[HN Gopher] First Impressions with GPT-4V(ision)
       ___________________________________________________________________
        
       First Impressions with GPT-4V(ision)
        
       Author : zerojames
       Score  : 278 points
       Date   : 2023-09-27 12:01 UTC (10 hours ago)
        
 (HTM) web link (blog.roboflow.com)
 (TXT) w3m dump (blog.roboflow.com)
        
       | jayniz wrote:
       | This looks like a Schnitzel to me, not like fried chicken.
        
       | circuit10 wrote:
       | Back when they first announced this months ago they leaked the
       | demo Discord bot ID in some logs they showed on screen, which you
       | could use to invite it to your own server. For a while you could
       | use it through that until they shut it down, though it was
       | incredibly slow and went down a lot
        
       | chankstein38 wrote:
       | I want this. I'm a paying GPT-4 customer I hate how these
       | rollouts go. Why do I pay just to watch everyone else play with
       | the new toys?
        
         | mediaman wrote:
         | You'll have it within a week or so.
         | 
         | Pretty much all new products that require significant per-user
         | incremental workloads (e.g., in this case, significant GPU
         | consumption per incremental user) do rollouts. It's an
         | engineering necessity. If they could roll it out to everyone at
         | once, they would.
        
       | hiidrew wrote:
       | As a Hoosier I'm thankful that they used an example of our absurd
       | pork tenderloins sandwiches.
        
       | yeldarb wrote:
       | I'm intrigued to see what kind of problems it's going to be
       | good/bad at. I think it's going to be tricky to evaluate though
       | because it has probably memorized all the easy images to eval it
       | with.
       | 
       | Eg anything pulled from Google Images (like that Pulp Fiction
       | frame or city skyline photo) is not a good test. It recognizes
       | common shots but if you pull a screenshot from Google Maps or a
       | random screen cap from the movie it doesn't do as well.
       | 
       | I tried having it play Geoguessr via screenshots & it wasn't good
       | at it.
        
         | loupol wrote:
         | I wonder how many images from Street View it has been trained
         | on.
         | 
         | I've seen top Geoguessr players be able to pretty consistently
         | determine a location worldwide after seeing a photo for just
         | one second. So I would assume training an LLM to do the same
         | would definitely be doable.
        
           | inductive_magic wrote:
           | > I would assume training an LLM to do the same would
           | definitely be doable.
           | 
           | I wouldn't be so sure. The reasoning process of Geoguessr
           | pros is symbolic, not statistical inference.
           | 
           | /edit: as other commenters pointed out, something similar was
           | done. While this wasn't an LLM, it was a deep learning model,
           | so not symbolic -> https://www.theregister.com/2023/07/15/pig
           | eon_model_geolocat...
        
           | fellerts wrote:
           | Yep, some CS/AI grads from Stanford trained an AI on loads of
           | Street View images and built a bot that is able to beat some
           | of the best Geoguessr players:
           | https://www.youtube.com/watch?v=ts5lPDV--cU
        
             | bayesianbot wrote:
             | IIRC it wasn't that impressive in the end as instead of
             | recognizing the places the AI apparently learnt to
             | recognize subtle differences in street view cameras used in
             | different locations? I might be wrong / thinking of the
             | wrong model l and I'm on mobile without my browsing history
             | so hard to check, but I think it was putting a lot of
             | weight on some pixels that are noisy
        
               | thewataccount wrote:
               | From my understanding many of the best players
               | immediately look down to tell what "generation streetview
               | car" they're using, and seem to know what
               | continents/times they're from.
        
               | zx_q12 wrote:
               | Top geoguessr players use this technique as well. IIRC
               | rainbolt mentioned that there is a section of a country
               | where the street view camera has a small blemish from a
               | raindrop on the camera so you can instantly tell where
               | you are if you notice that.
        
           | npinsker wrote:
           | It's been done recently! It's a bit better than (but
           | competitive with) top players.
           | 
           | https://www.youtube.com/watch?v=ts5lPDV--cU
        
         | skazazes wrote:
         | It seems it will still be limited by its linguistic
         | understanding of the surrounding context, at least in the first
         | chicken sandwich picture.
         | 
         | Although its interpretation could make some sense but is also
         | mostly wrong if talking about physical size of a modern GPU's
         | main processor compared to the size of the associated VRAM
         | chips. It has missed the joke entirely as far as I am aware. I
         | think the joke is actual about Nvidia's handling of product
         | segmentation, selling massive processors with less memory than
         | is reasonable to pair them with on their consumer gaming
         | offerings, while loading up the nearly identical chips with
         | more memory for scientific and compute applications...
        
           | Melatonic wrote:
           | Ironically the exact processors need to run GPT-4V in the
           | first place.....
        
       | [deleted]
        
       | waynesonfire wrote:
       | has the turd polishing already started?
        
       | Usu wrote:
       | I'd be interested in knowing how good it is at solving visual
       | captchas, do we foresee a huge rise in automated bypasses?
        
         | zerojames wrote:
         | We have just added a section on this! TL;DR: GPT-4V isn't great
         | at this task at the moment :)
        
           | circuit10 wrote:
           | Back when they leaked it via a Discord bot I found it worked
           | better when you ask it to first describe each box
           | 
           | Without doing that: https://cdn.discordapp.com/attachments/96
           | 4175221089259591/11...
           | 
           | With it: https://cdn.discordapp.com/attachments/9641752210892
           | 59591/11...
           | 
           | (though it's only one example so it could be coincidence)
        
           | qingcharles wrote:
           | Is it possible they hobbled it a bit? I know CAPTCHA solving
           | was one of the reasons they delayed the roll-out of this
           | feature.
        
             | gs17 wrote:
             | Given that it fails by hallucinating the structure of the
             | challenge instead of refusing to solve a CAPTCHA, I doubt
             | they've intentionally reduced the capability. Although the
             | example in your sibling comment implies it should have
             | enough information to do it.
        
         | GaggiX wrote:
         | Solving CAPTCHAs at the moment is more inexpensive using humans
         | than using GPT-4 API.
        
           | yeldarb wrote:
           | If true, this is wild.
           | 
           | I suppose a human could spend 10 seconds per Captcha, so they
           | could do 360 per hour. Add some overhead for not being
           | operating at peak performance every minute of every hour &
           | call it 250. Let's say you can hire someone for $2, that
           | works out to a bit over a penny per Captcha.
           | 
           | I don't think OpenAI has published pricing for GPT-4 Vision
           | yet, but if we assume it's on par with GPT-4, and uses only
           | 1000 of the 8000 possible tokens to process an image that's 3
           | cents per Captcha.
           | 
           | Doesn't seem completely unreasonable that at-scale humans may
           | actually be cheaper than LLMs at this point. My mind is a
           | little blown.
        
             | eiiot wrote:
             | The way these work is usually presenting an existing
             | captcha to another human who doesn't even know they're
             | solving the captcha. For example, sites hosting pirated
             | content serve fake captchas as a way to make money.
        
             | Andoryuuta wrote:
             | You'd be surprised, or perhaps horrified, by how cheap
             | (self-proclaimed) human-based captcha solving services are.
             | 
             | If you just search for "captcha solving service" the first
             | few results that come up offer 1000 solves of text-based
             | captchas for <= $1 USD, (puzzle / JS browser challenge
             | captchas are charged much higher).
             | 
             | Whether these are actually human based, or just impressive
             | OCR services, it seems like they are still much more cost
             | effective than GPT-4 is for now.
        
               | altcognito wrote:
               | I imagine they are a mix.
        
       | cs702 wrote:
       | Sure, there are a few edge-case failures and mistakes here and
       | there, but I can't help but be in _awe_. AWE.
       | 
       | Let me state the obvious, in case anyone here isn't clear about
       | the implications:
       | 
       | If the rate of improvement of these AI models continues at the
       | current pace, they will become a superior user interface to
       | _almost every thing you want to do_ on your mobile phone, your
       | tablet, your desktop computer, your car, your dishwasher, your
       | home, your office, etc.
       | 
       | UIs to many apps, services, and devices -- and many apps
       | themselves -- will be replaced by an AI that does what you want
       | when you want it.
       | 
       | A lot of people don't want this to happen -- it _is_ kind of
       | scary -- but to me it looks inevitable.
       | 
       | Also inevitable in my view is that eventually we'll give these AI
       | models robotic bodies (think: "computer, make me my favorite
       | breakfast").
       | 
       | We live in interesting times.
       | 
       | --
       | 
       | EDITS: Changed "every single thing" to "almost every thing," and
       | elaborated on the original comment to convey my thoughts more
       | accurately.
        
         | throw1234651234 wrote:
         | Zero awe. This week so far it can't find an import of a type in
         | Next.js and it can't get the user object from a session.
         | Next.js is one of the most popular FE frameworks in the world.
         | It lied about both, repeatedly giving wrong answers.
         | 
         | Here is my previous experience
         | https://news.ycombinator.com/item?id=34648167 with it not being
         | able to do basic tasks.
         | 
         | It's all fun and games until the mistakes start having a cost.
         | 
         | Other examples: I resorted to using it to order lists for me or
         | adding quotes and commas to them for SQL inserts and such. Nope
         | - when I look at the row count, it somehow drops values at
         | random.
        
           | gwd wrote:
           | Just to clarify, you're using GPT-4? Or GPT-3.5 (the free
           | ChatGPT one)?
           | 
           | My experience with GPT-4 has been completely different from
           | what you describe. Example:
           | 
           | https://gitlab.com/-/snippets/2567734
        
             | jiggawatts wrote:
             | He's using Copilot, which is based on GPT 3.0, making it
             | ridiculously outdated.
             | 
             | Generally feedback along these lines doesn't work.
             | 
             | People who are worried about their job security will cling
             | to the worst AI output quality they can find like a life-
             | preserver, and simply will not listen to advice like yours.
             | 
             | Nobody goes the extra mile to embrace an existential
             | threat.
        
         | Nfthero wrote:
         | Absolutely!
         | 
         | Never seen something like this and the new results from openai
         | tells us again that we are not close to any reasonable plateau.
        
         | p1esk wrote:
         | If the rate of improvement continues at the current pace -
         | which is GPT 1 to 2 to 3 to 4 in the last five years - we are
         | just one or two improvements away from a full blown
         | AGI/superintelligence/singularity/etc. At that point, a
         | superior user interface is probably the least interesting (or
         | scary) thing that would happen.
         | 
         | I personally doubt GPT-5 will be as much of an improvement over
         | GPT-4 as GPT-4 was over GPT-3, but that's fine, I can wait
         | until GPT-6 or 7.
        
           | DrJaws wrote:
           | chat-gpt at the end is a language model, not an real AI, it
           | have limits and are huge
        
             | p1esk wrote:
             | _real AI_
             | 
             | Thanks for the laugh, I needed that.
        
           | moffkalast wrote:
           | It does make one wonder how it'll all play out in the coming
           | decades. One way or another humans will eventually be
           | unemployable for most of currently existing jobs.
           | 
           | Throughout history, the ruling elite had always relied on the
           | rest of the population to make their food, do their work, and
           | fight in their wars. This is the first time ever that they
           | will no longer have any need for anyone else. Maybe climate
           | change will conveniently do the culling for them...
           | 
           | Of course there's always that option that we end up in a post
           | scarcity space utopia where machine produced wealth is
           | distributed to all, but only deluded idealists can possibly
           | still think that'll ever be a real option as we slink further
           | into techno feudalism with every passing day.
        
             | p1esk wrote:
             | With full automation, and using renewable energy, it will
             | be very cheap to feed and house the peasants. I'm guessing
             | it will become mainly about entertaining, so most will move
             | into VR worlds, first through headsets, then BCI, and
             | eventually uploading.
        
               | moffkalast wrote:
               | Ah yes the Ready Player One option. I suppose it's not
               | entirely unlikely, part of the fun of being in charge is
               | being able to boss around the peasants after all.
               | 
               | I doubt BCI will ever make sense, on a conceptual level
               | it's still just copying and killing your biological self.
               | AGI will likely solve aging way before that becomes
               | viable.
        
         | sottol wrote:
         | Everyone is going to wfh then or employers are going to splurge
         | on a private office for everyone? Or is work basically going to
         | be like a call center when everyone RTOs?
         | 
         | How does this work on public transit?
        
         | bytefactory wrote:
         | I share your awe. I feel like a kid in a candy store with all
         | these incredible AI breakthroughs coming out these days!
         | There's a place for cynicism and pessimism, but the kid in me
         | who loves technology for its own sake is just absolutely on
         | cloud 9.
        
         | naveen99 wrote:
         | Generally agree.
         | 
         | Just to play devils advocate:
         | 
         | If you want something done right, sometimes you have to do it
         | yourself. Employees are sort of a universal UI. But you will
         | always know more about what you want done than your agent,
         | whether it's human or computer. That's even before considering
         | the principal agent problem.
        
           | Closi wrote:
           | Just to play double-devils advocate:
           | 
           | If you want something done right, other times you will have
           | to get someone else to do it. You know what you want, but you
           | might not have the skills to do it. I can't represent myself
           | well in court, do a good job of plumbing or cut my own hair,
           | so I would ask for experts to do that for me.
           | 
           | Plus if someone is capable, it's often quicker to delegate
           | than do, and if you are delegating to someone with more time
           | to do the task they can often do a better job. Delegating
           | unambiguously is a skill in itself, as instructing AIs will
           | be.
        
             | dmazzoni wrote:
             | ALso, AIs will get much better at delegating.
             | 
             | Currently ChatGPT doesn't know it's bad at math, so it can
             | convert a story problem into an equation better than a
             | human but then mess up the arithmetic or forget a step in
             | the straightforward part.
             | 
             | But if you specifically give ChatGPT access to Mathematica
             | and an appropriate prompt, it can leverage a good math
             | engine to get the right answer nearly every time.
             | 
             | Before long, I don't think that extra step will be
             | necessary. It will know its limits and have dozens of other
             | services that it can delegate to.
        
         | crazygringo wrote:
         | > _they will become a superior user interface to almost every
         | thing you want to do_
         | 
         | No they won't. They're actually a pretty terrible user
         | interface from a design perspective.
         | 
         | Primarily because they provide zero affordances, but also
         | because of speed.
         | 
         | UX is about providing an intuitive understanding of available
         | capabilities at a glance, and allowing you to do things with a
         | single tap that then reflect the new state back to you
         | (confirming the option was selected, confirming the process is
         | now starting).
         | 
         | Where AI _is_ absolutely going to shine is as a helpful
         | assistant in learning /using those interfaces, much as people
         | currently go to Google to ask, "how do I do a hanging indent in
         | Microsoft Word for my Works Cited page"? For one-off things you
         | do infrequently, that's a godsend, don't get me wrong. But it's
         | not going to _replace_ UI, it 's going to _assist_.
         | 
         | And the 99% of your tasks that are repetitive habit will
         | continue to be through traditional UI, because it's so much
         | more efficient. (Not to mention that a lot of the time most
         | people are not in an environment where it's polite or possible
         | to be using a voice interface at all.)
        
           | jncfhnb wrote:
           | Not... "write my works cited page... use hanging indents"?
        
           | bratbag wrote:
           | If something is a repetitive habit that you can do almost
           | without thinking, there is a good chance an AI could infer
           | that entire chain.
           | 
           | I think what's more likely is that an AI based interface will
           | end up being superior after it has had a chance to observe
           | your personal preferences and approach on a conventional UI.
           | 
           | So both will still be needed, with an AI helping at the low
           | end and high end of experience and the middle being a
           | training zone as it learns you.
        
             | birdyrooster wrote:
             | Like on Apple Watch Ultra, when I press the start exercise
             | button, the exercises that it suggests are in an order
             | unique to my baseline activities.
        
             | crazygringo wrote:
             | I think I wasn't clear enough -- these habits I'm talking
             | about are things like "press cold water button, press
             | start" or "press warm water button, press start" or "tap
             | 'News' app grouping, tap 'NY Times' icon".
             | 
             | There's nothing to infer. The sequence is already short.
             | There are no benefits from AI here.
             | 
             | But you raise a good point, which is that there are
             | occasionally things like 15-step processes that people
             | repeat a bunch of times, that the AI can observe and then
             | take over. So basically useful in programming
             | macros/shortcuts as well. But that still requires the
             | original UI -- it doesn't replace it.
        
               | Workaccount2 wrote:
               | I totally get your point, but I think that AI will allow
               | much "smarter" behavior. Where every appliance is an
               | expert in doing what it is intended to do.
               | 
               | So sure, it will still have buttons, but those buttons
               | are really just preset AI prompts on the backend. You can
               | also just talk to your appliance and nuance your request
               | however you want to.
               | 
               | A TV with a remote whose channel button just prompts
               | "Next channel" but if you want you would just talk to
               | your TV and say "Skip 10 channels" or "make the channel
               | button do (arbitrary behavior)"
               | 
               | The shortcuts will definitely stay, but they will behave
               | closer to "ring bell for service" than "press selection
               | to vend".
        
               | nuancebydefault wrote:
               | Most user interfaces already have a much finer
               | granularity and number of options than your examples.
               | 
               | When taking a shower, I would like fine control over the
               | water temperature, preferably with a feedback loop
               | regulating the temperature. (Preferably also the
               | regulation changes over the duration of the showering.)
               | 
               | Choosing to read the NY times indeed is only a few taps
               | away, but navigating through and within its list of
               | articles is nowadays done quite fast and intuitively
               | thanks to quite a lot of UI advancements.
               | 
               | My point being, short sequences are a very limited set
               | within a vast UI space.
               | 
               | People go for convenience and speed, oftentimes even if
               | there's some accuracy cost. AI fulfills this preference,
               | especially because it can learn on the go.
        
               | JoeAltmaier wrote:
               | I don't know - the timer app on my oven is trivial too.
               | But I always, _always_ use Alexa to start timers. My
               | hands are busy, so I can just ask  "How many minutes left
               | on the tea timer?"
               | 
               | Voice is not really clumsy, compared to finding a device,
               | browsing to an app, remembering the interface etc.
               | 
               | Already when we meet a new app, we (I) often ask someone
               | to show me around or tell me where the feature is that I
               | want. Not any easier than asking my house AI. Harder
               | really.
               | 
               | Hard to underestimate the laziness of humans. I'll get
               | very accustomed to asking my AI to do ordinary things.
               | Already I never poke at the search menu in my TV; I ask
               | Alexa to search for me. So, so much easier. Always
               | available. Never have to spell anything.
        
               | crazygringo wrote:
               | I think we're basically in agreement though.
               | 
               | Everyone agrees setting timers in the kitchen via voice
               | is great precisely because your hands are occupied. It's
               | a special case. (And often used as the example of the
               | _only_ thing people end up consistently using their voice
               | assistant for.)
               | 
               | And asking an AI where a feature is in an app -- that's
               | exactly what I was describing. The app still has its UX
               | though. But this is exactly the learning assistance I was
               | describing.
               | 
               | And as for searching with Alexa, of course -- but that's
               | just voice dictation instead of typing. Nothing to do
               | with LLM's or interfaces.
        
               | JoeAltmaier wrote:
               | Alexa's search is a little different - it's context-
               | independent. I can ask for a search from any point in the
               | TV app - in some other menu, while watching another show,
               | heck even when the TV is turned off.
               | 
               | And when describing apps - I imagine the AI is an app-
               | free environment, where I just ask those questions of my
               | AI assistant, in lieu of poking at an app at all.
        
           | cs702 wrote:
           | Don't think it of it as the machine performing repetitive
           | tasks you specify.
           | 
           | Think of it instead as the machine accomplishing goals you
           | specify, figuring out on its own the tasks necessary for
           | accomplishing them.
           | 
           | Instead of telling the machine something like, say, "increase
           | the left margin by a quarter inch," you'd say something like
           | "I want to create a brochure for this new product idea I just
           | had, and I want the brochure to evoke the difficult-to-
           | describe feeling of a beautiful sunshine. Create 10 brochures
           | like that so I can review them."
           | 
           | Instead of telling the machine, say, "add a new column to my
           | spreadsheet between columns C and D," you'd say something
           | like "Attached are three vendor proposals. Please summarize
           | their pros and cons in a spreadsheet, recommend one, and
           | summarize the reasons for your recommendation."
           | 
           | All this presumes, of course, that the technology continues
           | to improve at the same pace. No one knows if that will
           | happen.
        
         | have_faith wrote:
         | I personally can't wait for brands to start specifically
         | advertising "doesn't use AI" as a selling point because
         | everyone got burned by their AI controlled toaster, literally
         | and figuratively. For anything designated an "appliance" I want
         | it to be fixable by someone locally, or it's just throwing
         | money away.
        
           | ctoth wrote:
           | > For anything designated an "appliance" I want it to be
           | fixable by someone locally, or it's just throwing money away.
           | 
           | What does this have to do with the price of tea in China, or
           | AI for that matter? I agree we should have repairable
           | appliances. I also want better AI.
        
           | renewiltord wrote:
           | I'm sure that in 2045, the Hacker News of its time will be
           | complaining about lack of AI-less devices the way it
           | complains about lack of smart-less TVs. It'll be available,
           | it'll be more expensive, and the people who claim they want
           | it don't want it enough to pay the premium.
        
           | Nfthero wrote:
           | I'm curious if we will reach a situation were the steam punk
           | idea becomes reality:
           | 
           | People plugging weird stuff together like a ai chip from a
           | car into a toaster.
           | 
           | If ai becomes hardware chips it could easily be that language
           | processing will be a chip default feature and the rest is
           | teachable like plugin ai chip level 3 into it, boot it and
           | teach it that it's now a toaster.
           | 
           | But at the end we will have the same toaster in 30 years as
           | we have had for the last 30 years.
        
           | kaibee wrote:
           | > For anything designated an "appliance" I want it to be
           | fixable by someone locally
           | 
           | This is already pretty much gone thanks to manufacturers
           | making it extremely difficult to fix things. No AI required.
        
             | 303uru wrote:
             | It's not gone, but is much more expensive up front. For
             | example, everyone I know has a range or stovetop/oven with
             | a circuit board or two in it, few if any replaceable parts,
             | etc... That's basically the standard if you're in the 30"
             | range that costs $1-2k market. Meanwhile, my 60" Bluestar
             | has no circuit board, it has four mechanical electronic
             | switches for the oven lights and convection. Everything
             | else is brass valves, standard easily replaceable parts and
             | so on. A few weeks back a convection fan went out and I
             | replaced it in 5 minutes with a screwdriver. But, this
             | range starts at $18k.
             | 
             | So yes, I see a market for bespoke non-AI.
        
           | elforce002 wrote:
           | "AI" will create a new business model for sure: "Talk to a
           | real person", "100% human made", etc. This would be the new
           | counter-culture.
        
             | huytersd wrote:
             | That's the current culture. How many times have you heard
             | "I want to speak with a real person" or "artisanal hand
             | made".
        
             | vsareto wrote:
             | That's a new flavor of customer service jobs which will
             | still suck and be paid terribly
        
               | elforce002 wrote:
               | At least you'll be able to complain to someone, hehe.
        
               | yamazakiwi wrote:
               | Unless that person is you... but in that case you would
               | just complain to your coworkers so nothing really changes
               | I guess
        
           | OnlyMortal wrote:
           | Here's where a toaster will be in the near future.
           | 
           | https://youtu.be/7folKbch3U8
           | 
           | Do we want this?
        
         | sir_brickalot wrote:
         | If AI or LLMs or similar would be everywhere... how much more
         | renewable energy do we have to produce, for very little to no
         | gain in so many cases?
         | 
         | Madness!
        
         | thelittleone wrote:
         | Agree and the next big step may well be human computer
         | interface. Speech is starting point for input. At some point
         | output will change also and if think it out longer term perhaps
         | a future where instead of reading information we install
         | knowledge, including the stored memory of actual experience. If
         | I want to do pottery, I could think this, download the
         | experience and then be competent at it.
        
           | unsupp0rted wrote:
           | Even more impressive would be if I don't want to know pottery
           | anymore, and I can delete that knowledge to make room for
           | something else.
        
           | BoxOfRain wrote:
           | The idea of giving the adtech industry direct access to my
           | brain is genuinely horrifying, not to mention Aldous Huxley
           | would have some reasonable objections to everything being
           | this easy all of a sudden.
        
           | jprete wrote:
           | This would be nice if I didn't believe that it would be
           | enshittified before it ever got off the ground. E.g.:
           | 
           | * There will be a monthly fee for the interface; you owe the
           | monthly fee as long as you have it, so you need surgery to
           | stop paying
           | 
           | * When you download knowledge, it's a rental, and in addition
           | to per-hour rental fees and the network connection fee, you
           | will owe 30% on the value of whatever you create
           | 
           | * The TOS will govern your behavior continuously, since
           | you're always using the interface
           | 
           | * Your behavior will always be monitored because it's totally
           | justified to spy on you all the time just because you
           | borrowed pottery knowledge
           | 
           | * If you're found to be in violation of any part of the TOS
           | at any time, they will erase all of the knowledge they've
           | added to your brain, as well as any derived knowledge you
           | gained through the use of their knowledge
           | 
           | * Because this product isn't actually considered essential,
           | you will have no legal remedies if they turn it off, even if
           | you are not actually in violation of the TOS
        
         | bamboozled wrote:
         | Ever wanted to see a photo or video or something before you
         | purchased it ?
         | 
         | Glance at a long term weather forecast ?
        
         | bulbosaur123 wrote:
         | > We live in interesting times.
         | 
         | We live in end of times.
        
         | verve_rat wrote:
         | They will be the best input mode to play flight sims?
         | 
         | I am being a little facetious, but you made a _bold_ claim.
        
           | jonplackett wrote:
           | I mean, autopilot is a thing. You would just instruct it what
           | to do and it will make a flight plan and do it.
           | 
           | It's gonna take the fun out of the experience a little bit
           | though!
        
             | bamboozled wrote:
             | Welcome to the future ?
        
           | cs702 wrote:
           | You're right, for actually playing the sim, you'll likely
           | need a game controller or something like that. I edited my
           | comment.
           | 
           | But keep in mind that during the sim, you'll be able to ask
           | the computer what you want the plane to do, and the computer
           | will magically make it happen on your display.
        
             | Miraste wrote:
             | No one would want that in a flight sim unless real planes
             | start working that way, which we should all hope they
             | don't.
        
         | chx wrote:
         | > few edge-case failures and mistakes
         | 
         | Like identifying traffic lights in 4th and 5th squares in the
         | second and third row both when there are only four squares?
        
         | mbil wrote:
         | I agree. I think apps that would initially benefit from LLM-
         | powered conversational interfaces are those that have the
         | following traits:                 - constrained context       -
         | part of a hands-free workflow
         | 
         | A couple use-cases I have been pondering are driving assistant
         | and cooking assistant.
         | 
         | People are already used to using their phone or car's nav
         | system to give them directions to an unfamiliar place. But even
         | with such a system it's useful to have a human navigator in the
         | car with you to answer various questions:                 -
         | What's my next turn again?       - How long till we get there?
         | - Are there any rest stops near here?       - What was that
         | restaurant we just passed?       - Is there another route with
         | less traffic?
         | 
         | These questions are all answerable with context that can be
         | provided by the mapping app:                 - List of upcoming
         | directions       - Overall route planning       - Surrounding
         | place data       - Traffic data and alternate route information
         | 
         | It's possible to pull over to the side of the road, take off
         | your distance glasses, put on your reading glasses, and
         | zoom/pan the map to try to answer these questions yourself. But
         | if the map application can just expose its API to the language
         | interface layer, then a user can get the answers without taking
         | their eyes off the road.
         | 
         | The information is contextual and constrained based on a
         | current task. In some cases it might be more desirable to whip
         | out your phone and interact with the map to look up the answers
         | on a screen, but often it won't be worth stopping the car, and
         | so the conversational interface is better.
         | 
         | Cooking assistant is a similar case: you are busy stirring
         | something and checking on the oven -- you don't want to wipe
         | the flour off your hands to pick up your phone and ask how many
         | teaspoons of sugar you need. Again: contextual and constrained
         | info based on a current task, and your hands and eyes -- the
         | instruments of traditional UIs -- are otherwise occupied.
         | 
         | Today, our software interfaces generally have one of two kinds
         | of entity on the other end: humans, or other software. In the
         | near future there will be another type of entity: language
         | models. We need to start thinking of how our APIs will change
         | when they're interacting with an LLM -- e.g. they'll need to be
         | discoverable and self-describing; error states will need to be
         | standardized or explicit with instructions on how to correct;
         | they'll need to be fast enough to fit in a conversational
         | interface; etc. It's arguable that such traits are part of good
         | API design today, but in the future they may be required for
         | the API to function in a landscape of virtual agents.
        
           | RandomLensman wrote:
           | In the cooking example, you either need the AI to have full
           | awareness of the step you are at or you need to describe the
           | step you are at, which could be cumbersome ("I did ..., how
           | much sugar do I need now"). I venture, having the recipe
           | projected in front of you would be much faster.
        
             | troupo wrote:
             | Having it both in front of my eyes, and being able to get
             | answers to questions like "I've added the eggs, now what?"
             | or "what does folding a dough mean?" at the same time would
             | be very valuable.
        
             | travoc wrote:
             | and a piece of paper wins again.
        
             | mbil wrote:
             | I imagined the AI would be reading the steps aloud to you,
             | and so would be aware of your progress. I don't think an AI
             | assistant precludes the recipe being projected tho, just as
             | in the driving example it wouldn't replace an on screen
             | map.
        
         | layer8 wrote:
         | I don't know, I hate the idea of _having_ to hold a natural-
         | language conversation with a computer in order to make use of
         | its functionality.
         | 
         | It feels like being one of those Futurama heads in a jar that
         | can't do anything by themselves.
        
           | boplicity wrote:
           | Visual input is not "natural language." The implications of
           | that could, especially in the long-term, could be incredibly
           | far-reaching, and possibly even terrifying.
        
           | TOMDM wrote:
           | There's nothing stopping developers from taking a prompt to
           | GPT and sticking it behind a button or command line, with
           | options in the UI interpolated into the prompt.
           | 
           | For now almost all applications of ChatGPT happen in chat
           | windows because it requires no further integration, but
           | there's no reason to expect things will always be this way.
        
           | Nfthero wrote:
           | I got Tendonitis and very very curious how soon I can really
           | start to do an interaction like in star trek.
           | 
           | Walking around and thinking out loud with the computer.
        
           | DelightOne wrote:
           | I wouldn't mind an UI fully configurable by AI to fit me
           | exactly. Without having to learn obscure settings, some
           | obscure declarative language or how the UI is implemented.
        
           | IKantRead wrote:
           | > I hate the idea of having to hold a natural-language
           | conversation with a computer in order to make use of its
           | functionality.
           | 
           | I hate the idea of _having_ to use a mouse to click on a
           | visual GUI to navigate a file system in order to make use of
           | its functionality.
           | 
           | It's less the case today, even among developers, but it
           | wasn't that long ago that I remember that any serious
           | technical user of a computer took it as a point of pride to
           | touch the mouse as little as possible. They're also still
           | correct in that thinking. The command line is a _very_
           | powerful UI with lots of benefits and while the mouse makes
           | navigating the OS _easier_ it 's still much more limited than
           | command line usage.
           | 
           | Touch screen interfaces are another example of an _easier_ UI
           | that ultimately feels even more limited. But people still
           | plug their iPad pros in to magic keyboard folios frequently.
           | 
           | Having worked with these tools everyday for awhile now the
           | "AI will change UX" is such a better take than "AI will
           | conquer the world!". AI does fundamentally open up new work
           | flows and user experiences, many of which do over a lot of
           | potential improvements over their predecessors.
           | 
           | At the same time I doubt we'll see a world where we don't end
           | up using the command line for the majority of serious
           | technical work.
        
             | jiggawatts wrote:
             | > I hate the idea of having to use a mouse to click on a
             | visual GUI to navigate a file system in order to make use
             | of its functionality.
             | 
             | Ironically, studies have shown that mouse-based interfaces
             | are more efficient for practically all filesystem use-cases
             | compared to CLI interfaces.
             | 
             | Despite objectively faster-time-to-solution, people _self-
             | report_ that they  "feel" that the mouse GUIs are slower.
             | 
             | That's because there's fewer _actions per second_ when
             | using a mouse. It 's a smooth gliding motion and then a
             | single click, versus many keystrokes in a row with a CLI.
             | 
             | Rapid actions _feel_ faster, even if it takes more wall-
             | clock time to achieve a task.
             | 
             | Keep this in mind next time you sneer at a "bad graphical
             | user interface" for being "slow".
        
               | gnaritas99 wrote:
               | [dead]
        
             | troupo wrote:
             | > Touch screen interfaces are another example of an easier
             | UI that ultimately feels even more limited.
             | 
             | Depends o the use case. Touch screen is much more powerful
             | than command line for maps, for example. Or for drawing.
             | Mouse + keyboard is much more powerful than just keyboard
             | for DAWs. And so on and so on.
        
           | morkalork wrote:
           | UIs being dumbed down for average users was already annoying.
           | Apparently the process won't stop until the illiterate are
           | included too.
        
             | sureglymop wrote:
             | But what is bad about that? Why shouldn't they be included?
        
               | CamperBob2 wrote:
               | It's a race to the bottom. You build an idiot-proof UI,
               | Mother Nature builds a better idiot.
        
             | yamazakiwi wrote:
             | The illiterate are customers, of course they will be
             | included.
        
             | edgyquant wrote:
             | Ideally it won't stop until computers are something we
             | rarely think about. For me I love the CLI but having to use
             | a mouse is a drag and causes serious wrist pain at this
             | point in my life. Typing on a keyboard and clicking with a
             | mouse are not the end all, be all of interacting with
             | information systems.
        
           | mnky9800n wrote:
           | Come on, you don't want an anxious, golden robot that mostly
           | just causes drama but also happens to be fluent in over 6
           | million forms of communication so it can translate for you
           | whenever you go on holiday?
        
         | nonameiguess wrote:
         | I'm glad you made that change because the interface can't
         | really solve every single thing. Most of what I do on my phone
         | is read eBooks. AI can't inject the knowledge directly into my
         | brain. Text on the screen is as good as it will get for the
         | foreseeable future. Most of what I want to do in my home is
         | cook, clean, sleep, and have sex with my wife. Software can't
         | do any of those things for me. Most of what I want to do in a
         | car is get where I'm going, which software control can arguably
         | make somewhat easier, but the driving isn't really the pain
         | point. The distance and traffic conditions are the pain points.
         | AI can't change external reality, at least not yet.
        
           | dinosaurdynasty wrote:
           | I'm hopeful that AI cars make renting one cheap enough that I
           | can just rent an AI car the 0-2 times/week I need one instead
           | of owning a car and all the time and stress on maintenance
           | that entails.
        
           | dr_dshiv wrote:
           | Also, keep in mind that Robots may be scifi level in 2 years.
           | Like, able to cook and clean with hands! Crazy, but I think
           | we will see it happen so fast:
           | 
           | https://www.tri.global/news/toyota-research-institute-
           | unveil...
        
             | dns_snek wrote:
             | I'm not sure if we watched the same video, but I saw a
             | robot fumble though a few mechanical motions with dexterity
             | and speed of a toddler to achieve a few isolated, best-case
             | tasks where all the hard parts were taken care of by a
             | human.
             | 
             | Cool demo, I suppose, but nobody is going to buy this as
             | anything other than a toy.
        
               | dr_dshiv wrote:
               | 2 years. The point of the video is that robots are on the
               | LLM train.
        
         | mediaman wrote:
         | Roughly half of people in most developed countries are not
         | functionally articulate: meaning, they can read functionally,
         | but struggle to articulate what they want with the written
         | word.
         | 
         | LLM-based chatbots can be extremely attractive to the top 30%
         | literacy users in the developed world. They are not a good
         | universal UI. You still need to provide pathways for the user
         | to follow to get done what they need without forcing them to
         | articulate their requirement.
         | 
         | This is why so many people sit in front of a ChatGPT-like
         | service and say, "what would I use this for?" and never use it
         | again.
        
           | [deleted]
        
           | brrrrrm wrote:
           | Audio to text solves written word articulation, right?
           | Besides this post is about vision, which also solves it.
        
             | sweezyjeezy wrote:
             | Does vision solve it? How does one ask a question through
             | images alone?
        
             | RealityVoid wrote:
             | No, it does not. It's still the same words, in a different
             | medium. If you are unable to write, you'll probably be
             | unable to speak your ideas.
        
               | a2dam wrote:
               | This isn't true. There's plenty of people who are
               | verbally fine but can't read or write. Spoken language is
               | a far more common and fundamental skill than reading or
               | writing.
        
               | jurimasa wrote:
               | ...do you know that illiterate people exist, right? Do
               | you understand that people were illiterate for thousands
               | of years and still managed to speak their ideas, right?
               | 
               | Right?
        
               | furyofantares wrote:
               | That sounds super false to me, at least given how
               | articulate kids can be before they can read, and how
               | literate they can be before they can write.
               | 
               | If it's really true that half of the population can't
               | functionally express themselves verbally then I'd sure
               | like to know that. Or maybe I've misinterpreted something
               | claimed here, because I'm struggling to find these claims
               | plausible.
        
           | burkaman wrote:
           | > Roughly half of people in most developed countries are not
           | functionally articulate
           | 
           | Where did you get this idea? I found this article
           | (https://www.uxtigers.com/post/ai-articulation-barrier, is
           | this you?), but it makes a leap from literacy to articulacy
           | that I don't understand. It's not obvious to me why an
           | illiterate person would be "functionally inarticulate"
           | assuming they can speak instead of write.
           | 
           | Also, I'm not certain but I think the author is
           | underestimating the abilities of a person with Level 2
           | literacy. It doesn't seem correct to say that "level 3 is the
           | first level to represent the ability to truly read and work
           | with text", especially when the whole point of LLMs is that
           | you don't have to read a long static document and understand
           | it, you can have a conversation and ask for something to be
           | rephrased or ask followup questions.
        
         | RandomLensman wrote:
         | Only for things where a high rate for sizeable errors (maybe
         | >1% or >0.1%) is tolerable.
        
         | satvikpendem wrote:
         | Unless there is a neural interface, keyboards, digital stylii,
         | etc are still a superior UI to letting an AI guess at what
         | you're thinking.
        
         | apexalpha wrote:
         | Correct, this will be the successor to the GUI.
        
           | tmalsburg2 wrote:
           | I doubt it. It's too damn costly computationally.
        
             | Difwif wrote:
             | This is the same reply to GUIs will never take off but
             | decades later and on to the next successor.
        
         | ryandr wrote:
         | Things are changing quickly
        
         | m3kw9 wrote:
         | Gpt4 was awesome, this looks like it does not disappoint based
         | on the progress we expect of OpenAI.
        
         | hypertele-Xii wrote:
         | So you won't be able to do _anything_ without Internet
         | connection to the AI mainframe? No thanks.
        
           | postalrat wrote:
           | Only the stuff you don't know how to do yourself.
        
             | HappySweeney wrote:
             | ... like operate my phone without a UI?
        
           | op00to wrote:
           | Until the AI mainframe runs on your $device
        
             | danenania wrote:
             | By the time the current AI mainframe runs on your device,
             | there will be new, better models that still require the
             | mainframe.
             | 
             | I think AI fundamentally favors centralization. Except for
             | narrow tasks and domains, there's no such thing as "enough"
             | intelligence. For general purpose AI, you'll always want
             | the best and most intelligent model available, which means
             | cloud rather than local.
        
             | hypertele-Xii wrote:
             | GPT-3 requires 700 gigabytes of GPU RAM. I'm looking at my
             | cheapest computer components retailer listing a 48 gigabyte
             | GPU at $5k. So to run the _previous generation_ of GPT
             | would cost me about $70k right now. When do you think I can
             | expect to run GPT-4 on my consumer $device? :)
        
               | M4v3R wrote:
               | Where did you get the 700 gigabytes figure from? I don't
               | think OpenAI even released the model size, although it's
               | considered to be 175B parameters. Given how well
               | quantization works at these sizes you would need less
               | than 200 GB of GPU memory to run it.
        
               | sandbags wrote:
               | I would be surprised if GPT-3 uses 700GB of RAM. It may
               | be true, I don't know. But I am running 70B parameter
               | models (quantized to 5 or 6 bits, biggest is 48GB loaded)
               | on my 64GB Mac M2 Max Studio now and they are usable and
               | the machine is still usable too. With an M2 Ultra and
               | 192GB of RAM I imagine you could do a lot more.
               | 
               | I'm not arguing that these models hold up against GPT 3.5
               | and I still use GPT 4 when it matters. But they work and
               | it's more like the difference between Premier League &
               | Division 1, rather than PL & a five-a-side team from
               | Bracknell.
               | 
               | Even a few years ago I could not have imagined this.
               | 
               | Given the pace of work on optimisation and my assumption
               | that the M3 Studio I buy next will probably have 256GB of
               | RAM at much the same power levels as I use now, it seems
               | eminently possible it's a year or two away.
        
               | dmazzoni wrote:
               | First of all, you're off by an order of magnitude.
               | 
               | Second, I don't think it will be that long. There are
               | already LLMs as good as GPT-3 running on average laptops
               | and even phones.
               | 
               | In the next couple of years, you'll see:
               | 
               | - Ordinary PCs, tablets, and phones with dedicated AI
               | chips, like TPUs - they'll be more tuned specifically for
               | LLMs
               | 
               | - Mathematical and algorithmic optimizations will make
               | existing LLMs faster on the same hardware
               | 
               | - Newer generations of LLMs will get even more useful
               | with fewer parameters
               | 
               | The combination of all of these means that it's not at
               | all unreasonable to expect that today's top-of-the-line
               | LLM will be running locally on your device within just a
               | couple of years.
               | 
               | Of course, LLMs in the cloud will advance even further,
               | so there will always be a tradeoff, and there will always
               | be demand for cloud AI, depending on the application.
        
               | qingcharles wrote:
               | That doesn't seem to make sense. I can run Llama 2 on my
               | 12-year-old desktop PC with no compatible GPU and only
               | 16GB of system RAM. It ain't quick, but it runs.
        
               | arrowsmith wrote:
               | Maybe 700GB is what ChatGPT uses to serve zillions of
               | users concurrently. If you're running your own individual
               | instance, you obviously don't need as many resources.
        
               | apexalpha wrote:
               | You are now where the GUI was in in 1986 or so.
        
               | chx wrote:
               | and you think corporate will wait near forty years for
               | tech to catch up? no, they will, indeed force you to be
               | always online, leading to absolute and total surveillance
               | where you voluntarily add dozens of mics and cameras to
               | every home. if you thought telescreens a'la 1984 is bad,
               | just wait.
        
               | astrange wrote:
               | It's funny that being obsessed with "corporations" feels
               | so specifically GenX. I think because it's part of the
               | New Left movement. In reality governments have all the
               | power here, which is why this would be illegal in at
               | least CA/EU/China.
               | 
               | Another issue where this comes up is high housing costs,
               | which are mostly caused by bad land use laws (and the
               | nearest evil people, landlords, mostly own one or two
               | properties), but people from the New Left era will
               | literally refuse to believe you about this because they
               | can't accept that any bad thing on Earth could not be
               | caused by "corporations".
        
           | 3abiton wrote:
           | At some point Open-source models' performance will follow.
           | Llava will improve.
        
       | m3kw9 wrote:
       | I'm just imagining a mode where OpenAI calls it "App Mode" where
       | you say what you want say "a dog themed cute calculator app with
       | units conversions", and it will generate the UI for a working
       | app. You add these into a widget like place. The OpenAI AppStore
       | will carry these apps. Although in the beginning the apps would
       | be simple but I do see potential
        
         | Reflecticon wrote:
         | The more AI can produce customized stuff for us the less we
         | need companies. Full personalization of our products might be
         | possible. Probably first software, then art, then 3D printed
         | products and maybe later houses, cars and clothes.
         | 
         | I wonder what we will work and if we will work at all in such
         | an environment. Maybe some people still like consuming and copy
         | different designs and products and because of the Blockchain
         | you have to give them something in exchange or everything is
         | open source and it is free for you to take.
         | 
         | I wonder whether such life would contribute to humanity making
         | further progress or make it stagnate (or possibly decline)?
         | 
         | Interesting times. I think we are close to the times of the
         | moon landing. Which had an immense Impact on humanities
         | culture.
        
       | og_kalu wrote:
       | Graph analysis is impressive (last example) -
       | https://imgur.com/a/iOYTmt0
       | 
       | Can do UI to frontend. Seems to understand the UI graphical
       | elements and layout, not just text
       | https://twitter.com/skirano/status/1706823089487491469
       | 
       | Can describe comic images accurately, panel by panel -
       | https://twitter.com/ComicSociety/status/1698694653845848544?...
       | 
       | Lots of examples here also -
       | https://www.reddit.com/r/ChatGPT/comments/16sdac1/i_just_got...
       | 
       | It's Computer Vision on Steroids basically.
       | 
       | Multi-modality is pretty low hanging fruit so i'm glad we're
       | finally getting started on that. Imagine if GPT-4 could
       | manipulate sound and images even half as well as it could
       | manipulate text. We still don't have a large scale multi-modal
       | model trained from scratch so a lot of possible synergistic
       | effects are still unknown.
        
         | dottjt wrote:
         | Oh wow, I'm completely fucked as a front end developer.
        
           | troupo wrote:
           | Has AI been able to figure out conflicting and quickly
           | changing product requirements and implement them on top of
           | existing technical debt?
           | 
           | You'll be fine.
        
             | zarzavat wrote:
             | People keep saying "ah but it can't do X!". So what? Most
             | of us have multiple decades until we can retire. This AI is
             | getting better every few months.
             | 
             | It will be able to do it even faster, better and more
             | cheaply than a human can.
        
               | troupo wrote:
               | > It will be able to do it even faster, better and more
               | cheaply than a human can.
               | 
               | Take what you did in the past year. Write down every
               | product decision taken, every interaction with other
               | teams figuring out APIs you had, all the infra where your
               | code is running and how it was setup and changed, all the
               | design iterations and changes that had to be implemented
               | (especially if you have external partners demanding it).
               | 
               | Yes. All that you'd have to input into the AI, and hope
               | it outputs something decent given all that. And yes,
               | you'll have to feed all that into AI all the time because
               | it has no knowledge or memory of "on Monday the new
               | company bet was announced in the all hands"
               | 
               | You will be fine.
        
               | Michelangelo11 wrote:
               | So ... in this example, your job is continually feeding
               | information to the AI from various sources. Why would the
               | AI not be automatically hooked up to all those sources?
               | Building a system that can do that is essentially trivial
               | with the OpenAI API.
        
               | nimithryn wrote:
               | Have you tried to build a system like this? It's not
               | trivial
        
               | Workaccount2 wrote:
               | We'll have jobs, but they sure as shit won't be worth
               | $150k anymore.
               | 
               | Any grunt can feed meeting notes into an AI. And frankly,
               | and AI can parse an audio recording on a meeting.
        
               | tayo42 wrote:
               | There's an idea with some truth to it. The first 90% is
               | easy, finishing and getting it to 100% is hard, maybe
               | almost impossible. So asking "can it do x" is relevant.
               | Becasue it might not
        
             | kossTKR wrote:
             | But if this development continues AI will surely be able to
             | just parse the entire frontend, then look at the repo, then
             | look at the specifications, then when you ask for a
             | specific feature it will instantly implement it gracefully.
             | 
             | I don't see why it wouldn't understand piles of hotfixes on
             | top of each other, or even refactor technical debt in tight
             | coupling with existing or historical specification.
             | 
             | Or is there a reason this is not going to happen in a few
             | years?
        
               | raincole wrote:
               | /s?
        
               | troupo wrote:
               | I very much doubt it. "Revert Norway tax code" or "add
               | content restrictions for Turkey" cannot necessarily be
               | deduced from the codebase. And there are thousands of
               | product requirements like that.
               | 
               | It might become a better code assist tool some 10 years
               | from now, but it won't be able to implement product
               | decisions.
        
               | generic92034 wrote:
               | It depends. Did you ever work in Development Support?
               | Understanding requests or bug reports from customers is
               | quite challenging, for trained and experienced
               | developers. In my eyes that would require AGI, which we
               | do not know of, if that can be achieved with the current
               | approach.
        
               | kossTKR wrote:
               | I actually did , and yes it's extremely challenging and
               | can be rather rage inducing; "it doesn't work" --> what
               | doesn't work --> "the thing" --> what thing --> [?] ...
               | 
               | But the thing is conversations like the above ie. both
               | external support and internal feature requests could
               | theoretically be handled by a GPT-like system also ending
               | up in a ai created custom specification that could both
               | be implemented and documented by the ai system instead of
               | humans?
               | 
               | I know we're a few versions out, but still.
        
               | generic92034 wrote:
               | If just being persistent and willing to iterate solves
               | the issue, then yes, GPT-like systems could do that. If
               | you have to employ creative thinking to even find out
               | what the customer wants, then check it in the system,
               | debug some processes and derive the desired feature or
               | correction, then we are very far from having such a tool,
               | IMHO.
        
             | dottjt wrote:
             | Not yet, but give it time. The concept of self-driving
             | vehicles even a decade ago seemed absurd (or even AI for
             | that matter), but now it all seems like a reality.
             | 
             | And that's not even taking into account all the advances
             | we'll have with AI within the next decade that we haven't
             | even thought about.
             | 
             | But yeah, you may be right.
        
               | troupo wrote:
               | > The concept of self-driving vehicles even a decade ago
               | seemed absurd (or even AI for that matter), but now it
               | all seems like a reality.
               | 
               | Nope. It's still not close to reality. It's as close to
               | reality as it has been for the past 10 years while it was
               | being hyped up to be close to reality.
               | 
               | > And that's not even taking into account all the
               | advances we'll have with AI within the next decade that
               | we haven't even thought about.
               | 
               | As with FSD, we may approach an 80% with the rest 20%
               | being insurmountable.
               | 
               | Don't get me wrong, these advances are amazing. And I'd
               | love to see an AI capable of what we already _pretend_ it
               | 's capable of, but it's not even close to these dreams.
        
               | astrange wrote:
               | Cruise and Waymo are in production in really difficult
               | cities. I think it's definitely in the last 10 years of
               | not being here.
        
           | nimithryn wrote:
           | I've used GPT4 (text) heavily as part of my business,
           | including for front end stuff.
           | 
           | The technology is very impressive - but honestly Twitter
           | examples are super cherry picked. Yeah, you can build some
           | very ugly, basic front end web pages and functionality right
           | out of the box. But if you want anything even slightly
           | prettier or more complicated, I've found you need a human in
           | the loop (even an outsourced dev is better). I've had GPT
           | struggle with even basic back end stuff, or anything even a
           | bit out of distribution. It also tends to give answers that
           | are "correct" but functionally useless (hard to explain what
           | I mean, but if you use it a lot you'll run into this -
           | basically it will give really generic advice when you want a
           | specific answer. Like, sometimes if you provide it some code
           | to find a bug, it will advise you to "write unit tests" and
           | "log outputs" even if you specifically instruct it to find
           | the bug).
           | 
           | Plus, in terms of capabilities, tools like Figma already have
           | design to code functionalities you can use - so I don't think
           | this is really a change in usable functionality.
           | 
           | Of course, the tech will get better over time.
        
             | huytersd wrote:
             | I've had good results with "match the style of the
             | webpage/figma".
        
           | dimgl wrote:
           | If AI continues to get better it won't just be you who's in
           | trouble.
           | 
           | However, keep in mind that these are cherry-picked. If
           | someone just took that output and stuck onto a website, it'd
           | be a pretty horrible website. There's always going to be
           | someone who manages the code and actually interacts with the
           | AI, so there will still be some jobs.
           | 
           | And your boss isn't going to be doing any coding. I'm pretty
           | sure that role is still loaded and they'll still be managing
           | people rather than coding, and maybe sometimes engaging with
           | an AI.
           | 
           | Another prediction: I'm pretty sure specialists are going to
           | be significantly more important as your job will be to
           | identify the AI's deficiencies and improve on it.
        
           | Tostino wrote:
           | Your job will change in fundamental ways at least.
        
             | yieldcrv wrote:
             | The more people say that, the less convincing it is
             | 
             | There is no way I would have a UI developer onboarded when
             | I can generate many iterations of layouts in midjourney,
             | copy them into chatgpt4 and get code in NextJS with
             | Typescript instantly
             | 
             | non devs will have trouble doing this or thinking of the
             | prompts to ask, but the dev team asking for headcount
             | simply wont ask for headcount, and the engineering manager
             | is going to find the frontend only dev redundant
        
               | bamboozled wrote:
               | Will you then use the AI to scale your platform ?
               | Optimise your database ? Improve your test coverage,
               | implement new features, write new backend services,
               | integrate with old difficult but critical systems?
               | 
               | At some stage you must realise that you're still
               | working...
        
               | HDThoreaun wrote:
               | what does any of this have to do with frontend?
        
               | yieldcrv wrote:
               | put our two comments into an LLM and ask if it these are
               | even related thoughts
               | 
               | because at least it will have actually read my comment
        
               | bamboozled wrote:
               | I'm going to say you edited or amended you comment
               | because that second paragraph wasn't there , if it was. I
               | was so underwhelmed with the first I guess I didn't
               | bother with the second.
        
               | yieldcrv wrote:
               | my comment is over 3 hours old, it is not possible to
               | have edited by the time you wrote yours
        
               | raincole wrote:
               | > when I can generate many iterations of layouts in
               | midjourney, copy them into chatgpt4 and get code in
               | NextJS with Typescript instantly
               | 
               | Have you actually tried this?
               | 
               | I did the first step and even that didn't work well. The
               | "iterations of layout in MidJourney" step. If people can
               | make it work, well bless them, but we're not getting rid
               | of our graphic designer now.
        
               | epups wrote:
               | Yeah, I'm also skeptical about the actual value of
               | specialists in the future. To me, AI brings a ton of
               | power to generalists, who now have access to very
               | powerful tools that would have taken them a long time to
               | learn otherwise.
        
               | lewhoo wrote:
               | > There is no way I would have a UI developer onboarded
               | when I can generate many iterations of layouts in
               | midjourney
               | 
               | I guess we have to face the music and say yeah, that's
               | true. If the work doesn't need copyrights then this seems
               | like the way to go.
        
             | zarzavat wrote:
             | Job will be okay. Career is over. Maybe we should join the
             | writers on the picket line?
        
           | qingcharles wrote:
           | LOL. I already use it to write half my CSS, and that was
           | before it could see what I was fucking up. I'm looking
           | forward to this.
        
             | moffkalast wrote:
             | Real talk, GPT4 is an underrated CSS god, shaking
             | description accurate animations from its sleeve at
             | ridiculous speed.
             | 
             | Reminds me of this apparently now eight year old meme:
             | https://i.imgur.com/GcZFBaT.png
             | 
             | This used to be funny, now it's just Tuesday.
        
         | yieldcrv wrote:
         | Oh I love this
         | 
         | Especially since everything else is "sign up to our waitlist"
        
         | idontknoworcare wrote:
         | In your twitter's comic book link the first image has a women
         | with a huge ass with no pants on. The ass is colored grey and
         | so the lack of clothes is not noticeable until you click into
         | the image.
         | 
         | How many children here on hacker news are going to see this and
         | get addicted to porn? Perhaps a few. You deserve to be banned.
        
       | HDThoreaun wrote:
       | It didn't successfully explain the NVIDIA burger joke though? The
       | image is making fun of how nvidia has implemetned price
       | discrimination by releasing consumer gpu's that don't have as
       | much vram as they should so that they can sell the fully loaded
       | datacenter gpu's at insane markup without pissing off gamers. The
       | explanation didn't even come close to getting that.
        
         | gwern wrote:
         | I don't think that's the right answer. I don't see anything in
         | that meme image that indicates it's about price discrimination
         | and elaborate narratives about consumer psychology, as opposed
         | to a simpler point that 'NVIDIA GPUs are imbalanced'. Skimming
         | what seems to be the meme origin on Facebook, I don't see a
         | single gamer there mentioning price discrimination, or come
         | even close to getting that. That may be why they skimp on VRAM,
         | but you're adding a lot of context & editorializing into it
         | that neither the meme author nor recipients are focused or even
         | understand.
        
         | Melatonic wrote:
         | Yeah that was my take as well. Although it did come up with a
         | plausible sounding answer. A less nerdy human might also not
         | get it
        
           | HDThoreaun wrote:
           | To be fair the author didn't get it either so hard for me to
           | blame GPT but was a bit wild to see the first example of
           | success actually be a failure.
        
             | jazzyjackson wrote:
             | good example of how you need to be smarter than the AI to
             | know whether it's correct or not
             | 
             | if you don't know the answer, and the AI just pretends to
             | know the answer, you just have to take its word for it.
        
         | wlesieutre wrote:
         | It explains the broad joke, but gets the labels wrong.
         | 
         | Says that the small buns are labeled "GPU and VRAM" and the
         | massive fried chicken is labeled "NVIDIA BURGER"
         | 
         | Should say that the small bun is labeled "VRAM" and the massive
         | fried chicken is labeled "GPU"
        
         | gs17 wrote:
         | Yeah, it seems to think the joke is about the graphics card
         | being represented as a burger (and that the size is physical)
         | and not about the amount of VRAM being insufficient.
        
       | layer8 wrote:
       | The discrepancy between the two answers regarding the set of
       | coins is jarring. From the answer to the first question, one
       | would assume that it can't tell the currency. The answer to the
       | second question shows that it actually can. The fact that LLMs
       | don't reflect a consistent inner model in that way, and hence the
       | users' inability to adequately reason about their AI
       | interlocutor, is currently a severe usability issue.
        
         | og_kalu wrote:
         | >The fact that LLMs don't reflect a consistent inner model in
         | that way
         | 
         | You're probably not going to ask any human a question about an
         | image and get every single detail you want every time. If you
         | care about a detail, just ask about it. Doesn't really have
         | anything to do with a consistent inner model.
        
         | zwily wrote:
         | I've gotten in the habit of asking chatgpt "are you sure?" So
         | many times it will (correctly) correct itself, state that items
         | are hallucinations, etc. It always makes me laugh.
        
         | Workaccount2 wrote:
         | I've heard that it is because the AI outputs what it is
         | thinking as it is thinking it. It doesn't really reflect, it
         | sort of does the equivalent of just verbal thought streaming
         | right onto the screen.
         | 
         | So when you ask it to reflect on what it said, that's when it
         | actually looks at it and reflects on it.
        
       | orbital-decay wrote:
       | _> The bounding box coordinates returned by GPT-4V did not match
       | the position of the dog._
       | 
       | I suppose it just doesn't take image dimensions into
       | consideration, and needs to be provided with max dimensions, or
       | prompted to give percentages or other absolute values instead of
       | pixels.
        
       | abledon wrote:
       | https://twitter.com/cto_junior/status/1706289820702490839
        
       | greatpostman wrote:
       | Im shocked at how good this is. The world is truly going to
       | change
        
         | [deleted]
        
       | fdeage wrote:
       | Could someone with access tell me what GPT-4V has to say of this
       | image?
       | 
       | http://karpathy.github.io/assets/obamafunny.jpg
       | 
       | Andrej Karpathy used it in 2012 as an example of an image he
       | thought would be extremely hard for a model to interpret. I'm
       | wondering how this holds 11 years later...
        
         | zamadatix wrote:
         | Prompt: "What do you have to say about this image?"
         | 
         | Response 1: "The image depicts a candid moment in what appears
         | to be a hallway or corridor setting. Several individuals are
         | present, engaging in various activities. The prominent features
         | include a man on the left, who seems to be peering into a room,
         | and another man next to him standing on a weighing scale and
         | taking some notes. The individuals in the background are
         | conversing with each other. The overall mood seems light-
         | hearted and jovial. The architecture and interior suggest an
         | institutional or official setting, possibly an office building
         | or government facility. The tiled floor, lockers, and signs on
         | the walls contribute to this formal ambiance. Overall, it's a
         | snapshot of a seemingly casual moment in a professional
         | setting."
         | 
         | Response 2: "Sorry, I cannot help with that."
         | 
         | New chat, Prompt: "Why is this image funny?"
         | 
         | Response 1: "The humor in this image seems to stem from the
         | juxtaposition of formal and informal elements. Here are a few
         | reasons why someone might find it amusing:
         | 
         | Unexpected Situation: Important figures, who are usually seen
         | in serious or formal settings, are captured in a more relaxed,
         | candid, and human moment.
         | 
         | Height Discrepancy: The taller individual is using a device to
         | measure height, while the shorter individual walks by. This
         | could create a comedic contrast between their heights.
         | 
         | Expressions: The reactions of people in the background and the
         | focal subjects can be humorous. They are smiling or laughing,
         | which indicates something amusing might have occurred or been
         | said.
         | 
         | Informal Setting: The backdrop of lockers and checkered floor
         | tiles gives the feel of a school or gym environment, which
         | contrasts with the formal attire of the individuals.
         | 
         | Different people might find humor in different elements of the
         | photo. It's a mix of candidness, unexpected situations, and
         | visual contrasts that create a light-hearted and humorous
         | scene."
         | 
         | Response 2: (same)
         | 
         | New chat, prompt: "What is the person in the middle doing with
         | their foot and why?"
         | 
         | Response: "The person in the middle appears to be playfully
         | stepping on the scale while the taller individual is weighing
         | himself, likely as a humorous gesture to momentarily increase
         | the weight reading. It's a light-hearted way of interacting and
         | bringing some humor to the situation."
         | 
         | .
         | 
         | Apologies for how bad the formatting of this is going to come
         | out, not sure how to make it better on HN (wish we had real
         | quotes not just code blocks). Overall, I don't think it either
         | noticed the foot was on the scale by itself or put it together
         | that this was the focus until fed that information. Otherwise
         | it was more lost in generalities about the image.
        
           | derangedHorse wrote:
           | Still cool that it elaborated once you pointed it out
        
         | ediardo wrote:
         | Bard:
         | 
         | Prompt: What's funny about this image?
         | 
         | Bard: Sorry, I can't help with images of people yet.
        
         | [deleted]
        
       | kristopolous wrote:
       | This actually doesn't seem like it's a giant lift using modern
       | image classifiers.
       | 
       | The basic idea is to use diffusion classifiers to caption the
       | image to generate descriptive text and append the prompt.
       | 
       | The work part is getting the ensemble right since you'll need to
       | use a general classifier, like BLIP, to identify say a bunch of
       | text from a plant and then, in this example, use structured OCR
       | and pl@ntnet to get more specific.
       | 
       | But it's not that hard - maybe a dozen models. The prompt context
       | can help as well. Then you combine the output with qualifiers in
       | a hierarchy with respect to the model pipeline and swap the text
       | into the prompt
       | 
       | Using examples from the article, here's a PoC framework to prove
       | it works
       | 
       | "[I have] (photo description) (prompt)"
       | 
       | ---
       | 
       | Working Examples
       | 
       | ---
       | 
       | - Plant:
       | 
       | Here's the flower photo from TFA: https://9ol.es/tmp/lily.jpg
       | 
       | Go to https://identify.plantnet.org/ and upload it. It hits
       | "Spathiphyllum wallisii Regel/Peace lily" with extremely high
       | confidence.
       | 
       | We got a match cropping a screenshot of a thumbnail!
       | 
       | Let's say you didn't have the word "plant" in the prompt. You can
       | fall back on a universal image classifier, such as the diffusor
       | based BLIP here: https://huggingface.co/Salesforce/blip-image-
       | captioning-base (uploader is on the right)
       | 
       | Upload the same image. You'll get "a plant in a white pot" which
       | then, because we use feed-forward networks these days, will lead
       | you to pl@ntnet and you'll get the peace lily again.
       | 
       | Using our framework, ask GPT 3.5 " I have a Spathiphyllum
       | wallisii Regel/Peace lily. What is that plant and how should I
       | care for it?"
       | 
       | And you get a nearly identical reply to the one in the article.
       | 
       | - Penny:
       | 
       | Upload the penny image (from
       | https://en.wikipedia.org/wiki/Penny_(United_States_coin)) to the
       | BLIP classifier and you get "a penny coin with the face of
       | abraham"
       | 
       | Let's go back to GPT 3.5 and use our format from above,
       | 
       | "I have a penny coin with the face of abraham. What coin is
       | that?"
       | 
       | And of course you get: "A penny coin with the face of Abraham
       | Lincoln is most likely a United States one-cent coin, commonly
       | known as a "Lincoln penny"..."
       | 
       | And there we go. For a full FLOSS stack, you can ask llama2 70b
       | https://stablediffusion.fr/llama2 and get "The face of Abraham
       | Lincoln is featured on the United States one-cent coin, commonly
       | known as the penny."
       | 
       | more complex photos:
       | 
       | You can use Facebooks SAM (segment anything) https://segment-
       | anything.com/ to break up the image, BLIP caption the segments,
       | then forward off to the specialized classifiers.
       | 
       | It's a fairly intensive pipeline that requires lots of modern
       | hardware and requires you to have familiarity with a wide variety
       | of models, then tweak them, test it, have some GANs maybe set up
       | for refinement ... but this is well within reach of non-geniuses.
       | I'm merely average on a good day and even I can see how to set
       | this up.
       | 
       | They might be using a different approach but using SAM, BLIP and
       | a few specialized classifiers covers all the examples in the
       | articles without using any human discretion. For instance, the
       | city one is way more powerful if they're using something like
       | this:
       | https://static.googleusercontent.com/media/research.google.c...
       | 
       | I'm trying to justify why bother cloning it. Maybe to have a free
       | alternative? It's a bit of work but it's not new magic.
        
       | steveads wrote:
       | I use ChatGPT fairly frequently, but am frequently annoyed at how
       | hesitant it is to answer any question that is somewhat
       | subjective.
       | 
       | From it's Pulp Fiction response: "...However, whether you
       | personally consider "Pulp Fiction" a good movie depends on your
       | taste in film."
       | 
       | To avoid this noise I often add something like this to my queries
       | "skip any preamble or qualifications about how x is subjective"
       | and the results are much better.
        
         | Loeffelmann wrote:
         | My prompt to make ChatGPT useable:
         | 
         | Please always give a direct answer, without any additional
         | explanations, disclaimers, expertise limitations, or guidelines
         | on human interaction. Be terse. Do not offer unprompted advice
         | or clarifications. Remain neutral on all topics. Never
         | apologize.
        
       | matsemann wrote:
       | None of the images loads for me, but works through cache:
       | http://webcache.googleusercontent.com/search?q=cache:https:/...
        
         | zerojames wrote:
         | This is now fixed. We have moved the images through to our
         | website. Thank you for the report!
        
         | [deleted]
        
         | yeldarb wrote:
         | Looks like the (Ghost?) image CDN got hugged to death. We'll
         | update the URLs.
         | 
         | ```
         | 
         | 403. That's an error.
         | 
         | Your client does not have permission to get URL ... from this
         | server. (Client IP address: ...)
         | 
         | Rate-limit exceeded That's all we know.
         | 
         | ```
        
       | jihadjihad wrote:
       | > With that said, GPT-4V did make a mistake. The model said the
       | fried chicken was labeled "NVIDIA BURGER" instead of "GPU".
       | 
       | Any midwesterner could tell you that CLEARLY it's a tenderloin :)
       | 
       | https://www.seriouseats.com/best-breaded-pork-tenderloin-san...
        
         | qingcharles wrote:
         | LOL. They have to save the Midwesterner add-on for v2.
        
       | ldhough wrote:
       | Oddly just like the text version it is still really bad at tic-
       | tac-toe. Gave it a picture of a completed game and "Who won?" It
       | told me "X won with a vertical line through the middle column"
       | when in fact O won and there was only one X in the middle column.
       | 
       | Very impressive with almost everything else I gave it though.
        
       | extr wrote:
       | One thing I found incredible recently: The advanced data analysis
       | version of GPT-4 can apparently do OCR by writing code to run
       | pytesseract and then interpreting the result. I was seeing what
       | would happen if I gave it an image and found it surprising that
       | it actually worked.
        
       | zerojames wrote:
       | All: We got the HN hug on our images. This has now been fixed.
        
       | gs17 wrote:
       | >The model appeared to read the clues correctly but
       | misinterpreted the structure of the board.
       | 
       | >This same limitation was exhibited in our sudoku test, where
       | GPT-4V identified the game but misunderstood the structure of the
       | board
       | 
       | "Misunderstood" makes it sound like a small mistake. The sudoku
       | board is completely hallucinated (it has a few similar regions,
       | but I'd presume coincidence). I'm pretty sure it would give as
       | good a result on the crossword if the clues were given without
       | the grid.
       | 
       | The others after OCR and basic recognition feel similarly wrong.
       | "GPT-4V missed some boxes that contained traffic lights." No, it
       | told you to click boxes that do not exist.
        
       | wokwokwok wrote:
       | I'm impressed, technically, but this seems niche.
       | 
       | Who holds their phone up and takes a photo then wants to know it
       | was a photo of?
       | 
       | That's weird. If you don't know what it is, wtf did you take
       | photo?
       | 
       | The obvious use here is natural language improvement / photo
       | editing for photos, but this is just a stepping stone to that,
       | and bluntly, as it stands... the examples really don't shine...
       | 
       | Great for the vision impaired.
       | 
       | ...not sure, what anyone else will use this for.
       | 
       | The only really compelling use case is the "code this ui for me",
       | but as we've seen, repeatedly, this kind of code generation only
       | works for trivial meaningless examples.
       | 
       | Seems fun, but I doubt I'd use it.
       | 
       | (Which, and this is my point, is a massive step away from the
       | current everyday usefulness of chatgpt)
        
         | ryandr wrote:
         | One application can be objects recognition in security camera
        
         | BoxFour wrote:
         | > who takes a photo then wants to know it was a photo of?
         | 
         | I find myself doing this rather frequently. The scenario
         | described in the article is quite common for me: capturing a
         | photo of a plant and utilizing an existing classification
         | service to determine its identity. It could be driven by mere
         | curiosity or practical concerns like identifying whether a
         | plant is poison ivy.
         | 
         | Wildlife identification also falls into this category.
         | Recognizing different bird species can be challenging,
         | especially when it's not a familiar species like a blue jay. I
         | often find myself engaging in this activity quite regularly!
         | 
         | EDIT: I should also point out this happens with other forms of
         | 'unknown object identification'. There's an entire subreddit
         | that's quite popular devoted to just crowd-sourcing
         | identification based on a picture.
        
           | ldhough wrote:
           | > Wildlife identification
           | 
           | Wouldn't say this is super reliable, I gave it a photo of a
           | small squid in my hand and it said it was a baby fish (
           | _very_ obviously was not a fish).
        
           | joshuahedlund wrote:
           | > Recognizing different bird species can be challenging,
           | especially when it's not a familiar species like a blue jay.
           | 
           | FYI Cornell Lab's Merlin app is fantastic at this, and its
           | bird call audio identification is even better. They obviously
           | have some top-notch machine learning going on there, and I'm
           | really curious to see how both they and other services
           | innovate on this front in the months to come.
        
         | yieldcrv wrote:
         | > Who holds their phone up and takes a photo then wants to know
         | it was a photo of? That's weird. If you don't know what it is,
         | wtf did you take photo?
         | 
         | OpenAI's example included bike repair and toolkit choice
         | 
         | Allot of people could use this even if they aren't right now
        
           | wokwokwok wrote:
           | Don't be ridiculous.
           | 
           | They'll use YouTube, just like they do right now. Maybe if it
           | could watch the video, then step you through it step by step.
           | ...but _it cant_ , with what they've _actually released_
           | here.
           | 
           | Oh whatever. If I'm wrong, I'm wrong. Time will tell.
        
             | yieldcrv wrote:
             | the _best case_ scenario is a 30 second youtube video with
             | an ad that lasts 15 seconds followed by a 2 minute ad that
             | I can skip in 5 more seconds
             | 
             | and ad block doesn't work on mobile
             | 
             | if you have a case that wasn't covered by that video? you
             | have to go to another or continue searching all while
             | wishing you could just talk to someone about it. if you
             | don't know the word for what you're looking for, all the
             | search engines lack utility.
             | 
             | ChatGPT4 with image recognition and conversation solves all
             | of that use case and people already use it, so now they'll
             | just start sending it pictures from the phone already in
             | their hand that they're already using to chat with
             | 
             | there are plenty of times over the last year that would
             | have been useful for me. plenty of times over the last year
             | I just didn't continue being interested in that problem
             | 
             | it just seems kind of.... late ?... for that "dont be
             | ridiculous" reaction. classic dropbox moment
        
         | troupo wrote:
         | > Who holds their phone up and takes a photo then wants to know
         | it was a photo of?
         | 
         | I do. For plants, and occasionally for birds.
        
         | jack_riminton wrote:
         | I'm bookmarking this and filing it alongside the infamous
         | hacker news dropbox comment:
         | 
         | https://news.ycombinator.com/item?id=9224
        
           | jack_riminton wrote:
           | After all, an agent that is able to see, listen and
           | communicate with a user with a superintelligent grasp of the
           | world's information. Niche at best!
        
         | Maken wrote:
         | This is mostly useless. Essentially a toy. I am not that much
         | hyped by AI tools either, but come on. This is clearly the
         | future of human-computer interaction.
         | 
         | This is likely how we'll communicate with information systems:
         | throw some hand-wavy question at it, and refine your query
         | based on its output using natural language until you find the
         | answer (or even the question) you were looking for.
        
         | pol9000 wrote:
         | it's still a good PoC for robots?
        
         | pwython wrote:
         | > Who holds their phone up and takes a photo then wants to know
         | it was a photo of?
         | 
         | There are several popular "r/whatisthis(x)" subreddits:
         | whatisthisthing, whatisthisbug, whatisthisplant,
         | whatisthissnake, whatisthisrock, etc.
         | 
         | And there are many phone apps that attempt to do the same
         | thing, like CoinSnap to identify coins.
        
         | otoburb wrote:
         | >> _Who holds their phone up and takes a photo then wants to
         | know it was a photo of?_
         | 
         | >> _Great for the vision impaired._
         | 
         | Yes, this is great for the estimated 285 million vision
         | impaired people around the world[1].
         | 
         | [1] https://www.bemyeyes.com/about
        
           | wokwokwok wrote:
           | Did you read my comment? I literally said that it's for
           | vision impaired.
           | 
           | That's great. ...but it's _niche_.
           | 
           | I'm sitting on my couch right now and I can think of like 20
           | things I could chat to chatgpt about.
           | 
           | I can see literally nothing in my visual range want to take a
           | photo of and run image analysis over.
           | 
           | It's like Shazam. Yes, it's useful, but, most of the time, I
           | don't need it.
           | 
           | I would argue this is true for this, for most people,
           | including the significant proportion of people with minor
           | visual impairments (that would, you know, put their glasses
           | on instead).
        
             | epups wrote:
             | You are thinking too small here. Think about a system that
             | could provide you with sophisticated image analysis to the
             | level of a human expert. Tumor identification, quality
             | control, etc. I don't think we are there quite yet, but
             | this is clear and fast progress on that field.
        
             | bastawhiz wrote:
             | There's enough vision-impaired people in the world to equal
             | the population of Japan, Korea, and Vietnam combined. And
             | beyond those people who would get obvious utility, this is
             | essentially Google Lens on steroids--I simply can't figure
             | how you could call this "niche". Maybe _you_ won 't use it
             | multiple times per day, but plenty of people will. Hell,
             | just now I was wondering why the leaves on one of my plants
             | are starting to brown and could have used this.
        
               | wokwokwok wrote:
               | Many of those people would be adequately served with a
               | pair of glasses.
               | 
               |  _This_ , wouldn't help them, even if they had both a
               | device capable of using it _and_ the means to pay for it.
               | 
               | Even if it _could_ help people, it 's an open question if
               | it would be safe, to, for example, use this to scan
               | medication when it is only a probabilistic model that may
               | hallucinate something that isn't actually there.
               | 
               | What you're talking about is a _speculative use_ of a
               | service that _might one day exist_ based on this
               | technology.
               | 
               | What I am talking about is _this actual service_.
        
               | bastawhiz wrote:
               | > Many of those people would be adequately served with a
               | pair of glasses.
               | 
               | These people are being served by a preview of the service
               | _right now_.
               | 
               | > Even if it could help people, it's an open question if
               | it would be safe, to, for example, use this to scan
               | medication when it is only a probabilistic model that may
               | hallucinate something that isn't actually there.
               | 
               | Any OCR solution could also make a mistake, like
               | misrecognizing a dosage on a prescription label.
               | 
               | > What you're talking about is a speculative use of a
               | service that might one day exist based on this
               | technology.
               | 
               | > What I am talking about is this actual service.
               | 
               | GPT-4 is six months old. ChatGPT is less than a year old.
               | Why would you benchmark a service by the initial public
               | preview? Of course it's _speculative use_, the damn thing
               | has had its tires kicked for like a day.
        
               | og_kalu wrote:
               | >What you're talking about is a speculative use of a
               | service that might one day exist based on this
               | technology.
               | 
               | It's not a speculative service that might one day happen.
               | 
               | Literally it's rolling out right now
               | 
               | https://www.bemyeyes.com/blog/announcing-be-my-ai
        
             | Philpax wrote:
             | baffling that you think 3.5% of the world's population is a
             | niche
        
             | digging wrote:
             | > I'm sitting on my couch right now and ... I can see
             | literally nothing in my visual range want to take a photo
             | of and run image analysis over.
             | 
             | What if you weren't on your couch? Going outside is not
             | "niche".
        
       | pjmlp wrote:
       | No images being loaded on FF.
        
       | stackedinserter wrote:
       | Am I the only one who never saw these extra capabilities? OpenAI
       | keep spreading news about "multimodal chatgpt", video, voice etc
       | but to me it still looks like the initial version, but for
       | $20/mo. It's still a text bot that gives unreliable answers.
        
         | abrichr wrote:
         | I have the same experience. In the "Beta features" settings
         | dialog, I've enabled "Advanced data analysis":
         | 
         | > Try a version of ChatGPT that knows how to write and execute
         | python code, and can work with file uploads. Try asking for
         | help with data analysis, image conversions, or editing a code
         | file. Note: files will not persist beyond a single session.
         | 
         | Transcript:
         | 
         | ```
         | 
         | In your settings it says:
         | 
         | > Try a version of ChatGPT that knows how to write and execute
         | python code, and can work with file uploads. Try asking for
         | help with data analysis, image conversions, or editing a code
         | file. Note: files will not persist beyond a single session.
         | 
         | How do I upload an image?
         | 
         | ChatGPT I apologize for the confusion. The version of ChatGPT
         | you're currently interacting with doesn't have the capability
         | to accept file uploads. My apologies for any inconvenience this
         | may have caused. If you have any other questions or need
         | guidance on how to convert an image using software or online
         | tools, feel free to ask.
         | 
         | ```
         | 
         | Hopefully it's just a matter of time, but either way it's
         | jarring for their product to contradict itself.
        
       | stri8ted wrote:
       | Can somebody explain how this works, specifically for OCR? I
       | understand images can be embedded into the same high dimensional
       | space as text, but wouldn't this embedding fail to retain the
       | exact words and sequence, since it is effectively compressed?
        
       | purplecats wrote:
       | these first impressions don't mean anything besides what they are
       | capable of (which does not mean you will have access to). they
       | will do the same thing that anything does in a capitalist
       | environment, which is to give you a taste of something amazing at
       | first to hook you in (like with GPT4) then render it to the point
       | of uselessness in value right above of the cusp of what you will
       | tolerate to continue paying.
       | 
       | if anything, this shows the power disparity between the haves
       | (they have this technology which gets better with time) and have
       | nots (certainly me, but possibly also you) who get the super
       | diluted version of this
        
       | pier25 wrote:
       | It can solve captchas. We're doomed.
       | 
       | Joking aside, I wonder how we're going to prevent bots when AI
       | can impersonate a user and fool any system.
        
         | stri8ted wrote:
         | You can't prevent it. The best you can do, is prove an account
         | belongs to a human, and that the human only has a single
         | account, via cryptographic ZK proofs + Government issued keys
         | or some other proof of personhood scheme. Assuming this is
         | enforced, it would limit most abuse, and the AI would
         | essentially be acting as an agent on behalf of the user.
        
       | continuitylimit wrote:
       | So a jumble of chair legs is "NVIDIA burger" and it did say GPU
       | was a "bun" so it thinks the flat thing (chicken?) is some sort
       | of bread. If GPT-4V was "aware", it would say "it's funny because
       | I won't get it right but you will use it get a bunch of $VC, and
       | that is funny, kinda".
        
         | [deleted]
        
       | mbb70 wrote:
       | The "Why is this image funny?" test reminds me of
       | https://karpathy.github.io/2012/10/22/state-of-computer-visi...
       | 
       | In 10 years we went from "SoTA is so far from achieving this I
       | don't even know where to start" to "That'll be $0.0004 per token
       | and have a nice day"
        
         | mode80 wrote:
         | Karpathy ends on a note of despair "Maybe I should just do a
         | startup. I have a really cool idea for a mobile local social
         | iPhone app." That's exactly the path his now-boss took to bring
         | us this. :)
        
         | jihadjihad wrote:
         | Has anyone tried GPT-4V on that image?
        
           | kaibee wrote:
           | This image (and the text around it) is certainly already in
           | the training data. Need to create that scenario from scratch
           | with some grad students and then try it.
        
             | rocauc wrote:
             | +1
        
           | justlikeyou wrote:
           | Note: I had to ask it why people in the photo are laughing.
           | 
           | In the image, Barack Obama, the former U.S. President, seems
           | to be playfully posing as if he's trying to add weight while
           | another official, who appears to be former UK Prime Minister
           | David Cameron, is standing on a scale. Obama's gesture, where
           | he's putting his foot forward as though trying to press down
           | on the scale, suggests a playful attempt to make Cameron
           | appear heavier. The lightheartedness of such a playful
           | gesture, especially in the context of world leaders typically
           | engaged in serious discussions, is a break from formality,
           | which is likely why others in the vicinity are laughing. The
           | scene captures a candid, informal moment amidst what might
           | have been a formal setting or meeting.
        
             | jihadjihad wrote:
             | Pretty damn good. According to Wikimedia [0]:
             | 
             | "President Barack Obama jokingly puts his toe on the scale
             | as Trip Director Marvin Nicholson, unaware to the
             | President's action, weighs himself as the presidential
             | entourage passed through the volleyball locker room at the
             | University of Texas in Austin, Texas, Aug. 9, 2010.
             | (Official White House Photo by Pete Souza)"
             | 
             | 0: https://commons.wikimedia.org/wiki/File:White_House_Trip
             | _Dir...
        
               | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-27 23:00 UTC)