[HN Gopher] GPT Unicorn: A Daily Exploration of GPT-4's Image Ge...
       ___________________________________________________________________
        
       GPT Unicorn: A Daily Exploration of GPT-4's Image Generation
       Capabilities
        
       Author : imdsm
       Score  : 51 points
       Date   : 2023-04-13 20:40 UTC (2 hours ago)
        
 (HTM) web link (adamkdean.co.uk)
 (TXT) w3m dump (adamkdean.co.uk)
        
       | dr_dshiv wrote:
       | \\            \\             \\  ,__,              \\ (oo)____
       | (__)    )\\                      ||--|| *
       | 
       | "Draw an ASCII unicorn" (GPT4)
        
       | Mystery-Machine wrote:
       | Would be great if these days would have dates as well. Otherwise,
       | there's little use of "Day 69". If I could see "Day 69 (June 21
       | 2023)"
        
       | thomasfromcdnjs wrote:
       | Might as well make a Twitter account! or get AutoGPT to do it.
        
       | bestcoder69 wrote:
       | It can draw a penis fyi
        
       | MH15 wrote:
       | Would be useful if the prompts used to generate the drawing code
       | were included in the site.
        
         | abrichr wrote:
         | They appear to be here:
         | 
         | https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib...
         | { role: 'system', content: `You are a helpful assistant that
         | generates SVG drawings. You respond only with SVG. You do not
         | respond with text.` },           { role: 'user', content: `Draw
         | a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with
         | a single SVG string. Do not respond with conversation or
         | codeblocks.` }
        
           | dmix wrote:
           | Sadly it outputs raw svg code so you have to save it locally
           | as .svg to see it. Or just insert it into an HTML page via
           | devtools if you're lazy like me.
        
           | bee_rider wrote:
           | "You are a helpful assistant" seems like it is always
           | included in these sort of prompts. I wonder if it really
           | helps...
        
             | LeoPanthera wrote:
             | It's quite funny to tell it that it is an unhelpful
             | assistant. During the first few responses it is amusingly
             | obstinate.
             | 
             | It always seems to revert back to "helpful assistant" after
             | a few messages, whatever the prompt says.
        
             | Guillaume86 wrote:
             | It's too generic I think, my prompt immediately gave me a
             | better result that the ones in his post:
             | You are a SVG expert, when asked by the user to draw
             | something, you reply to the best of your ability with SVG
             | code that satisfies the request.
        
       | ShamelessC wrote:
       | As is noted in the paper from which this is inspired from:
       | GPT-4's image generation capabilities were severely diminished by
       | the instruction/safety-tuning process. Unfortunately this means
       | the currently available model from the API won't be very capable
       | - certainly not as capable as the early version of GPT-4 that
       | Microsoft had access to.
       | 
       | edit: I'm specifically referring to the "image generation by
       | trickery (e.g. SVG)" technique being diminished. Other tasks were
       | diminished as well though - is my understanding.
        
         | og_kalu wrote:
         | It's not just image generation the rlhf worsens too.
         | Calibration (confidence on solving a question in relation to
         | ability to solve that problem) went from excellent to non
         | existent. and you can see from the report that the base model
         | performed better on a number of tests. Basically a dumber
         | model.
        
           | tbalsam wrote:
           | Not dumber. More biased.
           | 
           | Important distinction, especially if we're looking to push
           | back out towards the Pareto Frontier of the problem.
           | 
           | RLHF is still very much in its infancy and does not maximize
           | the bias-variance tradeoff by a long shot, in my personal
           | experience.
        
             | og_kalu wrote:
             | No dumber. Sure more biased too if you want but also
             | dumber. Open ai have indicated as much.
        
               | psychphysic wrote:
               | Also generally less creative and insightful.
               | 
               | "No I won't do it" becomes a good option no matter what
               | if you turn safety too high.
        
             | ShamelessC wrote:
             | My understanding is that OpenAI did indeed find diminished
             | capability across a range of tasks after doing RLHF. You're
             | correct to question this though - as I believe the opposite
             | was true of GPT-3 where it improved certain tasks.
             | 
             | The benefits from a business perspective were still clear
             | however, and of course the instruction-tuned GPT-4 model
             | still outperformed GPT-3, in general.
             | 
             | There are probably some weird edge cases and nuances that
             | I'm missing - and I'd be happy to be corrected.
        
           | arthurcolle wrote:
           | Are you saying this specifically for the GPT-4 API endpoint
           | compared to idealized described GPT-4 from the paper?
        
             | og_kalu wrote:
             | yes the public api (or on paid chatGPT) vs the base model
             | from the paper
        
       | Varqu wrote:
       | Is anyone else also getting tired of seeing "GPT" prefix / suffix
       | in the name of 90% new AI-related products?
        
         | mustacheemperor wrote:
         | Given this is a process specifically to evaluate the changing
         | performance of GPT-4 over time, it seems appropriate.
        
         | squeaky-clean wrote:
         | This isn't a new AI product. It's a (seemingly auto updating)
         | blog entry about GPT-4
        
       | ansk wrote:
       | This is a great rorschach test. Show these four images to someone
       | hyping AI, and if they see evidence of a growing/emerging
       | intelligence, you can diagnose them as being wholly unqualified
       | to comment on anything related to AI.
        
         | syntaxing wrote:
         | I don't get it, wouldn't something like HuggingGPT be able to
         | command stable diffusion to do this? Just because GPT can't do
         | this natively doesn't mean it's not possible with the right
         | framework?
        
           | ansk wrote:
           | These images were all generated by an identical model. The
           | fact that this individual has convinced themself that the
           | model is improving indicates that they don't understand how
           | these models are trained and deployed. Furthermore, any
           | conclusions reached on such limited data reveal more about
           | one's predisposed opinions than anything about the nature of
           | the data. Show this person an ink blot and they very well may
           | see an image of a superintelligent AGI.
        
       | einpoklum wrote:
       | Perhaps you should ask it to draw you a sheep.
        
       | dangond wrote:
       | > The idea behind GPT Unicorn is quite simple: every day, GPT-4
       | will be asked to draw a unicorn in SVG format. This daily
       | interaction with the model will allow us to observe changes in
       | the model over time, as reflected in the output. Is it useful to
       | do this every day? Correct me if I'm wrong, but my understanding
       | is that OpenAI does not update the models available in production
       | incrementally on a day-to-day basis.
        
         | sacred_numbers wrote:
         | They do update the model in the background, although I'm not
         | sure how often or how much they update it. To avoid issues with
         | this practice they offer gpt-4-0314 which says this in the
         | documentation:
         | 
         | "Snapshot of gpt-4 from March 14th 2023. Unlike gpt-4, this
         | model will not receive updates, and will only be supported for
         | a three month period ending on June 14th 2023."
         | 
         | Unfortunately this experiment is using the frozen snapshot
         | model gpt-4-0314 instead of the unfrozen gpt-4 or gpt-4-32k
         | models, so any differences are literally 100% noise. This would
         | be a somewhat interesting experiment if someone were to use an
         | unfrozen model, though. I do appreciate the author for
         | captioning the images with the exact model they used for
         | generation so that this bug could be caught quickly.
         | 
         | [0]https://platform.openai.com/docs/models/gpt-4
        
         | charcircuit wrote:
         | Similarly the quality of the model can't be judged with a
         | single sample. These end up canceling out.
        
       | sp0rk wrote:
       | Did you generate a bunch all at once before starting to get some
       | idea of what the natural variance looks like? I would think it's
       | important to verify some level of progression over time, because
       | with the current four it seems entirely possible that the
       | examples could have all been generated at the same time with no
       | changes to the model.
        
         | gwern wrote:
         | Also unclear if he's sampling at temp=0. Looks like he doesn't
         | set a temp? https://github.com/adamkdean/gpt-
         | unicorn/blob/8ad76ec7161682... So not sure what he's really
         | doing.
        
         | ratg13 wrote:
         | Aren't they using the March 14 model like the general public?
         | 
         | It's frozen in time, there are no updates to it..
         | 
         | All of these will be drawn using the same model until they push
         | a new update, or you switch to a different GPT
         | 
         | But I already think they proved the point that the generation
         | is random enough that it would be extremely difficult to track
         | progress this way.
        
           | williamstein wrote:
           | GPT's output is by default somewhat random. If you ask the
           | same exact question several times, you'll potentially get
           | several different answers. Each successive word in the output
           | is chosen from a distribution of possibilities -- that
           | distribution is fixed, but that actual sample chosen from the
           | distribution is not fixed. See, e.g.,
           | https://platform.openai.com/docs/api-
           | reference/completions/c...
        
       | startupsfail wrote:
       | Sampling a single noisy sample from a model that doesn't update
       | that often is hardly correlated with the claim of "Daily
       | exploration".
        
       | dang wrote:
       | The unicorn example is discussed at length in Bubeck's recent
       | talk:
       | 
       | https://www.youtube.com/watch?v=qbIk7-JPB2c#t=22m6s
        
       | dmix wrote:
       | Why would the model change over time when asking the same
       | question? Just it's generation dataset for generating similar
       | images? Or is this just tracking GPT's explicit model
       | improvements over time?
        
         | pps wrote:
         | "GPT 5 Will be Released 'Incrementally' - 5 Points from
         | Brockman Statement" -
         | https://www.youtube.com/watch?v=1NAmLp5i4Ps
        
           | atleastoptimal wrote:
           | gpt-4-0314 is a snapshot model and won't be updated, they
           | shouldn't use that for this experiment.
        
         | tbalsam wrote:
         | The models seem to have been changing in the background, though
         | as another commenter pointed out.... having a variance-
         | calibrayion baseline for humans would be great too. :'))))
        
       | m3kw9 wrote:
       | Are they banking on OpenAI updating their model every day, or
       | just prompting the same thing everyday wishing for a different
       | outcome?
        
         | qumpis wrote:
         | In the "sparks of AGI" paper, authors noted that the unicorn
         | shape degrees as more "alignment" is injected to to. If openai
         | adjust the model (say by training more), the picture should
         | reflect it. If they make the model be more "aligned", it should
         | reflect as well.
         | 
         | So I'd guess the answer is the former.
        
       | atleastoptimal wrote:
       | if GPT-4 will update based on recent web training data, the fact
       | that people are bringing much more attention to the "draw a
       | unicorn" task magnifies the chance someone will have posted a
       | perfect version of an svg unicorn, leading the model to leverage
       | that rather than the aim of this experiment which I imagine is
       | GPT-4's capacity to extrapolate.
       | 
       | EDIT: Also it makes no sense to constantly retry it every day on
       | the gpt-4-0314 model, since OpenAI specified that that is a
       | snapshot model that will not be updated.
        
       ___________________________________________________________________
       (page generated 2023-04-13 23:00 UTC)