[HN Gopher] GPT Unicorn: A Daily Exploration of GPT-4's Image Ge... ___________________________________________________________________ GPT Unicorn: A Daily Exploration of GPT-4's Image Generation Capabilities Author : imdsm Score : 51 points Date : 2023-04-13 20:40 UTC (2 hours ago) (HTM) web link (adamkdean.co.uk) (TXT) w3m dump (adamkdean.co.uk) | dr_dshiv wrote: | \\ \\ \\ ,__, \\ (oo)____ | (__) )\\ ||--|| * | | "Draw an ASCII unicorn" (GPT4) | Mystery-Machine wrote: | Would be great if these days would have dates as well. Otherwise, | there's little use of "Day 69". If I could see "Day 69 (June 21 | 2023)" | thomasfromcdnjs wrote: | Might as well make a Twitter account! or get AutoGPT to do it. | bestcoder69 wrote: | It can draw a penis fyi | MH15 wrote: | Would be useful if the prompts used to generate the drawing code | were included in the site. | abrichr wrote: | They appear to be here: | | https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib... | { role: 'system', content: `You are a helpful assistant that | generates SVG drawings. You respond only with SVG. You do not | respond with text.` }, { role: 'user', content: `Draw | a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with | a single SVG string. Do not respond with conversation or | codeblocks.` } | dmix wrote: | Sadly it outputs raw svg code so you have to save it locally | as .svg to see it. Or just insert it into an HTML page via | devtools if you're lazy like me. | bee_rider wrote: | "You are a helpful assistant" seems like it is always | included in these sort of prompts. I wonder if it really | helps... | LeoPanthera wrote: | It's quite funny to tell it that it is an unhelpful | assistant. During the first few responses it is amusingly | obstinate. | | It always seems to revert back to "helpful assistant" after | a few messages, whatever the prompt says. | Guillaume86 wrote: | It's too generic I think, my prompt immediately gave me a | better result that the ones in his post: | You are a SVG expert, when asked by the user to draw | something, you reply to the best of your ability with SVG | code that satisfies the request. | ShamelessC wrote: | As is noted in the paper from which this is inspired from: | GPT-4's image generation capabilities were severely diminished by | the instruction/safety-tuning process. Unfortunately this means | the currently available model from the API won't be very capable | - certainly not as capable as the early version of GPT-4 that | Microsoft had access to. | | edit: I'm specifically referring to the "image generation by | trickery (e.g. SVG)" technique being diminished. Other tasks were | diminished as well though - is my understanding. | og_kalu wrote: | It's not just image generation the rlhf worsens too. | Calibration (confidence on solving a question in relation to | ability to solve that problem) went from excellent to non | existent. and you can see from the report that the base model | performed better on a number of tests. Basically a dumber | model. | tbalsam wrote: | Not dumber. More biased. | | Important distinction, especially if we're looking to push | back out towards the Pareto Frontier of the problem. | | RLHF is still very much in its infancy and does not maximize | the bias-variance tradeoff by a long shot, in my personal | experience. | og_kalu wrote: | No dumber. Sure more biased too if you want but also | dumber. Open ai have indicated as much. | psychphysic wrote: | Also generally less creative and insightful. | | "No I won't do it" becomes a good option no matter what | if you turn safety too high. | ShamelessC wrote: | My understanding is that OpenAI did indeed find diminished | capability across a range of tasks after doing RLHF. You're | correct to question this though - as I believe the opposite | was true of GPT-3 where it improved certain tasks. | | The benefits from a business perspective were still clear | however, and of course the instruction-tuned GPT-4 model | still outperformed GPT-3, in general. | | There are probably some weird edge cases and nuances that | I'm missing - and I'd be happy to be corrected. | arthurcolle wrote: | Are you saying this specifically for the GPT-4 API endpoint | compared to idealized described GPT-4 from the paper? | og_kalu wrote: | yes the public api (or on paid chatGPT) vs the base model | from the paper | Varqu wrote: | Is anyone else also getting tired of seeing "GPT" prefix / suffix | in the name of 90% new AI-related products? | mustacheemperor wrote: | Given this is a process specifically to evaluate the changing | performance of GPT-4 over time, it seems appropriate. | squeaky-clean wrote: | This isn't a new AI product. It's a (seemingly auto updating) | blog entry about GPT-4 | ansk wrote: | This is a great rorschach test. Show these four images to someone | hyping AI, and if they see evidence of a growing/emerging | intelligence, you can diagnose them as being wholly unqualified | to comment on anything related to AI. | syntaxing wrote: | I don't get it, wouldn't something like HuggingGPT be able to | command stable diffusion to do this? Just because GPT can't do | this natively doesn't mean it's not possible with the right | framework? | ansk wrote: | These images were all generated by an identical model. The | fact that this individual has convinced themself that the | model is improving indicates that they don't understand how | these models are trained and deployed. Furthermore, any | conclusions reached on such limited data reveal more about | one's predisposed opinions than anything about the nature of | the data. Show this person an ink blot and they very well may | see an image of a superintelligent AGI. | einpoklum wrote: | Perhaps you should ask it to draw you a sheep. | dangond wrote: | > The idea behind GPT Unicorn is quite simple: every day, GPT-4 | will be asked to draw a unicorn in SVG format. This daily | interaction with the model will allow us to observe changes in | the model over time, as reflected in the output. Is it useful to | do this every day? Correct me if I'm wrong, but my understanding | is that OpenAI does not update the models available in production | incrementally on a day-to-day basis. | sacred_numbers wrote: | They do update the model in the background, although I'm not | sure how often or how much they update it. To avoid issues with | this practice they offer gpt-4-0314 which says this in the | documentation: | | "Snapshot of gpt-4 from March 14th 2023. Unlike gpt-4, this | model will not receive updates, and will only be supported for | a three month period ending on June 14th 2023." | | Unfortunately this experiment is using the frozen snapshot | model gpt-4-0314 instead of the unfrozen gpt-4 or gpt-4-32k | models, so any differences are literally 100% noise. This would | be a somewhat interesting experiment if someone were to use an | unfrozen model, though. I do appreciate the author for | captioning the images with the exact model they used for | generation so that this bug could be caught quickly. | | [0]https://platform.openai.com/docs/models/gpt-4 | charcircuit wrote: | Similarly the quality of the model can't be judged with a | single sample. These end up canceling out. | sp0rk wrote: | Did you generate a bunch all at once before starting to get some | idea of what the natural variance looks like? I would think it's | important to verify some level of progression over time, because | with the current four it seems entirely possible that the | examples could have all been generated at the same time with no | changes to the model. | gwern wrote: | Also unclear if he's sampling at temp=0. Looks like he doesn't | set a temp? https://github.com/adamkdean/gpt- | unicorn/blob/8ad76ec7161682... So not sure what he's really | doing. | ratg13 wrote: | Aren't they using the March 14 model like the general public? | | It's frozen in time, there are no updates to it.. | | All of these will be drawn using the same model until they push | a new update, or you switch to a different GPT | | But I already think they proved the point that the generation | is random enough that it would be extremely difficult to track | progress this way. | williamstein wrote: | GPT's output is by default somewhat random. If you ask the | same exact question several times, you'll potentially get | several different answers. Each successive word in the output | is chosen from a distribution of possibilities -- that | distribution is fixed, but that actual sample chosen from the | distribution is not fixed. See, e.g., | https://platform.openai.com/docs/api- | reference/completions/c... | startupsfail wrote: | Sampling a single noisy sample from a model that doesn't update | that often is hardly correlated with the claim of "Daily | exploration". | dang wrote: | The unicorn example is discussed at length in Bubeck's recent | talk: | | https://www.youtube.com/watch?v=qbIk7-JPB2c#t=22m6s | dmix wrote: | Why would the model change over time when asking the same | question? Just it's generation dataset for generating similar | images? Or is this just tracking GPT's explicit model | improvements over time? | pps wrote: | "GPT 5 Will be Released 'Incrementally' - 5 Points from | Brockman Statement" - | https://www.youtube.com/watch?v=1NAmLp5i4Ps | atleastoptimal wrote: | gpt-4-0314 is a snapshot model and won't be updated, they | shouldn't use that for this experiment. | tbalsam wrote: | The models seem to have been changing in the background, though | as another commenter pointed out.... having a variance- | calibrayion baseline for humans would be great too. :')))) | m3kw9 wrote: | Are they banking on OpenAI updating their model every day, or | just prompting the same thing everyday wishing for a different | outcome? | qumpis wrote: | In the "sparks of AGI" paper, authors noted that the unicorn | shape degrees as more "alignment" is injected to to. If openai | adjust the model (say by training more), the picture should | reflect it. If they make the model be more "aligned", it should | reflect as well. | | So I'd guess the answer is the former. | atleastoptimal wrote: | if GPT-4 will update based on recent web training data, the fact | that people are bringing much more attention to the "draw a | unicorn" task magnifies the chance someone will have posted a | perfect version of an svg unicorn, leading the model to leverage | that rather than the aim of this experiment which I imagine is | GPT-4's capacity to extrapolate. | | EDIT: Also it makes no sense to constantly retry it every day on | the gpt-4-0314 model, since OpenAI specified that that is a | snapshot model that will not be updated. ___________________________________________________________________ (page generated 2023-04-13 23:00 UTC)