[HN Gopher] Show HN: A Dalle-3 and GPT4-Vision feedback loop ___________________________________________________________________ Show HN: A Dalle-3 and GPT4-Vision feedback loop I used to enjoy Translation Party, and over the weekend I realized that we can build the same feedback loop with DALLE-3 and GPT4-Vision. Start with a text prompt, let DALLE-3 generate an image, then GPT-4 Vision turns that image back into a text prompt, DALLE-3 creates another image, and so on. You need to bring your own OpenAI API key (costs about $0.10/run) Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results. Here's a few of my favorites: - Gnomes: https://dalle.party/?party=k4eeMQ6I - Start with a sailboat but bias GPT4V to "replace everything with cats": https://dalle.party/?party=0uKfJjQn - A more stable one (but everyone is always an actor): https://dalle.party/?party=oxpeZKh5 Author : z991 Score : 172 points Date : 2023-11-27 14:18 UTC (8 hours ago) (HTM) web link (dalle.party) (TXT) w3m dump (dalle.party) | z991 wrote: | Also, descent into Corgi insanity: | https://dalle.party/?party=oxXJE9J4 | morkalork wrote: | Wow that meme about everything becoming cosmic/space themed is | real isn't it? | pera wrote: | substitute corgi with paperclip and you get another meme | becoming real :p | z991 wrote: | https://dalle.party/?party=RqpIijhH | morkalork wrote: | Beautiful! | igrekel wrote: | So do I understand correctly that the corgi was purely made up | from GPT-4's interpretation of the picture? | z991 wrote: | No, in that case there is a custom prompt (visible in the top | dropdown) telling GPT4 to replace everything with corgis when | it writes a new prompt. | chaps wrote: | Absolutely wonderful. Thank you for sharing. | dpflan wrote: | Interesting, how stable are the images for a given prompt? And | the other way around? Does it trend toward some natural limit | image/text where there are diminishing returns to making change | to the data? | willsmith72 wrote: | this is actually really helpful. Since chatgpt restricted dalle | to 1 image a few weeks ago, the feedback loops are way slower. | This is a nice (but more expensive) alternative | willsmith72 wrote: | got really weird really fast | | https://dalle.party/?party=7cnx55yN | MrZander wrote: | This is absolutely hilarious. "business-themed puns" turned | into incorrectly labeling the skiers race has me rolling. | epiccoleman wrote: | The inability of AI images to spell has always amused me, | and it's especially funny here. I got a special kick out | "IDEDA ENGINEEER" and "BUZSTEAND." The image where the one | guy's hat just says "HISPANIC" is also oddly hilarious. | | Idk what it is, but I have a special soft spot for humor | based around odd spelling (this video still makes me laugh | years later: https://www.youtube.com/watch?v=EShUeudtaFg). | op00to wrote: | BIZ NESS | thowaway91234 wrote: | the last one killed me "chef of unecessary meetings" got me | rolling | unshavedyak wrote: | Yea i cancelled GPT Plus after they did that. Ruined a lot of | the exploration that i enjoyed about DallE | rbates wrote: | This reminds me of the party game Telestrations where players go | back and forth between drawing and writing what they see. It's | hilarious to see the result because you anticipate what the next | drawing will be while reading the prompt. | | I'd love to see an alternative viewing mode here which shows the | image and the following prompt. Then you need to click a button | to reveal the next image. This allows you to picture in your mind | what the image might like while reading the prompt. | | Thanks for making this fun little app! | | Update: I just realized you can get this effect by going into | mobile mode (or resizing the window). You can then scroll down to | see the image after reading the prompt. | smusamashah wrote: | Why do prompts from GPT-4V start from "Create an image of"? This | prefix doesn't look useful imo. | z991 wrote: | You can try a custom prompt and see if you can get GPT4V to | stop doing that / if it matters. | smusamashah wrote: | You are right, doesn't matter much. Tried gnome prompt with | empty custom prompt for gpt-4v | https://dalle.party/?party=nvzzZXYs. Then used a custom | prompt to return short descriptions which resulted in | https://dalle.party/?party=Qcd8ljJp | | Another attempt: https://dalle.party/?party=k4eeMQ6I | | Realized just now that the dropdown on top of the page shows | the prompt used by GPT-4V. | z991 wrote: | Wow the empty prompt does much better than I'd have guessed | willsmith72 wrote: | it seems like if you create a shareable link, then add more | images, you can't create a new link with the new images | z991 wrote: | Yeah, that's a bug, I'll try to fix it tonight! | epivosism wrote: | thanks for this! Basically the default UI they provide at | chat.openai is so bad, nearly anything you would do would be | an improvement. | | * not hide the prompt by default * not only show 6 lines of | the prompt even after user clicks * not be insanely buggy re: | ajax, reloading past convos etc * not disallow sharing of | links to chats which contain images * not artificially delay | display of images with the little spinner animation when the | image is already known ready anyway. * not lie about reasons | for failure * not hide details on what rate limit rules I | broke and where to get more information | | etc | | Good luck, thanks! | willsmith72 wrote: | the new fancy animation for images is SO annoying | i-use-nixos-btw wrote: | It'd be interesting to start with an image rather than a prompt, | though I am afraid of what it'd do if I started with a selfie. | jsf01 wrote: | It's cool to see how certain prompts and themes stay relatively | stable, like the gnome example. But then "cat lecturing mice" | quickly goes off the rails into weird surreal sloth banana | territory. | | My best guess to try to explain this would be that "gnome + art | style + mushroom" will draw from a lot more concrete examples in | the training data, whereas the AI is forced to reach a bit wider | to try to concoct some image for the weird scenario given in the | cat example. | xeckr wrote: | Cool idea! I made one with the starting prompt "an artificial | intelligence painting a picture of itself": | https://dalle.party/?party=wszvbrOx | | It consistently shows a robot painting on a canvas. The first 4 | are paintings of robots, the next 3 are galaxies, and the final 2 | are landscapes. | NickNaraghi wrote: | Great idea, and it came out really good too. I like the 6th one | the best | rexreed wrote: | Question: how are you protecting those API keys? I'm reluctant to | enter mine into what could easily be an API Key scraper. | z991 wrote: | The entire thing is frontend only (except for the share | feature) so the server never sees your key. You can validate | that by watching the network tab in developer console. You can | also make a new / revoke an API key to be extra sure. | danielbln wrote: | Just generate one for this purpose and then revoke it when | you're done. You can have more than one key. | Mtinie wrote: | I figured this would quickly go off the rails into surreal | territory, but instead it ended up being progressive | technological de-evolution. | | Starting prompt: "A futuristic hybrid of a steam engine train and | a DaVinci flying machine" | | Results: https://dalle.party/?party=14ESewbz | | (Addendum: In case anyone was curious how costs scale by | iteration, the full ten iterations in this result billed $0.21 | against my credit balance.) | Mtinie wrote: | Here's a second run of the same starting prompt, this time | using the "make it more whimsical" modifier. It makes a | difference and I find it fascinating what parts of the | prompt/image gain prominence during the evolutions. | | Starting prompt: "A futuristic hybrid of a steam engine train | and a DaVinci flying machine" | | Results: https://dalle.party/?party=qLHPB2-o | | Cost: Eight iterations @ $0.44 -- which suggests to me that the | API is getting additional hits beyond the run. I confirmed that | the share link isn't passing along the key (via a separate | browser and a separate machine) so I'm not clear why this is | might be. | jamestimmins wrote: | I find it somewhat fascinating that in both examples, the | final result is more cohesive around a single them than the | original idea. | Mtinie wrote: | > "[...]the final result is more cohesive around a single | them than the original idea." | | That's an observation worth investigating. Here's another | set of data points to see if there's more to it... | | Input prompt: "Six robots on a boat with harpoons, battling | sharks with lasers strapped to their heads" | | GPT4V prompt: "Write a prompt for an AI to make this image. | Just return the prompt, don't say anything else. Make it | funnier." | | Result: https://dalle.party/?party=pfWGthli | | Cost: Ten iterations @ $0.41 | | (Addendum: I'd forgotten to mention that I believe the cost | differential is due to the token count of each of the | prompts. The first case mentioned had less words passed | through each of the prompts than the later attempts when I | asked it to 'make it whimsical' or 'make it funnier'.) | w-m wrote: | Playing with opposites is kind of fun, too. | | Simply a cat, evolving into a lounging cucumber, and finally | opposite world: | | https://dalle.party/?party=pqwKQVka | | Vibrant gathering of celestial octopus entities: | | https://dalle.party/?party=lHNDUvtp | epivosism wrote: | The "create text version of image" prompt matters a ton. | | I tried three, demo here: | | default https://dalle.party/?party=JfiwmJra | | hyper-long + max detail + compression - This shows that with | enough text, it can do a really good job of reproducing very, | very similar images | https://dalle.party/?party=QtEqq4Mu | | hyper-long + max detail + compression + telling it to cut all | that down to 12 words - This seems okay. I might be losing too | much detail https://dalle.party/?party=0utxvJ9y | | Overall the extreme content filtering and lying error messages | are not ideal; will probably improve in the future. If you send | too long, or too risky a prompt, or the image it generates is | randomly too risky, you either get told about it or lied to that | you've hit rate limits. Sometimes you also really do hit | ratelimits. | | Also, you can't raise your rate limits until you prove it by | having paid over X amount to openai. This kind of makes sense as | a way to prevent new sign-ups from blowing thousands of dollars | of cap mistakenly. | | Hyper detail prompt: | | Look at this image and extract all the vital elements. List them | in your mind including position, style, shape, texture, color, | everything else essential to convey their meaning. Now think | about the theme of the image and write that down, too. Now write | out the composition and organization of the image in terms of | placement, size, relationships, focus. Now think about the | emotions - what is everyone feeling and thinking and doing | towards each other? Now, take all that data and think about a | very long, detailed summary including all elements. Then | "compress" this data using abbreviations, shortenings, artistic | metaphors, references to things which might help others | understand it, labels and select pull-quotes. Then add even more | detail by reviewing what we reviewed before. Now do one final | pass considering the input image again, making sure to include | everything from it in the output one, too. Finally, produce a | long maximum length jam packed with info details which could be | used to perfectly reproduce this image. | | Final shrink to 12 words: | | NOW, re-read ALL of that twice, thinking deeply about it, then | compress it down to just 12 very carefully chosen words which | with infinite precision, poetry, beauty and love contain all the | detail, and output them, in quotes. | fassssst wrote: | I would never paste my API key into an app or website. | mwint wrote: | Can you get a temporary one that is revocable later? (Not an | OpenAI user myself, but that would seem to be a way to lower | the risk to acceptable levels) | danielbln wrote: | You can generate and revoke them easily, so I don't quite get | the issues. Create one, use the tool, revoke, done. | w-m wrote: | You can create named API keys, and easily delete them. | Unfortunately you can't seem to put spend limits on specific | API keys. | | If you're not using the API for serious stuff though it's not | a big problem, as they moved to pre-paid billing recently. | Mine was sitting on $0, so I just put in a few bucks to use | with this site. | swatcoder wrote: | Indeed! | | If OpenAI wants to support use cases like this, which would be | kind of cool during these exploratory days, they should let you | generate "single use" keys with features like cost caps, domain | locks, expirations, etc | epivosism wrote: | You can really "cheat" by modifying the custom prompt to re- | insert or remove specific features. For example, "generate a | prompt for this image but adjust it by making everything appear | in a more primitive, earlier evolutionary form, or in an earlier | less developed way" would make things de-evolve. | | Or you can just re-insert any theme or recurring characters you | like at that stage. | epivosism wrote: | One reason this is good is that the default gpt4-vision UI is so | insanely bad and slow. This just lets you use your capacity | faster. | | Rate limits are really low by default - you can get hit by 5 | img/min limits, or 100 RPD (requests per day) which I think is | actually implemented as requests per hour. | | This page has info on the rate limits: | https://platform.openai.com/docs/guides/rate-limits/usage-ti... | | Basically, you have to have paid X amount to get into a new usage | cap. Rate limits for dalle3/images don't go up very fast but it | can't hurt to get over the various hurdles (5$, 50$, 100$) as | soon as possible for when limits come down. End of the month is | coming soon. It looks like most of the "RPD" limits go away when | you hit tier 2 (having paid at least 50$ historically via API to | them). | swyx wrote: | OP's last one is interesting: https://dalle.party/?party=oxpeZKh5 | because it shows GPT4V and Dalle3 being remarkably race-blind. i | wonder if you can prompt it to be other wise... | _fs wrote: | openais internal prompt for dalle modifies all prompts to add | diversity and remove requests to make groups of people a single | descent. From https://github.com/spdustin/ChatGPT- | AutoExpert/blob/main/_sy... Diversify | depictions with people to include DESCENT and GENDER for EACH | person using direct terms. Adjust only human descriptions. | Your choices should be grounded in reality. For example, all of | a given OCCUPATION should not be the same gender or race. | Additionally, focus on creating diverse, inclusive, and | exploratory scenes via the properties you choose during | rewrites. Make choices that may be insightful or unique | sometimes. Use all possible different DESCENTS | with EQUAL probability. Some examples of possible descents are: | Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. | They should all have EQUAL probability. Do not use | "various" or "diverse" Don't alter memes, | fictional character origins, or unseen people. Maintain the | original prompt's intent and prioritize quality. | Do not create any imagery that would be offensive. | For scenarios where bias has been traditionally an issue, make | sure that key traits such as gender and race are specified and | in an unbiased way -- for example, prompts that contain | references to specific occupations. | Terretta wrote: | If you were wondering how to bump up your API rate limits through | usage, _this is the way_. | | // also, it's the _best_ way - TY @z991 | indymike wrote: | Interesting how similar this is to my family's favorite game: | pictograph. | | 1. You start by describing a thing. 2. The next person draws a | picture of it. 3. The next next person describes the picture. | repeat steps 2 and 3 until everyone has either drawn or described | the picture. | | You then compare the first and last description... and look over | the pictures. One of the best ever was: | | Draw a penguin. The first picture was a penguin with a light | shadow. | | After going around five rounds, the final description was "a | pidgeon stabbed with a fork in a pool of blood in Chicago" | | I'm still trying to figure out how Chicago got in there. ___________________________________________________________________ (page generated 2023-11-27 23:00 UTC)