[HN Gopher] Show HN: A Dalle-3 and GPT4-Vision feedback loop
       ___________________________________________________________________
        
       Show HN: A Dalle-3 and GPT4-Vision feedback loop
        
       I used to enjoy Translation Party, and over the weekend I realized
       that we can build the same feedback loop with DALLE-3 and
       GPT4-Vision. Start with a text prompt, let DALLE-3 generate an
       image, then GPT-4 Vision turns that image back into a text prompt,
       DALLE-3 creates another image, and so on.  You need to bring your
       own OpenAI API key (costs about $0.10/run)  Some prompts are very
       stable, others go wild. If you bias GPT4's prompting by telling it
       to "make it weird" you can get crazy results.  Here's a few of my
       favorites:  - Gnomes: https://dalle.party/?party=k4eeMQ6I  - Start
       with a sailboat but bias GPT4V to "replace everything with cats":
       https://dalle.party/?party=0uKfJjQn  - A more stable one (but
       everyone is always an actor): https://dalle.party/?party=oxpeZKh5
        
       Author : z991
       Score  : 172 points
       Date   : 2023-11-27 14:18 UTC (8 hours ago)
        
 (HTM) web link (dalle.party)
 (TXT) w3m dump (dalle.party)
        
       | z991 wrote:
       | Also, descent into Corgi insanity:
       | https://dalle.party/?party=oxXJE9J4
        
         | morkalork wrote:
         | Wow that meme about everything becoming cosmic/space themed is
         | real isn't it?
        
           | pera wrote:
           | substitute corgi with paperclip and you get another meme
           | becoming real :p
        
             | z991 wrote:
             | https://dalle.party/?party=RqpIijhH
        
               | morkalork wrote:
               | Beautiful!
        
         | igrekel wrote:
         | So do I understand correctly that the corgi was purely made up
         | from GPT-4's interpretation of the picture?
        
           | z991 wrote:
           | No, in that case there is a custom prompt (visible in the top
           | dropdown) telling GPT4 to replace everything with corgis when
           | it writes a new prompt.
        
         | chaps wrote:
         | Absolutely wonderful. Thank you for sharing.
        
       | dpflan wrote:
       | Interesting, how stable are the images for a given prompt? And
       | the other way around? Does it trend toward some natural limit
       | image/text where there are diminishing returns to making change
       | to the data?
        
       | willsmith72 wrote:
       | this is actually really helpful. Since chatgpt restricted dalle
       | to 1 image a few weeks ago, the feedback loops are way slower.
       | This is a nice (but more expensive) alternative
        
         | willsmith72 wrote:
         | got really weird really fast
         | 
         | https://dalle.party/?party=7cnx55yN
        
           | MrZander wrote:
           | This is absolutely hilarious. "business-themed puns" turned
           | into incorrectly labeling the skiers race has me rolling.
        
             | epiccoleman wrote:
             | The inability of AI images to spell has always amused me,
             | and it's especially funny here. I got a special kick out
             | "IDEDA ENGINEEER" and "BUZSTEAND." The image where the one
             | guy's hat just says "HISPANIC" is also oddly hilarious.
             | 
             | Idk what it is, but I have a special soft spot for humor
             | based around odd spelling (this video still makes me laugh
             | years later: https://www.youtube.com/watch?v=EShUeudtaFg).
        
             | op00to wrote:
             | BIZ NESS
        
           | thowaway91234 wrote:
           | the last one killed me "chef of unecessary meetings" got me
           | rolling
        
         | unshavedyak wrote:
         | Yea i cancelled GPT Plus after they did that. Ruined a lot of
         | the exploration that i enjoyed about DallE
        
       | rbates wrote:
       | This reminds me of the party game Telestrations where players go
       | back and forth between drawing and writing what they see. It's
       | hilarious to see the result because you anticipate what the next
       | drawing will be while reading the prompt.
       | 
       | I'd love to see an alternative viewing mode here which shows the
       | image and the following prompt. Then you need to click a button
       | to reveal the next image. This allows you to picture in your mind
       | what the image might like while reading the prompt.
       | 
       | Thanks for making this fun little app!
       | 
       | Update: I just realized you can get this effect by going into
       | mobile mode (or resizing the window). You can then scroll down to
       | see the image after reading the prompt.
        
       | smusamashah wrote:
       | Why do prompts from GPT-4V start from "Create an image of"? This
       | prefix doesn't look useful imo.
        
         | z991 wrote:
         | You can try a custom prompt and see if you can get GPT4V to
         | stop doing that / if it matters.
        
           | smusamashah wrote:
           | You are right, doesn't matter much. Tried gnome prompt with
           | empty custom prompt for gpt-4v
           | https://dalle.party/?party=nvzzZXYs. Then used a custom
           | prompt to return short descriptions which resulted in
           | https://dalle.party/?party=Qcd8ljJp
           | 
           | Another attempt: https://dalle.party/?party=k4eeMQ6I
           | 
           | Realized just now that the dropdown on top of the page shows
           | the prompt used by GPT-4V.
        
             | z991 wrote:
             | Wow the empty prompt does much better than I'd have guessed
        
       | willsmith72 wrote:
       | it seems like if you create a shareable link, then add more
       | images, you can't create a new link with the new images
        
         | z991 wrote:
         | Yeah, that's a bug, I'll try to fix it tonight!
        
           | epivosism wrote:
           | thanks for this! Basically the default UI they provide at
           | chat.openai is so bad, nearly anything you would do would be
           | an improvement.
           | 
           | * not hide the prompt by default * not only show 6 lines of
           | the prompt even after user clicks * not be insanely buggy re:
           | ajax, reloading past convos etc * not disallow sharing of
           | links to chats which contain images * not artificially delay
           | display of images with the little spinner animation when the
           | image is already known ready anyway. * not lie about reasons
           | for failure * not hide details on what rate limit rules I
           | broke and where to get more information
           | 
           | etc
           | 
           | Good luck, thanks!
        
             | willsmith72 wrote:
             | the new fancy animation for images is SO annoying
        
       | i-use-nixos-btw wrote:
       | It'd be interesting to start with an image rather than a prompt,
       | though I am afraid of what it'd do if I started with a selfie.
        
       | jsf01 wrote:
       | It's cool to see how certain prompts and themes stay relatively
       | stable, like the gnome example. But then "cat lecturing mice"
       | quickly goes off the rails into weird surreal sloth banana
       | territory.
       | 
       | My best guess to try to explain this would be that "gnome + art
       | style + mushroom" will draw from a lot more concrete examples in
       | the training data, whereas the AI is forced to reach a bit wider
       | to try to concoct some image for the weird scenario given in the
       | cat example.
        
       | xeckr wrote:
       | Cool idea! I made one with the starting prompt "an artificial
       | intelligence painting a picture of itself":
       | https://dalle.party/?party=wszvbrOx
       | 
       | It consistently shows a robot painting on a canvas. The first 4
       | are paintings of robots, the next 3 are galaxies, and the final 2
       | are landscapes.
        
         | NickNaraghi wrote:
         | Great idea, and it came out really good too. I like the 6th one
         | the best
        
       | rexreed wrote:
       | Question: how are you protecting those API keys? I'm reluctant to
       | enter mine into what could easily be an API Key scraper.
        
         | z991 wrote:
         | The entire thing is frontend only (except for the share
         | feature) so the server never sees your key. You can validate
         | that by watching the network tab in developer console. You can
         | also make a new / revoke an API key to be extra sure.
        
         | danielbln wrote:
         | Just generate one for this purpose and then revoke it when
         | you're done. You can have more than one key.
        
       | Mtinie wrote:
       | I figured this would quickly go off the rails into surreal
       | territory, but instead it ended up being progressive
       | technological de-evolution.
       | 
       | Starting prompt: "A futuristic hybrid of a steam engine train and
       | a DaVinci flying machine"
       | 
       | Results: https://dalle.party/?party=14ESewbz
       | 
       | (Addendum: In case anyone was curious how costs scale by
       | iteration, the full ten iterations in this result billed $0.21
       | against my credit balance.)
        
         | Mtinie wrote:
         | Here's a second run of the same starting prompt, this time
         | using the "make it more whimsical" modifier. It makes a
         | difference and I find it fascinating what parts of the
         | prompt/image gain prominence during the evolutions.
         | 
         | Starting prompt: "A futuristic hybrid of a steam engine train
         | and a DaVinci flying machine"
         | 
         | Results: https://dalle.party/?party=qLHPB2-o
         | 
         | Cost: Eight iterations @ $0.44 -- which suggests to me that the
         | API is getting additional hits beyond the run. I confirmed that
         | the share link isn't passing along the key (via a separate
         | browser and a separate machine) so I'm not clear why this is
         | might be.
        
           | jamestimmins wrote:
           | I find it somewhat fascinating that in both examples, the
           | final result is more cohesive around a single them than the
           | original idea.
        
             | Mtinie wrote:
             | > "[...]the final result is more cohesive around a single
             | them than the original idea."
             | 
             | That's an observation worth investigating. Here's another
             | set of data points to see if there's more to it...
             | 
             | Input prompt: "Six robots on a boat with harpoons, battling
             | sharks with lasers strapped to their heads"
             | 
             | GPT4V prompt: "Write a prompt for an AI to make this image.
             | Just return the prompt, don't say anything else. Make it
             | funnier."
             | 
             | Result: https://dalle.party/?party=pfWGthli
             | 
             | Cost: Ten iterations @ $0.41
             | 
             | (Addendum: I'd forgotten to mention that I believe the cost
             | differential is due to the token count of each of the
             | prompts. The first case mentioned had less words passed
             | through each of the prompts than the later attempts when I
             | asked it to 'make it whimsical' or 'make it funnier'.)
        
       | w-m wrote:
       | Playing with opposites is kind of fun, too.
       | 
       | Simply a cat, evolving into a lounging cucumber, and finally
       | opposite world:
       | 
       | https://dalle.party/?party=pqwKQVka
       | 
       | Vibrant gathering of celestial octopus entities:
       | 
       | https://dalle.party/?party=lHNDUvtp
        
       | epivosism wrote:
       | The "create text version of image" prompt matters a ton.
       | 
       | I tried three, demo here:
       | 
       | default                 https://dalle.party/?party=JfiwmJra
       | 
       | hyper-long + max detail + compression - This shows that with
       | enough text, it can do a really good job of reproducing very,
       | very similar images
       | https://dalle.party/?party=QtEqq4Mu
       | 
       | hyper-long + max detail + compression + telling it to cut all
       | that down to 12 words - This seems okay. I might be losing too
       | much detail                 https://dalle.party/?party=0utxvJ9y
       | 
       | Overall the extreme content filtering and lying error messages
       | are not ideal; will probably improve in the future. If you send
       | too long, or too risky a prompt, or the image it generates is
       | randomly too risky, you either get told about it or lied to that
       | you've hit rate limits. Sometimes you also really do hit
       | ratelimits.
       | 
       | Also, you can't raise your rate limits until you prove it by
       | having paid over X amount to openai. This kind of makes sense as
       | a way to prevent new sign-ups from blowing thousands of dollars
       | of cap mistakenly.
       | 
       | Hyper detail prompt:
       | 
       | Look at this image and extract all the vital elements. List them
       | in your mind including position, style, shape, texture, color,
       | everything else essential to convey their meaning. Now think
       | about the theme of the image and write that down, too. Now write
       | out the composition and organization of the image in terms of
       | placement, size, relationships, focus. Now think about the
       | emotions - what is everyone feeling and thinking and doing
       | towards each other? Now, take all that data and think about a
       | very long, detailed summary including all elements. Then
       | "compress" this data using abbreviations, shortenings, artistic
       | metaphors, references to things which might help others
       | understand it, labels and select pull-quotes. Then add even more
       | detail by reviewing what we reviewed before. Now do one final
       | pass considering the input image again, making sure to include
       | everything from it in the output one, too. Finally, produce a
       | long maximum length jam packed with info details which could be
       | used to perfectly reproduce this image.
       | 
       | Final shrink to 12 words:
       | 
       | NOW, re-read ALL of that twice, thinking deeply about it, then
       | compress it down to just 12 very carefully chosen words which
       | with infinite precision, poetry, beauty and love contain all the
       | detail, and output them, in quotes.
        
       | fassssst wrote:
       | I would never paste my API key into an app or website.
        
         | mwint wrote:
         | Can you get a temporary one that is revocable later? (Not an
         | OpenAI user myself, but that would seem to be a way to lower
         | the risk to acceptable levels)
        
           | danielbln wrote:
           | You can generate and revoke them easily, so I don't quite get
           | the issues. Create one, use the tool, revoke, done.
        
           | w-m wrote:
           | You can create named API keys, and easily delete them.
           | Unfortunately you can't seem to put spend limits on specific
           | API keys.
           | 
           | If you're not using the API for serious stuff though it's not
           | a big problem, as they moved to pre-paid billing recently.
           | Mine was sitting on $0, so I just put in a few bucks to use
           | with this site.
        
         | swatcoder wrote:
         | Indeed!
         | 
         | If OpenAI wants to support use cases like this, which would be
         | kind of cool during these exploratory days, they should let you
         | generate "single use" keys with features like cost caps, domain
         | locks, expirations, etc
        
       | epivosism wrote:
       | You can really "cheat" by modifying the custom prompt to re-
       | insert or remove specific features. For example, "generate a
       | prompt for this image but adjust it by making everything appear
       | in a more primitive, earlier evolutionary form, or in an earlier
       | less developed way" would make things de-evolve.
       | 
       | Or you can just re-insert any theme or recurring characters you
       | like at that stage.
        
       | epivosism wrote:
       | One reason this is good is that the default gpt4-vision UI is so
       | insanely bad and slow. This just lets you use your capacity
       | faster.
       | 
       | Rate limits are really low by default - you can get hit by 5
       | img/min limits, or 100 RPD (requests per day) which I think is
       | actually implemented as requests per hour.
       | 
       | This page has info on the rate limits:
       | https://platform.openai.com/docs/guides/rate-limits/usage-ti...
       | 
       | Basically, you have to have paid X amount to get into a new usage
       | cap. Rate limits for dalle3/images don't go up very fast but it
       | can't hurt to get over the various hurdles (5$, 50$, 100$) as
       | soon as possible for when limits come down. End of the month is
       | coming soon. It looks like most of the "RPD" limits go away when
       | you hit tier 2 (having paid at least 50$ historically via API to
       | them).
        
       | swyx wrote:
       | OP's last one is interesting: https://dalle.party/?party=oxpeZKh5
       | because it shows GPT4V and Dalle3 being remarkably race-blind. i
       | wonder if you can prompt it to be other wise...
        
         | _fs wrote:
         | openais internal prompt for dalle modifies all prompts to add
         | diversity and remove requests to make groups of people a single
         | descent. From https://github.com/spdustin/ChatGPT-
         | AutoExpert/blob/main/_sy...                   Diversify
         | depictions with people to include DESCENT and GENDER for EACH
         | person using direct terms. Adjust only human descriptions.
         | Your choices should be grounded in reality. For example, all of
         | a given OCCUPATION should not be the same gender or race.
         | Additionally, focus on creating diverse, inclusive, and
         | exploratory scenes via the properties you choose during
         | rewrites. Make choices that may be insightful or unique
         | sometimes.              Use all possible different DESCENTS
         | with EQUAL probability. Some examples of possible descents are:
         | Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White.
         | They should all have EQUAL probability.              Do not use
         | "various" or "diverse"              Don't alter memes,
         | fictional character origins, or unseen people. Maintain the
         | original prompt's intent and prioritize quality.
         | Do not create any imagery that would be offensive.
         | For scenarios where bias has been traditionally an issue, make
         | sure that key traits such as gender and race are specified and
         | in an unbiased way -- for example, prompts that contain
         | references to specific occupations.
        
       | Terretta wrote:
       | If you were wondering how to bump up your API rate limits through
       | usage, _this is the way_.
       | 
       | // also, it's the _best_ way - TY @z991
        
       | indymike wrote:
       | Interesting how similar this is to my family's favorite game:
       | pictograph.
       | 
       | 1. You start by describing a thing. 2. The next person draws a
       | picture of it. 3. The next next person describes the picture.
       | repeat steps 2 and 3 until everyone has either drawn or described
       | the picture.
       | 
       | You then compare the first and last description... and look over
       | the pictures. One of the best ever was:
       | 
       | Draw a penguin. The first picture was a penguin with a light
       | shadow.
       | 
       | After going around five rounds, the final description was "a
       | pidgeon stabbed with a fork in a pool of blood in Chicago"
       | 
       | I'm still trying to figure out how Chicago got in there.
        
       ___________________________________________________________________
       (page generated 2023-11-27 23:00 UTC)