[HN Gopher] Q* Hypothesis: Enhancing Reasoning, Rewards, and Syn...
       ___________________________________________________________________
        
       Q* Hypothesis: Enhancing Reasoning, Rewards, and Synthetic Data
        
       Author : Jimmc414
       Score  : 83 points
       Date   : 2023-11-24 19:02 UTC (3 hours ago)
        
 (HTM) web link (www.interconnects.ai)
 (TXT) w3m dump (www.interconnects.ai)
        
       | romesc wrote:
       | Sure A* is awesome, but taking the "star" and immediately
       | attributing it to A* is probably a bridge too far.
       | 
       | Q* or any X* for that matter is extremely common for referring to
       | the optimal function under certain assumptions. (usually cost /
       | reward structure).
        
         | tunesmith wrote:
         | Yeah I just saw the video from that researcher (later an OpenAI
         | researcher?) that talked about it back in 2016... not that I
         | understood much, but it definitely seemed that Q* was a
         | generalization of the Q algorithm described on the previous
         | slide. The optimum something across all somethings.
        
           | maaaaattttt wrote:
           | If you have the possibility I would be quite interested in a
           | link to the video or alternatively the name of the researcher
           | you mention.
        
           | resource0x wrote:
           | LeCun: Please ignore the deluge of complete nonsense about
           | Q*. https://twitter.com/ylecun/status/1728126868342145481
        
       | Zolde wrote:
       | It will be nice to see the breakthroughs resulting from what
       | people _believed_ Q* to have been.
        
         | erikaww wrote:
         | certainly more things to throw at the wall! Excited to see the
         | "accidental" progress
        
         | bschne wrote:
         | I love this take. Reminds me of how the Mechanical Turk
         | apparently indirectly inspired someone to build a weaving
         | machine b/c "how hard could it be if machines can play chess"
         | -- https://x.com/gordonbrander/status/1385245747071787008?s=20
        
       | spicyusername wrote:
       | I have trouble believing this isn't just a sneaky marketing
       | campaign.
        
         | dmix wrote:
         | Nothing OpenAI has released product-wise (ChatGPT, Dall-E) has
         | required 'marketing'. The value speaks for itself. People
         | raving about it on twitter, telling their friends/coworkers,
         | and journos documenting their explorations is more than enough.
         | 
         | If this was an extremely competitive market that'd be more
         | plausible. But they enjoy some pretty serious dominance and are
         | struggling to handle the growth they already have with GPT.
         | 
         | If Q* is real, you likely wouldn't _need_ to hype up something
         | that has the potential to solve math  / logic problems without
         | having seen the problem/solution before hand. Something that
         | novel would be hugely valuable and generate demand naturally.
        
           | djvdq wrote:
           | Of course they are doing PR stunts to kepp media talking
           | about them.
           | 
           | Remember Altman saying that they shouldn't release GPT-2
           | because of it being too dangerous? It's the same thing with
           | this Q* thing.
        
             | FeepingCreature wrote:
             | Because it could be used to generate spam, yes, and he was
             | right about that.
             | 
             | And to set a precedent that models should be released
             | cautiously, and he was right about that too, and it is to
             | our detriment that we don't take that more seriously.
        
             | dmix wrote:
             | Helen Toner board member accused Sam/OpenAI for releasing
             | GPT too early, there were people who wanted to keep it
             | locked away for those concerns, which largely haven't come
             | true (a lot of people don't understand how spam detection
             | works and overrate the impact of deepfakes).
             | 
             | Company's have competing interests and personalities.
             | That's normal. But there is no indication that GPT was held
             | back for marketing.
        
           | lawlessone wrote:
           | >The value speaks for itself.
           | 
           | What is that though? I've seen a lot of tools created for it.
           | Custom AI Characters. Things that let you have an LLM read a
           | DB etc. But I haven't much in regards to customer facing
           | things.
        
             | dharmab wrote:
             | It's pretty good for customer support agent tools. Feed the
             | LLM your company's knowledgebase and give it the context of
             | the support chat/email/call transcript, and it suggests
             | solutions to the agent.
        
             | dist-epoch wrote:
             | > Satya: Microsoft has over a million paying Github Copilot
             | users
             | 
             | https://www.zdnet.com/article/microsoft-has-over-a-
             | million-p...
        
             | janalsncm wrote:
             | > But I haven't much in regards to customer facing things.
             | 
             | How about ChatGPT? It's a game changer. It has allowed me
             | to learn Rust extremely quickly since I can just ask it
             | direct questions about my code. And I don't worry about
             | hallucinations since the compiler is always there to "fact
             | check".
             | 
             | I'm pretty bearish on OpenAI wrappers. Low effort, zero
             | moat. But that's largely irrelevant to the value of OpenAI
             | products themselves.
        
           | ghostzilla wrote:
           | > People raving about it on twitter
           | 
           | For the most part usages of GenAI have been sharing output on
           | social media. It is mind-blowingly fascinating, but the
           | utility of it is far far behind.
        
         | bhhaskin wrote:
         | I agree. Only thing that matters is results.
        
         | YetAnotherNick wrote:
         | I have trouble believing the who ousting of Sam Altman was
         | planned for this. But yeah someone might be smart enough to
         | feed wrong info to the press after the whole saga was over.
        
       | ben_w wrote:
       | I definitely need to blog more. A* search with a neural network
       | as the heuristic function seemed like a good idea to
       | investigate... a month or two ago, and I never got around to it.
        
       | haltist wrote:
       | I have an idea for a great AI project and it's about finding the
       | first logical inconsistency in an argument about a formal system
       | like an LLM. I think if OpenAI can deliver that then I will
       | believe they have achieved AGI.
       | 
       | I am a techno-optimist and I believe this is possible and all I
       | need is a lot of money. I think $80B would be more than
       | sufficient. I will be awaiting a reply from other techno-
       | optimists like Marc Andreesen and those who are techno-optimist
       | adjacent like millionaires and billionaires that read HN
       | comments.
        
       | adamnemecek wrote:
       | Both RL and A* are both approaches to dynamic programming, this
       | would not be surprising.
        
       | jbrisson wrote:
       | Imho, in order to reach AGI you have to get out of the LLM space.
       | It has to be something else. Something close to biological
       | plausability.
        
         | bob1029 wrote:
         | I think big parts of the answer include time domain, multi-
         | agent and iterative concepts.
         | 
         | Language is about communication of information _between_
         | parties. One instance of an LLM doing one-shot inference is not
         | leveraging much of this. Only first-order semantics can really
         | be explored. There is a limit to what can be communicated in a
         | context of _any_ size if you only get one shot at it. Change
         | over time is a critical part of our reality.
         | 
         | Imagine if your agent could determine that it has been thinking
         | about something for too long and adapt strategy automatically.
         | Increase to higher param model, adapt the context, etc.
         | 
         | Perhaps we aren't seeking total AGI/ASI either (aka inventing
         | new physics). From a business standpoint, it seems like we
         | mostly have what we need now. The next ~3 months are going to
         | be a hurricane in our shop.
        
         | hackinthebochs wrote:
         | LLMs as we currently understand them won't reach AGI. But AGI
         | will very likely have an LLM as a component. What is language
         | but a way to represent arbitrary structure? Of course that's
         | relevant to AGI.
        
         | valine wrote:
         | Covering an airplane in feathers isn't going to make it fly
         | faster. Biological plausibility is a red haring imho.
        
           | foooorsyth wrote:
           | The training space is more important. I don't think a general
           | intelligence will spawn from text corpuses. A person only
           | able to consume text to learn would be considered severely
           | disabled.
           | 
           | A significant part of intelligence comes from existence in
           | meatspace and the ability to manipulate and observe that
           | meatspace. A two year old learns much faster with much less
           | data than any LLM.
        
             | valine wrote:
             | We already have multimodal models that take both images and
             | text as input. The bulk of the training for these models
             | was in text, not images. This shouldn't be surprising. Text
             | is a great way of abstractly and efficiently representing
             | reality. Of course those patterns are useful for making
             | sense of other modalities.
             | 
             | Beyond modeling the world, text is also a great way to
             | model human thought and reason. People like to explain
             | their thought process in writing. LLMs already pick up on
             | and mimic chain of thought well.
             | 
             | Contained within large datasets is crystallized thought,
             | and efficient descriptions of reality that have proven
             | useful for processing modalities beyond text. To me that
             | seems like a great foundation for AGI.
        
         | orbital-decay wrote:
         | Definitions, again. OpenAI defines AGI as highly autonomous
         | agents that can replace humans in most of the economically
         | important jobs. Those don't need to look or function like
         | humans.
        
       | kelseyfrog wrote:
       | A* is a red-herring based on availability bias.
       | 
       | Q* is already a thing and it's the Bellman equation describing
       | the optimal action-value function.
        
         | bertil wrote:
         | Are you saying that the Bellman equations already use the
         | notation Q*, or are you saying that those equations (I'm not as
         | familiar as I should be, sorry) are the obvious connection
         | between the incoherent ramblings from Reuters?
         | 
         | Because having similar acronyms or notations used for multiple
         | contexts that end up collapsing with cross-pollination of ideas
         | is far too frequent these days. I once made a dictionary of
         | terms used in A/B testing / Feature Flags / DevOps / Statistics
         | / Econometrics, and _most_ keywords had multiple, incompatible
         | acceptions depending on the exact context, all somewhat
         | relevant to A /B testing. Every reader came out of it so
         | defeated, like language itself was broken...
        
           | ElectricalUnion wrote:
           | Can you link this dictionary here or is it proprietary?
        
           | tnecniv wrote:
           | Q* is an incredibly common notation for the above version of
           | the Bellman equation. I think it's stupid to call an
           | algorithm Q* for the same reason it is to read too much into
           | this: it's an incredibly nondescript name.
        
           | kelseyfrog wrote:
           | I'm saying that everyone already uses that notation including
           | OpenAI[1].
           | 
           | 1.
           | https://spinningup.openai.com/en/latest/algorithms/ddpg.html
        
       | janalsncm wrote:
       | Is it possible they were referring to this research they
       | published in May?
       | 
       | https://openai.com/research/improving-mathematical-reasoning...
        
       | fizx wrote:
       | The most likely hypothesis I've seen for Q*:
       | 
       | https://twitter.com/alexgraveley/status/1727777592088867059
        
       | urbandw311er wrote:
       | See also https://news.ycombinator.com/item?id=38407741
        
       ___________________________________________________________________
       (page generated 2023-11-24 23:00 UTC)