[HN Gopher] Q* Hypothesis: Enhancing Reasoning, Rewards, and Syn... ___________________________________________________________________ Q* Hypothesis: Enhancing Reasoning, Rewards, and Synthetic Data Author : Jimmc414 Score : 83 points Date : 2023-11-24 19:02 UTC (3 hours ago) (HTM) web link (www.interconnects.ai) (TXT) w3m dump (www.interconnects.ai) | romesc wrote: | Sure A* is awesome, but taking the "star" and immediately | attributing it to A* is probably a bridge too far. | | Q* or any X* for that matter is extremely common for referring to | the optimal function under certain assumptions. (usually cost / | reward structure). | tunesmith wrote: | Yeah I just saw the video from that researcher (later an OpenAI | researcher?) that talked about it back in 2016... not that I | understood much, but it definitely seemed that Q* was a | generalization of the Q algorithm described on the previous | slide. The optimum something across all somethings. | maaaaattttt wrote: | If you have the possibility I would be quite interested in a | link to the video or alternatively the name of the researcher | you mention. | resource0x wrote: | LeCun: Please ignore the deluge of complete nonsense about | Q*. https://twitter.com/ylecun/status/1728126868342145481 | Zolde wrote: | It will be nice to see the breakthroughs resulting from what | people _believed_ Q* to have been. | erikaww wrote: | certainly more things to throw at the wall! Excited to see the | "accidental" progress | bschne wrote: | I love this take. Reminds me of how the Mechanical Turk | apparently indirectly inspired someone to build a weaving | machine b/c "how hard could it be if machines can play chess" | -- https://x.com/gordonbrander/status/1385245747071787008?s=20 | spicyusername wrote: | I have trouble believing this isn't just a sneaky marketing | campaign. | dmix wrote: | Nothing OpenAI has released product-wise (ChatGPT, Dall-E) has | required 'marketing'. The value speaks for itself. People | raving about it on twitter, telling their friends/coworkers, | and journos documenting their explorations is more than enough. | | If this was an extremely competitive market that'd be more | plausible. But they enjoy some pretty serious dominance and are | struggling to handle the growth they already have with GPT. | | If Q* is real, you likely wouldn't _need_ to hype up something | that has the potential to solve math / logic problems without | having seen the problem/solution before hand. Something that | novel would be hugely valuable and generate demand naturally. | djvdq wrote: | Of course they are doing PR stunts to kepp media talking | about them. | | Remember Altman saying that they shouldn't release GPT-2 | because of it being too dangerous? It's the same thing with | this Q* thing. | FeepingCreature wrote: | Because it could be used to generate spam, yes, and he was | right about that. | | And to set a precedent that models should be released | cautiously, and he was right about that too, and it is to | our detriment that we don't take that more seriously. | dmix wrote: | Helen Toner board member accused Sam/OpenAI for releasing | GPT too early, there were people who wanted to keep it | locked away for those concerns, which largely haven't come | true (a lot of people don't understand how spam detection | works and overrate the impact of deepfakes). | | Company's have competing interests and personalities. | That's normal. But there is no indication that GPT was held | back for marketing. | lawlessone wrote: | >The value speaks for itself. | | What is that though? I've seen a lot of tools created for it. | Custom AI Characters. Things that let you have an LLM read a | DB etc. But I haven't much in regards to customer facing | things. | dharmab wrote: | It's pretty good for customer support agent tools. Feed the | LLM your company's knowledgebase and give it the context of | the support chat/email/call transcript, and it suggests | solutions to the agent. | dist-epoch wrote: | > Satya: Microsoft has over a million paying Github Copilot | users | | https://www.zdnet.com/article/microsoft-has-over-a- | million-p... | janalsncm wrote: | > But I haven't much in regards to customer facing things. | | How about ChatGPT? It's a game changer. It has allowed me | to learn Rust extremely quickly since I can just ask it | direct questions about my code. And I don't worry about | hallucinations since the compiler is always there to "fact | check". | | I'm pretty bearish on OpenAI wrappers. Low effort, zero | moat. But that's largely irrelevant to the value of OpenAI | products themselves. | ghostzilla wrote: | > People raving about it on twitter | | For the most part usages of GenAI have been sharing output on | social media. It is mind-blowingly fascinating, but the | utility of it is far far behind. | bhhaskin wrote: | I agree. Only thing that matters is results. | YetAnotherNick wrote: | I have trouble believing the who ousting of Sam Altman was | planned for this. But yeah someone might be smart enough to | feed wrong info to the press after the whole saga was over. | ben_w wrote: | I definitely need to blog more. A* search with a neural network | as the heuristic function seemed like a good idea to | investigate... a month or two ago, and I never got around to it. | haltist wrote: | I have an idea for a great AI project and it's about finding the | first logical inconsistency in an argument about a formal system | like an LLM. I think if OpenAI can deliver that then I will | believe they have achieved AGI. | | I am a techno-optimist and I believe this is possible and all I | need is a lot of money. I think $80B would be more than | sufficient. I will be awaiting a reply from other techno- | optimists like Marc Andreesen and those who are techno-optimist | adjacent like millionaires and billionaires that read HN | comments. | adamnemecek wrote: | Both RL and A* are both approaches to dynamic programming, this | would not be surprising. | jbrisson wrote: | Imho, in order to reach AGI you have to get out of the LLM space. | It has to be something else. Something close to biological | plausability. | bob1029 wrote: | I think big parts of the answer include time domain, multi- | agent and iterative concepts. | | Language is about communication of information _between_ | parties. One instance of an LLM doing one-shot inference is not | leveraging much of this. Only first-order semantics can really | be explored. There is a limit to what can be communicated in a | context of _any_ size if you only get one shot at it. Change | over time is a critical part of our reality. | | Imagine if your agent could determine that it has been thinking | about something for too long and adapt strategy automatically. | Increase to higher param model, adapt the context, etc. | | Perhaps we aren't seeking total AGI/ASI either (aka inventing | new physics). From a business standpoint, it seems like we | mostly have what we need now. The next ~3 months are going to | be a hurricane in our shop. | hackinthebochs wrote: | LLMs as we currently understand them won't reach AGI. But AGI | will very likely have an LLM as a component. What is language | but a way to represent arbitrary structure? Of course that's | relevant to AGI. | valine wrote: | Covering an airplane in feathers isn't going to make it fly | faster. Biological plausibility is a red haring imho. | foooorsyth wrote: | The training space is more important. I don't think a general | intelligence will spawn from text corpuses. A person only | able to consume text to learn would be considered severely | disabled. | | A significant part of intelligence comes from existence in | meatspace and the ability to manipulate and observe that | meatspace. A two year old learns much faster with much less | data than any LLM. | valine wrote: | We already have multimodal models that take both images and | text as input. The bulk of the training for these models | was in text, not images. This shouldn't be surprising. Text | is a great way of abstractly and efficiently representing | reality. Of course those patterns are useful for making | sense of other modalities. | | Beyond modeling the world, text is also a great way to | model human thought and reason. People like to explain | their thought process in writing. LLMs already pick up on | and mimic chain of thought well. | | Contained within large datasets is crystallized thought, | and efficient descriptions of reality that have proven | useful for processing modalities beyond text. To me that | seems like a great foundation for AGI. | orbital-decay wrote: | Definitions, again. OpenAI defines AGI as highly autonomous | agents that can replace humans in most of the economically | important jobs. Those don't need to look or function like | humans. | kelseyfrog wrote: | A* is a red-herring based on availability bias. | | Q* is already a thing and it's the Bellman equation describing | the optimal action-value function. | bertil wrote: | Are you saying that the Bellman equations already use the | notation Q*, or are you saying that those equations (I'm not as | familiar as I should be, sorry) are the obvious connection | between the incoherent ramblings from Reuters? | | Because having similar acronyms or notations used for multiple | contexts that end up collapsing with cross-pollination of ideas | is far too frequent these days. I once made a dictionary of | terms used in A/B testing / Feature Flags / DevOps / Statistics | / Econometrics, and _most_ keywords had multiple, incompatible | acceptions depending on the exact context, all somewhat | relevant to A /B testing. Every reader came out of it so | defeated, like language itself was broken... | ElectricalUnion wrote: | Can you link this dictionary here or is it proprietary? | tnecniv wrote: | Q* is an incredibly common notation for the above version of | the Bellman equation. I think it's stupid to call an | algorithm Q* for the same reason it is to read too much into | this: it's an incredibly nondescript name. | kelseyfrog wrote: | I'm saying that everyone already uses that notation including | OpenAI[1]. | | 1. | https://spinningup.openai.com/en/latest/algorithms/ddpg.html | janalsncm wrote: | Is it possible they were referring to this research they | published in May? | | https://openai.com/research/improving-mathematical-reasoning... | fizx wrote: | The most likely hypothesis I've seen for Q*: | | https://twitter.com/alexgraveley/status/1727777592088867059 | urbandw311er wrote: | See also https://news.ycombinator.com/item?id=38407741 ___________________________________________________________________ (page generated 2023-11-24 23:00 UTC)