[HN Gopher] TypeChat ___________________________________________________________________ TypeChat Author : DanRosenwasser Score : 228 points Date : 2023-07-20 16:41 UTC (6 hours ago) (HTM) web link (microsoft.github.io) (TXT) w3m dump (microsoft.github.io) | paxys wrote: | I swear I think of something and Anders Hejlsberg builds it. | | Structured requests and responses are 100% the next evolution of | LLMs. People are already getting tired of chatbots. Being able to | plug in any backend without worrying about text parsing and | prompts will be amazing. | sidnb13 wrote: | Maybe worth looking into: | https://news.ycombinator.com/item?id=36750083 | sidnb13 wrote: | maybe worth looking into: | https://news.ycombinator.com/item?id=36750083 | _the_inflator wrote: | This as a dynamic mapper in a backend layer can be huge. | | For example, try to keep up with (frequent) API payload changes | around a consumer in Java. We implemented a NodeJS layer just | to stay sane. (Banking, huge JSON payloads, backends in Java) | | Mapping is really something LLMs could shine. | tylerrobinson wrote: | It could shine, or it could be an absolute disaster. | | Code/functionality archeology is already insanely hard in | orgs with old codebases. Imagine the facepalming that Future | You will have when you see that the way the system works is | some sort of nondeterministic translation layer that | magically connects two APIs where versions are allowed to | fluctuate. | unshavedyak wrote: | > Structured requests and responses are 100% the next evolution | of LLMs. People are already getting tired of chatbots. Being | able to plug in any backend without worrying about text parsing | and prompts will be amazing. | | Yup, a general desire of mine is to locally run an LLM which | has actionable interfaces that i provide. Things like "check | time", "check calendar", "send message to user" and etc. | | TypeChat seems to be in the right area. I can imagine an extra | layer of "fit this JSON input to a possible action, if any" and | etc. | | I see a neat hybrid future where a bot (LLM/etc) works to glue | layers of real code together. Sometimes part of ingestion, | tagging, etc - sometimes part of responding to input, etc. | | All around this is a super interesting area to me but frankly, | everything is moving so fast i haven't concerned myself with | diving too deep in it yet. Lots of smart people are working on | it so i feel the need to let the dust settle a bit. But i think | we're already there to have my "dream home interface" working. | sdwr wrote: | I was thinking about this yesterday. ChatGPT really is good | enough to act as a proper virtual assistant / home manager, | with enough toggles exposed. | 9dev wrote: | ChatGPT isn't the limiting factor here, a good way to | expose the toggles is. I recently tried to expose our | company CRM to employees by means of a Teams bot they could | ask for stuff in natural language (like ,,send an invite | link to newlead@example.org" or ,,how many MAUs did | customer Foo have in June"), but while I almost got there, | communicating an ever-growing set of actionable commands | (with an arbitrary number of arguments) to the model was | more complex than I thought. | unshavedyak wrote: | Care to share what made it complex? My comment above was | most likely ignorant, but my general thought was to write | some header prompt about available actions that the LLM | could map to, and then ask it if a given input text | matches to a pre-defined action. Much like what TypeChat | does. | | Does this sound similar enough to what you were doing? | Was there something difficult in this that you could | explain? | | Aside from being completely hand-wavey in my hypothetical | guess-timated implementation, i had figured the most | difficult part would be piping complex actions together. | "Remind me tomorrow about any events i have on my | calendar" would be a conditional action based on lookups, | etc - so order of operations would also have to be parsed | somehow. I suspect a looping "thinking" mechanism would | be necessary, and while i know that's not a novel idea i | am unsure if i would nonetheless have to reinvent it in | my own tech for the way i wanted to deploy. | J_Shelby_J wrote: | https://github.com/ShelbyJenkins/LLM-OpenAPI-minifier | | I have a working solution to exposing the toggles. | | I'm integrating it into the bot I have in the other repo. | | Goal is you point to an openapi spec and then GPT can run | choose and run functions. Basically Siri but with access | to any API. | ianzakalwe wrote: | I am not sure why this exist, maybe I am missing something, and | it does not seem like there is much value past "hey check this | out this is possible" | phillipcarter wrote: | I'd love to see a robust study on the effectiveness of this and | several other ways to coax a structured response out: | | - Lots of examples / prompt engineering techniques | | - MS Guideance | | - TypeChat | | - OpenAI functions (the model itself is tuned to do this, a key | differentiator) | | - ...others? | 33a wrote: | Looks like it just runs the LLM in a loop until it spits out | something that type checks, prompting with the error message. | | This is a cute idea and it looks like it should work, but I could | see this getting expensive with larger models and input prompts. | Probably not a fix for all scenarios. | osaariki wrote: | I'm not familiar with how TypeChat works, but Guidance [1] is | another similar project that can actually integrate into the | token sampling to enforce formats. | | [1]: https://github.com/microsoft/guidance | behnamoh wrote: | except that guidance is defunct and is not maintained | anymore. | J_Shelby_J wrote: | It's logit bias. You don't even need another library to do | this. You can do it with three lines of python. | | Here's an example of one of my implementations of logit bias. | | https://github.com/ShelbyJenkins/shelby-as-a- | service/blob/74... | babyshake wrote: | At least with OpenAI, wouldn't it be better if under the hood | it was using the new function call feature? | akavi wrote: | Typescript's type system is much more expressive than the one | the function call feature makes available. | | I imagine closing the loop (using the TS compiler to restrict | token output weights) is in the works, though it's probably | not totally trivial. You'd need: | | * An incremental TS compiler that could report "valid" or | "valid prefix" (ie, valid as long as the next token is not | EOF) | | * The ability to backtrack the model | | Idk how hard either one piece is. | rezonant wrote: | For the TS compiler: If you took each generation step, | closed any partial JSON objects (ie close any open `{`), | checked that it was valid JSON and then validated it using | a deep version of Partial<T>, that should do the trick. | SkyPuncher wrote: | I suspect most products are concerned about product-market fit | then they can wrangle costs down. | | There's also a good assumption that models will be improving | structured output as the market is demanding it. | dvt wrote: | This is my hot take: we're slowly entering the "tooling" phase of | AI, where people realize there's no real value generation here, | but people are so heavily invested in AI, that money is still | being pumped into building stuff (and of course, it's one of the | best way to guarantee your academic paper gets published). I | mean, LangChain is kind of a joke and they raised $10M seed lol. | | DeFi/crypto went through this phase 2 years ago. Mark my words, | it's going to end up being this weird limbo for a few years where | people will slowly realize that AI is a feature, not a product. | And that its applicability is limited and that it won't save the | world. It won't be able to self-drive cars due to all the edge | cases, it won't be able to perform surgeries because it might | kill people, etc. | | I keep mentioning that even the most useful AI tools (Copilot, | etc.) are marginally useful at best. At the very best it saves me | a few clicks on Google, but the agents are not "intelligent" in | the least. We went through a similar bubble a few years ago with | chatbots[1]. These days, no one cares about them. "The metaverse" | was much more short-lived, but the same herd mentality applies. | "It's the next big thing" until it isn't. | | [1] https://venturebeat.com/business/facebook-opens-its- | messenge... | [deleted] | rvz wrote: | Someone should just get this working on Llama 2 instead of | OpenAI.com [0] | | All this is it's just talking to a AI model sitting on someone | else's server. | | [0] | https://github.com/microsoft/TypeChat/blob/main/src/model.ts... | joelmgallant wrote: | The most recent gpt4all (https://github.com/nomic-ai/gpt4all) | includes a local server compatible with OpenAPI -- this could | be a useful start! | DanRosenwasser wrote: | Hi there! I'm one of the people working on TypeChat and I just | want to say that we definitely welcome experimentation on | things like this. We've actually been experimenting with | running Llama 2 ourselves. Like you said, to get a model | working with TypeChat all you really need is to provide a | completion function. So give it a shot! | jensneuse wrote: | This looks quite similar to how were using OpenAI functions and | zod (JSON Schema) to have OpenAI answer with JSON and interact | with our custom functions to answer a prompt: | https://wundergraph.com/blog/return_json_from_openai | joefreeman wrote: | > It's unfortunately easy to get a response that includes { | "name": "grande latte" } type Item = { | name: string; ... size?: string; | | I'm not really following how this would avoid `name: "grande | latte"`? | | But then the example response: "size": 16 | | > This is pretty great! | | Is it? It's not even returning the type being asked for? | | I'm guessing this is more of a typo in the example, because | otherwise this seems cool. | mynameisvlad wrote: | I feel like that's just a documentation bug. I'm guessing they | changed from number of ounces to canonical size late in the | drafting of the announcement and forgot to change the output | value to match. | | There would be no way for a system to map "grande" to 16 based | on the code provided, and 16 does not seem to be used anywhere | else. | DanRosenwasser wrote: | Whoops - thanks for catching this. Earlier iterations of this | blog post used an different schema where `size` had been | accidentally specified as a `number`. While we changed the | schema, we hadn't re-run the prompt. It should be fixed now! | graypegg wrote: | Their example here is really weak overall IMO. Like more than | just that typo. You also probably wouldn't want a "name" string | field anyway. Like there's nothing stoping you from receiving | { name: "the brown one", size: "the | espresso cup", ... } | | Like that's just as bad as parsing the original string. You | probably want big string union types for each one of those | representing whatever known values you want, so the LLM can try | and match them. | | But now why would you want that to be locked into the type | syntax? You probably want something more like Zod where you can | use some runtime data to build up those union types. | | You also want restrictions on the types too, like quantity | should be a positive, non-fractional integer. Of course you can | just validate the JSON values afterwards, but now the user gets | two kinds of errors. One from the LLM which is fluent and human | sounding, and the other which is a weird technical "oops! You | provided a value that is too large for quantity" error. | | The type syntax seems like the wrong place to describe this | stuff. | hirsin wrote: | The rest of the paragraph discusses "what happens when it | ignores type?", so I think that's where they were going with | that? | verdverm wrote: | I don't see the value add here. | | Here's the core of the message sent to the LLM: | https://github.com/microsoft/TypeChat/blob/main/src/typechat... | | You are basically getting a fixed prompt to return structured | data with a small amount of automation and vendor lockin. All | these LLM libraries are just crappy APIs to the underlying API. | It is trivial to write a script that does the same and will be | much more flexible as models and user needs evolve. | | As an example, think about how you could change the prompt or use | python classes instead. How much work would this be using a | library like this versus something that lifts the API calls and | text templating to the user like: https://github.com/hofstadter- | io/hof/blob/_dev/flow/chat/llm... | ofslidingfeet wrote: | Getting these models to reliably return a consistent structure | without frequent human intervention and/or having to account | for the personal moral opinions of big tech CEOs is not | trivial, no. | whimsicalism wrote: | Yes as the abstractions gets better it becomes easier to code | useful things. | politelemon wrote: | Pretty much all the LLM libraries I'm seeing are like this. | They boil down to a request to the LLM to do something in a | certain way. I've noticed under complex conditions, they stop | listening and start reverting to their 'default' behavior. | | But that said it still feels like using a library is the right | thing to do... so I'm still watching this space to see what | matures and emerges as a good-enough approach. | bwestergard wrote: | The value is in: | | 1. Running the typescript type checker against what is returned | by the LLM. | | 2. If there are type errors, combining those into a "repair | prompt" that will (it is assumed) have a higher likelihood of | eliciting an LLM output that type checks. | | 3. Gracefully handling the cases where the heuristic in #2 | fails. | | https://github.com/microsoft/TypeChat/blob/main/src/typechat... | | In my experience experimenting with the same basic idea, the | heuristic in #2 works surprisingly well for relatively simple | types (i.e. records and arrays not nested too deeply, limited | use of type variables). It turns out that prompting LLMs to | return values inhabiting relatively simple types can be used to | create useful applications. Since that is valuable, this | library is valuable inasmuch as it eliminates the need to hand | roll this request pattern, and provides a standardized | integration with the typescript codebase. | verdverm wrote: | these are trivial steps you can add in any script, as your | link demonstrates. | | Why would I want to add all this extra stuff just for that? | The opaque retry until it returns valid JSON? That sounds | like it will make for many pleasant support cases or issues | | Personally, I have found investing more effort in the actual | prompt engineering improves success rates and reduces the | need to retry with an appended error message. Especially | helpful are input/output pairs (i.e. few-shot) and while we | haven't tried it yet, I imagine fine-tuning and distillation | would improve the situation even more | bwestergard wrote: | There are many subtleties to invoking the typescript type | checker from node. It's nice to have support for that from | the team that maintains the type checker. | BoorishBears wrote: | Here's a project that does that better imo: | | https://github.com/dzhng/zod-gpt | | And by better I mean doesn't tie you to OpenAI for no good | reason | LordDragonfang wrote: | I don't know where all you people work that your employer | would prefer a random git repo (that has no support and no | guarantee of updates) over a solution from _Microsoft_. | (Alternatively: that you have so much free time that you 'd | prefer to fiddle with your own validation code instead of | writing your actual app) | | Open source solutions are great, but having a first-party | solution is _also a good thing_. | BoorishBears wrote: | I don't know which employer is hiring the people who make | logical leaps like this but I thank them for their | sacrifice. | | At the end of the day the repo I linked is grokkable with | about 10 minutes of effort, and has simple demonstrable | usefulness by letting you swap out the LLM you're | calling. | | Both are experimental open source libraries in an | experimental space. | TechBro8615 wrote: | Where's the vendor lock-in? This is an open source library and | the file you linked to even includes configs for two vendors: | ChatGPT and Bard. | nfw2 wrote: | It's essentially prompt engineering as a service with some | basic quality-control features thrown in. | | Sure, your engineers could implement it themselves, but don't | they have better things to do? | arc9693 wrote: | TL;DR: It's asking ChatGPT to format response according to a | schema. | bottlepalm wrote: | How does no voice assistant (Apple, Google, Amazon, Microsoft) | integrate LLMs into their service yet, and how has OpenAI not | released their own voice assistant? | | Also like RSS, if there were some standard URL a websites exposed | for AI interaction, using this TypeChat to expose the interfaces, | we'd be well on our way here. | 9dev wrote: | Seriously, it feels like there's some collusion going on behind | the scenes. This is the most obvious use case for the | technology, but none of the big vendors have explored it. | jomohke wrote: | It takes a while to develop a product, and the world only | woke up to them mere months ago | zitterbewegung wrote: | Microsoft is doing that to replace Cortana in windows 11 | COGlory wrote: | Willow, and the Willow Interference Server have the option to | use Vicuna with speech input and TTS | dbish wrote: | OpenAI is pretty likely working on their own (see Kaparthy's | "Building a kind of JARVIS @ OreoA[?]"), and Microsoft of | course is doing an integration or reinterpretation of Cortana | with OpenAI's LLMS (since they are incapable of building their | own models nowadays it seems - "Why do we have Microsoft | Research at all?"-S.N.), but there's a lot less value in voice | driven LLM then there is in actually being able to perform | actions. Take Alexa for example, you need a system that can | handle smart home control in a predictable, debuggable, way | otherwise people would get annoyed. I definitely think you can | do this, but the current system as built (and others like Siri | and to a lesser use Cortana) all have a bunch of hooks and APIs | being used by years and years of rules and software built atop | less powerful models. They need to both maintain the current | quality and improve on it while swapping out major parts of | their system in order to make this work, which takes time. | | Not to mention that none of these assistants actually make any | money, they all lose money really, and are only worthwhile to | big companies with other ways to make cash or drive other parts | of their business (phones, shopping, whatever), so there's less | incentive for a startup to do it. | | I worked on both Cortana and Alexa in the past, thought a lot | about trying to build a new version of them ground up with the | LLM advancements, and while the tech was all straight forward | and even had some new ideas for use cases that are enabled now, | could not figure out a business model that would work (and | hence, working on something completely different now). | sandkoan wrote: | Relevant: Built this which generalizes to arbitrary regex | patterns / context free grammars with 100% adherence and is | model-agnostic -- https://news.ycombinator.com/item?id=36750083 | davrous wrote: | This is a fantastic concept! It's going to be super useful to map | users' intent to API / code in a super reliable way. | Zaheer wrote: | It's not super clear how this differs from another recently | released library from Microsoft: Guidance | (https://github.com/microsoft/guidance). | | They both seem to aim to solve the problem of getting typed, | valid responses back from LLMs | DanRosenwasser wrote: | One of the key things that we've focused on with TypeChat is | not just that it acts as a specification for retrieving | structured data (i.e. JSON), but that the structure is actually | valid - that it's well-typed based on your type definitions. | | The thing to keep in mind with these different libraries is | that they are not necessarily perfect substitutes for each | other. They often serve different use-cases, or can be combined | in various ways -- possibly using the techniques directly and | independent of the libraries themselves. | trafnar wrote: | It's not clear to me how they ensure the responses will be valid | JSON, are they just asking for it, then parsing the result with | error checking? | davnicwil wrote: | seems like they run the generated response through the | typescript type checker, and if it fails, retry using the error | message as a further hint to the LLM, until it succeeds. | anonzzzies wrote: | I would expect that, if it doesn't do that even, why | bother... that is also trivial to do anyway. | [deleted] | verdverm wrote: | also some very basic prompt engineering | esafak wrote: | Yes. | https://github.com/microsoft/TypeChat/blob/main/src/typechat... | mahalex wrote: | So, it's a thing that appends "please format your response as the | following JSON" to the prompt", then validates the actual | response against the schema, all in a "while (true)" loop | (literally) until it succeeds. This unbelievable achievement is a | work of seven people (authors of the blog post). | | Honestly, this is getting beyond embarrassing. How is this the | world we live in? | jlnho wrote: | It's because not everyone can be as gifted as you. | | I think the (arguably very prototypical) implementation is not | what's interesting here. It's the concept itself. Natural | language may soon become the default interface for most of the | computing people do on a day to day basis, and tools like these | will make it easier to create new applications in this space. | Edes wrote: | I'm gonna love trying to figure out what query gets the | support chatbot to pair me with an actual human so that I can | solve something that's off script | lsh123 wrote: | Hm... so how do we know that the actual values in the produced | json are correct??? | siva7 wrote: | One of the authors is Anders Hejlsberg, the guy behind c# and | delphi | mahalex wrote: | That's what makes it even more embarrassing. | katamaster818 wrote: | Hang on, so this is doing runtime validation of an object against | a typescript type definition? Can this be shipped as a standalone | library/feature? This would be absolutely game changing for | validating api response payloads, etc. in typescript codebases. | tehsauce wrote: | maybe this function? | | https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494... | katamaster818 wrote: | yup, just found that, super neat, I am 100% interested in | using this for other runtime validation... | | It's interesting because I've always been under the | impression the TS team was against the use of types at | runtime (that's why projects like | https://github.com/nonara/ts-patch exist), but now they're | doing it themselves with this project... | | I wonder what the performance overhead of starting up an | instance of tsc in memory is? Is this suitable for low | latency situations? Lots of testing to do... | robbie-c wrote: | This is funny, I have something pretty similar in my code, except | it's using Zod for runtime typechecking, and I convert Zod | schemas to json schemas and send that to gpt-3.5 as a function | call. I would expect that using TypeScript's output is better for | recovering from errors than with Zod's output, so I can | definitely see the advantage of this. | bestcoder69 wrote: | Why this instead of GPT Functions? | verdverm wrote: | it's basically the same thing, but uses a more concise spec for | writing the schema (typescript vs jsonschema) | | In the end, both methods try to coax the model into returning a | JSON object, one method can be used with any model, the other | is tied to a specific, ever changing vendor API | | Why would one choose to only support "OpenAI" and nothing else? | yanis_t wrote: | TL;DR: This is ChatGPT + TypeScript. | | I'm totally happy to be able to receive structured queries, but | I'm also not 100% sure TypeScript is the right tool, it seems to | be an overkill. I mean obviously you don't need the power of TS | with all its enums, generics, etc. | | Plus given that it will run multiple queries in loop, it might | end up very expensive for it abide by your custom-mage complex | type | garrett_makes wrote: | I built and released something really similar to this (but | smaller scope) for Laravel PHP this week: | https://github.com/adrenallen/ai-agents-laravel | | My take on this is, it should be easy for an engineer to spin up | a new "bot" with a given LLM. There's a lot of boring work around | translating your functions into something ChatGPT understands, | then dealing with the response and parsing it back again. | | With systems like these you can just focus on writing the actual | PHP code, adding a few clear comments, and then the bot can | immediately use your code like a tool in whatever task you give | it. | | Another benefit to things like this, is that it makes it much | easier for code to be shared. If someone writes a function, you | could pull it into a new bot and immediately use it. It | eliminates the layer of "converting this for the LLM to use and | understand", which I think is pretty cool and makes building so | much quicker! | | None of this is perfect yet, but I think this is the direction | everything will go so that we can start to leverage each others | code better. Think about how we use package managers in coding | today, I want a package manager for AI specific tooling. Just | install the "get the weather" library, add it to my bot, and now | it can get the weather. | ameyab wrote: | Here's a relevant paper that folks may find interesting: | <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt | construction method with LLMs for program synthesis, translating | natural language user utterances to ODSL programs that can be | transpiled to application APIs and then executed.</snip> | | https://arxiv.org/abs/2306.03460 ___________________________________________________________________ (page generated 2023-07-20 23:00 UTC)