[HN Gopher] TypeChat
       ___________________________________________________________________
        
       TypeChat
        
       Author : DanRosenwasser
       Score  : 228 points
       Date   : 2023-07-20 16:41 UTC (6 hours ago)
        
 (HTM) web link (microsoft.github.io)
 (TXT) w3m dump (microsoft.github.io)
        
       | paxys wrote:
       | I swear I think of something and Anders Hejlsberg builds it.
       | 
       | Structured requests and responses are 100% the next evolution of
       | LLMs. People are already getting tired of chatbots. Being able to
       | plug in any backend without worrying about text parsing and
       | prompts will be amazing.
        
         | sidnb13 wrote:
         | Maybe worth looking into:
         | https://news.ycombinator.com/item?id=36750083
        
         | sidnb13 wrote:
         | maybe worth looking into:
         | https://news.ycombinator.com/item?id=36750083
        
         | _the_inflator wrote:
         | This as a dynamic mapper in a backend layer can be huge.
         | 
         | For example, try to keep up with (frequent) API payload changes
         | around a consumer in Java. We implemented a NodeJS layer just
         | to stay sane. (Banking, huge JSON payloads, backends in Java)
         | 
         | Mapping is really something LLMs could shine.
        
           | tylerrobinson wrote:
           | It could shine, or it could be an absolute disaster.
           | 
           | Code/functionality archeology is already insanely hard in
           | orgs with old codebases. Imagine the facepalming that Future
           | You will have when you see that the way the system works is
           | some sort of nondeterministic translation layer that
           | magically connects two APIs where versions are allowed to
           | fluctuate.
        
         | unshavedyak wrote:
         | > Structured requests and responses are 100% the next evolution
         | of LLMs. People are already getting tired of chatbots. Being
         | able to plug in any backend without worrying about text parsing
         | and prompts will be amazing.
         | 
         | Yup, a general desire of mine is to locally run an LLM which
         | has actionable interfaces that i provide. Things like "check
         | time", "check calendar", "send message to user" and etc.
         | 
         | TypeChat seems to be in the right area. I can imagine an extra
         | layer of "fit this JSON input to a possible action, if any" and
         | etc.
         | 
         | I see a neat hybrid future where a bot (LLM/etc) works to glue
         | layers of real code together. Sometimes part of ingestion,
         | tagging, etc - sometimes part of responding to input, etc.
         | 
         | All around this is a super interesting area to me but frankly,
         | everything is moving so fast i haven't concerned myself with
         | diving too deep in it yet. Lots of smart people are working on
         | it so i feel the need to let the dust settle a bit. But i think
         | we're already there to have my "dream home interface" working.
        
           | sdwr wrote:
           | I was thinking about this yesterday. ChatGPT really is good
           | enough to act as a proper virtual assistant / home manager,
           | with enough toggles exposed.
        
             | 9dev wrote:
             | ChatGPT isn't the limiting factor here, a good way to
             | expose the toggles is. I recently tried to expose our
             | company CRM to employees by means of a Teams bot they could
             | ask for stuff in natural language (like ,,send an invite
             | link to newlead@example.org" or ,,how many MAUs did
             | customer Foo have in June"), but while I almost got there,
             | communicating an ever-growing set of actionable commands
             | (with an arbitrary number of arguments) to the model was
             | more complex than I thought.
        
               | unshavedyak wrote:
               | Care to share what made it complex? My comment above was
               | most likely ignorant, but my general thought was to write
               | some header prompt about available actions that the LLM
               | could map to, and then ask it if a given input text
               | matches to a pre-defined action. Much like what TypeChat
               | does.
               | 
               | Does this sound similar enough to what you were doing?
               | Was there something difficult in this that you could
               | explain?
               | 
               | Aside from being completely hand-wavey in my hypothetical
               | guess-timated implementation, i had figured the most
               | difficult part would be piping complex actions together.
               | "Remind me tomorrow about any events i have on my
               | calendar" would be a conditional action based on lookups,
               | etc - so order of operations would also have to be parsed
               | somehow. I suspect a looping "thinking" mechanism would
               | be necessary, and while i know that's not a novel idea i
               | am unsure if i would nonetheless have to reinvent it in
               | my own tech for the way i wanted to deploy.
        
               | J_Shelby_J wrote:
               | https://github.com/ShelbyJenkins/LLM-OpenAPI-minifier
               | 
               | I have a working solution to exposing the toggles.
               | 
               | I'm integrating it into the bot I have in the other repo.
               | 
               | Goal is you point to an openapi spec and then GPT can run
               | choose and run functions. Basically Siri but with access
               | to any API.
        
       | ianzakalwe wrote:
       | I am not sure why this exist, maybe I am missing something, and
       | it does not seem like there is much value past "hey check this
       | out this is possible"
        
       | phillipcarter wrote:
       | I'd love to see a robust study on the effectiveness of this and
       | several other ways to coax a structured response out:
       | 
       | - Lots of examples / prompt engineering techniques
       | 
       | - MS Guideance
       | 
       | - TypeChat
       | 
       | - OpenAI functions (the model itself is tuned to do this, a key
       | differentiator)
       | 
       | - ...others?
        
       | 33a wrote:
       | Looks like it just runs the LLM in a loop until it spits out
       | something that type checks, prompting with the error message.
       | 
       | This is a cute idea and it looks like it should work, but I could
       | see this getting expensive with larger models and input prompts.
       | Probably not a fix for all scenarios.
        
         | osaariki wrote:
         | I'm not familiar with how TypeChat works, but Guidance [1] is
         | another similar project that can actually integrate into the
         | token sampling to enforce formats.
         | 
         | [1]: https://github.com/microsoft/guidance
        
           | behnamoh wrote:
           | except that guidance is defunct and is not maintained
           | anymore.
        
           | J_Shelby_J wrote:
           | It's logit bias. You don't even need another library to do
           | this. You can do it with three lines of python.
           | 
           | Here's an example of one of my implementations of logit bias.
           | 
           | https://github.com/ShelbyJenkins/shelby-as-a-
           | service/blob/74...
        
         | babyshake wrote:
         | At least with OpenAI, wouldn't it be better if under the hood
         | it was using the new function call feature?
        
           | akavi wrote:
           | Typescript's type system is much more expressive than the one
           | the function call feature makes available.
           | 
           | I imagine closing the loop (using the TS compiler to restrict
           | token output weights) is in the works, though it's probably
           | not totally trivial. You'd need:
           | 
           | * An incremental TS compiler that could report "valid" or
           | "valid prefix" (ie, valid as long as the next token is not
           | EOF)
           | 
           | * The ability to backtrack the model
           | 
           | Idk how hard either one piece is.
        
             | rezonant wrote:
             | For the TS compiler: If you took each generation step,
             | closed any partial JSON objects (ie close any open `{`),
             | checked that it was valid JSON and then validated it using
             | a deep version of Partial<T>, that should do the trick.
        
         | SkyPuncher wrote:
         | I suspect most products are concerned about product-market fit
         | then they can wrangle costs down.
         | 
         | There's also a good assumption that models will be improving
         | structured output as the market is demanding it.
        
       | dvt wrote:
       | This is my hot take: we're slowly entering the "tooling" phase of
       | AI, where people realize there's no real value generation here,
       | but people are so heavily invested in AI, that money is still
       | being pumped into building stuff (and of course, it's one of the
       | best way to guarantee your academic paper gets published). I
       | mean, LangChain is kind of a joke and they raised $10M seed lol.
       | 
       | DeFi/crypto went through this phase 2 years ago. Mark my words,
       | it's going to end up being this weird limbo for a few years where
       | people will slowly realize that AI is a feature, not a product.
       | And that its applicability is limited and that it won't save the
       | world. It won't be able to self-drive cars due to all the edge
       | cases, it won't be able to perform surgeries because it might
       | kill people, etc.
       | 
       | I keep mentioning that even the most useful AI tools (Copilot,
       | etc.) are marginally useful at best. At the very best it saves me
       | a few clicks on Google, but the agents are not "intelligent" in
       | the least. We went through a similar bubble a few years ago with
       | chatbots[1]. These days, no one cares about them. "The metaverse"
       | was much more short-lived, but the same herd mentality applies.
       | "It's the next big thing" until it isn't.
       | 
       | [1] https://venturebeat.com/business/facebook-opens-its-
       | messenge...
        
       | [deleted]
        
       | rvz wrote:
       | Someone should just get this working on Llama 2 instead of
       | OpenAI.com [0]
       | 
       | All this is it's just talking to a AI model sitting on someone
       | else's server.
       | 
       | [0]
       | https://github.com/microsoft/TypeChat/blob/main/src/model.ts...
        
         | joelmgallant wrote:
         | The most recent gpt4all (https://github.com/nomic-ai/gpt4all)
         | includes a local server compatible with OpenAPI -- this could
         | be a useful start!
        
         | DanRosenwasser wrote:
         | Hi there! I'm one of the people working on TypeChat and I just
         | want to say that we definitely welcome experimentation on
         | things like this. We've actually been experimenting with
         | running Llama 2 ourselves. Like you said, to get a model
         | working with TypeChat all you really need is to provide a
         | completion function. So give it a shot!
        
       | jensneuse wrote:
       | This looks quite similar to how were using OpenAI functions and
       | zod (JSON Schema) to have OpenAI answer with JSON and interact
       | with our custom functions to answer a prompt:
       | https://wundergraph.com/blog/return_json_from_openai
        
       | joefreeman wrote:
       | > It's unfortunately easy to get a response that includes {
       | "name": "grande latte" }                   type Item = {
       | name: string;             ...             size?: string;
       | 
       | I'm not really following how this would avoid `name: "grande
       | latte"`?
       | 
       | But then the example response:                   "size": 16
       | 
       | > This is pretty great!
       | 
       | Is it? It's not even returning the type being asked for?
       | 
       | I'm guessing this is more of a typo in the example, because
       | otherwise this seems cool.
        
         | mynameisvlad wrote:
         | I feel like that's just a documentation bug. I'm guessing they
         | changed from number of ounces to canonical size late in the
         | drafting of the announcement and forgot to change the output
         | value to match.
         | 
         | There would be no way for a system to map "grande" to 16 based
         | on the code provided, and 16 does not seem to be used anywhere
         | else.
        
         | DanRosenwasser wrote:
         | Whoops - thanks for catching this. Earlier iterations of this
         | blog post used an different schema where `size` had been
         | accidentally specified as a `number`. While we changed the
         | schema, we hadn't re-run the prompt. It should be fixed now!
        
         | graypegg wrote:
         | Their example here is really weak overall IMO. Like more than
         | just that typo. You also probably wouldn't want a "name" string
         | field anyway. Like there's nothing stoping you from receiving
         | {             name: "the brown one",             size: "the
         | espresso cup",         ... }
         | 
         | Like that's just as bad as parsing the original string. You
         | probably want big string union types for each one of those
         | representing whatever known values you want, so the LLM can try
         | and match them.
         | 
         | But now why would you want that to be locked into the type
         | syntax? You probably want something more like Zod where you can
         | use some runtime data to build up those union types.
         | 
         | You also want restrictions on the types too, like quantity
         | should be a positive, non-fractional integer. Of course you can
         | just validate the JSON values afterwards, but now the user gets
         | two kinds of errors. One from the LLM which is fluent and human
         | sounding, and the other which is a weird technical "oops! You
         | provided a value that is too large for quantity" error.
         | 
         | The type syntax seems like the wrong place to describe this
         | stuff.
        
         | hirsin wrote:
         | The rest of the paragraph discusses "what happens when it
         | ignores type?", so I think that's where they were going with
         | that?
        
       | verdverm wrote:
       | I don't see the value add here.
       | 
       | Here's the core of the message sent to the LLM:
       | https://github.com/microsoft/TypeChat/blob/main/src/typechat...
       | 
       | You are basically getting a fixed prompt to return structured
       | data with a small amount of automation and vendor lockin. All
       | these LLM libraries are just crappy APIs to the underlying API.
       | It is trivial to write a script that does the same and will be
       | much more flexible as models and user needs evolve.
       | 
       | As an example, think about how you could change the prompt or use
       | python classes instead. How much work would this be using a
       | library like this versus something that lifts the API calls and
       | text templating to the user like: https://github.com/hofstadter-
       | io/hof/blob/_dev/flow/chat/llm...
        
         | ofslidingfeet wrote:
         | Getting these models to reliably return a consistent structure
         | without frequent human intervention and/or having to account
         | for the personal moral opinions of big tech CEOs is not
         | trivial, no.
        
         | whimsicalism wrote:
         | Yes as the abstractions gets better it becomes easier to code
         | useful things.
        
         | politelemon wrote:
         | Pretty much all the LLM libraries I'm seeing are like this.
         | They boil down to a request to the LLM to do something in a
         | certain way. I've noticed under complex conditions, they stop
         | listening and start reverting to their 'default' behavior.
         | 
         | But that said it still feels like using a library is the right
         | thing to do... so I'm still watching this space to see what
         | matures and emerges as a good-enough approach.
        
         | bwestergard wrote:
         | The value is in:
         | 
         | 1. Running the typescript type checker against what is returned
         | by the LLM.
         | 
         | 2. If there are type errors, combining those into a "repair
         | prompt" that will (it is assumed) have a higher likelihood of
         | eliciting an LLM output that type checks.
         | 
         | 3. Gracefully handling the cases where the heuristic in #2
         | fails.
         | 
         | https://github.com/microsoft/TypeChat/blob/main/src/typechat...
         | 
         | In my experience experimenting with the same basic idea, the
         | heuristic in #2 works surprisingly well for relatively simple
         | types (i.e. records and arrays not nested too deeply, limited
         | use of type variables). It turns out that prompting LLMs to
         | return values inhabiting relatively simple types can be used to
         | create useful applications. Since that is valuable, this
         | library is valuable inasmuch as it eliminates the need to hand
         | roll this request pattern, and provides a standardized
         | integration with the typescript codebase.
        
           | verdverm wrote:
           | these are trivial steps you can add in any script, as your
           | link demonstrates.
           | 
           | Why would I want to add all this extra stuff just for that?
           | The opaque retry until it returns valid JSON? That sounds
           | like it will make for many pleasant support cases or issues
           | 
           | Personally, I have found investing more effort in the actual
           | prompt engineering improves success rates and reduces the
           | need to retry with an appended error message. Especially
           | helpful are input/output pairs (i.e. few-shot) and while we
           | haven't tried it yet, I imagine fine-tuning and distillation
           | would improve the situation even more
        
             | bwestergard wrote:
             | There are many subtleties to invoking the typescript type
             | checker from node. It's nice to have support for that from
             | the team that maintains the type checker.
        
           | BoorishBears wrote:
           | Here's a project that does that better imo:
           | 
           | https://github.com/dzhng/zod-gpt
           | 
           | And by better I mean doesn't tie you to OpenAI for no good
           | reason
        
             | LordDragonfang wrote:
             | I don't know where all you people work that your employer
             | would prefer a random git repo (that has no support and no
             | guarantee of updates) over a solution from _Microsoft_.
             | (Alternatively: that you have so much free time that you 'd
             | prefer to fiddle with your own validation code instead of
             | writing your actual app)
             | 
             | Open source solutions are great, but having a first-party
             | solution is _also a good thing_.
        
               | BoorishBears wrote:
               | I don't know which employer is hiring the people who make
               | logical leaps like this but I thank them for their
               | sacrifice.
               | 
               | At the end of the day the repo I linked is grokkable with
               | about 10 minutes of effort, and has simple demonstrable
               | usefulness by letting you swap out the LLM you're
               | calling.
               | 
               | Both are experimental open source libraries in an
               | experimental space.
        
         | TechBro8615 wrote:
         | Where's the vendor lock-in? This is an open source library and
         | the file you linked to even includes configs for two vendors:
         | ChatGPT and Bard.
        
         | nfw2 wrote:
         | It's essentially prompt engineering as a service with some
         | basic quality-control features thrown in.
         | 
         | Sure, your engineers could implement it themselves, but don't
         | they have better things to do?
        
       | arc9693 wrote:
       | TL;DR: It's asking ChatGPT to format response according to a
       | schema.
        
       | bottlepalm wrote:
       | How does no voice assistant (Apple, Google, Amazon, Microsoft)
       | integrate LLMs into their service yet, and how has OpenAI not
       | released their own voice assistant?
       | 
       | Also like RSS, if there were some standard URL a websites exposed
       | for AI interaction, using this TypeChat to expose the interfaces,
       | we'd be well on our way here.
        
         | 9dev wrote:
         | Seriously, it feels like there's some collusion going on behind
         | the scenes. This is the most obvious use case for the
         | technology, but none of the big vendors have explored it.
        
           | jomohke wrote:
           | It takes a while to develop a product, and the world only
           | woke up to them mere months ago
        
         | zitterbewegung wrote:
         | Microsoft is doing that to replace Cortana in windows 11
        
         | COGlory wrote:
         | Willow, and the Willow Interference Server have the option to
         | use Vicuna with speech input and TTS
        
         | dbish wrote:
         | OpenAI is pretty likely working on their own (see Kaparthy's
         | "Building a kind of JARVIS @ OreoA[?]"), and Microsoft of
         | course is doing an integration or reinterpretation of Cortana
         | with OpenAI's LLMS (since they are incapable of building their
         | own models nowadays it seems - "Why do we have Microsoft
         | Research at all?"-S.N.), but there's a lot less value in voice
         | driven LLM then there is in actually being able to perform
         | actions. Take Alexa for example, you need a system that can
         | handle smart home control in a predictable, debuggable, way
         | otherwise people would get annoyed. I definitely think you can
         | do this, but the current system as built (and others like Siri
         | and to a lesser use Cortana) all have a bunch of hooks and APIs
         | being used by years and years of rules and software built atop
         | less powerful models. They need to both maintain the current
         | quality and improve on it while swapping out major parts of
         | their system in order to make this work, which takes time.
         | 
         | Not to mention that none of these assistants actually make any
         | money, they all lose money really, and are only worthwhile to
         | big companies with other ways to make cash or drive other parts
         | of their business (phones, shopping, whatever), so there's less
         | incentive for a startup to do it.
         | 
         | I worked on both Cortana and Alexa in the past, thought a lot
         | about trying to build a new version of them ground up with the
         | LLM advancements, and while the tech was all straight forward
         | and even had some new ideas for use cases that are enabled now,
         | could not figure out a business model that would work (and
         | hence, working on something completely different now).
        
       | sandkoan wrote:
       | Relevant: Built this which generalizes to arbitrary regex
       | patterns / context free grammars with 100% adherence and is
       | model-agnostic -- https://news.ycombinator.com/item?id=36750083
        
       | davrous wrote:
       | This is a fantastic concept! It's going to be super useful to map
       | users' intent to API / code in a super reliable way.
        
       | Zaheer wrote:
       | It's not super clear how this differs from another recently
       | released library from Microsoft: Guidance
       | (https://github.com/microsoft/guidance).
       | 
       | They both seem to aim to solve the problem of getting typed,
       | valid responses back from LLMs
        
         | DanRosenwasser wrote:
         | One of the key things that we've focused on with TypeChat is
         | not just that it acts as a specification for retrieving
         | structured data (i.e. JSON), but that the structure is actually
         | valid - that it's well-typed based on your type definitions.
         | 
         | The thing to keep in mind with these different libraries is
         | that they are not necessarily perfect substitutes for each
         | other. They often serve different use-cases, or can be combined
         | in various ways -- possibly using the techniques directly and
         | independent of the libraries themselves.
        
       | trafnar wrote:
       | It's not clear to me how they ensure the responses will be valid
       | JSON, are they just asking for it, then parsing the result with
       | error checking?
        
         | davnicwil wrote:
         | seems like they run the generated response through the
         | typescript type checker, and if it fails, retry using the error
         | message as a further hint to the LLM, until it succeeds.
        
           | anonzzzies wrote:
           | I would expect that, if it doesn't do that even, why
           | bother... that is also trivial to do anyway.
        
             | [deleted]
        
           | verdverm wrote:
           | also some very basic prompt engineering
        
         | esafak wrote:
         | Yes.
         | https://github.com/microsoft/TypeChat/blob/main/src/typechat...
        
       | mahalex wrote:
       | So, it's a thing that appends "please format your response as the
       | following JSON" to the prompt", then validates the actual
       | response against the schema, all in a "while (true)" loop
       | (literally) until it succeeds. This unbelievable achievement is a
       | work of seven people (authors of the blog post).
       | 
       | Honestly, this is getting beyond embarrassing. How is this the
       | world we live in?
        
         | jlnho wrote:
         | It's because not everyone can be as gifted as you.
         | 
         | I think the (arguably very prototypical) implementation is not
         | what's interesting here. It's the concept itself. Natural
         | language may soon become the default interface for most of the
         | computing people do on a day to day basis, and tools like these
         | will make it easier to create new applications in this space.
        
           | Edes wrote:
           | I'm gonna love trying to figure out what query gets the
           | support chatbot to pair me with an actual human so that I can
           | solve something that's off script
        
         | lsh123 wrote:
         | Hm... so how do we know that the actual values in the produced
         | json are correct???
        
         | siva7 wrote:
         | One of the authors is Anders Hejlsberg, the guy behind c# and
         | delphi
        
           | mahalex wrote:
           | That's what makes it even more embarrassing.
        
       | katamaster818 wrote:
       | Hang on, so this is doing runtime validation of an object against
       | a typescript type definition? Can this be shipped as a standalone
       | library/feature? This would be absolutely game changing for
       | validating api response payloads, etc. in typescript codebases.
        
         | tehsauce wrote:
         | maybe this function?
         | 
         | https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...
        
           | katamaster818 wrote:
           | yup, just found that, super neat, I am 100% interested in
           | using this for other runtime validation...
           | 
           | It's interesting because I've always been under the
           | impression the TS team was against the use of types at
           | runtime (that's why projects like
           | https://github.com/nonara/ts-patch exist), but now they're
           | doing it themselves with this project...
           | 
           | I wonder what the performance overhead of starting up an
           | instance of tsc in memory is? Is this suitable for low
           | latency situations? Lots of testing to do...
        
       | robbie-c wrote:
       | This is funny, I have something pretty similar in my code, except
       | it's using Zod for runtime typechecking, and I convert Zod
       | schemas to json schemas and send that to gpt-3.5 as a function
       | call. I would expect that using TypeScript's output is better for
       | recovering from errors than with Zod's output, so I can
       | definitely see the advantage of this.
        
       | bestcoder69 wrote:
       | Why this instead of GPT Functions?
        
         | verdverm wrote:
         | it's basically the same thing, but uses a more concise spec for
         | writing the schema (typescript vs jsonschema)
         | 
         | In the end, both methods try to coax the model into returning a
         | JSON object, one method can be used with any model, the other
         | is tied to a specific, ever changing vendor API
         | 
         | Why would one choose to only support "OpenAI" and nothing else?
        
       | yanis_t wrote:
       | TL;DR: This is ChatGPT + TypeScript.
       | 
       | I'm totally happy to be able to receive structured queries, but
       | I'm also not 100% sure TypeScript is the right tool, it seems to
       | be an overkill. I mean obviously you don't need the power of TS
       | with all its enums, generics, etc.
       | 
       | Plus given that it will run multiple queries in loop, it might
       | end up very expensive for it abide by your custom-mage complex
       | type
        
       | garrett_makes wrote:
       | I built and released something really similar to this (but
       | smaller scope) for Laravel PHP this week:
       | https://github.com/adrenallen/ai-agents-laravel
       | 
       | My take on this is, it should be easy for an engineer to spin up
       | a new "bot" with a given LLM. There's a lot of boring work around
       | translating your functions into something ChatGPT understands,
       | then dealing with the response and parsing it back again.
       | 
       | With systems like these you can just focus on writing the actual
       | PHP code, adding a few clear comments, and then the bot can
       | immediately use your code like a tool in whatever task you give
       | it.
       | 
       | Another benefit to things like this, is that it makes it much
       | easier for code to be shared. If someone writes a function, you
       | could pull it into a new bot and immediately use it. It
       | eliminates the layer of "converting this for the LLM to use and
       | understand", which I think is pretty cool and makes building so
       | much quicker!
       | 
       | None of this is perfect yet, but I think this is the direction
       | everything will go so that we can start to leverage each others
       | code better. Think about how we use package managers in coding
       | today, I want a package manager for AI specific tooling. Just
       | install the "get the weather" library, add it to my bot, and now
       | it can get the weather.
        
       | ameyab wrote:
       | Here's a relevant paper that folks may find interesting:
       | <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt
       | construction method with LLMs for program synthesis, translating
       | natural language user utterances to ODSL programs that can be
       | transpiled to application APIs and then executed.</snip>
       | 
       | https://arxiv.org/abs/2306.03460
        
       ___________________________________________________________________
       (page generated 2023-07-20 23:00 UTC)