[HN Gopher] Native JSON Output from GPT-4
       ___________________________________________________________________
        
       Native JSON Output from GPT-4
        
       Author : yonom
       Score  : 246 points
       Date   : 2023-06-14 19:07 UTC (3 hours ago)
        
 (HTM) web link (yonom.substack.com)
 (TXT) w3m dump (yonom.substack.com)
        
       | aecorredor wrote:
       | Newbie in machine learning here. It's crazy that this is the top
       | post just today. I've been doing the intro to deep learning
       | course from MIT this week, mainly because I have a ton of JSON
       | files that are already classified, and want to train a model that
       | can generate new JSON data by taking classification tags as
       | input.
       | 
       | So naturally this post is exciting. My main unknown right now is
       | figuring out which model to train my data on. An RNN, a GAN, a
       | diffusion model?
        
         | ilaksh wrote:
         | Did you read the article? To do it with OpenAI you would just
         | put a few output examples in the prompt and then give it a
         | function that takes the class and the output parameters
         | correspond to the JSON format you want, or just a string
         | containing JSON.
         | 
         | You could also fine tuned an LLM like Falcon-7b but probably
         | not necessary and nothing to do with OpenAI.
         | 
         | You might also look into the OpenAI Embedding API as a third
         | option.
         | 
         | I would try the first option though.
        
       | chaxor wrote:
       | Is there a decent way of converting to a structure with a very
       | constrained vocabulary? For example, given some input text,
       | converting it to something like {"OID-189": "QQID-378",
       | "OID-478":"QQID-678"}. Where OID and QQID dictionaries can be
       | e.g. millions of different items defined by a description. The
       | rules for mapping could be essentially what looks closest in
       | semantic space to the descriptions given in a dictionary.
       | 
       | I know this should be able to be solvable by local LLMs and bert
       | cosine similarity (it isn't exactly, but it's a start on the
       | idea), but is there a way to do this with decoder models rather
       | than encoder models with other logic?
        
       | 037 wrote:
       | I'm wondering if introducing a system message like "convert the
       | resulting json to yaml and return the yaml only" would adversely
       | affect the optimization done for these models. The reason is that
       | yaml uses significantly fewer tokens compared to json. For the
       | output, where data type specification or adding comments may not
       | be necessary, this could be beneficial. From my understanding,
       | specifying functions in json now uses fewer tokens, but I believe
       | the response still consumes the usual amount of tokens.
        
         | lbeurerkellner wrote:
         | I think one should not underestimate the impact on downstream
         | performance the output format can have. From a modelling
         | perspective it is unclear whether asking/fine-tuning the model
         | to generate JSON (or YAML) output is really lossless with
         | respect to the raw reasoning powers of the model (e.g. it may
         | perform worse on tasks when asked/trained to always respond in
         | JSON).
         | 
         | I am sure they ran tests on this internally, but I wonder what
         | the concrete effects are, especially comparing different output
         | formats like JSON, YAML, different function calling conventions
         | and/or forms of tool discovery.
        
       | imranq wrote:
       | Wouldnt this be possible with a solution like Guidance where you
       | have a pre structured JSON format ready to go and all you need is
       | text: https://github.com/microsoft/guidance
        
       | swyx wrote:
       | i think people are underestimating the potential here for agents
       | building - it is now a lot easier for GPT4 to call other models,
       | or itself. while i was taking notes for our emergency pod
       | yesterday (https://www.latent.space/p/function-agents) we had
       | this interesting debate with Simon Willison on just how many
       | functions will be supplied to this API. Simon thinks it will be
       | "deep" rather than "wide" - eg a few functions that do many
       | things, rather than many functions that do few things. I think i
       | agree.
       | 
       | you can now trivially make GPT4 decide whether to call itself
       | again, or to proceed to the next stage. it feels like the first
       | XOR circuit from which we can compose a "transistor", from which
       | we can compose a new kind of CPU.
        
         | ilaksh wrote:
         | The thing is the relevant context often depends on what it's
         | trying to do. You can give it a lot of context in 16k but if
         | there are too many different types of things then I think it
         | will be confused or at least have less capacity for the actual
         | selected task.
         | 
         | So what I am thinking is that some functions might just be like
         | gateways into a second menu level. So instead of just edit_file
         | with the filename and new source, maybe only
         | select_files_for_edit is available at the top level. In that
         | case I can ensure it doesn't try to overwrite an existing file
         | without important stuff that was already in there, by providing
         | the requested files existing contents along with the function
         | allowing the file edit.
        
           | naiv wrote:
           | I think big context only makes sense for document analysis.
           | 
           | For programming you want to keep it slim. Just like you
           | should keep your controllers and classes slim.
           | 
           | Also people with 32k access report very very long response
           | times of up to multiple minutes which is not feasible if you
           | only want a smaller change or analysis.
        
         | jonplackett wrote:
         | It was already quite easy to get GPT-4 to output json. You just
         | append 'reply in json with this format' and it does a really
         | good job.
         | 
         | GPT-3.5 was very haphazard though and needs extensive
         | babysitting and reminding, so if this makes gpt3 better then
         | it's useful - it does have an annoying disclaimer though that
         | 'it may not reply with valid json' so we'll still have to do
         | some sense checks into he output.
         | 
         | I have been using this to make a few 'choose your own
         | adventure' type games and I can see there's a TONNE of
         | potential useful things.
        
           | reallymental wrote:
           | Is there any publicly available resource replicate your work?
           | I would love to just find the right kind of "incantation" for
           | the gpt-3.5-t or gpt-4 to output a meaningful story arc etc.
           | 
           | Any examples of your work would be greatly helpful as well!
        
             | SamPatt wrote:
             | I'm not the person you're asking, but I built a site that
             | allows you to generate fiction if you have an OpenAI API
             | key. You can see the prompts sent in console, and it's all
             | open source:
             | 
             | https://havewords.ai/
        
           | ignite wrote:
           | > You just append 'reply in json with this format' and it
           | does a really good job.
           | 
           | It does an ok job. Except when it doesn't. Definitely misses
           | a lot of the time, sometimes on prompts that succeeded on
           | previous runs.
        
           | bradly wrote:
           | I could not get GPT-4 to reliably not give some sort of text
           | response, even if was just a simple "Sure" followed by the
           | JSON.
        
             | rytill wrote:
             | Did you try using the API and providing a very clear system
             | message followed by several examples that were pure JSON?
        
           | cwxm wrote:
           | even with gpt 4, it hallucinates enough that it's not
           | reliable, forgetting to open/close brackets and quotes. This
           | sounds like it'd be a big improvement.
        
             | ztratar wrote:
             | Nah, this was solved by most teams a while ago.
        
             | jonplackett wrote:
             | Not that it matters now but just doing something like this
             | works 99% of the time or more with 4 and 90% with 3.5.
             | 
             | It is VERY IMPORTANT that you respond in valid JSON ONLY.
             | Nothing before or after. Make sure to escape all strings.
             | Use this format:
             | 
             | {"some_variable": [describe the variable purpose]}
        
               | SamPatt wrote:
               | 99% of the time is still super frustrating when it fails,
               | if you're using it in a consumer facing app. You have to
               | clean up the output to avoid getting an error. If it goes
               | from 99% to 100% JSON that is a big deal for me, much
               | simpler.
        
               | jonplackett wrote:
               | Except it says in the small print to expect invalid JSON
               | occasionally, so you have to write your error handling
               | code either way
        
               | davepeck wrote:
               | Yup. Is there a good/forgiving "drunken JSON parser"
               | library that people like to use? Feels like it would be a
               | useful (and separable) piece?
        
         | minimaxir wrote:
         | "Trivial" is misleading. From OpenAI's docs and demos, the full
         | ReAct workflow is an order of magnitude more difficult than
         | typical ChatGPT API usage with a new set of constaints (e.g.
         | schema definitions)
         | 
         | Even OpenAI's notebook demo has error handling workflows which
         | was actually necessary since ChatGPT returned incorrect
         | formatted output.
        
           | cjonas wrote:
           | Maybe trivial isn't the right word, but it's still very
           | straight-forward to get something basic, yet really
           | powerful...
           | 
           | ReAct Setup Prompt (goal + available actions) -> Agent
           | "ReAction" -> Parse & Execute Action -> Send Action Response
           | (success or error) -> Agent "ReAction" -> repeat
           | 
           | As long as each action has proper validation and returns
           | meaningful error messages, you don't need to even change the
           | control flow. The agent will typically understand what went
           | wrong, and attempt to correct it in the next "ReAction".
           | 
           | I've been refactoring some agents to use "functions" and so
           | far it seems to be a HUGE improvement in reliability vs the
           | "Return JSON matching this format" approach. Most impactful
           | is that fact that "3.5-turbo" will now reliability return
           | JSON (before you'd be forced to use GPT-4 for an ReAct style
           | agent of modest complexity).
           | 
           | My agents also seem to be better at following other
           | instructions now that the noise of the response format is
           | gone (of course it's still there, but in a way it has been
           | specifically trained on). This could also just be a result of
           | the improvements to the system prompt though.
        
             | [deleted]
        
         | lbeurerkellner wrote:
         | It's interesting to think about this form of computation (LLM +
         | function call) in terms of circuitry. It is still unclear to me
         | however, if the sequential form of reasoning imposed by a
         | sequence of chat messages is the right model here. LLM decoding
         | and also more high-level "reasoning algorithms" like tree of
         | thought are not that linear.
         | 
         | Ever since we started working on LMQL, the overarching vision
         | all along was to get to a form of language model programming,
         | where LLM calls are just the smallest primitive of the "text
         | computer" you are running on. It will be interesting to see
         | what kind of patterns emerge, now that the smallest primitive
         | becomes more robust and reliable, at least in terms of the
         | interface.
        
         | moneywoes wrote:
         | Wow your brand is huge. Crazy growth. i wonder how much these
         | subtle mentions on forums help
        
           | TeMPOraL wrote:
           | They're the only one commenter on HN I noticed keeps writing
           | "smol" instead of "small", and is associated with projects
           | with "smol" in their name. Surely I'm not the only one who
           | missed it being a meme around 2015 or sth., and finds this
           | word/use jarring - and therefore very attention-grabbing?
           | Wonder how much that helps with marketing.
           | 
           | This is meant with no negative intentions. It's just that
           | 'swyx was, in my mind, "that HN-er that does AI and keeps
           | saying 'smol'" for far longer than I was aware of
           | latent.space articles/podcasts.
        
         | ftxbro wrote:
         | > "you can now trivially make GPT4 decide whether to call
         | itself again, or to proceed to the next stage."
         | 
         | Does this mean the GPT-4 API is now publicly available, or is
         | there still a waitlist? If there's a waitlist and you literally
         | are not allowed to use it no matter how much you are willing to
         | pay then it seems like it's hard to call that trivial.
        
           | bayesianbot wrote:
           | "With these updates, we'll be inviting many more people from
           | the waitlist to try GPT-4 over the coming weeks, with the
           | intent to remove the waitlist entirely with this model. Thank
           | you to everyone who has been patiently waiting, we are
           | excited to see what you build with GPT-4!"
           | 
           | https://openai.com/blog/function-calling-and-other-api-
           | updat...
        
           | Tostino wrote:
           | Not GP, but it's still the latter...i've been (im)patiently
           | waiting.
           | 
           | From their blog post the other day: With these updates, we'll
           | be inviting many more people from the waitlist to try GPT-4
           | over the coming weeks, with the intent to remove the waitlist
           | entirely with this model. Thank you to everyone who has been
           | patiently waiting, we are excited to see what you build with
           | GPT-4!
        
             | londons_explore wrote:
             | If you put contact info in your HN profile - especially an
             | email address that matches one you use to login to openai,
             | someone will probably give you access...
             | 
             | Anyone with access can share it with any other user via the
             | 'invite to organisation' feature. Obviously that allows the
             | invited person do requests billed to the inviter, but since
             | most experiments are only a few cents that doesn't really
             | matter much in practice.
        
         | majormajor wrote:
         | GPT-4 was already a massive improvement on 3.5 in terms of
         | replying consistently in a certain JSON structure - I often
         | don't even need to give examples, just a sentence describing
         | the format.
         | 
         | It's great to see they're making it even better, but where I'm
         | currently hitting the limit still in GPT-4 for "shelling out"
         | is about it being truly "creative" or "introspective" about "do
         | I need to ask for clarifications" or "can I find a truly novel
         | away around this task" type of things vs "here's a possible but
         | half-baked sequence I'm going to follow".
        
         | babyshake wrote:
         | What would be an example where there needs to be an arbitrary
         | level of recursive ability for GPT4 to call itself?
        
       | iamflimflam1 wrote:
       | It's pretty interesting how the work they've been doing on
       | plugins has fed into this.
       | 
       | I suspect that they've managed to get a lot of good training data
       | by calling the APIs provided by plugins and detecting when it's
       | gone wrong from bad request responses.
        
       | irthomasthomas wrote:
       | It's a shame they couldn't use yaml, instead. I compared them and
       | yaml uses about 20% fewer tokens. However, I can understand
       | accuracy, derived from frequency, being more important than token
       | budget.
        
         | IshKebab wrote:
         | I would imagine JSON is easier for a LLM to understand (and for
         | humans!) because it doesn't rely on indentation and confusing
         | syntax for lists, strings etc.
        
         | nasir wrote:
         | Its a lot more straightforward to use JSON programmatically
         | than YAML.
        
           | TeMPOraL wrote:
           | It really shouldn't be, though. I.e. not unless you're
           | parsing or emitting it ad-hoc, for example by assuming that
           | an expression like:                 "{" + $someKey + ":" +
           | $someValue + "}"
           | 
           | produces a valid JSON. It does - sometimes - and then it's
           | indeed easier to work with. It'll also blow up in your face.
           | Using JSON the right way - via a proper parser and serializer
           | - should be identical to using YAML or any other equivalent
           | format.
        
         | AdrienBrault wrote:
         | I think YAML actually uses more tokens than JSON without
         | indents, especially with deep data. For example "," being a
         | single token makes JSON quite compact.
         | 
         | You can compare JSON and YAML on
         | https://platform.openai.com/tokenizer
        
       | rank0 wrote:
       | OpenAI integration is going to be a goldmine for criminals in the
       | future.
       | 
       | Everyone and their momma is gonna start passing poorly
       | validated/sanitized client input to shared sessions of a non-
       | deterministic function.
       | 
       | I love the future!
        
       | zyang wrote:
       | Is it possible to fine-tune with custom data to output JSON?
        
         | edwin wrote:
         | That's not the current OpenAI recipe. Their expectation is that
         | your custom data will be retrieved via a function/plugin and
         | then be subsequently processed by a chat model.
         | 
         | Only the older completion models (davinci, curie, babbage, ada)
         | are avaialble for fine-tuning.
        
       | jamesmcintyre wrote:
       | In the openai blog post they mention "Convert "Who are my top ten
       | customers this month?" to an internal API call" but I'm assuming
       | they mean gpt will respond with structured json (we define via
       | schema in function prompt) that we can use to more easily
       | programatically make that api call?
       | 
       | I could be confused but I'm interpreting this function calling as
       | "a way to define structured input and selection of function and
       | then structured output" but not the actual ability to send it
       | arbitrary code to execute.
       | 
       | Still amazing, just wanting to see if I'm wrong on this.
        
         | williamcotton wrote:
         | This does not execute code!
        
           | jamesmcintyre wrote:
           | Ok, yea this makes sense. Also for others curious of the flow
           | here's a video walkthrough I just skimmed through:
           | https://www.youtube.com/watch?v=91VVM6MNVlk
        
       | smallerfish wrote:
       | I will experiment with this at the weekend. Once thing I found
       | useful with supplying a json schema in the prompt was that I
       | could supply inline comments and tell it when to leave a field
       | null, etc. I found that much more reliable than describing these
       | nuances elsewhere in the prompt. Presumably I can't do this with
       | functions, but maybe I'll be able to work around it in the prompt
       | (particularly now that I have more room to play with.)
        
       | loughnane wrote:
       | Just this morning I wrote a JSON object. I told GPT to turn it
       | into a schema. I tweaked that and then gave a list of terms for
       | which I wanted GPT to populate the schema accordingly.
       | 
       | It worked pretty well without any functions, but I did feel like
       | I was missing something because I was ready to be explicit and
       | there wasn't any way for me to tell that to GPT.
       | 
       | I look forward to trying this out.
        
       | mritchie712 wrote:
       | Glad we didn't get to far into adopting something like
       | Guardrails. This sort of kills it's main value prop for OpenAI.
       | 
       | https://shreyar.github.io/guardrails/
        
         | Blahah wrote:
         | Luckily it's for LLMs, not openai
        
         | swyx wrote:
         | i mean only at the most superficial level. she has a ton of
         | other validators that arent superceded (eg SQL is validated by
         | branching the database - we discussed on our pod
         | https://www.latent.space/p/guaranteed-quality-and-structure)
        
           | mritchie712 wrote:
           | yeah, listened to the pod (that's how I found out about
           | guardrails!).
           | 
           | fair point, I should have said: "value prop for our use
           | case"... the thing I was most interested in was how well
           | Guardrails structured output.
        
       | Kiro wrote:
       | Can I use this to make it reliably output code (say JavaScript)?
       | I haven't managed to do it with just prompt engineering as it
       | will still add explanations, apologies and do other unwanted
       | things like splitting the code into two files as markdown.
        
         | minimaxir wrote:
         | Here's a demo of some system prompt engineering which resulted
         | in better results for the older ChatGPT:
         | https://github.com/minimaxir/simpleaichat/blob/main/examples...
         | 
         | Coincidentially, the new gpt-3.5-turbo-0613 model also has
         | better system prompt guidance: for the demo above and some
         | further prompt tweaking, it's possible to get ChatGPT to output
         | code super reliably.
        
         | williamcotton wrote:
         | Here's an approach to return just JavaScript:
         | 
         | https://github.com/williamcotton/transynthetical-engine
         | 
         | The key is the addition of few-shot exemplars.
        
         | sanxiyn wrote:
         | Not this, but using the token selection restriction approach,
         | you can let LLM produce output that conforms to arbitrary
         | formal grammar completely reliably. JavaScript, Python,
         | whatever.
        
       | Xen9 wrote:
       | Marvin Minsky was so damn far ahead of his time with Society of
       | Mind.
       | 
       | Engineering of cognitively advanced multiagent systems will
       | become the area of research of this century / multiple decades.
       | 
       | GPT-GPT > GPT-API in terms of power.
       | 
       | The space of possible combinations of GPT multiagents goes beyond
       | imagination since even GPT-4 goes so.
       | 
       | Multiagent systems are best modeled with signal theory, graph
       | theory and cognitive science.
       | 
       | Of course "programming" will also play a role, in sense of
       | abstractions and creation of systems of / for thought.
       | 
       | Signal theory will be a significant approach for thinking about
       | embedded agency.
       | 
       | Complex multiagent systems approach us.
        
       | edwin wrote:
       | For those who want to test out the LLM as API idea, we are
       | building a turnkey prompt to API product. Here's Simon's recipe
       | maker deployed in a minute:
       | https://preview.promptjoy.com/apis/1AgCy9 . Public preview to
       | make and test your own API: https://preview.promptjoy.com
        
         | yonom wrote:
         | This is cool! Are you using one-shot learning under the hood
         | with a user provided example?
        
           | edwin wrote:
           | BTW: Here's a more performant version (fewer tokens)
           | https://preview.promptjoy.com/apis/jNqCA2 that uses a smaller
           | example but will still generate pretty good results.
        
           | edwin wrote:
           | Thanks. We find few-shot learning to be more effective
           | overall. So we are generating additional examples from the
           | provided example.
        
       | darepublic wrote:
       | I have been using gpt4 to translate natural language to JSON
       | already. And on v4 ( not v3) it hasn't returned any malformed
       | JSON iirc
        
         | yonom wrote:
         | - if the only reason you're using v4 over v3.5 is to generate
         | JSON, you can now use this API and downgrade for faster and
         | cheaper API calls. - malicious user input may break your json
         | (by asking GPT to include comments around the JSON, as another
         | user suggested); this may or may not be an issue (e. g. if one
         | user can influence other users' experience)
        
         | nocsi wrote:
         | What if you ask it to include comments in the JSON explaining
         | its choices
        
       | courseofaction wrote:
       | Nice to have an endpoint which takes care of this. I've been
       | doing this manually, it's a fairly simple process:
       | 
       | * Add "Output your response in json format, with the fields 'x',
       | which indicates 'x_explanation', 'z', which indicates
       | 'z_explanation' (...)" etc. GPT-4 does this fairly reliably.
       | 
       | * Validate the response, repeat if malformed.
       | 
       | * Bam, you've got a json.
       | 
       | I wonder if they've implemented this endpoint with validation and
       | carefully crafted prompts on the base model, or if this is
       | specifically fine-tuned.
        
         | 037 wrote:
         | It appears to be fine-tuning:
         | 
         | "These models have been fine-tuned to both detect when a
         | function needs to be called (depending on the user's input) and
         | to respond with JSON that adheres to the function signature."
         | 
         | https://openai.com/blog/function-calling-and-other-api-updat...
        
       | wskish wrote:
       | here is code (with several examples) that takes it a couple steps
       | further by validating the output json and pydantic model and
       | providing feedback to the llm model when it gets either of those
       | wrong:
       | 
       | https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...
        
       | sublinear wrote:
       | > The process is simple enough that you can let non-technical
       | people build something like this via a no-code interface. No-code
       | tools can leverage this to let their users define "backend"
       | functionality.
       | 
       | Early prototypes of software can use simple prompts like this one
       | to become interactive. Running an LLM every time someone clicks
       | on a button is expensive and slow in production, but _probably
       | still ~10x cheaper to produce than code._
       | 
       | Hah wow... no. Definitely not.
        
       | social_ism wrote:
       | [dead]
        
       | thorum wrote:
       | The JSON schema not counting toward token usage is huge, that
       | will really help reduce costs.
        
         | minimaxir wrote:
         | That is up in the air and needs more testing. Field
         | descriptions, for example, are important but extraneous input
         | that would be tokenized and count in the costs.
         | 
         | At the least for ChatGPT, input token costs were cut by 25% so
         | it evens out.
        
         | stavros wrote:
         | > Under the hood, functions are injected into the system
         | message in a syntax the model has been trained on. This means
         | functions count against the model's context limit and are
         | billed as input tokens. If running into context limits, we
         | suggest limiting the number of functions or the length of
         | documentation you provide for function parameters.
        
         | yonom wrote:
         | I believe functions do count in some way toward the token
         | usage; but it seems to be in a more efficient way than pasting
         | raw JSON schemas into the prompt. Nevertheless, the token usage
         | seems to be far lower than previous alternatives, which is
         | awesome!
        
       | adultSwim wrote:
       | _Running an LLM every time someone clicks on a button is
       | expensive and slow in production, but probably still ~10x cheaper
       | to produce than code._
        
         | edwin wrote:
         | New techniques like semantic caching will help. This is the
         | modern era's version of building a performant social graph.
        
           | daralthus wrote:
           | What's semantic caching?
        
             | edwin wrote:
             | With LLMs, the inputs are highly variable so exact match
             | caching is generally less useful. Semantic caching groups
             | similar inputs and returns relevant results accordingly. So
             | {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with
             | meat sauce"} could return the same cached result.
        
               | m3kw9 wrote:
               | Or store as sentence embedding and calculate the vector
               | distance, but creates many edge cases
        
       | minimaxir wrote:
       | After reading the docs for the new ChatGPT function calling
       | yesterday, it's structured and/or typed data for GPT input or
       | output that's the key feature of these new models. The ReAct flow
       | of tool selection that it provides is secondary.
       | 
       | As this post notes, you don't even need to the full flow of
       | passing a function result back to the model: getting structured
       | data from ChatGPT in itself has a lot of fun and practical use
       | cases. You could coax previous versions of ChatGPT to "output
       | results as JSON" with a system prompt but in practice results are
       | mixed, although even with this finetuned model the docs warn that
       | there still could be parsing errors.
       | 
       | OpenAI's demo for function calling is not a Hello World, to put
       | it mildly: https://github.com/openai/openai-
       | cookbook/blob/main/examples...
        
         | tornato7 wrote:
         | IIRC, there's a way to "force" LLMs to output proper JSON by
         | adding some logic to the top token selection. I.e. in the
         | randomness function (which OpenAI calls temperature) you'd
         | never choose a next token that results in broken JSON. The only
         | reason it wouldn't would be if the output exceeds the token
         | limit. I wonder if OpenAI is doing something like this.
        
           | ManuelKiessling wrote:
           | Note that you don't necessarily need to have the AI output
           | any JSON at all -- simply have it answer when being asked for
           | the value to a specific JSON key, and handle the JSON
           | structure part in your hallucinations-free own code:
           | https://github.com/manuelkiessling/php-ai-tool-bridge
        
             | naiv wrote:
             | Thanks for sharing!
        
           | woodrowbarlow wrote:
           | the linked article hypothesizes:
           | 
           | > I assume OpenAI's implementation works conceptually similar
           | to jsonformer, where the token selection algorithm is changed
           | from "choose the token with the highest logit" to "choose the
           | token with the highest logit which is valid for the schema".
        
           | senko wrote:
           | It would seem not, as the official documentation mentions the
           | arguments may be hallucinated or _be a malformed JSON_.
           | 
           | (except if the meaning is the JSON syntax is valid but may
           | not conform to the schema, but they're unclear on that).
        
             | sanxiyn wrote:
             | For various reasons, token selection may be implemented as
             | upweighting/downweighting instead of outright ban of
             | invalid tokens. (Maybe it helps training?) Then the model
             | could generate malformed JSON. I think it is premature to
             | infer from "can generate malformed JSON" that OpenAI is not
             | using token selection restriction.
        
           | sanxiyn wrote:
           | Note that this (token selection restriction) is even
           | available on OpenAI API as logit_bias.
        
             | newhouseb wrote:
             | But only for the whole generation. So if you want to
             | constrain things one token at a time (as you would to force
             | things to follow a grammar) you have to make fresh calls
             | and only request one token which makes things more or less
             | impractical if you want true guarantees. A few months ago I
             | built this anyway to suss out how much more expensive it
             | was [1]
             | 
             | [1] https://github.com/newhouseb/clownfish#so-how-do-i-use-
             | this-...
        
           | have_faith wrote:
           | How would a tweaked temp enforce a non broken output exactly?
        
             | isoprophlex wrote:
             | Not traditional temperature, maybe the parent worded it
             | somewhat obtusely. Anyway, to disambiguate...
             | 
             | I think it works something like this: You let something
             | akin to a json parser run with the output sampler. First
             | token must be either '{' or '['; then if you see [ has the
             | highest probability, you select that. Ignore all other
             | tokens, even those with high probability.
             | 
             | Second token must be ... and so on and so on.
             | 
             | Guarantee for non-broken (or at least parseable) json
        
             | sanxiyn wrote:
             | It's not temperature, but sampling. Output of LLM is
             | probabilistic distribution over tokens. To get concrete
             | tokens, you sample from that distribution. Unfortunately,
             | OpenAI API does not expose the distribution. You only get
             | the sampled tokens.
             | 
             | As an example, on the link JSON schema is defined such that
             | recipe ingredient unit is one of
             | grams/ml/cups/pieces/teaspoons. LLM may output the
             | distribution grams(30%), cups(30%), pounds(40%). Sampling
             | the best token "pounds" would generate an invalid document.
             | Instead, you can use the schema to filter tokens and sample
             | from the filtered distribution, which is grams(50%),
             | cups(50%).
        
         | behnamoh wrote:
         | What's the implication of this new change for Microsoft
         | Guidance, LMQL, Langchain, etc.? It looks like much of their
         | functionality (controlling model output) just became obsolete.
         | Am I missing something?
        
           | [deleted]
        
           | lbeurerkellner wrote:
           | If anything this removes a major roadblock for
           | libraries/languages that want to employ LLM calls as a
           | primitive, no? Although, I fear the vendor lock-in
           | intensifies here, also given how restrictive and specific the
           | Chat API.
           | 
           | Either way, as part of the LMQL team, I am actually pretty
           | excited about this, also with respect to what we want to
           | build going forward. This makes language model programming
           | much easier.
        
             | koboll wrote:
             | `Although, I fear the vendor lock-in intensifies here, also
             | given how restrictive and specific the Chat API.`
             | 
             | Eh, would be pretty easy to write a wrapper that takes a
             | functions-like JSON Schema object and interpolates it into
             | a traditional "You MUST return ONLY JSON in the following
             | format:" prompt snippet.
        
             | londons_explore wrote:
             | > Although, I fear the vendor lock-in intensifies here,
             | 
             | The openAI API is super simple - any other vendor is free
             | to copy it, and I'm sure many will.
        
       | m3kw9 wrote:
       | It works pretty good. You define a few "function" and enter a
       | description on what it does, when user prompts, it will
       | understand the prompt and tell you which likely "function" to
       | use, which is just the function name. I feel like this is a new
       | way to program, a sort of fuzzy logic type of programming
        
         | Sai_ wrote:
         | > fuzzy logic
         | 
         | Yes and no. While the choice of which function to call is
         | dependent on an llm, ultimately, you control the function
         | itself whose output is deterministic.
         | 
         | Even today, given an api, people can choose to call or not call
         | based on some factor. We don't call this fuzzy logic. E.g.,
         | people can decide to sell or buy stock through an api based on
         | some internal calculations - doesn't make the system "fuzzy".
        
       | jonplackett wrote:
       | This is useful, but for me at least, GPT-4 is unusable because it
       | sometimes takes 30 seconds + to reply to even basic queries.
        
         | m3kw9 wrote:
         | Also the rate limit is pretty bad if you want to release any
         | type of app
        
       | emilsedgh wrote:
       | Building agents that use advanced API's was not really practical
       | until now. Things like Langchain's Structured Agents worked
       | somewhat reliably, but due to the massive token count it was so
       | slow, the experience was _never_ going to be useful.
       | 
       | Due to this, the performance in which our agent processes results
       | has improved 5-6 times and it does actually do a pretty good job
       | of keeping the schema.
       | 
       | One problem that is not resolved yet is that it still
       | hallucinates a lot of attributes. For example we have tool that
       | allows it to create contacts in user's CRM. I ask it to:
       | 
       | "Create contacts for top 3 Barcelona players:.
       | 
       | It creates an structure like this"
       | 
       | 1. Lionel Messi - Email: lionel.messi@barcelona.com - Phone
       | Number: +1234567890 - Tags: Player, Barcelona
       | 
       | 2. Gerard Pique - Email: gerard.pique@barcelona.com - Phone
       | Number: +1234567891 - Tags: Player, Barcelona
       | 
       | 3. Marc-Andre ter Stegen - Email: marc-terstegen@barcelona.com -
       | Phone Number: +1234567892 - Tags: Player, Barcelona
       | 
       | And you can see it hallucinated email addresses and phone
       | numbers.
        
         | 037 wrote:
         | I would never rely on an LLM as a source of such information,
         | just as I wouldn't trust the general knowledge of a human being
         | used as a database. Does your workflow include a step for
         | information search? With the new json features, it should be
         | easy to instruct it to perform a search or directly feed it the
         | right pages to parse.
        
         | pluijzer wrote:
         | ChatGPT can be usefully for many things, but you should really,
         | not use it if you want to retrieve factual data. This might
         | partly be resolved by querying the internet like bing does but
         | purely on the language model side these hallucinations are just
         | an unavoidable part of it.
        
           | Spivak wrote:
           | Yep, it's _always_ _always_ write code  / query / function /
           | whatever you need that you would parse and retrieve the data
           | from an external system.
        
       | dang wrote:
       | Recent and related:
       | 
       |  _Function calling and other API updates_ -
       | https://news.ycombinator.com/item?id=36313348 - June 2023 (154
       | comments)
        
         | minimaxir wrote:
         | IMO this isn't a dupe and shouldn't be penalized as a result.
        
           | dang wrote:
           | It's certainly not a dupe. It looks like a follow-up though.
           | No?
        
             | minimaxir wrote:
             | More a very timely but practical demo.
        
               | dang wrote:
               | Ok, thanks!
        
       | EGreg wrote:
       | Actually I'm looking to take GPT-4 output and create file formats
       | like keynote presentations, or pptx. Is that currently possible
       | with some tools?
        
         | yonom wrote:
         | I would recommend creating a simplified JSON schema for the
         | slides (say, presentation is an array of slides, each slide has
         | a title, body, optional image, optional diagram, each diagram
         | is one of pie, table, ... Then use a library to generate the
         | pptx file from the content generated.
        
           | EGreg wrote:
           | Library? What library?
           | 
           | It seems to me that a Transformer should excel at
           | Transforming, say, text into pptx or pdf or HTML with CSS
           | etc.
           | 
           | Why don't they train it on that? So I don't have to sit there
           | with manually written libraries. It can easily transform HTML
           | to XML or text bullet points so why not the other formats?
        
             | yonom wrote:
             | I don't think the name "Transformer" is meant in the sense
             | of "transforming between file formats".
             | 
             | My intuition is that LLMs tend to be good at things human
             | brains are good at (e.g. reasoning), and bad at things
             | human brains are bad at (e.g. math, writing pptx binary
             | files from scratch, ...).
             | 
             | Eventually, we might get LLMs that can open PowerPoint and
             | quickly design the whole presentation using a virtual mouse
             | and keyboard but we're not there yet.
        
               | EGreg wrote:
               | It's just XML They can produce HTML and transform python
               | into php etc.
               | 
               | So why not? It's easy for them no?
        
         | stevenhuang wrote:
         | apparently pandoc also supports pptx
         | 
         | so you can tell GPT4 to output markdown, then use pandoc to
         | convert that markdown to pptx or pdf.
        
           | edwin wrote:
           | Here you go: https://preview.promptjoy.com/apis/m7oCyL
        
       ___________________________________________________________________
       (page generated 2023-06-14 23:00 UTC)