[HN Gopher] A Journey building a fast JSON parser and full JSONPath ___________________________________________________________________ A Journey building a fast JSON parser and full JSONPath Author : atomicnature Score : 105 points Date : 2023-10-12 06:36 UTC (14 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | tomthe wrote: | I like the "Simple Encoding Notation" (SEN) of the underlying | library: https://github.com/ohler55/ojg/blob/develop/sen.md | | " A valid example of a SEN document is: | | { one: 1 two: 2 array: [a b c] yes: true } " | koito17 wrote: | An interesting observation: if you move the colon on the | opposite side then you get valid EDN data! | | {:one 1 :two 2 :array [a b c] :yes true} | | cf. https://github.com/edn-format/edn | | Likewise, commas are considered whitespace. They are sometimes | added to make lengthy maps easier to read. | kubanczyk wrote: | > Which is the same as the following JSON: { | "one": 1, "two": 2, "array": ["a", "b", "c"], | "yes": true } | | That example also caught my attention, but in a bad way. It | looks just like a comeback of one of the worst ideas of YAML. | | My immediate question would be what's the JSON for this SEN | I've crafted: { array: [string1 string2 | "true" true True TRUE yes y] } | | For more fun, there's a single problematic entry here, can you | spot it?: 1.20.4 1.204.4 1.20 | 1.204 1.20.0 1.20.00 1.20-rc2 | | Or, level expert, there's exactly one problem here as well: | 0a1f 0bfd 0c0c 0d01 0e02 | tomthe wrote: | Thank you for thinking more deeply about this than I did! But | I do not see a problem in your first example, only true is | the true true (according to my browser and the linked | definition on https://www.json.org) | | I don't get your other examples, can you explain? I assumed | that 1.20.4 is not a valid SEN entry, because it starts with | a digit but is not a number. | jarym wrote: | I'm not following: ` { array: [string1 string2 "true" true | True TRUE yes y] } ` Doesn't look like a valid SEN or JSON. | The `y` `yes`, `True`, TRUE` aren't valid | keywords/variables/consts and `string1` and `string2` look | like variable references which aren't something SEN or JSON | support. The closest valid thing I can imagine is: | | ` { array: ["string1" "string2" "true" true "True" "TRUE" | "yes" "y"] } ` | mjpa86 wrote: | aren't they implied strings? If "[a b c]" is an array of 3 | strings, "a", "b" and "c", then True is a string "True". | That's the problem. | jarym wrote: | I must be missing why you think they're implied strings - | I don't see that in the spec. What I do see is: | | "Strings can also be delimited with a single quote | character which allows for a string to be either "abc" or | 'abc'." | | There's no mention of having a string without a | delimeter. | ReleaseCandidat wrote: | The example below is this: | | > array: [a b c] | jarym wrote: | ohhh I see it now, that looks like a recipe for... | issues. | pjc50 wrote: | Let me guess: 0e02 is interpreted as floating point? | k_process wrote: | Ditto 1.20, and when interpreted as floating point the | trailing zero loses significance. So as a version this is | indistinguishable from 1.2 | lazyasciiart wrote: | Am I missing something about the definition of "tokenStart"? It | can be 'letter' or three other characters: but all those other | characters (and more) are already in the definition of | 'letter'? | pjc50 wrote: | See the comment upthread about S-expressions, but .. given that | this doesn't have a marker for "atom" which it badly needs, | isn't it strictly worse than S-expressions. | ithkuil wrote: | reminder of recent efforts at standardizing JSONPath: | https://datatracker.ietf.org/wg/jsonpath/about/ | baz00 wrote: | Is JSON XML yet? Nearly! | | I'm going to invent Baz's 11th law of computing here: any data | format that isn't XML will evolve into a badly specified version | of XML over time. | kevingadd wrote: | With respect for the pain everyone has suffered through due to | XML... at this point I prefer XML with a good schema to JSON | any day, even if it's more verbose and more awkward to hand- | edit. It's just so much easier to validate it or generate code | to handle it, and you get things like XSLT or XPath if you want | them. | Deukhoofd wrote: | I mean, you can use JSON Schema as well to have similar | functionality to XML Schema. | znpy wrote: | That's exactly the point being made: json is becoming xml. | tgv wrote: | The point also feels like passive-aggressively ignoring | the reason why people use JSON and not XML. | w23j wrote: | Can you name some of these reasons? Or give me link? | Honest question! | alpaca128 wrote: | One reason would be massively reduced syntax overhead and | better readability. I've seen plenty of XML files where | XML syntax makes up more than 50% of the file's content, | and trying to read the actual content is tedious. Now | JSON isn't ideal either - technically you could get rid | of all commas, colons, and the quotes around most keys - | but I sure prefer `{"foo": "some \"stuff\""}` over | something like `<foo><![CDATA[some <stuff>]]></foo>` | w23j wrote: | I agree, I would prefer JSON (or YAML) for example for | configuration files. That is for stuff that humans | actually read. I was thinking about using JSON/XML as a | data exchange format between computers, because the | context of this discussion has revolved about things like | JSON/XML-Schema, JSON/XPath and SOAP/OpenAPI. There is a | large trend to replace XML with JSON as data format for | inter machine communication, and it is confusing to me. | tgv wrote: | XML is too unwieldy for human consumption. Editing it is | error-prone, and those schema-directed editors are even | worse, because everything requires clicking and clicking | and clicking. | | For machine-to-machine communication, it's very well | suited, but most data is simple enough, and the XML | libraries I've used tended to be --let's say-- over- | engineered, while there are no hoops to jump through when | you want to parse JSON. | | And one thing I always disliked about XML was the CDATA | section: it makes the message even harder to read, and | it's not like you're going to use that binary data | unparsed/unchecked. | | XML just tried to formalize data transfer and description | prematurely, which made it rigid and not even | sufficiently powerful. I must say that XSLT and XPath | were great additions, though. | eviks wrote: | It's unreadable | Devasta wrote: | Honestly for a lot of people they use JSON because thats | what they have always used; XMLs heyday was like 15 years | ago, you could be a very senior engineer now and have | never touched XML. | w23j wrote: | I haven't looked at JSON Schema in detail so please correct | me if I am wrong, but I had the impression that the JSON | Schema specification is still largely unfinished and | evolving. That means you need to know which version the | tool you use supports. And when I was looking for JSON | Schema validators for Java all I found were projects on | GitHub, which often were abandoned and referred the user to | another GitHub project which was also abandoned. There does | not seem to be support from an established project or | vendor. | | Compare that to XML where we have a plethora of established | tools (Woodstoxx, JAXB, etc.). | | What I have trouble to understand, which everybody else | just seems to accept as obvious, is why one would take on | these problems? Is JSON Schema more powerful than XML | Schema? Does the use of JSON have advantages over using | XML? When we are talking about a client program calling a | server API with JSON/XML, why do we care about the format | of data exchanged? What advantages does JSON have in this | case in contrast to XML (or for that matter a binary format | like Protocol Buffers)? Isn't this the most boring part of | the application, which you would want to just get out of | the way and work? What are the advantages of JSON over XML | that would lead me to deal with the problems of evolving | specifications and unreliable tooling? | | (And just to repeat, since everybody seems to have a | different opinion about this than me, I must be missing | something and really would like to learn what!) | pydry wrote: | All schema languages are a bit like that. You can almost | always add another layer on top of the validation and | screw down the validation a bit harder. The strictest | validation will only be achievable using a turing | complete language. | | OpenAPI is probably used a bit more than json schema, but | it's contextually limited to APIs (which, to be fair, is | mostly what JSON is used for). | w23j wrote: | I probably phrased my question poorly. Why would I use a | tool which is not or poorly maintained for a probably | already outdated version of a specs, when I can use | something else, that has been used for years by countless | companies in productions? The advantages must be huge. | And I don't know what they are. | | OpenAPI is another example. There are threads on hacker | news about generating code from OpenAPI specs. These | always seem to say "oh, yes don't use tool X, use tool Y | it does not have that problem, although it also doesn't | support Z". The consensus seems to be to not generate | code from an OpenAPI specification but to just use it as | documentation, since all generators are more or less | broken. Contrast that with for example JAXB (which is not | an exact replacement I know), which has been battle | tested for years. | pydry wrote: | I've used jsonschema and it was fine. I didn't think it | was poorly maintained. By contrast with most XML | libraries I've used had a myriad of broken edge cases and | security vulnerabilities brought on by its | overcomplication and the maintainers' inability to keep | up. | | >The consensus seems to be to not generate code from an | OpenAPI specification but to just use it as | documentation, since all generators are more or less | broken. | | OpenAPI still functions just fine as a means of | documentation and validation. | | I'm allergic to all forms of code generation, to be | honest. If there is an equivalent of XML in this I | imagine it's even more horrendous. I can just imagine | chasing down compiler errors indirectly caused by an XML | switch not set _shudder_. | | >Contrast that with for example JAXB | | JAXB looks like a bolt on to work around XML's | deficiencies. There's no need to marshal JSON to special | funky data structures in your code because lists and | hashmaps are already built in. You can just use those. An | equivalent doesn't need to exist. | | For schema validation, I think XML has, what, 3 ways of | doing it? DTDs? XMLSchema? And now JAXB does a bit of | that on the side too? Does that sound like a healthy | ecosystem to you? Because it sounds like absolute dogshit | to me. | Deukhoofd wrote: | > I'm allergic to all forms of code generation, to be | honest. If there is an equivalent of XML in this I | imagine it's even more horrendous. I can just imagine | chasing down compiler errors indirectly caused by an XML | switch not set shudder. | | WSDL comes to mind | w23j wrote: | I see. Thanks for taking the time to reply! | Deukhoofd wrote: | > That means you need to know which version the tool you | use supports | | Honestly the same issue with versioning has been my | primary issue with XML Schemas in the past. XSD 1.1 for | example came out over a decade ago, but is still very | badly supported in most tooling I tried out. | | > When we are talking about a client program calling a | server API with JSON/XML, why do we care about the format | of data exchanged? | | We shouldn't care much, beyond debuggability (can a | developer easily see what's going on), (de)serialization | speed, and bandwith use. JSON and protobuf tend to be a | decent chunk smaller than XML, JSON is a bit easier to | read, and Protobuf is faster to (de)serialize. This means | they should generally be preferred. | | In the case of a client program calling a server API I'd | personally have the server do the required validation on | a deserialized object, instead of doing so through a | schema. This is generally easier to work on for all | developers in my team, and gets around all the issues | with tooling. The only real reason I use schemas is when | I'm writing a file by hand, and want autocompletion and | basic validations. In that case versioning and tooling | issues are completely in my control. | Traubenfuchs wrote: | As someone who greatly enjoyed the rigidity of SOAP/xml, which | made proper architectural planning and careful deprecation | mandatory, I wonder where we went so wrong. I feel like it's | all connected to the impreciseness and typelessness of | JavaScript. SOAP/xml to generate well defined client and server | entry points in Java is how things should be done and SoapUI | was a pleasure to use. | Devasta wrote: | Honestly, I think a big reason is that Stack Overflow didn't | exist at XMLs peak, so you had people generating XML by | concatenation, to predictably disastrous results. | | One of the first XSLT transforms I was ever given to maintain | generated XML by the same method. | <xsl:text><PRICE></xsl:text><xsl:value-of | select="PRICE"/><xsl:text></PRICE></xsl:text> and so | on. | pjc50 wrote: | > made proper architectural planning and careful deprecation | mandatory | | That's why it never caught on. | | The ability of JSON/Javascript to tape together kinda-working | solutions _before and instead of_ any kind of specification | works is hugely powerful, because it allows iterating on the | requirements by having actual users use the app. | touisteur wrote: | I mean I've always found this enlightening, when hearing | json is 'simple': | https://seriot.ch/projects/parsing_json.html | aidos wrote: | The S stands for Simple | | http://harmful.cat-v.org/software/xml/soap/simple | PhilipRoman wrote: | Thanks for sharing this, somehow I missed this while | reading cat-v. Definitely applicable to a couple of other | technologies too... | another2another wrote: | Oh that was a good read. | | I lived through all that, and can totally understand why | people turned away in disgust and agreed on REST instead. | usrusr wrote: | In my experience SOAP was near-universally used as an RPC | encoding, where the schema was whatever types the exposed API | defined and no-one gave the tiniest anything about the data | representation on the wire. If you insisted on schema first | SOAP, people looked at you as if you had fallen through a | dimensional gate from an alternative history parallel | universe full of Zeppelins and domesticated dinosaurs. JSON | on the other hand came riding on that REST wave, where the | data models on the wire were given more consideration than | just an outcome of the serializion process best never looked | at. Some people even considered idempotency more than just a | funny sequence of letters. No, I'm not surprised at all the | SOAP mindset disappeared. (But SoapUI was really a pleasure | to use, spent an ungodly amount of hours staring at that | thing, never in anger) | nine_k wrote: | I'd say there must exist a more ancient law, stating that a | representation of s-expressions is reinvented whenever a need | arises for a generic data format. | | S-expressions are the most direct representation of a tree: | (root node node ...). Trees are everywhere, they represent any | nested structure; lists are logically a subset of trees. | | XML is a tree. It has the weird "attribute" node types, a | legacy of SGML text markup notation. JSON is a tree, obviously. | So is protobuf, thrift, etc. They all could be serialized as | s-expressions. | | Now, a schema that destined a tree is also a tree. Hence XML | Schema, JSONSchema, etc. | | More, an abstract program that describes a transformation of a | tree is also a tree; this products homoiconic languages, from | XSLT to Lisps. | | There is nothing special about XML; it's just a particular case | of a generic law. | baz00 wrote: | Completely agree on all points. But there is something | special about XML: everyone has failed to make something | better. | alpaca128 wrote: | If you said nothing better became an industry standard I | could see your point, but how exactly is XML better than | s-expressions? Or, if you want something less generalized, | KDL (which is roughly XML with 90% less syntax overhead)? | baz00 wrote: | XML has superset defined functionality of standardised | schemas, transformations and query. The same is not true | for s-expressions. | | I've not looked at KDL before but a quick scan suggests | it's interesting. I will look into it. | jerf wrote: | XML has a lot more defined structure than s-expressions. | S-expressions make cute demos when people just take some | chunk of data and blast out a conversion to drop into the | conversation and hold it up as a standard, but it's not a | fair comparison to take something actually defined and | then splat out an undefined ad-hoc format in the spur of | the moment. Of course the latter looks awesome by | comparison; the example was literally structured to look | awesome in this exact context. | | When you read the s-expression alternatives proposed to | XML with an eye to "How would I actually code against | this? How would I actually convince multiple people to | use the _exact_ same standard as me? How do I support | _all_ the use cases of interest to me? " they completely | fall apart. They're _too_ simple. The very fact I have to | use the plural for _s-expression alternative_ since no | two of them are every _quite_ the same says quite a bit. | | When you need that structure, XML is actually a very good | choice; the error people made was using it when they | didn't need that structure. Note how much of the | complaint about using XML, even in this very | conversation, is (quite correctly!) "what do I do with | all these extra structural elements?" If you don't have a | clear answer to that, don't use XML. If you do, don't jam | it into s-exprs or JSON either, you end up with an even | worse mess. | alpaca128 wrote: | > How would I actually convince multiple people to use | the exact same standard as me? | | The same way you agree on an XML schema? I don't know if | I quite understand what you want to say - as I see it | both are tree structured formats which means they both | can represent the same information, just that | s-expressions are less verbose but XML has more existing | tooling for defining & validating a structure. Though the | latter is more an aspect of the ecosystem than the format | itself. | dragonwriter wrote: | > Completely agree on all points. But there is something | special about XML: everyone has failed to make something | better. | | XML's decline from its peak of adoption mean lots of people | working with data disagree with you. | hardware2win wrote: | Why focus on s expr then? | | Every data format will eventually evolve into a tree | nine_k wrote: | S-exprs are just the simplest. | cxr wrote: | > I'd say there must exist a more ancient law, stating that a | representation of s-expressions is reinvented whenever a need | arises for a generic data format. | | That more ancient law would Greenspun's tenth rule, FYI--or a | corollary to it, at least. | | The law proposed here (as Baz's 11th law) was intended to be | a humorous and obvious pastiche crafted with Greenspun's quip | in mind, with the idea being that the reader would be in on | the joke (being already familiar with it). | | 1. <https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule> | tannhaeuser wrote: | JSON can be parsed using SGML [1], by instructing SGML to | interpret JSON tokens such as colons, quotation marks, and | curly braces as markup. The underlying technique for custom | lightweight markup is called SHORTREF and can be applied to | markdown etc. as well. | | So considering XML is subsetted from SGML, I guess the answer | is closer to yes than thought. | | Though probably it's worth citing the following quote from that | paper: | | > _If the sweet spot for XML and SGML is marking up "prose | documents", the sweet spot for JSON is collections of atomic | values._ | | [1]: | https://www.balisage.net/Proceedings/vol17/html/Walsh01/Bali... | lifthrasiir wrote: | > JSON can be parsed using SGML, [...]. So considering XML is | subsetted from SGML, I guess the answer is closer to yes than | thought. | | In the other words, SGML was way too powerful than what we | actually needed. Of course we are with the benefit of | hindsight though. | tannhaeuser wrote: | > _SGML was way too powerful_ | | The widespread use of markdown and other lightweight markup | rather than rigid XML-style fully tagged markup for | authoring tells otherwise though. And so does the continued | use of HTML chock full of SGMLisms such as tag inference | and attribute shortforms that weren't included in the XML | subset/profile when XML (XHTML) was created to replace | HTML. | | So while XML isn't used as an authoring format on the web | (nor as delivery format), it's still useful as canonical | archival format I guess. | lifthrasiir wrote: | SGML is a meta-language unlike every other example in | your reply, so the prevalence of such semi-structured | languages (including SGML applications) doesn't justify | SGML itself. Even HTML is not exactly an SGML application | (except for HTML 4), and in my knowledge implementing | HTML with a generic SGML implementation was rarely done. | So the fact that SGML is a near superset of both JSON and | XML doesn't mean much. | dgellow wrote: | XML has other abominations such as XSLT. | baz00 wrote: | I'd definitely rather write XSLT than YAML festering in the | same pot as go-template. | strken wrote: | People say this, and yet XML's origins as a markup language | make it baffling as a data format. No sane human being should | choose a data format with such confusion between properties | that no user knows whether to go with <Foo> | <Shininess>HIGH</Shininess> <Luck>7</Luck> | </Foo> | | or <Foo shininess="HIGH" luck="7" /> | | and yet countless thousands decided to do just that, for | reasons that are totally inexplicable to me. | | Obviously as a markup language this is fine; as a _data format_ | it 's bizarre, since the division between attribute vs child | doesn't match most in-memory data structures. | pydry wrote: | Yeah, it's a weird attitude. XML died out because it was an | overcomplicated design-by-committee mess. Quite apart from | the fact that meant it wouldn't map cleanly to lists and | hashmaps, necessitating a query language it also led to | embarrassing debacles like the billion laughs vulnerability - | problem in the very core of XML. | | With some niche exceptions where it has clung on, XML | basically died. It's time to move on. The fact that we do | similar sorts of stuff with JSON like data transformations | and schema validation does not, in any way, shape or form, | invalidate its flaws. | baz00 wrote: | XML is fine. | | The overcomplicated mess was the WS-* garbage. | smikhanov wrote: | no user knows | | The described problem literally doesn't exist in XML. Your | XML-validating editor will check your document against the | schema and will not allow for an attribute where the sub- | element is required and vice versa. | tyingq wrote: | I believe they mean for designing the schema in the first | place. Meaning the impedance match between JSON and their | chosen language is usually more natural. | tannhaeuser wrote: | I'm not disagreeing but the reason XML was used as data | format is that it has native support in browsers (remember | XML was created as a simplified SGML subset for eventually | replacing HTML), the idea being that you can display service | payloads via simple stylesheet applications or element | replacement/decoration rather than having to rely on | JavaScript or other Turing-complete environment for arbitrary | scripting which was seen as having no place as a central | technique in classic document-oriented browsing. | | JSON became only popular because of similar opportunistic | effects (ie being already part of the stack via eval()). If | you look at how typical non-JS backends such as Java or .net | deal with service request/response data, there's absolutely | no advantage for either JSON or XML - both are represented as | class/structure and (de-)serialized via binding frameworks | and annotations. | strken wrote: | There's no particular machine advantage to any human- | readable format over an equivalent binary format, sure. | However, if you look at human-"readable" formats that | predate XML (like HL7[0]) you can appreciate the advantages | of a tree-like structure with labelled fields when it comes | to human comprehension. I think XML is often difficult for | humans to read, and certainly to write, and since this is | the only reason to use either language it's an important | factor. | | I guess you could argue we should all use Protocol Buffers, | pickle, Thrift, etc.[1] and only switch to JSON for | debugging. I wouldn't disagree. Protobuf is apparently | faster than JSON in the browser. | | [0] See https://www.interfaceware.com/hl7-message-structure | for an example message | | [1] I missed Corba and spent the early years of my | professional life trying not to touch the SOAP, just in | case I dropped it | nrclark wrote: | JSON does have one advantage over XML: it maps cleanly onto | primitive types in Python and many other languages. XML | attributes don't really have an unambiguous way to be | represented using list and map primitives (other than maybe | an "everything is a map" model, which sucks from a | usability perspective). | aforwardslash wrote: | I beg to differ. JSON only provides a subset of commonly | available data types (quick example: show me a proper 64 | bit int, a proper date type or a proper money type). And | "everything is a map" is pretty much how python works, | but they prefer to call it dicts. I could go on and | explain how JSON is evolving to have exactly all the | problems of xml without any of the advantages, and how | people keep reinventing the wheel (pun intended for | python fans) ignoring why xml is the way it is (and it is | quite more robust than anything json). Xml biggest defect | was verbosity, specially in a http 1.0 context. With http | 1.1 (so nowadays, legacy tech) , most of these problems | disappear. I know, parsing of json is quite simple - the | reason is the format is lacking. | the8472 wrote: | > since the division between attribute vs child doesn't match | most in-memory data structures. | | vtables are attributes for pointers. hypergraphs (as used in | some tagging systems) have attributes on everything, | including attributes. CBOR has optional type-tags on its | items. | baz00 wrote: | Actually you should never use attributes in XML at all to | represent data. Your first example is correct. | | Everyone is just confused because people who didn't know this | designed HTML. But also everyone is confused because HTML and | XML aren't necessarily related other than some parentage in | SGML. | tannhaeuser wrote: | Nope. In markup, _attributes_ are for "metadata", that is, | anything not rendered to the reader/user, as opposed to | (element) _content_. The entire purpose of markup is to | provide a rich text format via decorating plain text usable | from any text editor. Data exchange, or any other | application where there is no concept of "rendering to the | user", is no primary application for markup. | | If anything, what's wrong with HTML in this respect is that | JavaScript and CSS can be put inline into content when | these should always go into attributes and/or external | resources linked via src/href attributes. And this flaw | shows indeed where HTML deviates from SGML proper: when the | style and script elements were introduced, their "content" | needed to be put into SGML comment tags <!-- and --> such | that browsers wouldn't render JavaScript snd CSS as text | content. I mean, who came up with this brain-dead design? | | But CSS is a lost cause anyway. What does it tell you about | its designers that they thought, starting with a markup | language having pretty intense syntactic constructs | already, to tunnel _yet another_ item=value syntax in | regular markup attributes? Like replacing <h2 | bgcolor=black> by <h2 style="background-color: black"> and | then claiming attributes are for "behavior" or whatever | nonsense after the fact. Whoever came up with this clearly | wasn't a CompSci person. And the syntactic proliferation in | CSS completely became out of hand, for the simple reason | that HTML evolution was locked down while W3C was focussed | on XML/XHTML for over a decade, while the CSS spec process | was lenient. | Communitivity wrote: | I haven't used XML is a long while, but there was a trick I | had when I designed schemas, back when I did use XML all the | time. Use an attribute if the data is a primitive String, | number, or boolean. Break into multiple attributes if the | data is structured but has only one level and has few | children. Otherwise use an element. The three rules are | simple, but produce schemas easy to read, easy to maintain, | and easy to implement against. One code smell is if you start | winding up with tons of attributes on one element. That may | mean you should break the logical concept that element | represents into multiple concepts, have those concepts be | nested elements, each with its related attributes. | OnlyMortal wrote: | With the origins in SGML in the early 90s, there were some | basic editors for manual creation. | | I suspect the popularity was due to the sax parser and | "interop" between C++ and Java. | | To me coming from ObjC++, json is just a serialised | dictionary. | heresie-dabord wrote: | Corollary: The number (N) of ad hoc support tools needed to do | any serious work with a given mark-up language is proportional | to the naivety of the implementation (Y). | baz00 wrote: | I like this one a lot. | crabmusket wrote: | I see this take often and I think it's pretty bad. JSON (data | format) and XML (markup format) are very different. Building | tools for JSON doesn't change that in any way. | | And it turns out that both JSON and XML are used for data | interchange, and when people have data interchange problems, | they build tooling to help solve those problems (like schema | validation). That doesn't make JSON "like XML", it just means | they're discovering the same problem and solving it for the | format they're using. | deepakarora3 wrote: | Nice work! I see that that this is for processing / parsing large | data sets and where documents do not conform to a fixed structure | and for Go language. | | I made something similar in Java - unify-jdocs - | https://github.com/americanexpress/unify-jdocs - though this is | not for parsing - it is more for reading and writing when the | structure of the document is known - read and write any JSONPath | in one line of code and use model documents to define the | structure of the data document (instead of using JSONSchema which | I found very unwieldy to use) - no POJOs or model classes - along | with many other features. Posting here as the topic is relevant | and it may help people in the Java world. We have used it | intensively within Amex for a very large complex project and it | has worked great for us. | latchkey wrote: | We all know the builtin golang JSON parser is slow. | | How about doing comparisons against other implementations? | | Like this one: https://github.com/json-iterator/go | | Update: found this outdated repo: | https://github.com/ohler55/compare-go-json | pstuart wrote: | Slightly tangential, but Go's JSON handling has long had room for | improvement and it looks like there's going to be a serious | overhaul of its capabilities and implementation: | https://github.com/golang/go/discussions/63397 -- I'm looking | forward to seeing this land. ___________________________________________________________________ (page generated 2023-10-12 21:01 UTC)