[HN Gopher] A Journey building a fast JSON parser and full JSONPath
       ___________________________________________________________________
        
       A Journey building a fast JSON parser and full JSONPath
        
       Author : atomicnature
       Score  : 105 points
       Date   : 2023-10-12 06:36 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tomthe wrote:
       | I like the "Simple Encoding Notation" (SEN) of the underlying
       | library: https://github.com/ohler55/ojg/blob/develop/sen.md
       | 
       | " A valid example of a SEN document is:
       | 
       | { one: 1 two: 2 array: [a b c] yes: true } "
        
         | koito17 wrote:
         | An interesting observation: if you move the colon on the
         | opposite side then you get valid EDN data!
         | 
         | {:one 1 :two 2 :array [a b c] :yes true}
         | 
         | cf. https://github.com/edn-format/edn
         | 
         | Likewise, commas are considered whitespace. They are sometimes
         | added to make lengthy maps easier to read.
        
         | kubanczyk wrote:
         | > Which is the same as the following JSON:                 {
         | "one": 1,         "two": 2,         "array": ["a", "b", "c"],
         | "yes": true       }
         | 
         | That example also caught my attention, but in a bad way. It
         | looks just like a comeback of one of the worst ideas of YAML.
         | 
         | My immediate question would be what's the JSON for this SEN
         | I've crafted:                 {         array: [string1 string2
         | "true" true True TRUE yes y]       }
         | 
         | For more fun, there's a single problematic entry here, can you
         | spot it?:                 1.20.4       1.204.4       1.20
         | 1.204       1.20.0       1.20.00       1.20-rc2
         | 
         | Or, level expert, there's exactly one problem here as well:
         | 0a1f       0bfd       0c0c       0d01       0e02
        
           | tomthe wrote:
           | Thank you for thinking more deeply about this than I did! But
           | I do not see a problem in your first example, only true is
           | the true true (according to my browser and the linked
           | definition on https://www.json.org)
           | 
           | I don't get your other examples, can you explain? I assumed
           | that 1.20.4 is not a valid SEN entry, because it starts with
           | a digit but is not a number.
        
           | jarym wrote:
           | I'm not following: ` { array: [string1 string2 "true" true
           | True TRUE yes y] } ` Doesn't look like a valid SEN or JSON.
           | The `y` `yes`, `True`, TRUE` aren't valid
           | keywords/variables/consts and `string1` and `string2` look
           | like variable references which aren't something SEN or JSON
           | support. The closest valid thing I can imagine is:
           | 
           | ` { array: ["string1" "string2" "true" true "True" "TRUE"
           | "yes" "y"] } `
        
             | mjpa86 wrote:
             | aren't they implied strings? If "[a b c]" is an array of 3
             | strings, "a", "b" and "c", then True is a string "True".
             | That's the problem.
        
               | jarym wrote:
               | I must be missing why you think they're implied strings -
               | I don't see that in the spec. What I do see is:
               | 
               | "Strings can also be delimited with a single quote
               | character which allows for a string to be either "abc" or
               | 'abc'."
               | 
               | There's no mention of having a string without a
               | delimeter.
        
               | ReleaseCandidat wrote:
               | The example below is this:
               | 
               | > array: [a b c]
        
               | jarym wrote:
               | ohhh I see it now, that looks like a recipe for...
               | issues.
        
           | pjc50 wrote:
           | Let me guess: 0e02 is interpreted as floating point?
        
             | k_process wrote:
             | Ditto 1.20, and when interpreted as floating point the
             | trailing zero loses significance. So as a version this is
             | indistinguishable from 1.2
        
         | lazyasciiart wrote:
         | Am I missing something about the definition of "tokenStart"? It
         | can be 'letter' or three other characters: but all those other
         | characters (and more) are already in the definition of
         | 'letter'?
        
         | pjc50 wrote:
         | See the comment upthread about S-expressions, but .. given that
         | this doesn't have a marker for "atom" which it badly needs,
         | isn't it strictly worse than S-expressions.
        
       | ithkuil wrote:
       | reminder of recent efforts at standardizing JSONPath:
       | https://datatracker.ietf.org/wg/jsonpath/about/
        
       | baz00 wrote:
       | Is JSON XML yet? Nearly!
       | 
       | I'm going to invent Baz's 11th law of computing here: any data
       | format that isn't XML will evolve into a badly specified version
       | of XML over time.
        
         | kevingadd wrote:
         | With respect for the pain everyone has suffered through due to
         | XML... at this point I prefer XML with a good schema to JSON
         | any day, even if it's more verbose and more awkward to hand-
         | edit. It's just so much easier to validate it or generate code
         | to handle it, and you get things like XSLT or XPath if you want
         | them.
        
           | Deukhoofd wrote:
           | I mean, you can use JSON Schema as well to have similar
           | functionality to XML Schema.
        
             | znpy wrote:
             | That's exactly the point being made: json is becoming xml.
        
               | tgv wrote:
               | The point also feels like passive-aggressively ignoring
               | the reason why people use JSON and not XML.
        
               | w23j wrote:
               | Can you name some of these reasons? Or give me link?
               | Honest question!
        
               | alpaca128 wrote:
               | One reason would be massively reduced syntax overhead and
               | better readability. I've seen plenty of XML files where
               | XML syntax makes up more than 50% of the file's content,
               | and trying to read the actual content is tedious. Now
               | JSON isn't ideal either - technically you could get rid
               | of all commas, colons, and the quotes around most keys -
               | but I sure prefer `{"foo": "some \"stuff\""}` over
               | something like `<foo><![CDATA[some <stuff>]]></foo>`
        
               | w23j wrote:
               | I agree, I would prefer JSON (or YAML) for example for
               | configuration files. That is for stuff that humans
               | actually read. I was thinking about using JSON/XML as a
               | data exchange format between computers, because the
               | context of this discussion has revolved about things like
               | JSON/XML-Schema, JSON/XPath and SOAP/OpenAPI. There is a
               | large trend to replace XML with JSON as data format for
               | inter machine communication, and it is confusing to me.
        
               | tgv wrote:
               | XML is too unwieldy for human consumption. Editing it is
               | error-prone, and those schema-directed editors are even
               | worse, because everything requires clicking and clicking
               | and clicking.
               | 
               | For machine-to-machine communication, it's very well
               | suited, but most data is simple enough, and the XML
               | libraries I've used tended to be --let's say-- over-
               | engineered, while there are no hoops to jump through when
               | you want to parse JSON.
               | 
               | And one thing I always disliked about XML was the CDATA
               | section: it makes the message even harder to read, and
               | it's not like you're going to use that binary data
               | unparsed/unchecked.
               | 
               | XML just tried to formalize data transfer and description
               | prematurely, which made it rigid and not even
               | sufficiently powerful. I must say that XSLT and XPath
               | were great additions, though.
        
               | eviks wrote:
               | It's unreadable
        
               | Devasta wrote:
               | Honestly for a lot of people they use JSON because thats
               | what they have always used; XMLs heyday was like 15 years
               | ago, you could be a very senior engineer now and have
               | never touched XML.
        
             | w23j wrote:
             | I haven't looked at JSON Schema in detail so please correct
             | me if I am wrong, but I had the impression that the JSON
             | Schema specification is still largely unfinished and
             | evolving. That means you need to know which version the
             | tool you use supports. And when I was looking for JSON
             | Schema validators for Java all I found were projects on
             | GitHub, which often were abandoned and referred the user to
             | another GitHub project which was also abandoned. There does
             | not seem to be support from an established project or
             | vendor.
             | 
             | Compare that to XML where we have a plethora of established
             | tools (Woodstoxx, JAXB, etc.).
             | 
             | What I have trouble to understand, which everybody else
             | just seems to accept as obvious, is why one would take on
             | these problems? Is JSON Schema more powerful than XML
             | Schema? Does the use of JSON have advantages over using
             | XML? When we are talking about a client program calling a
             | server API with JSON/XML, why do we care about the format
             | of data exchanged? What advantages does JSON have in this
             | case in contrast to XML (or for that matter a binary format
             | like Protocol Buffers)? Isn't this the most boring part of
             | the application, which you would want to just get out of
             | the way and work? What are the advantages of JSON over XML
             | that would lead me to deal with the problems of evolving
             | specifications and unreliable tooling?
             | 
             | (And just to repeat, since everybody seems to have a
             | different opinion about this than me, I must be missing
             | something and really would like to learn what!)
        
               | pydry wrote:
               | All schema languages are a bit like that. You can almost
               | always add another layer on top of the validation and
               | screw down the validation a bit harder. The strictest
               | validation will only be achievable using a turing
               | complete language.
               | 
               | OpenAPI is probably used a bit more than json schema, but
               | it's contextually limited to APIs (which, to be fair, is
               | mostly what JSON is used for).
        
               | w23j wrote:
               | I probably phrased my question poorly. Why would I use a
               | tool which is not or poorly maintained for a probably
               | already outdated version of a specs, when I can use
               | something else, that has been used for years by countless
               | companies in productions? The advantages must be huge.
               | And I don't know what they are.
               | 
               | OpenAPI is another example. There are threads on hacker
               | news about generating code from OpenAPI specs. These
               | always seem to say "oh, yes don't use tool X, use tool Y
               | it does not have that problem, although it also doesn't
               | support Z". The consensus seems to be to not generate
               | code from an OpenAPI specification but to just use it as
               | documentation, since all generators are more or less
               | broken. Contrast that with for example JAXB (which is not
               | an exact replacement I know), which has been battle
               | tested for years.
        
               | pydry wrote:
               | I've used jsonschema and it was fine. I didn't think it
               | was poorly maintained. By contrast with most XML
               | libraries I've used had a myriad of broken edge cases and
               | security vulnerabilities brought on by its
               | overcomplication and the maintainers' inability to keep
               | up.
               | 
               | >The consensus seems to be to not generate code from an
               | OpenAPI specification but to just use it as
               | documentation, since all generators are more or less
               | broken.
               | 
               | OpenAPI still functions just fine as a means of
               | documentation and validation.
               | 
               | I'm allergic to all forms of code generation, to be
               | honest. If there is an equivalent of XML in this I
               | imagine it's even more horrendous. I can just imagine
               | chasing down compiler errors indirectly caused by an XML
               | switch not set _shudder_.
               | 
               | >Contrast that with for example JAXB
               | 
               | JAXB looks like a bolt on to work around XML's
               | deficiencies. There's no need to marshal JSON to special
               | funky data structures in your code because lists and
               | hashmaps are already built in. You can just use those. An
               | equivalent doesn't need to exist.
               | 
               | For schema validation, I think XML has, what, 3 ways of
               | doing it? DTDs? XMLSchema? And now JAXB does a bit of
               | that on the side too? Does that sound like a healthy
               | ecosystem to you? Because it sounds like absolute dogshit
               | to me.
        
               | Deukhoofd wrote:
               | > I'm allergic to all forms of code generation, to be
               | honest. If there is an equivalent of XML in this I
               | imagine it's even more horrendous. I can just imagine
               | chasing down compiler errors indirectly caused by an XML
               | switch not set shudder.
               | 
               | WSDL comes to mind
        
               | w23j wrote:
               | I see. Thanks for taking the time to reply!
        
               | Deukhoofd wrote:
               | > That means you need to know which version the tool you
               | use supports
               | 
               | Honestly the same issue with versioning has been my
               | primary issue with XML Schemas in the past. XSD 1.1 for
               | example came out over a decade ago, but is still very
               | badly supported in most tooling I tried out.
               | 
               | > When we are talking about a client program calling a
               | server API with JSON/XML, why do we care about the format
               | of data exchanged?
               | 
               | We shouldn't care much, beyond debuggability (can a
               | developer easily see what's going on), (de)serialization
               | speed, and bandwith use. JSON and protobuf tend to be a
               | decent chunk smaller than XML, JSON is a bit easier to
               | read, and Protobuf is faster to (de)serialize. This means
               | they should generally be preferred.
               | 
               | In the case of a client program calling a server API I'd
               | personally have the server do the required validation on
               | a deserialized object, instead of doing so through a
               | schema. This is generally easier to work on for all
               | developers in my team, and gets around all the issues
               | with tooling. The only real reason I use schemas is when
               | I'm writing a file by hand, and want autocompletion and
               | basic validations. In that case versioning and tooling
               | issues are completely in my control.
        
         | Traubenfuchs wrote:
         | As someone who greatly enjoyed the rigidity of SOAP/xml, which
         | made proper architectural planning and careful deprecation
         | mandatory, I wonder where we went so wrong. I feel like it's
         | all connected to the impreciseness and typelessness of
         | JavaScript. SOAP/xml to generate well defined client and server
         | entry points in Java is how things should be done and SoapUI
         | was a pleasure to use.
        
           | Devasta wrote:
           | Honestly, I think a big reason is that Stack Overflow didn't
           | exist at XMLs peak, so you had people generating XML by
           | concatenation, to predictably disastrous results.
           | 
           | One of the first XSLT transforms I was ever given to maintain
           | generated XML by the same method.
           | <xsl:text>&lt;PRICE&gt;</xsl:text><xsl:value-of
           | select="PRICE"/><xsl:text>&lt;/PRICE&gt;</xsl:text> and so
           | on.
        
           | pjc50 wrote:
           | > made proper architectural planning and careful deprecation
           | mandatory
           | 
           | That's why it never caught on.
           | 
           | The ability of JSON/Javascript to tape together kinda-working
           | solutions _before and instead of_ any kind of specification
           | works is hugely powerful, because it allows iterating on the
           | requirements by having actual users use the app.
        
             | touisteur wrote:
             | I mean I've always found this enlightening, when hearing
             | json is 'simple':
             | https://seriot.ch/projects/parsing_json.html
        
           | aidos wrote:
           | The S stands for Simple
           | 
           | http://harmful.cat-v.org/software/xml/soap/simple
        
             | PhilipRoman wrote:
             | Thanks for sharing this, somehow I missed this while
             | reading cat-v. Definitely applicable to a couple of other
             | technologies too...
        
             | another2another wrote:
             | Oh that was a good read.
             | 
             | I lived through all that, and can totally understand why
             | people turned away in disgust and agreed on REST instead.
        
           | usrusr wrote:
           | In my experience SOAP was near-universally used as an RPC
           | encoding, where the schema was whatever types the exposed API
           | defined and no-one gave the tiniest anything about the data
           | representation on the wire. If you insisted on schema first
           | SOAP, people looked at you as if you had fallen through a
           | dimensional gate from an alternative history parallel
           | universe full of Zeppelins and domesticated dinosaurs. JSON
           | on the other hand came riding on that REST wave, where the
           | data models on the wire were given more consideration than
           | just an outcome of the serializion process best never looked
           | at. Some people even considered idempotency more than just a
           | funny sequence of letters. No, I'm not surprised at all the
           | SOAP mindset disappeared. (But SoapUI was really a pleasure
           | to use, spent an ungodly amount of hours staring at that
           | thing, never in anger)
        
         | nine_k wrote:
         | I'd say there must exist a more ancient law, stating that a
         | representation of s-expressions is reinvented whenever a need
         | arises for a generic data format.
         | 
         | S-expressions are the most direct representation of a tree:
         | (root node node ...). Trees are everywhere, they represent any
         | nested structure; lists are logically a subset of trees.
         | 
         | XML is a tree. It has the weird "attribute" node types, a
         | legacy of SGML text markup notation. JSON is a tree, obviously.
         | So is protobuf, thrift, etc. They all could be serialized as
         | s-expressions.
         | 
         | Now, a schema that destined a tree is also a tree. Hence XML
         | Schema, JSONSchema, etc.
         | 
         | More, an abstract program that describes a transformation of a
         | tree is also a tree; this products homoiconic languages, from
         | XSLT to Lisps.
         | 
         | There is nothing special about XML; it's just a particular case
         | of a generic law.
        
           | baz00 wrote:
           | Completely agree on all points. But there is something
           | special about XML: everyone has failed to make something
           | better.
        
             | alpaca128 wrote:
             | If you said nothing better became an industry standard I
             | could see your point, but how exactly is XML better than
             | s-expressions? Or, if you want something less generalized,
             | KDL (which is roughly XML with 90% less syntax overhead)?
        
               | baz00 wrote:
               | XML has superset defined functionality of standardised
               | schemas, transformations and query. The same is not true
               | for s-expressions.
               | 
               | I've not looked at KDL before but a quick scan suggests
               | it's interesting. I will look into it.
        
               | jerf wrote:
               | XML has a lot more defined structure than s-expressions.
               | S-expressions make cute demos when people just take some
               | chunk of data and blast out a conversion to drop into the
               | conversation and hold it up as a standard, but it's not a
               | fair comparison to take something actually defined and
               | then splat out an undefined ad-hoc format in the spur of
               | the moment. Of course the latter looks awesome by
               | comparison; the example was literally structured to look
               | awesome in this exact context.
               | 
               | When you read the s-expression alternatives proposed to
               | XML with an eye to "How would I actually code against
               | this? How would I actually convince multiple people to
               | use the _exact_ same standard as me? How do I support
               | _all_ the use cases of interest to me? " they completely
               | fall apart. They're _too_ simple. The very fact I have to
               | use the plural for _s-expression alternative_ since no
               | two of them are every _quite_ the same says quite a bit.
               | 
               | When you need that structure, XML is actually a very good
               | choice; the error people made was using it when they
               | didn't need that structure. Note how much of the
               | complaint about using XML, even in this very
               | conversation, is (quite correctly!) "what do I do with
               | all these extra structural elements?" If you don't have a
               | clear answer to that, don't use XML. If you do, don't jam
               | it into s-exprs or JSON either, you end up with an even
               | worse mess.
        
               | alpaca128 wrote:
               | > How would I actually convince multiple people to use
               | the exact same standard as me?
               | 
               | The same way you agree on an XML schema? I don't know if
               | I quite understand what you want to say - as I see it
               | both are tree structured formats which means they both
               | can represent the same information, just that
               | s-expressions are less verbose but XML has more existing
               | tooling for defining & validating a structure. Though the
               | latter is more an aspect of the ecosystem than the format
               | itself.
        
             | dragonwriter wrote:
             | > Completely agree on all points. But there is something
             | special about XML: everyone has failed to make something
             | better.
             | 
             | XML's decline from its peak of adoption mean lots of people
             | working with data disagree with you.
        
           | hardware2win wrote:
           | Why focus on s expr then?
           | 
           | Every data format will eventually evolve into a tree
        
             | nine_k wrote:
             | S-exprs are just the simplest.
        
           | cxr wrote:
           | > I'd say there must exist a more ancient law, stating that a
           | representation of s-expressions is reinvented whenever a need
           | arises for a generic data format.
           | 
           | That more ancient law would Greenspun's tenth rule, FYI--or a
           | corollary to it, at least.
           | 
           | The law proposed here (as Baz's 11th law) was intended to be
           | a humorous and obvious pastiche crafted with Greenspun's quip
           | in mind, with the idea being that the reader would be in on
           | the joke (being already familiar with it).
           | 
           | 1. <https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule>
        
         | tannhaeuser wrote:
         | JSON can be parsed using SGML [1], by instructing SGML to
         | interpret JSON tokens such as colons, quotation marks, and
         | curly braces as markup. The underlying technique for custom
         | lightweight markup is called SHORTREF and can be applied to
         | markdown etc. as well.
         | 
         | So considering XML is subsetted from SGML, I guess the answer
         | is closer to yes than thought.
         | 
         | Though probably it's worth citing the following quote from that
         | paper:
         | 
         | > _If the sweet spot for XML and SGML is marking up "prose
         | documents", the sweet spot for JSON is collections of atomic
         | values._
         | 
         | [1]:
         | https://www.balisage.net/Proceedings/vol17/html/Walsh01/Bali...
        
           | lifthrasiir wrote:
           | > JSON can be parsed using SGML, [...]. So considering XML is
           | subsetted from SGML, I guess the answer is closer to yes than
           | thought.
           | 
           | In the other words, SGML was way too powerful than what we
           | actually needed. Of course we are with the benefit of
           | hindsight though.
        
             | tannhaeuser wrote:
             | > _SGML was way too powerful_
             | 
             | The widespread use of markdown and other lightweight markup
             | rather than rigid XML-style fully tagged markup for
             | authoring tells otherwise though. And so does the continued
             | use of HTML chock full of SGMLisms such as tag inference
             | and attribute shortforms that weren't included in the XML
             | subset/profile when XML (XHTML) was created to replace
             | HTML.
             | 
             | So while XML isn't used as an authoring format on the web
             | (nor as delivery format), it's still useful as canonical
             | archival format I guess.
        
               | lifthrasiir wrote:
               | SGML is a meta-language unlike every other example in
               | your reply, so the prevalence of such semi-structured
               | languages (including SGML applications) doesn't justify
               | SGML itself. Even HTML is not exactly an SGML application
               | (except for HTML 4), and in my knowledge implementing
               | HTML with a generic SGML implementation was rarely done.
               | So the fact that SGML is a near superset of both JSON and
               | XML doesn't mean much.
        
         | dgellow wrote:
         | XML has other abominations such as XSLT.
        
           | baz00 wrote:
           | I'd definitely rather write XSLT than YAML festering in the
           | same pot as go-template.
        
         | strken wrote:
         | People say this, and yet XML's origins as a markup language
         | make it baffling as a data format. No sane human being should
         | choose a data format with such confusion between properties
         | that no user knows whether to go with                   <Foo>
         | <Shininess>HIGH</Shininess>           <Luck>7</Luck>
         | </Foo>
         | 
         | or                   <Foo shininess="HIGH" luck="7" />
         | 
         | and yet countless thousands decided to do just that, for
         | reasons that are totally inexplicable to me.
         | 
         | Obviously as a markup language this is fine; as a _data format_
         | it 's bizarre, since the division between attribute vs child
         | doesn't match most in-memory data structures.
        
           | pydry wrote:
           | Yeah, it's a weird attitude. XML died out because it was an
           | overcomplicated design-by-committee mess. Quite apart from
           | the fact that meant it wouldn't map cleanly to lists and
           | hashmaps, necessitating a query language it also led to
           | embarrassing debacles like the billion laughs vulnerability -
           | problem in the very core of XML.
           | 
           | With some niche exceptions where it has clung on, XML
           | basically died. It's time to move on. The fact that we do
           | similar sorts of stuff with JSON like data transformations
           | and schema validation does not, in any way, shape or form,
           | invalidate its flaws.
        
             | baz00 wrote:
             | XML is fine.
             | 
             | The overcomplicated mess was the WS-* garbage.
        
           | smikhanov wrote:
           | no user knows
           | 
           | The described problem literally doesn't exist in XML. Your
           | XML-validating editor will check your document against the
           | schema and will not allow for an attribute where the sub-
           | element is required and vice versa.
        
             | tyingq wrote:
             | I believe they mean for designing the schema in the first
             | place. Meaning the impedance match between JSON and their
             | chosen language is usually more natural.
        
           | tannhaeuser wrote:
           | I'm not disagreeing but the reason XML was used as data
           | format is that it has native support in browsers (remember
           | XML was created as a simplified SGML subset for eventually
           | replacing HTML), the idea being that you can display service
           | payloads via simple stylesheet applications or element
           | replacement/decoration rather than having to rely on
           | JavaScript or other Turing-complete environment for arbitrary
           | scripting which was seen as having no place as a central
           | technique in classic document-oriented browsing.
           | 
           | JSON became only popular because of similar opportunistic
           | effects (ie being already part of the stack via eval()). If
           | you look at how typical non-JS backends such as Java or .net
           | deal with service request/response data, there's absolutely
           | no advantage for either JSON or XML - both are represented as
           | class/structure and (de-)serialized via binding frameworks
           | and annotations.
        
             | strken wrote:
             | There's no particular machine advantage to any human-
             | readable format over an equivalent binary format, sure.
             | However, if you look at human-"readable" formats that
             | predate XML (like HL7[0]) you can appreciate the advantages
             | of a tree-like structure with labelled fields when it comes
             | to human comprehension. I think XML is often difficult for
             | humans to read, and certainly to write, and since this is
             | the only reason to use either language it's an important
             | factor.
             | 
             | I guess you could argue we should all use Protocol Buffers,
             | pickle, Thrift, etc.[1] and only switch to JSON for
             | debugging. I wouldn't disagree. Protobuf is apparently
             | faster than JSON in the browser.
             | 
             | [0] See https://www.interfaceware.com/hl7-message-structure
             | for an example message
             | 
             | [1] I missed Corba and spent the early years of my
             | professional life trying not to touch the SOAP, just in
             | case I dropped it
        
             | nrclark wrote:
             | JSON does have one advantage over XML: it maps cleanly onto
             | primitive types in Python and many other languages. XML
             | attributes don't really have an unambiguous way to be
             | represented using list and map primitives (other than maybe
             | an "everything is a map" model, which sucks from a
             | usability perspective).
        
               | aforwardslash wrote:
               | I beg to differ. JSON only provides a subset of commonly
               | available data types (quick example: show me a proper 64
               | bit int, a proper date type or a proper money type). And
               | "everything is a map" is pretty much how python works,
               | but they prefer to call it dicts. I could go on and
               | explain how JSON is evolving to have exactly all the
               | problems of xml without any of the advantages, and how
               | people keep reinventing the wheel (pun intended for
               | python fans) ignoring why xml is the way it is (and it is
               | quite more robust than anything json). Xml biggest defect
               | was verbosity, specially in a http 1.0 context. With http
               | 1.1 (so nowadays, legacy tech) , most of these problems
               | disappear. I know, parsing of json is quite simple - the
               | reason is the format is lacking.
        
           | the8472 wrote:
           | > since the division between attribute vs child doesn't match
           | most in-memory data structures.
           | 
           | vtables are attributes for pointers. hypergraphs (as used in
           | some tagging systems) have attributes on everything,
           | including attributes. CBOR has optional type-tags on its
           | items.
        
           | baz00 wrote:
           | Actually you should never use attributes in XML at all to
           | represent data. Your first example is correct.
           | 
           | Everyone is just confused because people who didn't know this
           | designed HTML. But also everyone is confused because HTML and
           | XML aren't necessarily related other than some parentage in
           | SGML.
        
             | tannhaeuser wrote:
             | Nope. In markup, _attributes_ are for  "metadata", that is,
             | anything not rendered to the reader/user, as opposed to
             | (element) _content_. The entire purpose of markup is to
             | provide a rich text format via decorating plain text usable
             | from any text editor. Data exchange, or any other
             | application where there is no concept of  "rendering to the
             | user", is no primary application for markup.
             | 
             | If anything, what's wrong with HTML in this respect is that
             | JavaScript and CSS can be put inline into content when
             | these should always go into attributes and/or external
             | resources linked via src/href attributes. And this flaw
             | shows indeed where HTML deviates from SGML proper: when the
             | style and script elements were introduced, their "content"
             | needed to be put into SGML comment tags <!-- and --> such
             | that browsers wouldn't render JavaScript snd CSS as text
             | content. I mean, who came up with this brain-dead design?
             | 
             | But CSS is a lost cause anyway. What does it tell you about
             | its designers that they thought, starting with a markup
             | language having pretty intense syntactic constructs
             | already, to tunnel _yet another_ item=value syntax in
             | regular markup attributes? Like replacing  <h2
             | bgcolor=black> by <h2 style="background-color: black"> and
             | then claiming attributes are for "behavior" or whatever
             | nonsense after the fact. Whoever came up with this clearly
             | wasn't a CompSci person. And the syntactic proliferation in
             | CSS completely became out of hand, for the simple reason
             | that HTML evolution was locked down while W3C was focussed
             | on XML/XHTML for over a decade, while the CSS spec process
             | was lenient.
        
           | Communitivity wrote:
           | I haven't used XML is a long while, but there was a trick I
           | had when I designed schemas, back when I did use XML all the
           | time. Use an attribute if the data is a primitive String,
           | number, or boolean. Break into multiple attributes if the
           | data is structured but has only one level and has few
           | children. Otherwise use an element. The three rules are
           | simple, but produce schemas easy to read, easy to maintain,
           | and easy to implement against. One code smell is if you start
           | winding up with tons of attributes on one element. That may
           | mean you should break the logical concept that element
           | represents into multiple concepts, have those concepts be
           | nested elements, each with its related attributes.
        
           | OnlyMortal wrote:
           | With the origins in SGML in the early 90s, there were some
           | basic editors for manual creation.
           | 
           | I suspect the popularity was due to the sax parser and
           | "interop" between C++ and Java.
           | 
           | To me coming from ObjC++, json is just a serialised
           | dictionary.
        
         | heresie-dabord wrote:
         | Corollary: The number (N) of ad hoc support tools needed to do
         | any serious work with a given mark-up language is proportional
         | to the naivety of the implementation (Y).
        
           | baz00 wrote:
           | I like this one a lot.
        
         | crabmusket wrote:
         | I see this take often and I think it's pretty bad. JSON (data
         | format) and XML (markup format) are very different. Building
         | tools for JSON doesn't change that in any way.
         | 
         | And it turns out that both JSON and XML are used for data
         | interchange, and when people have data interchange problems,
         | they build tooling to help solve those problems (like schema
         | validation). That doesn't make JSON "like XML", it just means
         | they're discovering the same problem and solving it for the
         | format they're using.
        
       | deepakarora3 wrote:
       | Nice work! I see that that this is for processing / parsing large
       | data sets and where documents do not conform to a fixed structure
       | and for Go language.
       | 
       | I made something similar in Java - unify-jdocs -
       | https://github.com/americanexpress/unify-jdocs - though this is
       | not for parsing - it is more for reading and writing when the
       | structure of the document is known - read and write any JSONPath
       | in one line of code and use model documents to define the
       | structure of the data document (instead of using JSONSchema which
       | I found very unwieldy to use) - no POJOs or model classes - along
       | with many other features. Posting here as the topic is relevant
       | and it may help people in the Java world. We have used it
       | intensively within Amex for a very large complex project and it
       | has worked great for us.
        
       | latchkey wrote:
       | We all know the builtin golang JSON parser is slow.
       | 
       | How about doing comparisons against other implementations?
       | 
       | Like this one: https://github.com/json-iterator/go
       | 
       | Update: found this outdated repo:
       | https://github.com/ohler55/compare-go-json
        
       | pstuart wrote:
       | Slightly tangential, but Go's JSON handling has long had room for
       | improvement and it looks like there's going to be a serious
       | overhaul of its capabilities and implementation:
       | https://github.com/golang/go/discussions/63397 -- I'm looking
       | forward to seeing this land.
        
       ___________________________________________________________________
       (page generated 2023-10-12 21:01 UTC)