[HN Gopher] Goal: Pass all 4259065 tests in sqllogictest in 1 week
       ___________________________________________________________________
        
       Goal: Pass all 4259065 tests in sqllogictest in 1 week
        
       Author : luu
       Score  : 158 points
       Date   : 2022-10-03 16:56 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | iLoveOncall wrote:
       | No dependencies but the very first step relies on a dependency?
       | It's not because you copy-paste it that it's not a dependency...
        
         | eatonphil wrote:
         | He also used a compiler, and a standard library, and a
         | computer.
        
           | iLoveOncall wrote:
           | There's a bit of a difference between taking away the only
           | truly hard part of the project and using the standard
           | library.
        
             | eatonphil wrote:
             | The SQL grammar is already defined.
             | 
             | He didn't invent SQL. He had to take the grammar from
             | somewhere.
             | 
             | You wanted him to handwrite it copied from the ANSI SQL
             | paper?
             | 
             | Or to just think up all the possible grammars and ignore
             | the real paper?
             | 
             | How would that be different or better than this?
        
         | lelandbatey wrote:
         | The first dependency is a "dependency", but in the same way
         | that an RFC you have to read with your eyes is a dependency.
         | Except in this case, the dependency is a definition, in formal
         | language (extended-backus-naur form, a.k.a. "e-bnf"), of the
         | SQL language, as defined in ISO/IEC 9075:2016[0] (the 2016
         | version of the SQL language specification). They then use this
         | language definition as the input to a program which generates
         | parsers (called a parser-generator[1]), so that they can
         | quickly get to where they have a library which they can use to
         | parse/validate/inspect the SQL being given to them.
         | 
         | Copying the BFN "from an external source" is a smart move,
         | since it means they don't have to do lots of busy work slowly
         | reading and transcribing the specification; someone's already
         | done that step, so why would anyone expect the author to waste
         | time?
         | 
         | Using a parser generator is also a smart move since they exist
         | already and are used all over the place (nobody hand-writes
         | parsers for large languages; that's just a needless source of
         | tedium and bugs). The code that's spit out of the parser
         | generator is novel; that's newly created code which isn't taken
         | from someone else's Github/other repo.
         | 
         | Ultimately, I don't see how any of what the author's done
         | constitutes "[relying] on a dependency" given that they're not
         | using anyone elses Zig source code in their compiled binary,
         | they're writing lots of code for themselves to use, just very
         | quickly, and with powerful tools.
         | 
         | [0] - https://en.wikipedia.org/wiki/SQL:2016
         | 
         | [1] - https://en.wikipedia.org/wiki/Compiler-compiler
        
           | eatonphil wrote:
           | > nobody hand-writes parsers for large languages; that's just
           | a needless source of tedium and bugs
           | 
           | That is absolutely not true! In fact, most major programming
           | language implementations use handwritten parsers [0].
           | 
           | That said:
           | 
           | > Using a parser generator is also a smart move since they
           | exist already
           | 
           | Jamie _wrote_ the parser generator here too. So it 's all the
           | more "from scratch".
           | 
           | [0] https://notes.eatonphil.com/parser-generators-vs-
           | handwritten...
        
       | spullara wrote:
       | So does anyone else agree that there are actually 10x (or more)
       | developers? This is a pretty good example of how one works.
        
         | sangnoir wrote:
         | 10x developers exist, but their prevalence is greatly
         | overstated. Also, churning out code at 10x doesn't make one a
         | 10x developer (this could even hamper everyone else).
         | 
         | I also propose that moniker be banned from being self-applied,
         | and is in fact, a smell test: if you encounter a colleague
         | calling themselves a 10x developer, start interviewing
         | immediately. No good will come out of that.
        
           | Jochim wrote:
           | My last boss was a 10x developer. Fantastically smart guy,
           | socially awkward but nice, really looked out for his team but
           | an absolute nightmare to work with technically.
           | 
           | The majority of the team couldn't understand his code. We'd
           | have newly hired senior developers just leave rather than
           | deal with it.
           | 
           | He'd rolled his own code generator for our data model that
           | did everything from model generation to the web controllers.
           | 
           | The result was that while he could pump out work quickly,
           | what would've otherwise been a quick fix for a graduate
           | developer now required a deep understanding of a complex
           | system.
           | 
           | This had the effect of turning what would have otherwise been
           | a team of 1-2x developers into a team of 0.2-0.5x devs with a
           | retention problem.
        
             | tempxyz wrote:
             | How is he 10x if no one understands his code? An expert at
             | quick and dirty?
        
               | Jochim wrote:
               | wolf550e summed it up pretty much perfectly.
               | 
               | There was a great deal of thought put into it and he
               | could extend and modify the output really quickly.
               | 
               | The complexity of the system basically made it so that
               | what would otherwise have been a simple task achievable
               | by a graduate required a deep understanding to carry out.
        
               | wolf550e wrote:
               | Imagine that instead of developing an app in a popular
               | programming language, someone implements an idiosyncratic
               | domain specific language suitable for the kind of app
               | they need to build, and then builds the app using that.
               | The result would work and maybe even let them be very
               | productive churning out more features of the kind that
               | were envisioned when the DSL was developed. If they need
               | to extend or fix the DSL, as the original author, they
               | can. Someone else will need to learn the DSL before they
               | can do any work on the app.
        
               | [deleted]
        
             | kuroguro wrote:
             | > He'd rolled his own code generator for our data model
             | that did everything from model generation to the web
             | controllers.
             | 
             | Heh, I've actually done that... twice. Luckily it was a
             | team of 1 and I wouldn't expect anyone else to understand
             | my mess. The code generation was extra ugly since I planned
             | to get rid of it eventually to craft out smaller details.
             | It was great at doing repetitive work in bulk. Not sure if
             | it was actually faster but at least it was less boring
             | doing things that way.
        
             | sangnoir wrote:
             | As a junior, I worked with a senior who was considered
             | "10x" and working with him was a pain: he decided the rules
             | didn't apply to him, and management tolerated his repeated
             | violations of conventions instituted to make our dynamic-
             | language codebase manageable.
             | 
             | Anytime he "improved" a module, no one else could maintain
             | it as that would entail additional rule-breaking, which was
             | _verboten_ for mortals, so only he could maintain code he
             | touched. Combined with the fact that he didn 't add any
             | tests: the net result was he was slowly and surely
             | subverting the codebase into his personal, brittle domain
             | that no one else could change. He was slowing everyone else
             | done, but all management was looking at was his velocity at
             | closing bugs or rolling out new features while creating
             | tech-debt. His boastful personality was just the icing on
             | the cake.
        
         | moritonal wrote:
         | This is kind of an unhealthy attitude?
         | 
         | @jamii seems super talented, but his bio says "in the past I've
         | built database engines, query planners, compilers, developer
         | tools and interfaces for [a...] myriad [of] consulting and
         | personal research projects.", along with his repo's being
         | related to SQL parers, or literal text-editors working purely
         | on string manipulation. He is also sponsored to spent 100% of
         | his time doing exactly this.
         | 
         | What I mean that he is almost definitely a 10x dev at writing
         | SQL parsers. But ask him to write a shader that renders a neat
         | waterbed material and he'd be likely a 0.8x dev? The overlap
         | between experience and context is key.
        
           | blowski wrote:
           | I agree that I think this can be an unhealthy attitude. A lot
           | of us are working on projects where the biggest challenge is
           | convincing the Product Manager to provide more than one
           | sentence in the brief.
           | 
           | That said, I can't think of any technical domain where I
           | could do this, even if provided with all the tests up front.
        
           | vsareto wrote:
           | "10x" mostly functions as a reputation badge. It's not a
           | realistic metric for performance.
           | 
           | It's something you get from other people. There's not a good
           | test to figure out if you're 10x better than some randomly
           | picked average developer.
        
           | spullara wrote:
           | When people say they don't exist they mean that they don't
           | exist generally not just when testing someone on a new
           | domain.
        
             | bcrosby95 wrote:
             | I don't know anyone that would say that 10x developers
             | don't exist at all. It's too easy to bring up someone like
             | John Carmack or many foundational people in computer
             | science that invented algorithms us layfolk could never
             | imagine.
             | 
             | My experience with the phrase is people mean finding a
             | "diamond in the rough" who can code circles around anyone
             | else. It's not about finding a Norvig or Carmack, it's
             | about finding a fresh graduate that you can stick on a
             | problem and they will be bountifully productive.
             | 
             | It's basically a manager's wet dream: extremely productive
             | but cheap. In my experience real 10x people appear to be
             | the opposite: seemingly slow but incredibly expensive.
             | Everyone I actually consider 10x makes millions. And of
             | those that are friends, they didn't really reach that 10x
             | stage until their 30s or 40s.
        
         | xhrpost wrote:
         | Of course it's impressive but this isn't the sort of work the
         | average dev ends up doing day to day. We spend our time digging
         | into dependencies (this had zero) both internal and external,
         | interfacing with stakeholders and looking up business logic for
         | a change. All those async tasks ultimately add up to
         | significant headwinds even for the best "10x" dev.
        
         | stocknoob wrote:
         | It's amusing that people even question the existence of 10x
         | developers.
         | 
         | What fraction of devs could even complete this, let alone in
         | merely 10x the time?
        
           | viraptor wrote:
           | The fraction of devs that regularly deal with databases and
           | parsing. There are no 10x devs. There are devs with long
           | experience in a specific category. The 10x idea is kind of
           | stupid in terms of companies looking for them - it just means
           | "we want people successfully trained somewhere else".
        
             | subroutine wrote:
             | I agree there are probably no 10x devs. This person took 7
             | days to (almost) complete this task (which they cherry
             | picked for their self), suggesting the average 1x dev with
             | experience in this domain would take 70 days.
             | 
             | I think it would be more reasonable to call someone a 3
             | -sigma dev (someone 3 standard deviations above the mean.
             | These would exist because that's how stats work)
        
               | ZephyrBlu wrote:
               | I've always read 10x as an order of magnitude better, not
               | necessarily 10x faster. 3 sigma is probably better
               | terminology though.
        
               | rcxdude wrote:
               | It's worth pointing out that in the origin of the "10x
               | developer" term, it's relative to the worst performing
               | devs, not the average.
               | 
               | Also, you're not guaranteed to have an example 3 standard
               | deviations above the mean. It strongly depends on your
               | distribution and sample size.
        
               | subroutine wrote:
               | I think the prevailing current definition is wrt. the
               | average dev. The worst dev could be arbitrarily bad
               | (suggesting the average dev could also be 10x).
               | 
               | You're right about the sufficient sample size.
        
           | cowmoo728 wrote:
           | It's like questioning the existence of 10x NBA players or 10x
           | chess players. The top super GMs are basically 10x better
           | than most other GMs, who are themselves 10x better than most
           | IMs. It seems strange that programming would be one of the
           | fields that doesn't have a similar distribution of skill.
           | 
           | I think the actual pushback of the 10x programmer idea is
           | that it's more often used to bully regular programmers into
           | working longer hours, rather than actually identifying top
           | performing programmers.
        
             | sophacles wrote:
             | There's also the part where "developer" is more akin to
             | "athlete" than "nba player". There's lots of different
             | types of developer, just like there's lots of different
             | types of athlete. A 10x NBA player will certainly not be a
             | 10x Olympic Swimmer also, more likely .1X. Part of the
             | problem is 10x developer gets talked about like they are
             | going to be 10x athlete at NBA and Swimming and Golf and,
             | and, and... that's what doesn't exist.
        
       | samsquire wrote:
       | I handrolled a very very basic SQL parser in my toy database
       | hash-db
       | 
       | https://GitHub.com/samsquire/hash-db
       | 
       | It's distributed dynamodb style keyvalue, SQL and Cypher graph
       | database.
       | 
       | I feel if you want to get a project moving forward for something
       | as large as a database, you can get something rudimentary working
       | and extend the parser when you need those features.
       | 
       | SQL wise it supports Joins and where's and rudimentary full text
       | search It uses rockset's converged indexes for ease of query
       | generation.
       | 
       | If you're interested in queries then you should read this blog
       | post. https://rockset.com/blog/converged-indexing-the-secret-
       | sauce...
       | 
       | The database is partly multimodel with document storage and SQL
       | and graph Cypher querying but I am yet to get all the models to
       | be mutually queryable. The document storage is queryable by SQL
       | but graphs aren't queryable by SQL or as a document.
        
       | apetresc wrote:
       | > So parsing the bnf is kind of a mess, but I only have to parse
       | this one bnf and not bnfs in general so I just mashed in a bunch
       | of special cases.
       | 
       | Surely at that point it would've been a lot cleaner and more
       | practical to just _edit_ the one file you need to parse, to
       | remove the weird line breaks, etc., rather than building special
       | cases into your parser to work around those lines? What am I
       | missing?
        
         | ruuda wrote:
         | The input is 9139 lines long, each anomaly probably occurs
         | dozens of times.
        
         | Forge36 wrote:
         | It's possible the files can't be edited within this project. I
         | had a similar experience writing a code parsing engine.
         | Sometimes it's best documented as code debt and the rewrite can
         | be done at a later time.
        
       | kris-s wrote:
       | What a cool project, I bet they learned a ton doing this.
        
       | ok_dad wrote:
       | What's the `scc` tool I see used there?
        
         | ok_dad wrote:
         | Here's a new relevant post for anyone looking:
         | 
         | "Processing 40 TB of code from ~10M projects with a server and
         | Go for $100 (2019)"
         | 
         | <https://news.ycombinator.com/item?id=33072846>
        
         | shadycuz wrote:
         | I'm interested as well.
        
         | lifthrasiir wrote:
         | Most likely: https://github.com/boyter/scc
        
           | okasaki wrote:
           | 7 ways of installing it, but no deb or rpm. Is this the
           | wonderful future of FOSS where developers don't bother
           | working with distribution maintainers anymore?
        
             | ok_dad wrote:
             | Someday someone will make a hyper-package-managers to be
             | able to manage their packages installed via the thousands
             | of package-managers out there today. Then, several other
             | hyper-package-managers will be developed to cover the cases
             | the first didn't cover. Then comes the hyper-hyper-package-
             | managers...
        
             | duped wrote:
             | This has been the norm for awhile. No one wants to work
             | with distro maintainers because their model is incompatible
             | with how people build and distribute their software.
             | 
             | You either get a curl sh, a tarball, or a wrapper around
             | either of those that pretends to be a .deb or .rpm.
        
             | dec0dedab0de wrote:
             | Hasn't it always been rare for developers to maintain
             | distro specific packages? That's why distro's have package
             | maintainers, they also modify the layout and default
             | configurations and whatnot to be consistent with the rest
             | of the distro.
        
               | rcxdude wrote:
               | Yeah, Official debs/rpms are a thing but often completely
               | independent of any distribution's packaging efforts (and
               | they often have very different priorities).
        
             | mperham wrote:
             | I've distributed a lot of software and DEB/RPM has to be
             | the worst. I'd suggest those distros improve on their
             | developer ergonomics if they want to stay relevant. 100% of
             | my customers use Docker images these days as it is much
             | much easier to use.
        
               | okasaki wrote:
               | I guess that's for web stuff? You wouldn't distribute
               | 'cloc' and similar in a docker image.
               | 
               | One hopes.
        
               | ArchOversight wrote:
               | It's the easiest way to distribute software in a way that
               | is controlled by the author of the software and where the
               | author can reasonably control all of the dependencies
               | installed.
               | 
               | This way you are providing a one-stop shop that can
               | easily be run. I have all kinds of tools that are docker
               | containers because its simpler to not have to worry about
               | all kinds of library mismatches or locations of shared
               | libraries, and instead ship a minimal docker container
               | instead.
        
               | ok_dad wrote:
               | I will refer you to the first tool I thought to Google
               | with "docker image for X":
               | 
               | https://hub.docker.com/r/stedolan/jq
               | 
               | Yes, it's despicable.
        
               | okasaki wrote:
               | 10M+ pulls >:O
               | 
               | https://www.youtube.com/watch?v=umDr0mPuyQc
        
       | mdaniel wrote:
       | > I lost a lot of time in the morning to segfaults in the zig
       | compiler. (https://github.com/jamii/hytradboi-jam-2022#day-4)
       | 
       | I bet the zig project would be interested in the sha of the tree
       | that blows up their compiler
        
         | puffoflogic wrote:
         | I bet they wouldn't. They're well aware their complier often
         | runs code inside if(false) blocks (in certain positions) and
         | they just don't see this as important. Moving fast is more
         | important. (Where exactly they're moving to is not quite
         | clear.)
        
           | an_ko wrote:
           | > their complier often runs code inside if(false) blocks (in
           | certain positions)
           | 
           | Do you have an example? Sounds like a catastrophic edge case.
        
       | bsima wrote:
       | And I would have done it too, if it weren't for that meddling
       | parser
        
       | chubot wrote:
       | I would have liked to have learned more about how the query
       | planner and evaluator work! There was almost nothing about that?
       | Just the tests magically moving from 0% to 95%.
       | 
       | e.g. What table and value representation was used?
       | 
       | FWIW I suspect using a LALR(1) parser in Zig on the sqlite
       | grammar would have saved some time and gotten past the parsing
       | headache.
       | 
       | The sqllogictest comes directly from sqlite, so it seems like the
       | parsing problem is mostly "port from C to Zig" (which are very
       | similar metalanguages, or I guess meta- meta- languages in this
       | case :) )
       | 
       | Lemon is apparently a mini-yacc, just for sqlite's grammar, and
       | is about 7K lines of C code, with no deps:
       | https://sqlite.org/src/doc/trunk/doc/lemon.html
        
       ___________________________________________________________________
       (page generated 2022-10-03 23:00 UTC)