[HN Gopher] Goal: Pass all 4259065 tests in sqllogictest in 1 week ___________________________________________________________________ Goal: Pass all 4259065 tests in sqllogictest in 1 week Author : luu Score : 158 points Date : 2022-10-03 16:56 UTC (6 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | iLoveOncall wrote: | No dependencies but the very first step relies on a dependency? | It's not because you copy-paste it that it's not a dependency... | eatonphil wrote: | He also used a compiler, and a standard library, and a | computer. | iLoveOncall wrote: | There's a bit of a difference between taking away the only | truly hard part of the project and using the standard | library. | eatonphil wrote: | The SQL grammar is already defined. | | He didn't invent SQL. He had to take the grammar from | somewhere. | | You wanted him to handwrite it copied from the ANSI SQL | paper? | | Or to just think up all the possible grammars and ignore | the real paper? | | How would that be different or better than this? | lelandbatey wrote: | The first dependency is a "dependency", but in the same way | that an RFC you have to read with your eyes is a dependency. | Except in this case, the dependency is a definition, in formal | language (extended-backus-naur form, a.k.a. "e-bnf"), of the | SQL language, as defined in ISO/IEC 9075:2016[0] (the 2016 | version of the SQL language specification). They then use this | language definition as the input to a program which generates | parsers (called a parser-generator[1]), so that they can | quickly get to where they have a library which they can use to | parse/validate/inspect the SQL being given to them. | | Copying the BFN "from an external source" is a smart move, | since it means they don't have to do lots of busy work slowly | reading and transcribing the specification; someone's already | done that step, so why would anyone expect the author to waste | time? | | Using a parser generator is also a smart move since they exist | already and are used all over the place (nobody hand-writes | parsers for large languages; that's just a needless source of | tedium and bugs). The code that's spit out of the parser | generator is novel; that's newly created code which isn't taken | from someone else's Github/other repo. | | Ultimately, I don't see how any of what the author's done | constitutes "[relying] on a dependency" given that they're not | using anyone elses Zig source code in their compiled binary, | they're writing lots of code for themselves to use, just very | quickly, and with powerful tools. | | [0] - https://en.wikipedia.org/wiki/SQL:2016 | | [1] - https://en.wikipedia.org/wiki/Compiler-compiler | eatonphil wrote: | > nobody hand-writes parsers for large languages; that's just | a needless source of tedium and bugs | | That is absolutely not true! In fact, most major programming | language implementations use handwritten parsers [0]. | | That said: | | > Using a parser generator is also a smart move since they | exist already | | Jamie _wrote_ the parser generator here too. So it 's all the | more "from scratch". | | [0] https://notes.eatonphil.com/parser-generators-vs- | handwritten... | spullara wrote: | So does anyone else agree that there are actually 10x (or more) | developers? This is a pretty good example of how one works. | sangnoir wrote: | 10x developers exist, but their prevalence is greatly | overstated. Also, churning out code at 10x doesn't make one a | 10x developer (this could even hamper everyone else). | | I also propose that moniker be banned from being self-applied, | and is in fact, a smell test: if you encounter a colleague | calling themselves a 10x developer, start interviewing | immediately. No good will come out of that. | Jochim wrote: | My last boss was a 10x developer. Fantastically smart guy, | socially awkward but nice, really looked out for his team but | an absolute nightmare to work with technically. | | The majority of the team couldn't understand his code. We'd | have newly hired senior developers just leave rather than | deal with it. | | He'd rolled his own code generator for our data model that | did everything from model generation to the web controllers. | | The result was that while he could pump out work quickly, | what would've otherwise been a quick fix for a graduate | developer now required a deep understanding of a complex | system. | | This had the effect of turning what would have otherwise been | a team of 1-2x developers into a team of 0.2-0.5x devs with a | retention problem. | tempxyz wrote: | How is he 10x if no one understands his code? An expert at | quick and dirty? | Jochim wrote: | wolf550e summed it up pretty much perfectly. | | There was a great deal of thought put into it and he | could extend and modify the output really quickly. | | The complexity of the system basically made it so that | what would otherwise have been a simple task achievable | by a graduate required a deep understanding to carry out. | wolf550e wrote: | Imagine that instead of developing an app in a popular | programming language, someone implements an idiosyncratic | domain specific language suitable for the kind of app | they need to build, and then builds the app using that. | The result would work and maybe even let them be very | productive churning out more features of the kind that | were envisioned when the DSL was developed. If they need | to extend or fix the DSL, as the original author, they | can. Someone else will need to learn the DSL before they | can do any work on the app. | [deleted] | kuroguro wrote: | > He'd rolled his own code generator for our data model | that did everything from model generation to the web | controllers. | | Heh, I've actually done that... twice. Luckily it was a | team of 1 and I wouldn't expect anyone else to understand | my mess. The code generation was extra ugly since I planned | to get rid of it eventually to craft out smaller details. | It was great at doing repetitive work in bulk. Not sure if | it was actually faster but at least it was less boring | doing things that way. | sangnoir wrote: | As a junior, I worked with a senior who was considered | "10x" and working with him was a pain: he decided the rules | didn't apply to him, and management tolerated his repeated | violations of conventions instituted to make our dynamic- | language codebase manageable. | | Anytime he "improved" a module, no one else could maintain | it as that would entail additional rule-breaking, which was | _verboten_ for mortals, so only he could maintain code he | touched. Combined with the fact that he didn 't add any | tests: the net result was he was slowly and surely | subverting the codebase into his personal, brittle domain | that no one else could change. He was slowing everyone else | done, but all management was looking at was his velocity at | closing bugs or rolling out new features while creating | tech-debt. His boastful personality was just the icing on | the cake. | moritonal wrote: | This is kind of an unhealthy attitude? | | @jamii seems super talented, but his bio says "in the past I've | built database engines, query planners, compilers, developer | tools and interfaces for [a...] myriad [of] consulting and | personal research projects.", along with his repo's being | related to SQL parers, or literal text-editors working purely | on string manipulation. He is also sponsored to spent 100% of | his time doing exactly this. | | What I mean that he is almost definitely a 10x dev at writing | SQL parsers. But ask him to write a shader that renders a neat | waterbed material and he'd be likely a 0.8x dev? The overlap | between experience and context is key. | blowski wrote: | I agree that I think this can be an unhealthy attitude. A lot | of us are working on projects where the biggest challenge is | convincing the Product Manager to provide more than one | sentence in the brief. | | That said, I can't think of any technical domain where I | could do this, even if provided with all the tests up front. | vsareto wrote: | "10x" mostly functions as a reputation badge. It's not a | realistic metric for performance. | | It's something you get from other people. There's not a good | test to figure out if you're 10x better than some randomly | picked average developer. | spullara wrote: | When people say they don't exist they mean that they don't | exist generally not just when testing someone on a new | domain. | bcrosby95 wrote: | I don't know anyone that would say that 10x developers | don't exist at all. It's too easy to bring up someone like | John Carmack or many foundational people in computer | science that invented algorithms us layfolk could never | imagine. | | My experience with the phrase is people mean finding a | "diamond in the rough" who can code circles around anyone | else. It's not about finding a Norvig or Carmack, it's | about finding a fresh graduate that you can stick on a | problem and they will be bountifully productive. | | It's basically a manager's wet dream: extremely productive | but cheap. In my experience real 10x people appear to be | the opposite: seemingly slow but incredibly expensive. | Everyone I actually consider 10x makes millions. And of | those that are friends, they didn't really reach that 10x | stage until their 30s or 40s. | xhrpost wrote: | Of course it's impressive but this isn't the sort of work the | average dev ends up doing day to day. We spend our time digging | into dependencies (this had zero) both internal and external, | interfacing with stakeholders and looking up business logic for | a change. All those async tasks ultimately add up to | significant headwinds even for the best "10x" dev. | stocknoob wrote: | It's amusing that people even question the existence of 10x | developers. | | What fraction of devs could even complete this, let alone in | merely 10x the time? | viraptor wrote: | The fraction of devs that regularly deal with databases and | parsing. There are no 10x devs. There are devs with long | experience in a specific category. The 10x idea is kind of | stupid in terms of companies looking for them - it just means | "we want people successfully trained somewhere else". | subroutine wrote: | I agree there are probably no 10x devs. This person took 7 | days to (almost) complete this task (which they cherry | picked for their self), suggesting the average 1x dev with | experience in this domain would take 70 days. | | I think it would be more reasonable to call someone a 3 | -sigma dev (someone 3 standard deviations above the mean. | These would exist because that's how stats work) | ZephyrBlu wrote: | I've always read 10x as an order of magnitude better, not | necessarily 10x faster. 3 sigma is probably better | terminology though. | rcxdude wrote: | It's worth pointing out that in the origin of the "10x | developer" term, it's relative to the worst performing | devs, not the average. | | Also, you're not guaranteed to have an example 3 standard | deviations above the mean. It strongly depends on your | distribution and sample size. | subroutine wrote: | I think the prevailing current definition is wrt. the | average dev. The worst dev could be arbitrarily bad | (suggesting the average dev could also be 10x). | | You're right about the sufficient sample size. | cowmoo728 wrote: | It's like questioning the existence of 10x NBA players or 10x | chess players. The top super GMs are basically 10x better | than most other GMs, who are themselves 10x better than most | IMs. It seems strange that programming would be one of the | fields that doesn't have a similar distribution of skill. | | I think the actual pushback of the 10x programmer idea is | that it's more often used to bully regular programmers into | working longer hours, rather than actually identifying top | performing programmers. | sophacles wrote: | There's also the part where "developer" is more akin to | "athlete" than "nba player". There's lots of different | types of developer, just like there's lots of different | types of athlete. A 10x NBA player will certainly not be a | 10x Olympic Swimmer also, more likely .1X. Part of the | problem is 10x developer gets talked about like they are | going to be 10x athlete at NBA and Swimming and Golf and, | and, and... that's what doesn't exist. | samsquire wrote: | I handrolled a very very basic SQL parser in my toy database | hash-db | | https://GitHub.com/samsquire/hash-db | | It's distributed dynamodb style keyvalue, SQL and Cypher graph | database. | | I feel if you want to get a project moving forward for something | as large as a database, you can get something rudimentary working | and extend the parser when you need those features. | | SQL wise it supports Joins and where's and rudimentary full text | search It uses rockset's converged indexes for ease of query | generation. | | If you're interested in queries then you should read this blog | post. https://rockset.com/blog/converged-indexing-the-secret- | sauce... | | The database is partly multimodel with document storage and SQL | and graph Cypher querying but I am yet to get all the models to | be mutually queryable. The document storage is queryable by SQL | but graphs aren't queryable by SQL or as a document. | apetresc wrote: | > So parsing the bnf is kind of a mess, but I only have to parse | this one bnf and not bnfs in general so I just mashed in a bunch | of special cases. | | Surely at that point it would've been a lot cleaner and more | practical to just _edit_ the one file you need to parse, to | remove the weird line breaks, etc., rather than building special | cases into your parser to work around those lines? What am I | missing? | ruuda wrote: | The input is 9139 lines long, each anomaly probably occurs | dozens of times. | Forge36 wrote: | It's possible the files can't be edited within this project. I | had a similar experience writing a code parsing engine. | Sometimes it's best documented as code debt and the rewrite can | be done at a later time. | kris-s wrote: | What a cool project, I bet they learned a ton doing this. | ok_dad wrote: | What's the `scc` tool I see used there? | ok_dad wrote: | Here's a new relevant post for anyone looking: | | "Processing 40 TB of code from ~10M projects with a server and | Go for $100 (2019)" | | <https://news.ycombinator.com/item?id=33072846> | shadycuz wrote: | I'm interested as well. | lifthrasiir wrote: | Most likely: https://github.com/boyter/scc | okasaki wrote: | 7 ways of installing it, but no deb or rpm. Is this the | wonderful future of FOSS where developers don't bother | working with distribution maintainers anymore? | ok_dad wrote: | Someday someone will make a hyper-package-managers to be | able to manage their packages installed via the thousands | of package-managers out there today. Then, several other | hyper-package-managers will be developed to cover the cases | the first didn't cover. Then comes the hyper-hyper-package- | managers... | duped wrote: | This has been the norm for awhile. No one wants to work | with distro maintainers because their model is incompatible | with how people build and distribute their software. | | You either get a curl sh, a tarball, or a wrapper around | either of those that pretends to be a .deb or .rpm. | dec0dedab0de wrote: | Hasn't it always been rare for developers to maintain | distro specific packages? That's why distro's have package | maintainers, they also modify the layout and default | configurations and whatnot to be consistent with the rest | of the distro. | rcxdude wrote: | Yeah, Official debs/rpms are a thing but often completely | independent of any distribution's packaging efforts (and | they often have very different priorities). | mperham wrote: | I've distributed a lot of software and DEB/RPM has to be | the worst. I'd suggest those distros improve on their | developer ergonomics if they want to stay relevant. 100% of | my customers use Docker images these days as it is much | much easier to use. | okasaki wrote: | I guess that's for web stuff? You wouldn't distribute | 'cloc' and similar in a docker image. | | One hopes. | ArchOversight wrote: | It's the easiest way to distribute software in a way that | is controlled by the author of the software and where the | author can reasonably control all of the dependencies | installed. | | This way you are providing a one-stop shop that can | easily be run. I have all kinds of tools that are docker | containers because its simpler to not have to worry about | all kinds of library mismatches or locations of shared | libraries, and instead ship a minimal docker container | instead. | ok_dad wrote: | I will refer you to the first tool I thought to Google | with "docker image for X": | | https://hub.docker.com/r/stedolan/jq | | Yes, it's despicable. | okasaki wrote: | 10M+ pulls >:O | | https://www.youtube.com/watch?v=umDr0mPuyQc | mdaniel wrote: | > I lost a lot of time in the morning to segfaults in the zig | compiler. (https://github.com/jamii/hytradboi-jam-2022#day-4) | | I bet the zig project would be interested in the sha of the tree | that blows up their compiler | puffoflogic wrote: | I bet they wouldn't. They're well aware their complier often | runs code inside if(false) blocks (in certain positions) and | they just don't see this as important. Moving fast is more | important. (Where exactly they're moving to is not quite | clear.) | an_ko wrote: | > their complier often runs code inside if(false) blocks (in | certain positions) | | Do you have an example? Sounds like a catastrophic edge case. | bsima wrote: | And I would have done it too, if it weren't for that meddling | parser | chubot wrote: | I would have liked to have learned more about how the query | planner and evaluator work! There was almost nothing about that? | Just the tests magically moving from 0% to 95%. | | e.g. What table and value representation was used? | | FWIW I suspect using a LALR(1) parser in Zig on the sqlite | grammar would have saved some time and gotten past the parsing | headache. | | The sqllogictest comes directly from sqlite, so it seems like the | parsing problem is mostly "port from C to Zig" (which are very | similar metalanguages, or I guess meta- meta- languages in this | case :) ) | | Lemon is apparently a mini-yacc, just for sqlite's grammar, and | is about 7K lines of C code, with no deps: | https://sqlite.org/src/doc/trunk/doc/lemon.html ___________________________________________________________________ (page generated 2022-10-03 23:00 UTC)