[HN Gopher] PRQL - A proposal for a better SQL ___________________________________________________________________ PRQL - A proposal for a better SQL Author : maximilianroos Score : 324 points Date : 2022-01-24 17:24 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | eatonphil wrote: | I'm really excited about languages that build on or are compiled | to SQL, in the long-term (because I think it will take a very | long time to build adoption). | | The ones that particularly excite me are shorthands for SQL, even | though their heavy use of symbols may be a detriment. One | particular use case is in easily defining static authorization | policy-queries that are backed by database data plus and have | request variables injected during evaluation. | | I am not very excited by datalog/prolog-based languages because I | think logic languages are too unnatural to ever go mainstream. | But I'd be excited to be wrong or for logic languages to become | more friendly. | | Here are some others I'm watching. * | https://github.com/mrumkovskis/tresql * | https://www.htsql.org/doc/overview.html * | https://github.com/cytosm/cytosm | larodi wrote: | in a way SQL is Prolog and all reasoning for improvement of SQL | should start from Prolog, because is where SQL started from. | the expressive power of both languages is theoretically the | same, even though SQL is much more comprehensible. but then | again - certain complex task turn SQL in difficult-to- | comprehend series of nested declarative operations on algebraic | sets. | skissane wrote: | How does this compare to QUEL? | | Or Tutorial D? | ggrothendieck wrote: | The link refers to dplyr not being able to use databases but | actually there is a database backend for it in package dbplyr. | See https://dbplyr.tidyverse.org/ | divan wrote: | Obligatory reading https://www.scattered- | thoughts.net/writing/against-sql/ | lxe wrote: | Nice. Why OCaml though? I think using a more conventional | language to construct queries could yield more adoption. It also | seems that ORMs kinda of exist to tackle a similar issue, at | least in part. | HellsMaddy wrote: | I don't see anything dealing with OCaml here, other than it | being listed as an inspiration. | tpoacher wrote: | meanwhile I'm still stuck trying to make prolog work xD | blintz wrote: | This is a nice idea, especially given all the work people have | done recently to make in-language querying nicer (Spark comes to | mind). | | My only gripe is the 'auto-generated' column names for | aggregates. This seems like a recipe for disaster - what if there | is already (as there almost certainly will be) named | "sum_gross_cost"? The behavior also just seems rather unexpected | and implicit. My suggestion would be simple syntax that lets you | optionally give a name to a particular aggregate column: | ... filter gross_cost > 0 aggregate by:[title, | country] [ average salary, sum | gross_salary, average gross_cost, let | sum_gc = sum gross_cost, count, ] | sort sum_gc | | While it might seem a little uglier, it seems much more | sustainable in the long run. If this is really too gross, I'd | advocate some token other than underscore that is reserved for | aggregation variables; perhaps `sum@gross_cost` or | `sum#gross_cost`. | maximilianroos wrote: | Definitely -- giving the option of naming them is great. | | I'm not sure whether we should force naming? When I'm writing a | query often I'm fine with something auto-generated when | starting out. | JimmyRuska wrote: | Wow, between this and Malloy , there's a lot of great ideas. | | It'd be great if ideas from sparql/datalog/datomic were also | picked up. It's easier to write recursive queries. | jacktheturtle wrote: | Do people still write SQL? | didip wrote: | Of course, and it is vastly better than using an ORM. | [deleted] | mrweasel wrote: | You kinda have to. Assuming that you're using an ORM, you still | need to understand how it translate to SQL, and help it do the | translation correctly. | | Personally I've seen developer use the Django ORM, and create | application with terrible performance. Tweaking the queries, | you can help guide the ORM to generate better SQL, which in | turn will affect your performance greatly. | | We're currently facing a problem with a custom who have an | application with terrible performance/scaling issues. The | entire thing is very database heavy, but interaction is done | solely via Hibernate. I have nothing against Hibernate, it's a | fine ORM, but you need to understand it well enough that you | can guide it towards better queries (Which sometimes involve | actually writing SQL). At some point you need to decide if your | time isn't better spend learning SQL directly, as that via | always provide you with better access to the functionality | provided by the database. | onlyrealcuzzo wrote: | It's regularly listed in the top 5 most commonly used | programming languages in the world [= | MrBuddyCasino wrote: | Do people still use ORMs? | edgyquant wrote: | Of course. Large companies like Pepsi have teams of analysts | that only write SQL. I applied for a programming job there a | long time ago and didn't follow up when they explained in the | interview that's the only language they used. | badhombres wrote: | Absolutely. If I use a SQL db for my applications (I'm a | software dev for context), I generally write raw SQL vs using | an ORM. I find the long term issues of an ORM to not be worth | investing and understanding SQL. | | I'm also not having to learn a new library, in addition to the | standard DB connection libraries, ~if~ when I switch a language | or platform for some project. | drpotato wrote: | Yes, quite a lot I would imagine. I use it extensively at work | and similarly sql heavy software companies in the past. That | being said, I've also worked at places where they've avoided it | like the plague - largely because few people were competent at | it - and were moving away from relational DBs due to scale. | roveo wrote: | In the world of data analytics/BI are many people whose main | job is writing SQL. | extrapickles wrote: | I prefer to write SQL as most alternatives ether have runtime | surprises or require more roundtrips to the database. I mostly | work on line-of-business software, so if I was doing simple | CRUD apps I might have a different opinion. | slaymaker1907 wrote: | This seems very similar to Kusto/KQL. | cryptonector wrote: | The most important dialect of SQL we should get is SQL-with-no- | literal-values so we can force the use of query parameters, and | then not have SQLi. | BeefWellington wrote: | This is another in a series of these kinds of proposals that look | excellent on first glance for perhaps the 75% case but start | getting syntactically messy when I want to customize the | resultset returned. | | On the surface, they're always neat but when you start to dig | into how you'd implement something in an RDBMS, it begins to fall | apart. | | Let's look at the example syntax: from | employees filter country = "USA" | # Each line transforms the previous result. gross_salary | = salary + payroll_tax # This _adds_ a column / | variable. gross_cost = gross_salary + healthcare_cost | # Variable can use other variables. filter gross_cost > 0 | aggregate split:[title, country] [ # Split are the | columns to group by. average salary, | # These are the calcs to run on the groups. sum | salary, average gross_salary, sum | gross_salary, average gross_cost, sum | gross_cost, count, ] sort | sum_gross_cost # Uses the auto- | generated column name. filter count > 200 take 20 | | Where in here is it clearly stated which fields are returned? In | the original SQL it's right up front but here it's buried into | the "aggregate" function, and I'm not clear that this isn't an | oversight. | | Another example that speaks to the "how do I implement this" side | of the equation: from employees filter | country = "USA" # Each line transforms | the previous result. gross_salary = salary + payroll_tax | # This _adds_ a column / variable. gross_cost = | gross_salary + healthcare_cost # Variable can use other | variables. filter gross_cost > 0 | | Does this mean that the database must scan all records of the | employee table in order to return the result before moving to the | next step in the query? Must I index all fields? If not, how does | a query planner prepare for this scenario? | | The major tradeoff you make in most ORMs is exactly this: You | lose out on being able to be explicit about how many queries are | sent to the DB (and in many cases how efficient those queries | are). Now this would become a language feature? What do I gain | for that loss? | | I'm not saying that SQL Syntax is perfect; far from it. I'm not | seeing how this is an improvement. | | I think if you want traction though, a proof of concept using an | existing RDBMS would go a long way into providing evidence that | this will work and is sufficiently thought out to deal with even | the basics of what existing SQL databases have to. Query planning | is hard, especially if you want it to be fast. | drittich wrote: | I don't think it's falling apart at all. Personally, I would | require that what columns get returned be explicit (optionally | with a * type syntax that you have to enable - the defaults | should be safe and * has its risks). For one thing, you don't | necessarily want to return all the columns you have aggregated. | E.g., you may be running the equivalent of a HAVING clause on | an aggregate column, so don't need the value returned. | | "Each line transforms the previous result." - I assume this is | referring to the order that transpilation happens, so you can | read it top to bottom and understand the flow easily. | | One thing I would like to see is how a recursive CTE might | look. | tomtheelder wrote: | > Where in here is it clearly stated which fields are returned? | In the original SQL it's right up front but here it's buried | into the "aggregate" function, and I'm not clear that this | isn't an oversight. | | It's in the aggregate portion, like you said. Other example | queries have a select portion. Why does it matter that it's not | in the leading position like SQL? | | > Does this mean that the database must scan all records of the | employee table in order to return the result before moving to | the next step in the query? Must I index all fields? If not, | how does a query planner prepare for this scenario? | | No, they are just describing how the statement is supposed to | be interpreted by a human. I think you can basically just | shuffle all the filter statements to the end and keep it | logically equivalent. | | This is a proposal for a "transpiles to SQL" language. So long | as that transpliation is predictable, you cannot run into the | sort of issues you are describing. | BeefWellington wrote: | > It's in the aggregate portion, like you said. Other example | queries have a select portion. Why does it matter that it's | not in the leading position like SQL? | | I don't mind it not being in the leading position. The author | provided a _very simple_ query and in that case it 's not | immediately apparent what fields to expect the resultset to | contain when returned to the consumer. | | This is a troubleshooting issue more than anything else. IMO | placing the "selected fields" into the very centre of the | query is distracting and obfuscates what is happening. | | > This is a proposal for a "transpiles to SQL" language. So | long as that transpliation is predictable, you cannot run | into the sort of issues you are describing. | | I think a good test of whether any transpiled language works | well is to look at whether it could work on its own as a | language. See: Typescript. | tfehring wrote: | Very cool! A couple questions/suggestions off the top of my head: | | 1. Did you consider using a keyword like `let` for column | declarations, e.g. `let gross_salary = salary + payroll_tax` | instead of just `gross_salary = salary + payroll_tax`? It's nice | to be able to scan for keywords along the left side of the | window, even if it's a bit more verbose. | | 2. How does it handle the pattern where you create two moderately | complex CTEs or subqueries (maybe aggregated to different levels | of granularity) and then join them to each other? I always found | that pattern awkward to deal with in dplyr - you have to either | assign one of the "subquery" results to a separate dataframe or | parenthesize that logic in the middle of a bigger pipeline. Maybe | table-returning functions would be a clean way to handle this? | maximilianroos wrote: | > 2. How does it handle the pattern where you create two | moderately complex CTEs or subqueries (maybe aggregated to | different levels of granularity) and then join them to each | other? I always found that pattern awkward to deal with in | dplyr - you have to either assign one of the "subquery" results | to a separate dataframe or parenthesize that logic in the | middle of a bigger pipeline. Maybe table-returning functions | would be a clean way to handle this? | | I don't have an example on the Readme, but I was thinking of | something like (toy example): table | newest_employees = ( from employees sort tenure | take 50 ) from newest_employees join | salary [id] select [name, salary] | | Or were you thinking something more sophisticated? I'm keen to | get difficult examples! | | Edit: formatting | twic wrote: | My gut reaction is that if we have "from first" then maybe we | should have to "to last": from employees | sort tenure take 50 as newest_employees | from newest_employees join salary [id] select | [name, salary] | mcdonje wrote: | When you add in the ability to reference different tables | like that to the piping syntax, it starts to remind me of the | M query language: https://docs.microsoft.com/en- | us/powerquery-m/quick-tour-of-... | | There, each variable can be referenced by downstream steps. | Generally, the prior step is referenced. Without table | variables, your language implicitly pipes the most recent | one. With table references, you can explicitly pipe any prior | one. That way, you can reference multiple prior steps for a | join step. | | I haven't thought through that fully, so there may be gotchas | in compiling such an approach down to SQL, but you can | already do something similar in SQL CTEs anyway, so it should | probably work. | maximilianroos wrote: | Thanks! | | > Did you consider using a keyword like `let` for column | declarations | | Yeah, the current design for that is not nice. Good point re | the keyword scanning. I actually listed `let` as an option in | the notes section. Kusto uses `extend`; dplyr uses `mutate`; | pandas uses `assign`. | | I opened an issue here: https://github.com/max- | sixty/prql/issues/2 | maximilianroos wrote: | I've added the `let` keyword given a few people commented on | this. | maximilianroos wrote: | I wrote this over the holidays, because I find SQL wonderfully | elegant in its function, but really frustrating in its form. | | Let me know any feedback -- as you can see it's still at the | proposal stage. If it gains some traction I'll write an | implementation. | roberto wrote: | This looks great! Clear docs and rationale, and the syntax is | well thought. I'm definitely following this. | xi wrote: | Maybe you'd like to check FunSQL.jl, my library for | compositional construction of SQL queries. It also follows | algebraic approach and covers many analytical features of SQL | including aggregates/window functions, recursive queries and | correlated subqueries/lateral joins. One thing where it differs | from dlpyr and similar packages is how it separates aggregation | from grouping (by modeling GROUP BY with a _universal_ | aggregate function). | maximilianroos wrote: | This is awesome! I'll add a link to it on PRQL. | | I guess the biggest difference between FunSQL (and similarly | dbplyr) and PRQL is that the former needs a Julia (or R) | runtime to run. | | I really respect the library and keen to see how it develops. | clarkevans wrote: | Writing an alternative syntax is straight forward. Perhaps | prototype PRQL using xi's excellent FunSQL backend? This | way it's working out of the gate. Once syntax+semantics are | pinned, writing another backend in the language of your | choice would then be easier. Getting the backend correct is | non-trivial work, and xi has done this already. Besides, we | need a sandbox syntax anyway, so it might be fun to | collaborate. | glogla wrote: | This is cool! I love it. | | Few notes: | | 1) SQL also allows you to define windowing reusably, like this | select sum(blah) over window_abc, | avg(blah) over window_abc from table_xyz window | window_abc as (partition by x order by y) | | so that second example could be written somewhat less | repetitively but it wouldn't change the whole point. | | 2) Sadly my main pain point with SQL for ETL is not possible to | solve with a transpiler - SQL has exactly one target so doing | things like "I want these records to go to a table A and those | records go to table B" is not possible with one query. | | 3) It would be cool to see how this does typically annoying and | repetitive cases from analytics / data warehousing world. I'm | thinking like SCD1/2 implementation. But I don't even know if | mutation is there yet. | | 4) I would recommend investing in one canonical formatter, like | Go has. So that there isn't infinite number of ways the same | query could be formatted for people to argue over preference. | | EDIT: | | 5) Since this seems to be focused on analytics (by the choice of | queries and Snowflake in examples), I want to highlight that | someone suggested to use TPC-H (or TPC-DS) queries as a | benchmark. It does sound like a good idea. | adamrezich wrote: | I like this a lot, and I eagerly anticipate an early version of a | transpiler to play with! | hyperpallium2 wrote: | \tangent What are today's data transformation needs (differ from | Codd's inspiration?), how does relational algebra serve them, and | design an "SQL" around that. There's got to be a 10x leapfrog in | benefit for _some_ niche in there, and that 's the gateway to | adoption. | tristanz wrote: | To get 10x I think you need to wed it to solving broader | workflow challenges, like dbt does today. | nassimsoftware wrote: | It would be definitely interesting to have a TypeScript of some | sort but for SQL. So a more practical and prettier syntaxe like | what I'm seeing here that compiles to SQL queries. | eatonphil wrote: | TypeScript is more verbose than JavaScript. While I love to use | TypeScript I don't think I'd categorize it as prettier than | JavaScript. And practical... well if you mean it is more | maintainable then yes but if you mean faster to write then no. | | I don't want a more verbose SQL I want a less verbose SQL! | nassimsoftware wrote: | I meant more the aspect that with TypeScript people would | prefer to write in it and then compile to JavaScript because | there is a benefit. With a PRQL with an SQL compilation | target we would reap the benefit of a more practical and | better syntax. In both cases they bring benefits but not in | the same way. | dirslashls wrote: | Go to https://sqlframes.com/demo and in the code editor enter | the following and execute (this example is taken from the first | example on PRQL github page). It generates SQL, but it also | computes and displays the results within the browser (though | the data set below gives no results). | | const employees = SQL.values([{ title: 'Developer', country: | 'USA', salary: 120, payroll_tax: 20, healthcare_cost: 6 }]); | employees.schemaName = 'employees'; const { groupBy, where: { | gt, eq, and }, agg: { count, sum, avg } } = SQL; return employe | es.pdf(SQL.script('[salary]+[payroll_tax]').as('gross_salary'), | SQL.script('[gross_salary]+[healthcare_cost]').as('gross_cost') | ) .fdf(and(gt('gross_cost',0),eq('country','USA'))) | .gdf(groupBy('title','country') | ,avg('salary').as('average_salary') | ,sum('salary').as('sum_salary') | ,avg('gross_salary').as('average_gross_salary') | ,sum('gross_salary').as('sum_gross_salary') | ,avg('gross_cost').as('average_gross_cost') | ,sum('gross_cost').as('sum_gross_cost') ,count().as('count')) | .having(gt('count',200)) .orderBy('sum_gross_cost'); | gibsonf1 wrote: | SPARQL. Representing human information in relational tables goes | against how people actually think and use information. We humans | think in tremendous numbers of nested hierarchies, and recursive | hierarchy traversal is a nightmare in relational databases. A | graph is the structure for data that works best, is most | efficient, and actually reflects how things are connected in our | brains. | mindcrime wrote: | I'm a big fan of SPARQL, but the one thing that would concern | me about trying to use it outside of the SemWeb context is | simply that it assumes data is stored in <S,P,O> triples. | Legacy databases by and large are not, so you need an adapter | to bridge the representations. And while I know some exist, I | haven't really used them and am not sure about the performance | impact. | mst wrote: | You can get quite far mapping the triple concept to (PK, | column, value) or (PK, FK, related-row) and transpiling from | there. | | (I played around with this some years back, not to the point | where anything came out of it worthy of publishing, but | enough to be pleasantly surprised how far 'quite far' turned | out to be in practice) | fuzzieozzie wrote: | The time of SQL alternatives | https://news.ycombinator.com/item?id=30053860 | maximilianroos wrote: | Yup, I posted that yesterday too. I think Malloy is really | interesting -- compile to SQL but give more integrations to the | DB, like schema-during-development. It has a proper team, led | by Lloyd Tabb. | 999900000999 wrote: | >Compatible -- PRQL transpiles to SQL, so it can be used with any | database that uses SQL. Where possible PRQL can unify syntax | across databases. PRQL should allow for a gradual onramp -- it | should be practical to mix SQL into a PRQL query where PRQL | doesn't yet have an implementation. | | Awesome. | | I hate SQL so much, I know for personal projects this is gold. I | imagine actually using it at work might draw some questions | though | hyperpallium2 wrote: | nice name | peoplefromibiza wrote: | This is the kind of things that a well crafted DSL can solve, if | the language you use supports macros. | | PRQL looks very similar to Ecto, the Elixir Query DSL | | https://hexdocs.pm/ecto/Ecto.Query.html | magicalhippo wrote: | Now this is actually nice, unlike the other suggestion posted | today[1]. | | Maybe I'm just too used to non-standard extensions of our | database but the SQL example could, at least for our db, be | rewritten as SELECT TOP 20 title, | country, AVG(salary) AS average_salary, | SUM(salary) AS sum_salary, AVG(gross_salary) AS | average_gross_salary, SUM(gross_salary) AS | sum_gross_salary, AVG(gross_cost) AS | average_gross_cost, SUM(gross_cost) AS | sum_gross_cost, COUNT(*) as count FROM ( | SELECT title, country, | salary, (salary + payroll_tax) AS gross_salary, | (salary + payroll_tax + healthcare_cost) AS gross_cost | FROM employees WHERE country = 'USA' ) emp | WHERE gross_cost > 0 GROUP BY title, country | ORDER BY sum_gross_cost HAVING count > 200 | | This cuts down the repetition a lot, and can also help the | optimizer in certain cases. Could do another nesting to get rid | of the HAVING if needed. | | Still, think the PRQL looks very nice, especially with a "let" | keyword as mentioned in another thread here. | | [1]: https://news.ycombinator.com/item?id=30053860 | ako wrote: | With a CTE it would read a bit more like prql: | with usa_employees as ( SELECT title, | country, salary, (salary + | payroll_tax) AS gross_salary, | (salary + payroll_tax + healthcare_cost) AS gross_cost | FROM employees WHERE country = 'USA' AND | (salary + payroll_tax + healthcare_cost) > 0 ) | select title, country, AVG(salary) | AS average_salary, SUM(salary) AS | sum_salary, AVG(gross_salary) AS | average_gross_salary, SUM(gross_salary) AS | sum_gross_salary, AVG(gross_cost) AS | average_gross_cost, SUM(gross_cost) AS | sum_gross_cost, COUNT(*) as emp_count from | usa_employees group by title, country having | count(*) > 200 order by sum_gross_cost limit 3 | | Readability is pretty similar to prql. It would really help in | SQL if you could refer to column aliases so you don't have to | repeat the expression. | magicalhippo wrote: | > With a CTE | | The DB we use supports those, I just learned about them too | late so keep forgetting they exist :( | | > It would really help in SQL if you could refer to column | aliases so you don't have to repeat the expression. | | The DB we use supports that, so in your CTE you could write | AND gross_cost > 0 | | We do that all the time, which will be a pain now that we're | migrating to a different DB server which doesn't. | jsyolo wrote: | what expressions are being repeated here? | oblio wrote: | > (salary + payroll_tax) AS gross_salary, | | > (salary + payroll_tax + healthcare_cost) AS gross_cost | | > AND (salary + payroll_tax + healthcare_cost) > 0 | | And his is a simple example. | mmsimanga wrote: | Sybase IQ allows you to use the column alias anywhere else in | the query. | gmfawcett wrote: | Not all database systems can optimize queries well over CTE | boundaries. I believe this is still true for PostgreSQL (no | longer true, see below -- it was true a few years ago). So | there's a potential performance hit for (the otherwise | excellent advice of) writing with CTE's. | Rovanion wrote: | IRC tells me this has been fixed now. | gmfawcett wrote: | Awesome news! thank you for sharing this. I found this | post which confirms IRC and suggests it was an | improvement in PG 12: | | https://paquier.xyz/postgresql-2/postgres-12-with- | materializ... | | Today is a great day to have been wrong on the Internet. | :) | correct-me-plz wrote: | Snowflake lets you refer to column aliases, and it's great! | | There's the slight issue of shadowing of table column names, | which they resolve by preferring columns to aliases if both | are named the same. So sometimes my aliases end up prefixed | with underscores, but that's not a big deal. | dvasdekis wrote: | Column aliases would have saved me hundreds of hours over the | course of my career. Sorely missing from standard SQL, and | would make the need for PRQL less acute. | iblaine wrote: | The syntax seems similar to Apache Pig. Both are declarative and | primarily built to be a procedural form of SQL. | pmontra wrote: | First, kudos because it takes courage to take on SQL in this way. | | Second, this kind of reversed SQL (filter-first, select-last) is | much easier to reason about than the original and keep in mind | that I prefer to code complex queries in SQL than to build or | translate them in the ORM of the project I'm working on. | | Maybe a transpiler is an inevitable first step but I think that | any SQL replacement should be itself the target of ORMs and run | directly in the database CLI tools (psql / mysql ...) or IDEs | (pgAdmin, MySQLAdmin, ...). What's the long term plan of the | project? | dragonwriter wrote: | > Second, this kind of reversed SQL (filter-first, select-last) | is much easier to reason about than the original | | Given that SQL clauses tend to be unambiguously terminated by | the start of the next clause or the end of the statement, it | surprises me that no engine has gone to accepting otherwise | standard(-ish, as much as real DB vendor dialects are) SQL but | without a mandated order of clauses. | | And then combine that with dev tools that allow easy | rearrangement of clauses, perhaps based on configured | preferences so that you don't even see the original if its not | your preferred order, so that "Bob likes old-school SELECT FROM | WHERE GROUP BY and Alice likes FROM WHERE GROUP BY SELECT" | isn't a problem. | maximilianroos wrote: | Thanks! | | I agree that integrating with the DB would allow much more from | a lang. But PRQL is a bet that languages which _start_ there | (e.g Kusto) get lost because it requires changing DB, which is | really hard. I worry EdgeDB may hit this issue too (but I 'm | really hoping it works, and they have an excellent team). | | As I think you're suggesting -- you could imagine a language | starting out as a transpiler, and then over time DBs working | with it directly, cutting out some of the impediment mismatch. | | Malloy [1] is another point in space -- it targets existing DBs | through SQL queries but can also ask for schemas etc while | developing. | | [1] https://github.com/looker-open-source/malloy | _dain_ wrote: | Does this let you write functions that take tables as parameters | and return tables? It always seemed weird to me that SQL doesn't | let you do this. | akdor1154 wrote: | Looks really nice, i've been scribbling away in a little notebook | all the things i would do in "akdor's dream sql", and what you | have here hits pretty much exactly. | | Wondering about generic use of `let` - you have let for col | defns, but `func` for functions and a TODO for tables/CTEs - | could/should `let` do the lot? (Like another commenter posted, | this is how MS's M language, used in PowerQuery in PowerBI and | Excel works). Could enable an escape from point-free for entire | queries if taken to extreme generality, not sure if that's a good | thing, maybe it could be? | | Bikeshedding: even with some OCaml/F# experience, i find `f x y` | harder to read than `f(x, y)`. | maximilianroos wrote: | Thanks! | | At the moment `let` is used to add a column as part of an | existing pipeline. [1] | | `func` is the start of new expressions / pipelines. And I just | added a proposal for `table = `, which would be the same. | | Does that make sense? Very open to more feedback... | | [1] I just added `let` based on feedback here, it's better than | it was, but not perfect, as it can be confused for a new | pipeline given its use in other langs. | nextaccountic wrote: | I also saw a proposal for a better syntax for SQL, called BQL, | that was a strict superset but allowed for better modularity | | http://intelligiblebabble.com/a-better-query-language-bql-la... | | It had this github repo https://github.com/lelandrichardson/BQL | but never went anywhere | | I hope PRQL has a better fate! unfortunately, by deviating from | SQL lexical conventions (using :, using [], etc) we lose the | ability to copy-paste from sql code elswhere. | | I want a better SQL, but I also want some compatibility. Like | typescript is for javascript. | spullara wrote: | SQL could get a lot better by just adopting the ordering of | operations like they did with LINQ: | | https://docs.microsoft.com/en-us/dotnet/csharp/programming-g... | tester756 wrote: | Funnily enough LINQ Query syntax is really uncommon and | everybody uses method syntax | | var list = new List<int>{1,2,3} | | var extracted = list | | ....................Where(x => x > 1) | | ....................Select(x => $"my number: {x}) | | ....................ToList(); | aloisdg wrote: | Because the original felt odd in C#. | louthy wrote: | > everybody | | Not everybody | betimsl wrote: | We barely learned SQL for all these years and the guy wants to | change it now. Thank you but no thank you. | | JK. Cool concept and hopefully it catches. | eximius wrote: | > A line-break doesn't created a pipeline in a few cases: ... > | When the following line is a new statement, by starting with a | keyword such as func. | | This feels like it could cause compatibility issues in the | future. | jcdreads wrote: | I like that everyone is trying to make something like SQL that | reads more naturally to them. More alternatives is good! SQL is a | widely accepted standard, and has strictly defined and super | broadly accepted semantics. | | As someone who has written quite a few half-baked-for-general-use | but fit-for-purpose SQL generator utilities over the years, I'll | suggest that if you intend for a novel syntax to be a general SQL | replacement then being isomorphic to SQL would massively increase | usefulness and uptake: | | 1. novel syntax to SQL; check! Now novel syntax works with all | the databases! | | 2. any valid SQL to novel syntax; a bit harder, but I'd start by | using a SQL parser like https://github.com/pganalyze/libpg_query | and translating the resulting AST into the novel syntax. | | 3. novel syntax to SQL back to novel syntax is idempotent; a nice | side effect is a validator/formatter for "novel syntax" | | 4. SQL to novel syntax back to SQL is idempotent; a nice side | effect is a validator/formatter for SQL, which would be awesome. | (See also https://go.dev/blog/gofmt, which is where I learned | this "round trip as formatter" trick.) | | I don't mean for this to sound negative, and I know that 2, 3, | and 4 are kind of hard. Thank you for building prql! | Nican wrote: | Also worth looking at KQL: https://docs.microsoft.com/en- | us/azure/data-explorer/kusto/q... | leokennis wrote: | When you said "KQL" I thought you meant "Kibana Query Language" | but I guess every letter of the alphabet followed by QL is | already taken twice over... | CarVac wrote: | I was reading the criticism and suggestions and wondering what | they were talking about, it seemed to read perfectly fine... | | Then I saw that the project had already been changed in response! | Impressive. | erezsh wrote: | It seems people here are really interested in alternatives to | SQL. So perhaps you'd also like to have a look at | https://github.com/erezsh/Preql | | (Same name, same goal, different approach, and already working) | dgudkov wrote: | >PRQL is intended to be a modern, simple, declarative language | for transforming data | | It's not declarative. It's functional. | | I believe that the approach that is followed by PRQL is more | practical than SQL. We've implemented a similar approach in our | visual ETL tool for non-technical people (https://easymorph.com) | and it works wonderfully. Other cool things you can do with this | approach (and can't with SQL): | | * Modify existing columns without re-selecting the whole dataset | | * Loops (iterations) | | * Conditional IF/THEN/ELSE branching as a workflow statement | | * Exceptions & error handling | beagle3 wrote: | shakti / K / kdb+ implements "real SQL", which is concise but | readable, and could give you a few ideas. Here's a copy-paste | from https://shakti.sh/ under document/sql.d (cannot deep link, | unfortunately). The most most magical aspects are automatic joins | - both left joins and "foreign key chase" joins. The fk-chase | joins, in particular, should be part of _every_ query language, | and can possibly be added in a backward compatible way to | existing SQL implementations. | | example: TPC-H National Market Share Query 8 | http://www.qdpma.com/tpch/TPCH100_Query_plans.html what market | share does supplier.nation BRAZIL have by order.year for | order.customer.nation.region AMERICA and part.type STEEL? | | real: select revenue avg supplier.nation=`BRAZIL by order.year | from t where order.customer.nation.region=`AMERICA, | part.type=`STEEL | | ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue | else 0 end) / sum(revenue) as mkt_share from ( select | extract(year from o_orderdate) as o_year, revenue, n2.n_name as | nation from t,part,supplier,orders,customer,nation n1,nation | n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey | and l_orderkey = o_orderkey and o_custkey = c_custkey and | c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and | r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and | o_orderdate between date '1995-01-01' and date '1996-12-31' and | p_type = 'STEEL') as all_nations group by o_year order by o_year; | maxwelljoslyn wrote: | Thanks for the tip. That automatic "foreign key chasing" looks | phenomenal. Byebye, much of that big tedious chunk in the | middle of your 2nd example... Wish I had that for more of the | SQL I write. | lijogdfljk wrote: | I wonder how this compares in practice to EdgeQL, | https://www.edgedb.com/showcase/edgeql | | Offhand i thought PRQL seemed easier to reason about, but | something about EdgeQL seems better to me.. though i can't | describe it. | maximilianroos wrote: | I see EdgeQL as an excellent replacement for SQL in OLTP | settings -- it has great language integration and a unified | relational & typing approach. (Please correct me if this is | mistaken though). | | I wrote the PRQL proposal for analytical / OLAP queries, where | the pipeline of transformations are more important, and | relations and typing are relatively less important. | mongol wrote: | I think it would be worthwhile to develop a shorthand of the same | thing, suitable for use on the command prompt. Something using | symbols as synonyms for keywords. Less eligble but more useful in | a future when shell tools understand this syntax. | nurettin wrote: | To be fair, lateral joins (cross/outer apply in mssql) can help | with name aliasing and table functions give sql some reusability. | I think the main pain points for sql are pivots and window | functions. | | A lot of the time you just want to transpose your result, but you | have to choose an aggregate and handle null cases to force pivot | to work the way you want it. | | And a lot of the time you want to aggregate a window but keep the | ids of the row so you avoid the having keyword altogether and go | for row_number and dense_rank to get your aggregate results. | | If I were to write a query language, I would discard group by and | having and make it easier to apply transpose and window | functions. | bchammer wrote: | This looks like a turing-complete language if you add function. | One benefit or current SQL is that it can (and is) often | rewritten to be executed more efficiently. This requires a | simpler and declarative model - far from a programming language. | | If you want simple 'forward declaration' take a look at the SQL | WITH clause. | hardwaregeek wrote: | I've always wondered why there aren't query languages that | embrace algebraic data types and pattern matching. Seems like an | obvious fit to me. There's many times where you'd want to model a | table that has either this scheme or that schema. | jlokier wrote: | They can work well. In the project I'm working on the database | uses algebraic datatype keys (i.e. tags and tag-dependent | columns) to make the database faster and smaller than an | equivalent relational schema, but the database is used via API | rather than via a query language. | jayd16 wrote: | Check out C# and LINQ. They pull it off with anonymous types. | haolez wrote: | This actually looks like an improvement (and I like SQL). This | feels closer to non-programmers, contrary to some other SQL | "competitors" like that query language from InfluxDB. | thanatos519 wrote: | It definitely scans better than 'Flux' from InfluxDB2. | | One thing I like about Flux is the ability to split streams and | return multiple distinct aggregations. Very handy in Grafana | dashboards! | 29athrowaway wrote: | SQL is like BASIC for data. | | It is a language created to read like natural language, to | facilitate its adoption. | | Not only developers use SQL, less technical users also use it, | especially in the finance industry. | | Now, do you use BASIC professional software development? While | some do, most don't. So why do we use SQL then? | | An imperative language that compiles to SQL can be attractive, | but unless the compilers come with good compiler warnings, we may | end up created bloated database code. | ajkjk wrote: | Just wanna say, I absolutely love this. | ajkjk wrote: | One piece of feedback: | | "sort sum_gross_cost # Uses the auto-generated column name." | ... seems like a huge landmine. Languages really should not | have any implicit way of constructing identifiers (among other | reasons it is not easily greppable). | | You might consider using a syntax like `sum:gross_cost` which | can function as a sort parameter and an aggregation, but is | actually recognizable as an object instead of having an | implicit transformation going on in the background. Like this: | ... filter gross_cost > 0 aggregate by:[title, | country] [ sum:gross_cost, ] sort | sum:gross_cost ... | topspin wrote: | > # Uses the auto-generated column name | | The fact that someone felt the need to add that comment hints | at a design mistake. Synthesizing symbols is weird, | unnecessary and is probably a violation of the principle of | least surprise. | | Otherwise I think PRQL has some value. Nice work. I strongly | suspect that if SQL looked more like this there would be a | lot more people willing to use the query language directly | and perhaps fewer that are compelled to bury it under | shifting layers of fragile abstractions. | politician wrote: | I prefer "from first". When I write a SQL query, 99 times out of | 100, I'll start with "select * from", then fill in the query, | then go back and select the columns. After a few basic joins, | "easy" column names have probably grown aliases or have been | subsumed entirely. For me, "select * from" is automatic. | johnthescott wrote: | using COMMON TABLE EXPRESSIONS (cte) can greatly improve | readability of complex sql queries. adding "flow" just feels like | a variation on sequential programming. | | i am against "improving" sql. instead, i thing a whole rethink of | the engineering behind relational engines needs to occur. for | example, why can't a relational database support both SQL and | other languages simultaneously, instead of being so black-boxish? | thelittlenag wrote: | I worked on a language at one of the big banks that looked very | similar. The goal was to have a better-than-SQL language on top | of Spark and I think we succeeded at that very well. | Unfortunately, politics killed the language and platform. | deepstack wrote: | Wouldn't it be better to just use something like prolog, there is | already and extension for postgres? | onphonenow wrote: | Any chance of starting with FROM perhaps to help with | autocomplete? | | From Table or FROMT TABLEA JOIN TABLE B ... | | The autocomplete would then be able to introspect much better in | the tooling side. | ithrow wrote: | Queries start with FROM, it's called a README for a reason... | Ngunyan wrote: | ridaj wrote: | Very cool | | Design goal question - is the goal to have a language that | _blindly_ compiles to SQL, or will compilation require data | schema knowledge? | | Suggestion: where possible stick to the well established SQL | keywords (prefer "group by" to "aggregate by") | maximilianroos wrote: | Initially the transpiling would be context-free of the schema. | | It would really nice to have context when _writing_ the code, | so we could do things like autocomplete (IIUC this is what | Malloy does already). | | Are there features you can think of that would be helpful if we | had the schema context during transpiling? | ridaj wrote: | I wasn't thinking it'd be useful, but more that it's good to | remain as free as possible from any assumptions about the | schema of the data. I often work with tables that have | unusually large schemas (> 100MB) and have seen some | products' performance severely degrade as a function of | schema complexity. | | (But otherwise +1 to schema awareness during authoring.) | kovek wrote: | Could there be a tool that would translate PRQL to SQL? One could | then write in ~/my_scripts/closest_points.prql and then run a | command to get the sql equivalent and use that in exiting SQL | tools that do not currently access PRQL (like Postgres). | ako wrote: | From the PRQL page: "PRQL transpiles to SQL, so it can be used | with any database that uses SQL." | oblio wrote: | Man, I really hope something like this takes off. SQL is so | entrenched but it's really showing its age and its pitiful | standardization. | oblio wrote: | To the author, if he's reading this. I wish you luck! To get this | adopted, try to make it modular so maybe it can be made a core | module of PostgreSQL, MariaDB, MySQL. If you somehow get into | those 3 I wouldn't be surprised if Oracle, SQL Server and DB2 | integrate it themselves just to keep up. | | Ah, Presto could be another popular target. | | It's a huge effort but something which could have colossal | payback. | twic wrote: | This language transpiles to SQL, so it can be implemented | entirely client-side. No need for modules for the database | engines, just wrappers round the clients. | nikkinana wrote: | jeroen79 wrote: | to be honest i find this makes it just more complex and less | readable then normal sql | LittlePeter wrote: | Same here. I find the SQL query in the README more readable. | hackeredje wrote: | Is this not Linq ? | https://stackoverflow.com/questions/tagged/linq ? | | And then dump the queries via | https://stackoverflow.com/questions/1412863/how-do-i-view-th... | or https://www.linqpad.net/ ? | aloisdg wrote: | Indeed. It looks a lot like dotnet's Linq. | ianbicking wrote: | I like it, it's readable, unlike some SQL alternatives I've seen | it doesn't make me feel like I'm dumb and don't understand what a | query even is. | | I can't decide if it would be better or worse if it stuck more | closely to SQL keywords. You use "from" and "select", but not | "where", "order by", "group by". There's some danger of it being | in an uncanny valley of SQLish, but I'm pretty sure I'd prefer | just using those terms verbatim (including the space in "order | by"... that style is less common in modern languages but it's not | really that much harder to parse). | | I'd like to see more examples of joins and composing SQL. Does | this language make it easier to make more general SQL queries? | Can I take two queries and squash them together in a reliable | way? I feel like I end up with a lot of theme and variation in my | queries, often involving optional filters. | | I might even like a notion of encapsulation that could help this | query language when it's embedded in other languages. Like if I | could say, in the language itself, that a query has certain | unbound variables (and not just ? or other placeholders). This | language seems like it would be better for generating than SQL, | and sometimes generation is just necessary (like in any | application that supports data exploration), but for most common | cases I'd hope to avoid that. Defining inputs and then making | whole filter sections or other statements conditional on those | inputs would help here. | maximilianroos wrote: | Thanks! | | I just fleshed out composing CTEs, which is a small step | towards the broader goal of making composition easier: | https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078... | | Let me know if you have a good case of the sort of composition | you find difficult in SQL (either here or in an issue). Thank | you! | cogman10 wrote: | Yup, I like a lot of things about the way this looks. In | particular, I like how friendly this looks to be for things | like auto complete (pretty annoying to need to practically type | the entire sql query only to go back and fix up the columns in | order to get autocomplete to work). | | Specific things I'd like to see. | | How do you handle column ambiguity. In the examples, they show | a join of positions to employee on employee_id == id. But what | happens when you have 2 columns with the same name that you are | joining on? (like employee_id to employee_id in some mapping | table). | | Subqueries are pretty important in what I do, so what do those | look like (perhaps covered by the "thinking about CTEs | section"). | | How about opportunities for optimization hints? In T-SQL you | can hint at which index the optimizer should prefer to a | specific query. | | Common SQL patterns would also be interesting. Like, how would | you do keyset pagination? | | Edit: Also, I'd like a discussion about null. SQL null handling | rules are terrible. I understand them, I work with them, but at | the same time, they are so different from other languages | concept of "null" that they are easy to trip over. | galkk wrote: | I'm quite opposed to the idea "from should be first". | | I want to understand what exactly the query returns, not the | implementation detail of the source of this data (that can later | be changed). | | Literally first example from page - I have no idea what is being | returned: from employees filter | country = "USA" # Each line transforms | the previous result. let gross_salary = salary + | payroll_tax # This _adds_ a column / variable. | let gross_cost = gross_salary + benefits_cost # Variables can | use other variables. filter gross_cost > 0 | aggregate by:[title, country] [ # `by` are the | columns to group by. average salary, | # These are the calcs to run on the groups. sum | salary, average gross_salary, sum | gross_salary, average gross_cost, sum | gross_cost, count, ] sort | sum_gross_cost # Uses the auto- | generated column name. filter count > 200 take | 20 | | of course, similar things are happening to SQL too, with CTEs | becoming more widespread and "real" list of the columns hidden | somewhere inside, but it's still parseable | quocanh wrote: | I agree that the columns of the results should be more obvious. | But I am a proponent of "from should be first". I have never | written a SQL query without thinking about the contents of a | table or its relations. If it was my way, I would describe | where the data I'm pulling from, then describe any | filters/joins, then describe the columns that I'm interested in | (last). | phailhaus wrote: | SELECT id, name, author | | Quick, what is this query about? What's ironic is that I think | you have it backwards: the columns are the implementation | detail, not the table. The table is the context: you can't | change that without having to change everything else. But | columns are the last step, the selection after the filters, | joins, etc. They can be changed at any time without affecting | the logic. | taeric wrote: | This is... An odd choice. I'd assume I'm not without context | looking at a query to know why I would want those columns. | | And the auto complete story is backwards. Often I know what | columns I want, but I'm not clear what table I need to get | them from. Such that, if you make a smarter suggest in the | from to only include tables that have the columns, I'd be | much happier. | sanderjd wrote: | Just throwing in another point of anecdata onto this pile: | "Often I know what columns I want, but I'm not clear what | table I need to get them from" does not make sense to me. I | don't relate at all to their being a global namespace of | columns, rather than a namespace of tables, each with its | own columns specific to its context. | taeric wrote: | I challenge this. I accept that there are ambiguities, | but I assert that you can go really fast by just telling | someone to fetch a few columns by name. | | I further assert that if your database is filled with | "Id" and "name" columns, instead of "department_name" and | similar, you are probably as likely to mess up a join as | any benefit you get from the name being short. (And | really, what advantage is there in short names nowadays?) | | That all said. I worded my take too strongly. My point | should have been that auto suggest should not be confined | in either direction. | samatman wrote: | I'm also completely unfamiliar with the PRQL syntax, outside of | right now. | | Reading the comment however, it would seem that `let` adds | columns which are implicitly returned in the order they are | defined. | | I do see benefits in this, and can imagine pitfalls. Hard to | judge without kicking the tires. | | Update: It's quite possible we saw different syntax! | | https://news.ycombinator.com/item?id=30063266 | | Without the `let` I would imagine having trouble reading it as | well, I'm not sure if that would go away with familiarity but | my instinct is that it's a useful addition. | inglor wrote: | The big advantage of "from first" like we have in Kusto KQL (a | database we use at Microsoft) is that it provides much better | autocomplete (if I write the `from` it can easily autocomplete | the projection). | | If you want an interesting example of how a query language | built for developer experience and autocompletions looks | definitely check it out!. | majkinetor wrote: | Too bad we can't use Kusto with anything except Azure. | _jal wrote: | Designing languages around autocomplete is like designing | toilets for better toilet paper dispensers. | | The language should be right for human understanding, not | automated mad-lib generation. | Xelbair wrote: | And one of use cases is writing queries which it helps | immensely. Best of both worlds would allow both orders. | Just automatically transform the query to the usual form | after it's execution. | phailhaus wrote: | Building for autocomplete _is_ building for human | understanding. If it is impossible for a computer to | determine the context of your query, why would a human do | much better? | [deleted] | _jal wrote: | They are not fully-aligned goals, and autocomplete should | not be given equal consideration on par with human | clarity. | | If you want nice autocomplete too, that's fine, but if | there is a tradeoff, human understanding is the primary | concern. | tester756 wrote: | I don't understand why do you think about it this way | | C#'s LINQ (really powerful tool similar to SQL) works the | same way | | look: | | var list = new List<int>{1,2,3} | | var extracted = list | | .............................Where(x => x > 1) | | .............................Select(x => $"my number: | {x}) | | .............................ToList(); | | or | | var extarcted = | | ........................from x in list | | ........................where x > 1 | | ........................select $"my number: {x}; | inglor wrote: | You would think that but having used both I find writing | Kusto/KQL much smoother, neater and faster and if I have to | choose between writing a query in either one I'd pick KQL. | | I understand this is just an opinion but it's an opinion | held by everyone in my org who writes both. | | Theoretical correctness loses to pragmatism a lot and I'd | read the KQL every day. Look at the examples at | https://docs.microsoft.com/en-us/azure/data- | explorer/kusto/q... - look at the examples at | https://docs.microsoft.com/en-us/azure/data- | explorer/write-q... and tell me they're not more readable | than comparable SQL? | | (I can see the result type both by hovering on the query | but also by just looking at the end of it - and in SQL most | of the SELECTed items in complex queries are from | subqueries anyway - at least in my use case) | keithnz wrote: | now if MS made a KQL -> TSQL or support it natively in | SQL Server, that would be great :) | adamrezich wrote: | yes please!! | jiggawatts wrote: | I've been frustrated by toilets where I have to contort my | body to reach the dispenser. Similarly, I've had dispensers | intrude on the space where my legs would normally be and | make it awkward to even just sit on the toilet. | | Toilets are absolutely designed to make the dispenser | placement convenient. You just don't think about it because | 95% of toilets get it right, so it just doesn't bother you | that much that it _can_ be wrong. | | In SQL, some decisions are right about 10% of the time and | are annoying and awkward the other 90%. | | That's why the order matters. Because everything else got | it right. | oldsecondhand wrote: | There's a person behind the IDE. If you help the IDE, you | help the person using the IDE. | emteycz wrote: | I'd agree if there was any way whatsoever of fixing this | issue, but there simply isn't. The editor can't even begin | to guess what you might want until you write your FROM. | taeric wrote: | Maybe in gigantic systems with more tables than makes | sense. Realistically, all of the columns available in a | database can be fit in memory with ease. | | Then, the ide could basically fill my from out for me, | based on what I'm asking for. Can even suggest what join | I will need, if I list columns from multiple tables. | [deleted] | tester756 wrote: | >Maybe in gigantic systems with more tables than makes | sense. Realistically, all of the columns available in a | database can be fit in memory with ease. | | Every table has more than one column | | So there's always more columns to remember than tables | and generally tables are pretty easy like user invoices | blabla | | I worked with systems that had like 500 tables and some | of them with 20-50 columns | | you really want good intellisense in such a environment | taeric wrote: | 500 times 50 is still not a big number. And you could do | decent statistical suggestions on the current columns in. | | Good intelligent suggestions is, of course, helpful. And | I agree that suggesting one of 500 is easier than the | other. That said, neither is hard for a computer. And | even asking friends what table I want will often be done | with starting with the actual columns I want. | tester756 wrote: | On the other hand I don't see why it couldnt work both | ways | | if you start query with | | SELECT MiddleName then you could receive auto complete | thats adds "FROM Users" and moves your cusor after | MiddleName. | | if you start query with | | FROM Users SELECT _ and know intellisense drops list of | columns | taeric wrote: | Agreed. I'm really just arguing that it doesn't have to | be from first. | lemmsjid wrote: | That's interesting because it also explains why I was going | to say I do like having from first. When trying to reason | about a query, I mentally go through the following: | | 1. What tables are being pulled from? This speaks to the | potential domain of the query. 2. What data is being selected | (I can now know what is or isn't being pulled from the | aforementioned tables...) 3. What operations, aggregations, | groupings, etc. are being performed to work on the pulle data | | Of course from vs select ordering is completely arguable, but | my thinking process seems to follow that of the auto complete | --in other words that my cognitive load of looking at the | select statement is lessened when I know from what the | columns are being selected. | | It also follows (at least to me) the mental process of | writing the query. First look at the tables, then decide what | columns, then decide what functions to apply. | taeric wrote: | I said it in a sibling, but I feel this is somewhat missed. | Auto complete that simply lists the tables is easier if from | is first. But... Auto complete that helps me know what tables | can give me my requested columns works the other direction. | KerryJones wrote: | This feels like a English-language thing. In english we tend to | put our adjectives first, it feels natural, "Where is my red, | round ball?", rather than some other languages (like German) | where you put the subject first. Equivalent of "Where is my | ball, red & round?" | | While it inherently feels unnatural I do agree with the others | here that the context is actually easier to understand once | over the initial uncomfort. | mcsoft wrote: | Both CTEs and this idea address the same problem: poor | readability of complex SQL queries. Compared to CTEs, the | author takes the idea to split the complex query into parts to | the next level. | | To your point - a solid IDE will show you what's being | processed at each line (or returned, if the cursor is on the | last line) - in an autocomplete window or a side panel. | andreygrehov wrote: | I like it. Was it inspired by CloudWatch Logs Query Language? [0] | Looks somewhat similar. | | [0] - | https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL... | goodlinks wrote: | maybe i am jaded (too many hours reading and writing sql as it | is) but the structure of SQL is very closely linked to how i | write a query: | | what data do i want to see: most important | | where doest is come from: next most important thing | | What filters and restrictions am i going to put on it: least | important | btown wrote: | For any who want this kind of pipelining way-of-writing-SQL that | has the benefit of existing in live production databases today, I | highly, highly recommend looking into Postgres' and Snowflake's | LATERAL JOIN keyword. | | https://heap.io/blog/postgresqls-powerful-new-join-type-late... | | https://stackoverflow.com/questions/28550679/what-is-the-dif... | | https://docs.snowflake.com/en/sql-reference/constructs/join-... | | The TL;DR is that they allow you to reuse annotated and | aggregated columns in an incredibly elegant way. Compared to OP's | proposal, you still do need to start the query with what columns | you want to come out at the end, and normal SQL weirdnesses still | apply - but it's far, far, easier when writing massive analytics | queries to see the flow of variables from one stage to another. | snidane wrote: | You don't need additional SQL repetitive cruft. | from population select country, rollup(city), count(*) | sort | | Can represent this repetitive SQL query: select | country, city, count(*) from population group | by country, rollup(city) order by country, city | | Information in group by is often redundant. You can tell which | columns are measures vs dimensions by examining the 'aggregate' | function - rollup or no function vs sum, count, avg. Order by can | have a default to sort by all columns instead of naming them one | by one. | rchrch wrote: | Awesome! Would love to see an implementation. I worked on | something similar over the Summer. It's just relational algebra | with pipes for composition. If you are interested, we could get | an antlr grammar going and plug it into this basic execution | engine to get a feel for the language. | | - https://github.com/RCHowell/Sift - | https://github.com/RCHowell/Sift/blob/main/src/main/kotlin/c... | maximilianroos wrote: | Yes this looks really cool, and similar! Feel free to hit me up | on Twitter https://twitter.com/max_sixty | cies wrote: | I love quality language proposals like this. I'm not so much in | data processing/bigdata, but have had to interact with SQL a lot. | | This syntax is lovely! It's more intuitively readable (and SQL is | not that bad in that regard). | | My feedback: | | 1. Lower case, underscored everything makes the terms a bit hard | to differentiate. Maybe set some classes of symbols in CamelCase, | or add a !@#$%& prefixes to them to make it more readable. | | 2. I dont like to use another language (SQL or PRSQL for that | matter) to db interaction, I like to write the queries in the | language that I'm using to develop in. There are ORMs in this | design space, but I'm a little fed up with them. In Java there's | jOOQ. Other less-OO-more-functional ORMs exist in Rust and | Haskell land. These often have a code generation step, a library | is generated that guarantees some type safety for a give schema | version. Some are more SQL-like, some provide a different API. | PRQL is much more diverted from SQL than these, and for good | reasons. Maybe several languages could easily have libs like this | building on top of JPQL? | | 3. You solution is a bit like GraphQL in some regards; where | there is a tool needed to convert the query to SQL. Tools like | this exist, like Hasura and the likes. Hasura does a lot more. To | me GraphQL has the huge advantage of serving a schema so that | clients can be generated. I can interact with GraphQL in a type | safe fashion from by generating a client in, say, Elm. The | generated client lib does not allow my to write syntax errors in | my queries and ensures all type conversions are sound. Maybe PRQL | can also be a language like GraphQL in that regard, and provide a | schema too. | | 4. JPQL. It's close to SQL. It improves to SQL, but I never found | it enough of an improvement to justify the cost. I think your | proposal is better. But still I think JPQL deserves a mention as | maybe one of the most widespread compile-to-SQL languages. | bachmeier wrote: | My thought is that joins are the tough part of the SQL learning | curve, but I don't see much in here that reduces the complexity | of joins. | omarhaneef wrote: | There was this professor of language who would say "Do you | think the question ('are carpets furniture?') tells you | something about the ambiguity of the word carpet, or do you | think it tells you something about the ambiguity in the world?" | | Similarly, I think joins are "tough" not because of the way SQL | expresses them but because the logical possibilities of merging | data from multiple tables are varied. | bob1029 wrote: | There is no such thing as a domain-agnostic SQL database that | holds up under this kind of semantic scrutiny. I don't think | that there ever _could_ be. | | If you are rolling a SQL schema for a home improvement | contractor, it is extraordinarily unlikely that their | specific business would expect any scenarios in which carpets | are sometimes known as furniture. | | Having a bounded context to operate within is what makes SQL | magical for me. When people don't understand the business or | simply the game around how you _talk_ about the business, | things start getting messy wrt joins. | omarhaneef wrote: | The carpet discussion was simply to say that you can't take | out all the complexity of a language if the domain it is | meant to describe is complex. The language has a limit to | how simple it can be. | | I was not proposing a SQL database of carpets, or | furniture, as a thought experiment. | bachmeier wrote: | Then make some cases easier and fall back to the SQL we | already have for the rest? | Supermancho wrote: | SQL has effectively failed, as a standard, despite it's | ubiquity. It's literally being aged out, which makes for | opportunities for PRQL, etc to fill pragmatic gaps. | | eg the lack of default column aliasing from joins | SELECT A.id AS A__id, A.name AS | A__name, B.id AS B__id, B.name AS | B__name FROM A LEFT JOIN B | ON A.other_id = B.other_id | | When you could have: SELECT | A.*, B.* FORMAT (TABLE__) | FROM A LEFT JOIN B ON A.other_id = | B.other_id | cogman10 wrote: | IMO, this appears not to be something that solves the SQL | learning curve but rather the usability of a query language | with tooling. | | I don't think there is much that could be done to address left, | right, inner, outer join semantics. It's just something you | have to learn if you want to do a lot of SQL (though, you are | likely only ever going to use left and inner joins). | munk-a wrote: | I like the flow direction compared to standard SQL. SQL is | supposed to read like a sentence I suppose but I have many times | looked at it and really wanted things to be in a more logical | order. | | My main suggestion would be to be a bit less terse and introduce | a bit more firm formatting. I'm not a huge fan of the term | "split" and feel like jazzing that up to "split over" or even | just reviving "group by" would improve readability. Additionally | the aliasing could use work, I'd suggest reversing the assignment | to be something closer to `use salary + payroll_tax as gross | salary`. In terms of firm formatting, unless I'm missing | something there isn't any reason to allow a filter statement | before any aliases - so you can force two fixed positions for | filter clauses which would make it always legal to reference | aliases in filters. | | On the brief topic of form vs. flexibility. SQL is a thing that, | when complex, is written by many people over the course of its | lifetime - removing the ability to make bad decisions is better | than enabling the ability to write simple things even simpler - | those silly do nothing queries like "SELECT * FROM customer WHERE | deleted='f'` are written once[1] in a moments time and never | inspected again. The complex queries are what you want to | optimize for. | | 1. If they even are - with ORMs available a lot of those dead | simple queries just end up being done through an ORM. | hn_throwaway_99 wrote: | > On the brief topic of form vs. flexibility. SQL is a thing | that, when complex, is written by many people over the course | of its lifetime - removing the ability to make bad decisions is | better than enabling the ability to write simple things even | simpler | | Hallelujah! But, to your footnote, this is a major reason why I | despise ORMs. In my mind they make writing simple code slightly | easier, but they make complicated SQL statements, especially | when you get some weird issue under load and you're trying to | debug why your DB is falling over, a _ton_ more difficult and | you spend so much time just battling your ORM. | jnsie wrote: | I like the flow direction specifically for | intellisense/autocomplete. I'm sure it would be easier to | provide hints when the table name is known immediately. | m1sta_ wrote: | I'd love for the next release of SQL to have optional | alternative ordering of clauses | maximilianroos wrote: | This is great feedback, and I agree with you re de-prioritizing | terseness. | | _And_ I agree with you on both the assignments and `split` | being a bit awkward. Kusto just uses `by`, WDYT? | tomtheelder wrote: | Not the original commenter, but just using `by` makes total | sense to me. | maximilianroos wrote: | I've made this change [1]. Thank you! | | [1] https://github.com/max- | sixty/prql/commit/dde7fcfc13daaadbdce... | Serow225 wrote: | <3 | munk-a wrote: | By actually sounds great to me to, yea. In this case it's | short but it's extremely communicative! | loic-sharma wrote: | Yes Kusto's `by` is excellent! | dragonwriter wrote: | > It's not declarative. It's functional. | | Functional, logical, and relational paradigms are the most | commonly cited examples given of declarative programming. ___________________________________________________________________ (page generated 2022-01-24 23:02 UTC)