[HN Gopher] PRQL - A proposal for a better SQL
       ___________________________________________________________________
        
       PRQL - A proposal for a better SQL
        
       Author : maximilianroos
       Score  : 324 points
       Date   : 2022-01-24 17:24 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | eatonphil wrote:
       | I'm really excited about languages that build on or are compiled
       | to SQL, in the long-term (because I think it will take a very
       | long time to build adoption).
       | 
       | The ones that particularly excite me are shorthands for SQL, even
       | though their heavy use of symbols may be a detriment. One
       | particular use case is in easily defining static authorization
       | policy-queries that are backed by database data plus and have
       | request variables injected during evaluation.
       | 
       | I am not very excited by datalog/prolog-based languages because I
       | think logic languages are too unnatural to ever go mainstream.
       | But I'd be excited to be wrong or for logic languages to become
       | more friendly.
       | 
       | Here are some others I'm watching.                 *
       | https://github.com/mrumkovskis/tresql       *
       | https://www.htsql.org/doc/overview.html       *
       | https://github.com/cytosm/cytosm
        
         | larodi wrote:
         | in a way SQL is Prolog and all reasoning for improvement of SQL
         | should start from Prolog, because is where SQL started from.
         | the expressive power of both languages is theoretically the
         | same, even though SQL is much more comprehensible. but then
         | again - certain complex task turn SQL in difficult-to-
         | comprehend series of nested declarative operations on algebraic
         | sets.
        
       | skissane wrote:
       | How does this compare to QUEL?
       | 
       | Or Tutorial D?
        
       | ggrothendieck wrote:
       | The link refers to dplyr not being able to use databases but
       | actually there is a database backend for it in package dbplyr.
       | See https://dbplyr.tidyverse.org/
        
       | divan wrote:
       | Obligatory reading https://www.scattered-
       | thoughts.net/writing/against-sql/
        
       | lxe wrote:
       | Nice. Why OCaml though? I think using a more conventional
       | language to construct queries could yield more adoption. It also
       | seems that ORMs kinda of exist to tackle a similar issue, at
       | least in part.
        
         | HellsMaddy wrote:
         | I don't see anything dealing with OCaml here, other than it
         | being listed as an inspiration.
        
       | tpoacher wrote:
       | meanwhile I'm still stuck trying to make prolog work xD
        
       | blintz wrote:
       | This is a nice idea, especially given all the work people have
       | done recently to make in-language querying nicer (Spark comes to
       | mind).
       | 
       | My only gripe is the 'auto-generated' column names for
       | aggregates. This seems like a recipe for disaster - what if there
       | is already (as there almost certainly will be) named
       | "sum_gross_cost"? The behavior also just seems rather unexpected
       | and implicit. My suggestion would be simple syntax that lets you
       | optionally give a name to a particular aggregate column:
       | ...         filter gross_cost > 0         aggregate by:[title,
       | country] [             average salary,             sum
       | gross_salary,             average gross_cost,             let
       | sum_gc = sum gross_cost,             count,         ]
       | sort sum_gc
       | 
       | While it might seem a little uglier, it seems much more
       | sustainable in the long run. If this is really too gross, I'd
       | advocate some token other than underscore that is reserved for
       | aggregation variables; perhaps `sum@gross_cost` or
       | `sum#gross_cost`.
        
         | maximilianroos wrote:
         | Definitely -- giving the option of naming them is great.
         | 
         | I'm not sure whether we should force naming? When I'm writing a
         | query often I'm fine with something auto-generated when
         | starting out.
        
       | JimmyRuska wrote:
       | Wow, between this and Malloy , there's a lot of great ideas.
       | 
       | It'd be great if ideas from sparql/datalog/datomic were also
       | picked up. It's easier to write recursive queries.
        
       | jacktheturtle wrote:
       | Do people still write SQL?
        
         | didip wrote:
         | Of course, and it is vastly better than using an ORM.
        
         | [deleted]
        
         | mrweasel wrote:
         | You kinda have to. Assuming that you're using an ORM, you still
         | need to understand how it translate to SQL, and help it do the
         | translation correctly.
         | 
         | Personally I've seen developer use the Django ORM, and create
         | application with terrible performance. Tweaking the queries,
         | you can help guide the ORM to generate better SQL, which in
         | turn will affect your performance greatly.
         | 
         | We're currently facing a problem with a custom who have an
         | application with terrible performance/scaling issues. The
         | entire thing is very database heavy, but interaction is done
         | solely via Hibernate. I have nothing against Hibernate, it's a
         | fine ORM, but you need to understand it well enough that you
         | can guide it towards better queries (Which sometimes involve
         | actually writing SQL). At some point you need to decide if your
         | time isn't better spend learning SQL directly, as that via
         | always provide you with better access to the functionality
         | provided by the database.
        
         | onlyrealcuzzo wrote:
         | It's regularly listed in the top 5 most commonly used
         | programming languages in the world [=
        
         | MrBuddyCasino wrote:
         | Do people still use ORMs?
        
         | edgyquant wrote:
         | Of course. Large companies like Pepsi have teams of analysts
         | that only write SQL. I applied for a programming job there a
         | long time ago and didn't follow up when they explained in the
         | interview that's the only language they used.
        
         | badhombres wrote:
         | Absolutely. If I use a SQL db for my applications (I'm a
         | software dev for context), I generally write raw SQL vs using
         | an ORM. I find the long term issues of an ORM to not be worth
         | investing and understanding SQL.
         | 
         | I'm also not having to learn a new library, in addition to the
         | standard DB connection libraries, ~if~ when I switch a language
         | or platform for some project.
        
         | drpotato wrote:
         | Yes, quite a lot I would imagine. I use it extensively at work
         | and similarly sql heavy software companies in the past. That
         | being said, I've also worked at places where they've avoided it
         | like the plague - largely because few people were competent at
         | it - and were moving away from relational DBs due to scale.
        
         | roveo wrote:
         | In the world of data analytics/BI are many people whose main
         | job is writing SQL.
        
         | extrapickles wrote:
         | I prefer to write SQL as most alternatives ether have runtime
         | surprises or require more roundtrips to the database. I mostly
         | work on line-of-business software, so if I was doing simple
         | CRUD apps I might have a different opinion.
        
       | slaymaker1907 wrote:
       | This seems very similar to Kusto/KQL.
        
       | cryptonector wrote:
       | The most important dialect of SQL we should get is SQL-with-no-
       | literal-values so we can force the use of query parameters, and
       | then not have SQLi.
        
       | BeefWellington wrote:
       | This is another in a series of these kinds of proposals that look
       | excellent on first glance for perhaps the 75% case but start
       | getting syntactically messy when I want to customize the
       | resultset returned.
       | 
       | On the surface, they're always neat but when you start to dig
       | into how you'd implement something in an RDBMS, it begins to fall
       | apart.
       | 
       | Let's look at the example syntax:                   from
       | employees         filter country = "USA"
       | # Each line transforms the previous result.         gross_salary
       | = salary + payroll_tax            # This _adds_ a column /
       | variable.         gross_cost   = gross_salary + healthcare_cost
       | # Variable can use other variables.         filter gross_cost > 0
       | aggregate split:[title, country] [             # Split are the
       | columns to group by.             average salary,
       | # These are the calcs to run on the groups.             sum
       | salary,             average gross_salary,             sum
       | gross_salary,             average gross_cost,             sum
       | gross_cost,             count,         ]         sort
       | sum_gross_cost                            # Uses the auto-
       | generated column name.         filter count > 200         take 20
       | 
       | Where in here is it clearly stated which fields are returned? In
       | the original SQL it's right up front but here it's buried into
       | the "aggregate" function, and I'm not clear that this isn't an
       | oversight.
       | 
       | Another example that speaks to the "how do I implement this" side
       | of the equation:                   from employees         filter
       | country = "USA"                         # Each line transforms
       | the previous result.         gross_salary = salary + payroll_tax
       | # This _adds_ a column / variable.         gross_cost   =
       | gross_salary + healthcare_cost  # Variable can use other
       | variables.         filter gross_cost > 0
       | 
       | Does this mean that the database must scan all records of the
       | employee table in order to return the result before moving to the
       | next step in the query? Must I index all fields? If not, how does
       | a query planner prepare for this scenario?
       | 
       | The major tradeoff you make in most ORMs is exactly this: You
       | lose out on being able to be explicit about how many queries are
       | sent to the DB (and in many cases how efficient those queries
       | are). Now this would become a language feature? What do I gain
       | for that loss?
       | 
       | I'm not saying that SQL Syntax is perfect; far from it. I'm not
       | seeing how this is an improvement.
       | 
       | I think if you want traction though, a proof of concept using an
       | existing RDBMS would go a long way into providing evidence that
       | this will work and is sufficiently thought out to deal with even
       | the basics of what existing SQL databases have to. Query planning
       | is hard, especially if you want it to be fast.
        
         | drittich wrote:
         | I don't think it's falling apart at all. Personally, I would
         | require that what columns get returned be explicit (optionally
         | with a * type syntax that you have to enable - the defaults
         | should be safe and * has its risks). For one thing, you don't
         | necessarily want to return all the columns you have aggregated.
         | E.g., you may be running the equivalent of a HAVING clause on
         | an aggregate column, so don't need the value returned.
         | 
         | "Each line transforms the previous result." - I assume this is
         | referring to the order that transpilation happens, so you can
         | read it top to bottom and understand the flow easily.
         | 
         | One thing I would like to see is how a recursive CTE might
         | look.
        
         | tomtheelder wrote:
         | > Where in here is it clearly stated which fields are returned?
         | In the original SQL it's right up front but here it's buried
         | into the "aggregate" function, and I'm not clear that this
         | isn't an oversight.
         | 
         | It's in the aggregate portion, like you said. Other example
         | queries have a select portion. Why does it matter that it's not
         | in the leading position like SQL?
         | 
         | > Does this mean that the database must scan all records of the
         | employee table in order to return the result before moving to
         | the next step in the query? Must I index all fields? If not,
         | how does a query planner prepare for this scenario?
         | 
         | No, they are just describing how the statement is supposed to
         | be interpreted by a human. I think you can basically just
         | shuffle all the filter statements to the end and keep it
         | logically equivalent.
         | 
         | This is a proposal for a "transpiles to SQL" language. So long
         | as that transpliation is predictable, you cannot run into the
         | sort of issues you are describing.
        
           | BeefWellington wrote:
           | > It's in the aggregate portion, like you said. Other example
           | queries have a select portion. Why does it matter that it's
           | not in the leading position like SQL?
           | 
           | I don't mind it not being in the leading position. The author
           | provided a _very simple_ query and in that case it 's not
           | immediately apparent what fields to expect the resultset to
           | contain when returned to the consumer.
           | 
           | This is a troubleshooting issue more than anything else. IMO
           | placing the "selected fields" into the very centre of the
           | query is distracting and obfuscates what is happening.
           | 
           | > This is a proposal for a "transpiles to SQL" language. So
           | long as that transpliation is predictable, you cannot run
           | into the sort of issues you are describing.
           | 
           | I think a good test of whether any transpiled language works
           | well is to look at whether it could work on its own as a
           | language. See: Typescript.
        
       | tfehring wrote:
       | Very cool! A couple questions/suggestions off the top of my head:
       | 
       | 1. Did you consider using a keyword like `let` for column
       | declarations, e.g. `let gross_salary = salary + payroll_tax`
       | instead of just `gross_salary = salary + payroll_tax`? It's nice
       | to be able to scan for keywords along the left side of the
       | window, even if it's a bit more verbose.
       | 
       | 2. How does it handle the pattern where you create two moderately
       | complex CTEs or subqueries (maybe aggregated to different levels
       | of granularity) and then join them to each other? I always found
       | that pattern awkward to deal with in dplyr - you have to either
       | assign one of the "subquery" results to a separate dataframe or
       | parenthesize that logic in the middle of a bigger pipeline. Maybe
       | table-returning functions would be a clean way to handle this?
        
         | maximilianroos wrote:
         | > 2. How does it handle the pattern where you create two
         | moderately complex CTEs or subqueries (maybe aggregated to
         | different levels of granularity) and then join them to each
         | other? I always found that pattern awkward to deal with in
         | dplyr - you have to either assign one of the "subquery" results
         | to a separate dataframe or parenthesize that logic in the
         | middle of a bigger pipeline. Maybe table-returning functions
         | would be a clean way to handle this?
         | 
         | I don't have an example on the Readme, but I was thinking of
         | something like (toy example):                 table
         | newest_employees = (         from employees         sort tenure
         | take 50       )              from newest_employees       join
         | salary [id]       select [name, salary]
         | 
         | Or were you thinking something more sophisticated? I'm keen to
         | get difficult examples!
         | 
         | Edit: formatting
        
           | twic wrote:
           | My gut reaction is that if we have "from first" then maybe we
           | should have to "to last":                 from employees
           | sort tenure       take 50       as newest_employees
           | from newest_employees       join salary [id]       select
           | [name, salary]
        
           | mcdonje wrote:
           | When you add in the ability to reference different tables
           | like that to the piping syntax, it starts to remind me of the
           | M query language: https://docs.microsoft.com/en-
           | us/powerquery-m/quick-tour-of-...
           | 
           | There, each variable can be referenced by downstream steps.
           | Generally, the prior step is referenced. Without table
           | variables, your language implicitly pipes the most recent
           | one. With table references, you can explicitly pipe any prior
           | one. That way, you can reference multiple prior steps for a
           | join step.
           | 
           | I haven't thought through that fully, so there may be gotchas
           | in compiling such an approach down to SQL, but you can
           | already do something similar in SQL CTEs anyway, so it should
           | probably work.
        
         | maximilianroos wrote:
         | Thanks!
         | 
         | > Did you consider using a keyword like `let` for column
         | declarations
         | 
         | Yeah, the current design for that is not nice. Good point re
         | the keyword scanning. I actually listed `let` as an option in
         | the notes section. Kusto uses `extend`; dplyr uses `mutate`;
         | pandas uses `assign`.
         | 
         | I opened an issue here: https://github.com/max-
         | sixty/prql/issues/2
        
           | maximilianroos wrote:
           | I've added the `let` keyword given a few people commented on
           | this.
        
       | maximilianroos wrote:
       | I wrote this over the holidays, because I find SQL wonderfully
       | elegant in its function, but really frustrating in its form.
       | 
       | Let me know any feedback -- as you can see it's still at the
       | proposal stage. If it gains some traction I'll write an
       | implementation.
        
         | roberto wrote:
         | This looks great! Clear docs and rationale, and the syntax is
         | well thought. I'm definitely following this.
        
         | xi wrote:
         | Maybe you'd like to check FunSQL.jl, my library for
         | compositional construction of SQL queries. It also follows
         | algebraic approach and covers many analytical features of SQL
         | including aggregates/window functions, recursive queries and
         | correlated subqueries/lateral joins. One thing where it differs
         | from dlpyr and similar packages is how it separates aggregation
         | from grouping (by modeling GROUP BY with a _universal_
         | aggregate function).
        
           | maximilianroos wrote:
           | This is awesome! I'll add a link to it on PRQL.
           | 
           | I guess the biggest difference between FunSQL (and similarly
           | dbplyr) and PRQL is that the former needs a Julia (or R)
           | runtime to run.
           | 
           | I really respect the library and keen to see how it develops.
        
             | clarkevans wrote:
             | Writing an alternative syntax is straight forward. Perhaps
             | prototype PRQL using xi's excellent FunSQL backend? This
             | way it's working out of the gate. Once syntax+semantics are
             | pinned, writing another backend in the language of your
             | choice would then be easier. Getting the backend correct is
             | non-trivial work, and xi has done this already. Besides, we
             | need a sandbox syntax anyway, so it might be fun to
             | collaborate.
        
       | glogla wrote:
       | This is cool! I love it.
       | 
       | Few notes:
       | 
       | 1) SQL also allows you to define windowing reusably, like this
       | select             sum(blah) over window_abc,
       | avg(blah) over window_abc         from table_xyz         window
       | window_abc as (partition by x order by y)
       | 
       | so that second example could be written somewhat less
       | repetitively but it wouldn't change the whole point.
       | 
       | 2) Sadly my main pain point with SQL for ETL is not possible to
       | solve with a transpiler - SQL has exactly one target so doing
       | things like "I want these records to go to a table A and those
       | records go to table B" is not possible with one query.
       | 
       | 3) It would be cool to see how this does typically annoying and
       | repetitive cases from analytics / data warehousing world. I'm
       | thinking like SCD1/2 implementation. But I don't even know if
       | mutation is there yet.
       | 
       | 4) I would recommend investing in one canonical formatter, like
       | Go has. So that there isn't infinite number of ways the same
       | query could be formatted for people to argue over preference.
       | 
       | EDIT:
       | 
       | 5) Since this seems to be focused on analytics (by the choice of
       | queries and Snowflake in examples), I want to highlight that
       | someone suggested to use TPC-H (or TPC-DS) queries as a
       | benchmark. It does sound like a good idea.
        
       | adamrezich wrote:
       | I like this a lot, and I eagerly anticipate an early version of a
       | transpiler to play with!
        
       | hyperpallium2 wrote:
       | \tangent What are today's data transformation needs (differ from
       | Codd's inspiration?), how does relational algebra serve them, and
       | design an "SQL" around that. There's got to be a 10x leapfrog in
       | benefit for _some_ niche in there, and that 's the gateway to
       | adoption.
        
         | tristanz wrote:
         | To get 10x I think you need to wed it to solving broader
         | workflow challenges, like dbt does today.
        
       | nassimsoftware wrote:
       | It would be definitely interesting to have a TypeScript of some
       | sort but for SQL. So a more practical and prettier syntaxe like
       | what I'm seeing here that compiles to SQL queries.
        
         | eatonphil wrote:
         | TypeScript is more verbose than JavaScript. While I love to use
         | TypeScript I don't think I'd categorize it as prettier than
         | JavaScript. And practical... well if you mean it is more
         | maintainable then yes but if you mean faster to write then no.
         | 
         | I don't want a more verbose SQL I want a less verbose SQL!
        
           | nassimsoftware wrote:
           | I meant more the aspect that with TypeScript people would
           | prefer to write in it and then compile to JavaScript because
           | there is a benefit. With a PRQL with an SQL compilation
           | target we would reap the benefit of a more practical and
           | better syntax. In both cases they bring benefits but not in
           | the same way.
        
         | dirslashls wrote:
         | Go to https://sqlframes.com/demo and in the code editor enter
         | the following and execute (this example is taken from the first
         | example on PRQL github page). It generates SQL, but it also
         | computes and displays the results within the browser (though
         | the data set below gives no results).
         | 
         | const employees = SQL.values([{ title: 'Developer', country:
         | 'USA', salary: 120, payroll_tax: 20, healthcare_cost: 6 }]);
         | employees.schemaName = 'employees'; const { groupBy, where: {
         | gt, eq, and }, agg: { count, sum, avg } } = SQL; return employe
         | es.pdf(SQL.script('[salary]+[payroll_tax]').as('gross_salary'),
         | SQL.script('[gross_salary]+[healthcare_cost]').as('gross_cost')
         | ) .fdf(and(gt('gross_cost',0),eq('country','USA')))
         | .gdf(groupBy('title','country')
         | ,avg('salary').as('average_salary')
         | ,sum('salary').as('sum_salary')
         | ,avg('gross_salary').as('average_gross_salary')
         | ,sum('gross_salary').as('sum_gross_salary')
         | ,avg('gross_cost').as('average_gross_cost')
         | ,sum('gross_cost').as('sum_gross_cost') ,count().as('count'))
         | .having(gt('count',200)) .orderBy('sum_gross_cost');
        
       | gibsonf1 wrote:
       | SPARQL. Representing human information in relational tables goes
       | against how people actually think and use information. We humans
       | think in tremendous numbers of nested hierarchies, and recursive
       | hierarchy traversal is a nightmare in relational databases. A
       | graph is the structure for data that works best, is most
       | efficient, and actually reflects how things are connected in our
       | brains.
        
         | mindcrime wrote:
         | I'm a big fan of SPARQL, but the one thing that would concern
         | me about trying to use it outside of the SemWeb context is
         | simply that it assumes data is stored in <S,P,O> triples.
         | Legacy databases by and large are not, so you need an adapter
         | to bridge the representations. And while I know some exist, I
         | haven't really used them and am not sure about the performance
         | impact.
        
           | mst wrote:
           | You can get quite far mapping the triple concept to (PK,
           | column, value) or (PK, FK, related-row) and transpiling from
           | there.
           | 
           | (I played around with this some years back, not to the point
           | where anything came out of it worthy of publishing, but
           | enough to be pleasantly surprised how far 'quite far' turned
           | out to be in practice)
        
       | fuzzieozzie wrote:
       | The time of SQL alternatives
       | https://news.ycombinator.com/item?id=30053860
        
         | maximilianroos wrote:
         | Yup, I posted that yesterday too. I think Malloy is really
         | interesting -- compile to SQL but give more integrations to the
         | DB, like schema-during-development. It has a proper team, led
         | by Lloyd Tabb.
        
       | 999900000999 wrote:
       | >Compatible -- PRQL transpiles to SQL, so it can be used with any
       | database that uses SQL. Where possible PRQL can unify syntax
       | across databases. PRQL should allow for a gradual onramp -- it
       | should be practical to mix SQL into a PRQL query where PRQL
       | doesn't yet have an implementation.
       | 
       | Awesome.
       | 
       | I hate SQL so much, I know for personal projects this is gold. I
       | imagine actually using it at work might draw some questions
       | though
        
       | hyperpallium2 wrote:
       | nice name
        
       | peoplefromibiza wrote:
       | This is the kind of things that a well crafted DSL can solve, if
       | the language you use supports macros.
       | 
       | PRQL looks very similar to Ecto, the Elixir Query DSL
       | 
       | https://hexdocs.pm/ecto/Ecto.Query.html
        
       | magicalhippo wrote:
       | Now this is actually nice, unlike the other suggestion posted
       | today[1].
       | 
       | Maybe I'm just too used to non-standard extensions of our
       | database but the SQL example could, at least for our db, be
       | rewritten as                   SELECT TOP 20             title,
       | country,             AVG(salary) AS average_salary,
       | SUM(salary) AS sum_salary,             AVG(gross_salary) AS
       | average_gross_salary,             SUM(gross_salary) AS
       | sum_gross_salary,             AVG(gross_cost) AS
       | average_gross_cost,             SUM(gross_cost) AS
       | sum_gross_cost,             COUNT(*) as count         FROM (
       | SELECT                 title,                 country,
       | salary,                 (salary + payroll_tax) AS gross_salary,
       | (salary + payroll_tax + healthcare_cost) AS gross_cost
       | FROM employees             WHERE country = 'USA'         ) emp
       | WHERE gross_cost > 0         GROUP BY title, country
       | ORDER BY sum_gross_cost         HAVING count > 200
       | 
       | This cuts down the repetition a lot, and can also help the
       | optimizer in certain cases. Could do another nesting to get rid
       | of the HAVING if needed.
       | 
       | Still, think the PRQL looks very nice, especially with a "let"
       | keyword as mentioned in another thread here.
       | 
       | [1]: https://news.ycombinator.com/item?id=30053860
        
         | ako wrote:
         | With a CTE it would read a bit more like prql:
         | with usa_employees as (         SELECT                 title,
         | country,                 salary,                 (salary +
         | payroll_tax)                   AS gross_salary,
         | (salary + payroll_tax + healthcare_cost) AS gross_cost
         | FROM  employees         WHERE country = 'USA'         AND
         | (salary + payroll_tax + healthcare_cost) > 0       )
         | select  title,             country,             AVG(salary)
         | AS average_salary,             SUM(salary)         AS
         | sum_salary,             AVG(gross_salary)   AS
         | average_gross_salary,             SUM(gross_salary)   AS
         | sum_gross_salary,             AVG(gross_cost)     AS
         | average_gross_cost,             SUM(gross_cost)     AS
         | sum_gross_cost,             COUNT(*) as emp_count       from
         | usa_employees       group by  title, country       having
         | count(*) > 200       order by  sum_gross_cost       limit 3
         | 
         | Readability is pretty similar to prql. It would really help in
         | SQL if you could refer to column aliases so you don't have to
         | repeat the expression.
        
           | magicalhippo wrote:
           | > With a CTE
           | 
           | The DB we use supports those, I just learned about them too
           | late so keep forgetting they exist :(
           | 
           | > It would really help in SQL if you could refer to column
           | aliases so you don't have to repeat the expression.
           | 
           | The DB we use supports that, so in your CTE you could write
           | AND   gross_cost > 0
           | 
           | We do that all the time, which will be a pain now that we're
           | migrating to a different DB server which doesn't.
        
           | jsyolo wrote:
           | what expressions are being repeated here?
        
             | oblio wrote:
             | > (salary + payroll_tax) AS gross_salary,
             | 
             | > (salary + payroll_tax + healthcare_cost) AS gross_cost
             | 
             | > AND (salary + payroll_tax + healthcare_cost) > 0
             | 
             | And his is a simple example.
        
           | mmsimanga wrote:
           | Sybase IQ allows you to use the column alias anywhere else in
           | the query.
        
           | gmfawcett wrote:
           | Not all database systems can optimize queries well over CTE
           | boundaries. I believe this is still true for PostgreSQL (no
           | longer true, see below -- it was true a few years ago). So
           | there's a potential performance hit for (the otherwise
           | excellent advice of) writing with CTE's.
        
             | Rovanion wrote:
             | IRC tells me this has been fixed now.
        
               | gmfawcett wrote:
               | Awesome news! thank you for sharing this. I found this
               | post which confirms IRC and suggests it was an
               | improvement in PG 12:
               | 
               | https://paquier.xyz/postgresql-2/postgres-12-with-
               | materializ...
               | 
               | Today is a great day to have been wrong on the Internet.
               | :)
        
           | correct-me-plz wrote:
           | Snowflake lets you refer to column aliases, and it's great!
           | 
           | There's the slight issue of shadowing of table column names,
           | which they resolve by preferring columns to aliases if both
           | are named the same. So sometimes my aliases end up prefixed
           | with underscores, but that's not a big deal.
        
           | dvasdekis wrote:
           | Column aliases would have saved me hundreds of hours over the
           | course of my career. Sorely missing from standard SQL, and
           | would make the need for PRQL less acute.
        
       | iblaine wrote:
       | The syntax seems similar to Apache Pig. Both are declarative and
       | primarily built to be a procedural form of SQL.
        
       | pmontra wrote:
       | First, kudos because it takes courage to take on SQL in this way.
       | 
       | Second, this kind of reversed SQL (filter-first, select-last) is
       | much easier to reason about than the original and keep in mind
       | that I prefer to code complex queries in SQL than to build or
       | translate them in the ORM of the project I'm working on.
       | 
       | Maybe a transpiler is an inevitable first step but I think that
       | any SQL replacement should be itself the target of ORMs and run
       | directly in the database CLI tools (psql / mysql ...) or IDEs
       | (pgAdmin, MySQLAdmin, ...). What's the long term plan of the
       | project?
        
         | dragonwriter wrote:
         | > Second, this kind of reversed SQL (filter-first, select-last)
         | is much easier to reason about than the original
         | 
         | Given that SQL clauses tend to be unambiguously terminated by
         | the start of the next clause or the end of the statement, it
         | surprises me that no engine has gone to accepting otherwise
         | standard(-ish, as much as real DB vendor dialects are) SQL but
         | without a mandated order of clauses.
         | 
         | And then combine that with dev tools that allow easy
         | rearrangement of clauses, perhaps based on configured
         | preferences so that you don't even see the original if its not
         | your preferred order, so that "Bob likes old-school SELECT FROM
         | WHERE GROUP BY and Alice likes FROM WHERE GROUP BY SELECT"
         | isn't a problem.
        
         | maximilianroos wrote:
         | Thanks!
         | 
         | I agree that integrating with the DB would allow much more from
         | a lang. But PRQL is a bet that languages which _start_ there
         | (e.g Kusto) get lost because it requires changing DB, which is
         | really hard. I worry EdgeDB may hit this issue too (but I 'm
         | really hoping it works, and they have an excellent team).
         | 
         | As I think you're suggesting -- you could imagine a language
         | starting out as a transpiler, and then over time DBs working
         | with it directly, cutting out some of the impediment mismatch.
         | 
         | Malloy [1] is another point in space -- it targets existing DBs
         | through SQL queries but can also ask for schemas etc while
         | developing.
         | 
         | [1] https://github.com/looker-open-source/malloy
        
       | _dain_ wrote:
       | Does this let you write functions that take tables as parameters
       | and return tables? It always seemed weird to me that SQL doesn't
       | let you do this.
        
       | akdor1154 wrote:
       | Looks really nice, i've been scribbling away in a little notebook
       | all the things i would do in "akdor's dream sql", and what you
       | have here hits pretty much exactly.
       | 
       | Wondering about generic use of `let` - you have let for col
       | defns, but `func` for functions and a TODO for tables/CTEs -
       | could/should `let` do the lot? (Like another commenter posted,
       | this is how MS's M language, used in PowerQuery in PowerBI and
       | Excel works). Could enable an escape from point-free for entire
       | queries if taken to extreme generality, not sure if that's a good
       | thing, maybe it could be?
       | 
       | Bikeshedding: even with some OCaml/F# experience, i find `f x y`
       | harder to read than `f(x, y)`.
        
         | maximilianroos wrote:
         | Thanks!
         | 
         | At the moment `let` is used to add a column as part of an
         | existing pipeline. [1]
         | 
         | `func` is the start of new expressions / pipelines. And I just
         | added a proposal for `table = `, which would be the same.
         | 
         | Does that make sense? Very open to more feedback...
         | 
         | [1] I just added `let` based on feedback here, it's better than
         | it was, but not perfect, as it can be confused for a new
         | pipeline given its use in other langs.
        
       | nextaccountic wrote:
       | I also saw a proposal for a better syntax for SQL, called BQL,
       | that was a strict superset but allowed for better modularity
       | 
       | http://intelligiblebabble.com/a-better-query-language-bql-la...
       | 
       | It had this github repo https://github.com/lelandrichardson/BQL
       | but never went anywhere
       | 
       | I hope PRQL has a better fate! unfortunately, by deviating from
       | SQL lexical conventions (using :, using [], etc) we lose the
       | ability to copy-paste from sql code elswhere.
       | 
       | I want a better SQL, but I also want some compatibility. Like
       | typescript is for javascript.
        
       | spullara wrote:
       | SQL could get a lot better by just adopting the ordering of
       | operations like they did with LINQ:
       | 
       | https://docs.microsoft.com/en-us/dotnet/csharp/programming-g...
        
         | tester756 wrote:
         | Funnily enough LINQ Query syntax is really uncommon and
         | everybody uses method syntax
         | 
         | var list = new List<int>{1,2,3}
         | 
         | var extracted = list
         | 
         | ....................Where(x => x > 1)
         | 
         | ....................Select(x => $"my number: {x})
         | 
         | ....................ToList();
        
           | aloisdg wrote:
           | Because the original felt odd in C#.
        
           | louthy wrote:
           | > everybody
           | 
           | Not everybody
        
       | betimsl wrote:
       | We barely learned SQL for all these years and the guy wants to
       | change it now. Thank you but no thank you.
       | 
       | JK. Cool concept and hopefully it catches.
        
       | eximius wrote:
       | > A line-break doesn't created a pipeline in a few cases: ... >
       | When the following line is a new statement, by starting with a
       | keyword such as func.
       | 
       | This feels like it could cause compatibility issues in the
       | future.
        
       | jcdreads wrote:
       | I like that everyone is trying to make something like SQL that
       | reads more naturally to them. More alternatives is good! SQL is a
       | widely accepted standard, and has strictly defined and super
       | broadly accepted semantics.
       | 
       | As someone who has written quite a few half-baked-for-general-use
       | but fit-for-purpose SQL generator utilities over the years, I'll
       | suggest that if you intend for a novel syntax to be a general SQL
       | replacement then being isomorphic to SQL would massively increase
       | usefulness and uptake:
       | 
       | 1. novel syntax to SQL; check! Now novel syntax works with all
       | the databases!
       | 
       | 2. any valid SQL to novel syntax; a bit harder, but I'd start by
       | using a SQL parser like https://github.com/pganalyze/libpg_query
       | and translating the resulting AST into the novel syntax.
       | 
       | 3. novel syntax to SQL back to novel syntax is idempotent; a nice
       | side effect is a validator/formatter for "novel syntax"
       | 
       | 4. SQL to novel syntax back to SQL is idempotent; a nice side
       | effect is a validator/formatter for SQL, which would be awesome.
       | (See also https://go.dev/blog/gofmt, which is where I learned
       | this "round trip as formatter" trick.)
       | 
       | I don't mean for this to sound negative, and I know that 2, 3,
       | and 4 are kind of hard. Thank you for building prql!
        
       | Nican wrote:
       | Also worth looking at KQL: https://docs.microsoft.com/en-
       | us/azure/data-explorer/kusto/q...
        
         | leokennis wrote:
         | When you said "KQL" I thought you meant "Kibana Query Language"
         | but I guess every letter of the alphabet followed by QL is
         | already taken twice over...
        
       | CarVac wrote:
       | I was reading the criticism and suggestions and wondering what
       | they were talking about, it seemed to read perfectly fine...
       | 
       | Then I saw that the project had already been changed in response!
       | Impressive.
        
       | erezsh wrote:
       | It seems people here are really interested in alternatives to
       | SQL. So perhaps you'd also like to have a look at
       | https://github.com/erezsh/Preql
       | 
       | (Same name, same goal, different approach, and already working)
        
       | dgudkov wrote:
       | >PRQL is intended to be a modern, simple, declarative language
       | for transforming data
       | 
       | It's not declarative. It's functional.
       | 
       | I believe that the approach that is followed by PRQL is more
       | practical than SQL. We've implemented a similar approach in our
       | visual ETL tool for non-technical people (https://easymorph.com)
       | and it works wonderfully. Other cool things you can do with this
       | approach (and can't with SQL):
       | 
       | * Modify existing columns without re-selecting the whole dataset
       | 
       | * Loops (iterations)
       | 
       | * Conditional IF/THEN/ELSE branching as a workflow statement
       | 
       | * Exceptions & error handling
        
       | beagle3 wrote:
       | shakti / K / kdb+ implements "real SQL", which is concise but
       | readable, and could give you a few ideas. Here's a copy-paste
       | from https://shakti.sh/ under document/sql.d (cannot deep link,
       | unfortunately). The most most magical aspects are automatic joins
       | - both left joins and "foreign key chase" joins. The fk-chase
       | joins, in particular, should be part of _every_ query language,
       | and can possibly be added in a backward compatible way to
       | existing SQL implementations.
       | 
       | example: TPC-H National Market Share Query 8
       | http://www.qdpma.com/tpch/TPCH100_Query_plans.html what market
       | share does supplier.nation BRAZIL have by order.year for
       | order.customer.nation.region AMERICA and part.type STEEL?
       | 
       | real: select revenue avg supplier.nation=`BRAZIL by order.year
       | from t where order.customer.nation.region=`AMERICA,
       | part.type=`STEEL
       | 
       | ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue
       | else 0 end) / sum(revenue) as mkt_share from ( select
       | extract(year from o_orderdate) as o_year, revenue, n2.n_name as
       | nation from t,part,supplier,orders,customer,nation n1,nation
       | n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey
       | and l_orderkey = o_orderkey and o_custkey = c_custkey and
       | c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and
       | r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and
       | o_orderdate between date '1995-01-01' and date '1996-12-31' and
       | p_type = 'STEEL') as all_nations group by o_year order by o_year;
        
         | maxwelljoslyn wrote:
         | Thanks for the tip. That automatic "foreign key chasing" looks
         | phenomenal. Byebye, much of that big tedious chunk in the
         | middle of your 2nd example... Wish I had that for more of the
         | SQL I write.
        
       | lijogdfljk wrote:
       | I wonder how this compares in practice to EdgeQL,
       | https://www.edgedb.com/showcase/edgeql
       | 
       | Offhand i thought PRQL seemed easier to reason about, but
       | something about EdgeQL seems better to me.. though i can't
       | describe it.
        
         | maximilianroos wrote:
         | I see EdgeQL as an excellent replacement for SQL in OLTP
         | settings -- it has great language integration and a unified
         | relational & typing approach. (Please correct me if this is
         | mistaken though).
         | 
         | I wrote the PRQL proposal for analytical / OLAP queries, where
         | the pipeline of transformations are more important, and
         | relations and typing are relatively less important.
        
       | mongol wrote:
       | I think it would be worthwhile to develop a shorthand of the same
       | thing, suitable for use on the command prompt. Something using
       | symbols as synonyms for keywords. Less eligble but more useful in
       | a future when shell tools understand this syntax.
        
       | nurettin wrote:
       | To be fair, lateral joins (cross/outer apply in mssql) can help
       | with name aliasing and table functions give sql some reusability.
       | I think the main pain points for sql are pivots and window
       | functions.
       | 
       | A lot of the time you just want to transpose your result, but you
       | have to choose an aggregate and handle null cases to force pivot
       | to work the way you want it.
       | 
       | And a lot of the time you want to aggregate a window but keep the
       | ids of the row so you avoid the having keyword altogether and go
       | for row_number and dense_rank to get your aggregate results.
       | 
       | If I were to write a query language, I would discard group by and
       | having and make it easier to apply transpose and window
       | functions.
        
       | bchammer wrote:
       | This looks like a turing-complete language if you add function.
       | One benefit or current SQL is that it can (and is) often
       | rewritten to be executed more efficiently. This requires a
       | simpler and declarative model - far from a programming language.
       | 
       | If you want simple 'forward declaration' take a look at the SQL
       | WITH clause.
        
       | hardwaregeek wrote:
       | I've always wondered why there aren't query languages that
       | embrace algebraic data types and pattern matching. Seems like an
       | obvious fit to me. There's many times where you'd want to model a
       | table that has either this scheme or that schema.
        
         | jlokier wrote:
         | They can work well. In the project I'm working on the database
         | uses algebraic datatype keys (i.e. tags and tag-dependent
         | columns) to make the database faster and smaller than an
         | equivalent relational schema, but the database is used via API
         | rather than via a query language.
        
         | jayd16 wrote:
         | Check out C# and LINQ. They pull it off with anonymous types.
        
       | haolez wrote:
       | This actually looks like an improvement (and I like SQL). This
       | feels closer to non-programmers, contrary to some other SQL
       | "competitors" like that query language from InfluxDB.
        
         | thanatos519 wrote:
         | It definitely scans better than 'Flux' from InfluxDB2.
         | 
         | One thing I like about Flux is the ability to split streams and
         | return multiple distinct aggregations. Very handy in Grafana
         | dashboards!
        
       | 29athrowaway wrote:
       | SQL is like BASIC for data.
       | 
       | It is a language created to read like natural language, to
       | facilitate its adoption.
       | 
       | Not only developers use SQL, less technical users also use it,
       | especially in the finance industry.
       | 
       | Now, do you use BASIC professional software development? While
       | some do, most don't. So why do we use SQL then?
       | 
       | An imperative language that compiles to SQL can be attractive,
       | but unless the compilers come with good compiler warnings, we may
       | end up created bloated database code.
        
       | ajkjk wrote:
       | Just wanna say, I absolutely love this.
        
         | ajkjk wrote:
         | One piece of feedback:
         | 
         | "sort sum_gross_cost # Uses the auto-generated column name."
         | ... seems like a huge landmine. Languages really should not
         | have any implicit way of constructing identifiers (among other
         | reasons it is not easily greppable).
         | 
         | You might consider using a syntax like `sum:gross_cost` which
         | can function as a sort parameter and an aggregation, but is
         | actually recognizable as an object instead of having an
         | implicit transformation going on in the background. Like this:
         | ...       filter gross_cost > 0       aggregate by:[title,
         | country] [         sum:gross_cost,       ]       sort
         | sum:gross_cost       ...
        
           | topspin wrote:
           | > # Uses the auto-generated column name
           | 
           | The fact that someone felt the need to add that comment hints
           | at a design mistake. Synthesizing symbols is weird,
           | unnecessary and is probably a violation of the principle of
           | least surprise.
           | 
           | Otherwise I think PRQL has some value. Nice work. I strongly
           | suspect that if SQL looked more like this there would be a
           | lot more people willing to use the query language directly
           | and perhaps fewer that are compelled to bury it under
           | shifting layers of fragile abstractions.
        
       | politician wrote:
       | I prefer "from first". When I write a SQL query, 99 times out of
       | 100, I'll start with "select * from", then fill in the query,
       | then go back and select the columns. After a few basic joins,
       | "easy" column names have probably grown aliases or have been
       | subsumed entirely. For me, "select * from" is automatic.
        
       | johnthescott wrote:
       | using COMMON TABLE EXPRESSIONS (cte) can greatly improve
       | readability of complex sql queries. adding "flow" just feels like
       | a variation on sequential programming.
       | 
       | i am against "improving" sql. instead, i thing a whole rethink of
       | the engineering behind relational engines needs to occur. for
       | example, why can't a relational database support both SQL and
       | other languages simultaneously, instead of being so black-boxish?
        
       | thelittlenag wrote:
       | I worked on a language at one of the big banks that looked very
       | similar. The goal was to have a better-than-SQL language on top
       | of Spark and I think we succeeded at that very well.
       | Unfortunately, politics killed the language and platform.
        
       | deepstack wrote:
       | Wouldn't it be better to just use something like prolog, there is
       | already and extension for postgres?
        
       | onphonenow wrote:
       | Any chance of starting with FROM perhaps to help with
       | autocomplete?
       | 
       | From Table or FROMT TABLEA JOIN TABLE B ...
       | 
       | The autocomplete would then be able to introspect much better in
       | the tooling side.
        
         | ithrow wrote:
         | Queries start with FROM, it's called a README for a reason...
        
       | Ngunyan wrote:
        
       | ridaj wrote:
       | Very cool
       | 
       | Design goal question - is the goal to have a language that
       | _blindly_ compiles to SQL, or will compilation require data
       | schema knowledge?
       | 
       | Suggestion: where possible stick to the well established SQL
       | keywords (prefer "group by" to "aggregate by")
        
         | maximilianroos wrote:
         | Initially the transpiling would be context-free of the schema.
         | 
         | It would really nice to have context when _writing_ the code,
         | so we could do things like autocomplete (IIUC this is what
         | Malloy does already).
         | 
         | Are there features you can think of that would be helpful if we
         | had the schema context during transpiling?
        
           | ridaj wrote:
           | I wasn't thinking it'd be useful, but more that it's good to
           | remain as free as possible from any assumptions about the
           | schema of the data. I often work with tables that have
           | unusually large schemas (> 100MB) and have seen some
           | products' performance severely degrade as a function of
           | schema complexity.
           | 
           | (But otherwise +1 to schema awareness during authoring.)
        
       | kovek wrote:
       | Could there be a tool that would translate PRQL to SQL? One could
       | then write in ~/my_scripts/closest_points.prql and then run a
       | command to get the sql equivalent and use that in exiting SQL
       | tools that do not currently access PRQL (like Postgres).
        
         | ako wrote:
         | From the PRQL page: "PRQL transpiles to SQL, so it can be used
         | with any database that uses SQL."
        
       | oblio wrote:
       | Man, I really hope something like this takes off. SQL is so
       | entrenched but it's really showing its age and its pitiful
       | standardization.
        
       | oblio wrote:
       | To the author, if he's reading this. I wish you luck! To get this
       | adopted, try to make it modular so maybe it can be made a core
       | module of PostgreSQL, MariaDB, MySQL. If you somehow get into
       | those 3 I wouldn't be surprised if Oracle, SQL Server and DB2
       | integrate it themselves just to keep up.
       | 
       | Ah, Presto could be another popular target.
       | 
       | It's a huge effort but something which could have colossal
       | payback.
        
         | twic wrote:
         | This language transpiles to SQL, so it can be implemented
         | entirely client-side. No need for modules for the database
         | engines, just wrappers round the clients.
        
       | nikkinana wrote:
        
       | jeroen79 wrote:
       | to be honest i find this makes it just more complex and less
       | readable then normal sql
        
         | LittlePeter wrote:
         | Same here. I find the SQL query in the README more readable.
        
       | hackeredje wrote:
       | Is this not Linq ?
       | https://stackoverflow.com/questions/tagged/linq ?
       | 
       | And then dump the queries via
       | https://stackoverflow.com/questions/1412863/how-do-i-view-th...
       | or https://www.linqpad.net/ ?
        
         | aloisdg wrote:
         | Indeed. It looks a lot like dotnet's Linq.
        
       | ianbicking wrote:
       | I like it, it's readable, unlike some SQL alternatives I've seen
       | it doesn't make me feel like I'm dumb and don't understand what a
       | query even is.
       | 
       | I can't decide if it would be better or worse if it stuck more
       | closely to SQL keywords. You use "from" and "select", but not
       | "where", "order by", "group by". There's some danger of it being
       | in an uncanny valley of SQLish, but I'm pretty sure I'd prefer
       | just using those terms verbatim (including the space in "order
       | by"... that style is less common in modern languages but it's not
       | really that much harder to parse).
       | 
       | I'd like to see more examples of joins and composing SQL. Does
       | this language make it easier to make more general SQL queries?
       | Can I take two queries and squash them together in a reliable
       | way? I feel like I end up with a lot of theme and variation in my
       | queries, often involving optional filters.
       | 
       | I might even like a notion of encapsulation that could help this
       | query language when it's embedded in other languages. Like if I
       | could say, in the language itself, that a query has certain
       | unbound variables (and not just ? or other placeholders). This
       | language seems like it would be better for generating than SQL,
       | and sometimes generation is just necessary (like in any
       | application that supports data exploration), but for most common
       | cases I'd hope to avoid that. Defining inputs and then making
       | whole filter sections or other statements conditional on those
       | inputs would help here.
        
         | maximilianroos wrote:
         | Thanks!
         | 
         | I just fleshed out composing CTEs, which is a small step
         | towards the broader goal of making composition easier:
         | https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078...
         | 
         | Let me know if you have a good case of the sort of composition
         | you find difficult in SQL (either here or in an issue). Thank
         | you!
        
         | cogman10 wrote:
         | Yup, I like a lot of things about the way this looks. In
         | particular, I like how friendly this looks to be for things
         | like auto complete (pretty annoying to need to practically type
         | the entire sql query only to go back and fix up the columns in
         | order to get autocomplete to work).
         | 
         | Specific things I'd like to see.
         | 
         | How do you handle column ambiguity. In the examples, they show
         | a join of positions to employee on employee_id == id. But what
         | happens when you have 2 columns with the same name that you are
         | joining on? (like employee_id to employee_id in some mapping
         | table).
         | 
         | Subqueries are pretty important in what I do, so what do those
         | look like (perhaps covered by the "thinking about CTEs
         | section").
         | 
         | How about opportunities for optimization hints? In T-SQL you
         | can hint at which index the optimizer should prefer to a
         | specific query.
         | 
         | Common SQL patterns would also be interesting. Like, how would
         | you do keyset pagination?
         | 
         | Edit: Also, I'd like a discussion about null. SQL null handling
         | rules are terrible. I understand them, I work with them, but at
         | the same time, they are so different from other languages
         | concept of "null" that they are easy to trip over.
        
       | galkk wrote:
       | I'm quite opposed to the idea "from should be first".
       | 
       | I want to understand what exactly the query returns, not the
       | implementation detail of the source of this data (that can later
       | be changed).
       | 
       | Literally first example from page - I have no idea what is being
       | returned:                   from employees          filter
       | country = "USA"                           # Each line transforms
       | the previous result.          let gross_salary = salary +
       | payroll_tax          # This _adds_ a column / variable.
       | let gross_cost   = gross_salary + benefits_cost  # Variables can
       | use other variables.          filter gross_cost > 0
       | aggregate by:[title, country] [                  # `by` are the
       | columns to group by.               average salary,
       | # These are the calcs to run on the groups.               sum
       | salary,               average gross_salary,               sum
       | gross_salary,               average gross_cost,               sum
       | gross_cost,               count,      ]          sort
       | sum_gross_cost                              # Uses the auto-
       | generated column name.          filter count > 200          take
       | 20
       | 
       | of course, similar things are happening to SQL too, with CTEs
       | becoming more widespread and "real" list of the columns hidden
       | somewhere inside, but it's still parseable
        
         | quocanh wrote:
         | I agree that the columns of the results should be more obvious.
         | But I am a proponent of "from should be first". I have never
         | written a SQL query without thinking about the contents of a
         | table or its relations. If it was my way, I would describe
         | where the data I'm pulling from, then describe any
         | filters/joins, then describe the columns that I'm interested in
         | (last).
        
         | phailhaus wrote:
         | SELECT id, name, author
         | 
         | Quick, what is this query about? What's ironic is that I think
         | you have it backwards: the columns are the implementation
         | detail, not the table. The table is the context: you can't
         | change that without having to change everything else. But
         | columns are the last step, the selection after the filters,
         | joins, etc. They can be changed at any time without affecting
         | the logic.
        
           | taeric wrote:
           | This is... An odd choice. I'd assume I'm not without context
           | looking at a query to know why I would want those columns.
           | 
           | And the auto complete story is backwards. Often I know what
           | columns I want, but I'm not clear what table I need to get
           | them from. Such that, if you make a smarter suggest in the
           | from to only include tables that have the columns, I'd be
           | much happier.
        
             | sanderjd wrote:
             | Just throwing in another point of anecdata onto this pile:
             | "Often I know what columns I want, but I'm not clear what
             | table I need to get them from" does not make sense to me. I
             | don't relate at all to their being a global namespace of
             | columns, rather than a namespace of tables, each with its
             | own columns specific to its context.
        
               | taeric wrote:
               | I challenge this. I accept that there are ambiguities,
               | but I assert that you can go really fast by just telling
               | someone to fetch a few columns by name.
               | 
               | I further assert that if your database is filled with
               | "Id" and "name" columns, instead of "department_name" and
               | similar, you are probably as likely to mess up a join as
               | any benefit you get from the name being short. (And
               | really, what advantage is there in short names nowadays?)
               | 
               | That all said. I worded my take too strongly. My point
               | should have been that auto suggest should not be confined
               | in either direction.
        
         | samatman wrote:
         | I'm also completely unfamiliar with the PRQL syntax, outside of
         | right now.
         | 
         | Reading the comment however, it would seem that `let` adds
         | columns which are implicitly returned in the order they are
         | defined.
         | 
         | I do see benefits in this, and can imagine pitfalls. Hard to
         | judge without kicking the tires.
         | 
         | Update: It's quite possible we saw different syntax!
         | 
         | https://news.ycombinator.com/item?id=30063266
         | 
         | Without the `let` I would imagine having trouble reading it as
         | well, I'm not sure if that would go away with familiarity but
         | my instinct is that it's a useful addition.
        
         | inglor wrote:
         | The big advantage of "from first" like we have in Kusto KQL (a
         | database we use at Microsoft) is that it provides much better
         | autocomplete (if I write the `from` it can easily autocomplete
         | the projection).
         | 
         | If you want an interesting example of how a query language
         | built for developer experience and autocompletions looks
         | definitely check it out!.
        
           | majkinetor wrote:
           | Too bad we can't use Kusto with anything except Azure.
        
           | _jal wrote:
           | Designing languages around autocomplete is like designing
           | toilets for better toilet paper dispensers.
           | 
           | The language should be right for human understanding, not
           | automated mad-lib generation.
        
             | Xelbair wrote:
             | And one of use cases is writing queries which it helps
             | immensely. Best of both worlds would allow both orders.
             | Just automatically transform the query to the usual form
             | after it's execution.
        
             | phailhaus wrote:
             | Building for autocomplete _is_ building for human
             | understanding. If it is impossible for a computer to
             | determine the context of your query, why would a human do
             | much better?
        
               | [deleted]
        
               | _jal wrote:
               | They are not fully-aligned goals, and autocomplete should
               | not be given equal consideration on par with human
               | clarity.
               | 
               | If you want nice autocomplete too, that's fine, but if
               | there is a tradeoff, human understanding is the primary
               | concern.
        
               | tester756 wrote:
               | I don't understand why do you think about it this way
               | 
               | C#'s LINQ (really powerful tool similar to SQL) works the
               | same way
               | 
               | look:
               | 
               | var list = new List<int>{1,2,3}
               | 
               | var extracted = list
               | 
               | .............................Where(x => x > 1)
               | 
               | .............................Select(x => $"my number:
               | {x})
               | 
               | .............................ToList();
               | 
               | or
               | 
               | var extarcted =
               | 
               | ........................from x in list
               | 
               | ........................where x > 1
               | 
               | ........................select $"my number: {x};
        
             | inglor wrote:
             | You would think that but having used both I find writing
             | Kusto/KQL much smoother, neater and faster and if I have to
             | choose between writing a query in either one I'd pick KQL.
             | 
             | I understand this is just an opinion but it's an opinion
             | held by everyone in my org who writes both.
             | 
             | Theoretical correctness loses to pragmatism a lot and I'd
             | read the KQL every day. Look at the examples at
             | https://docs.microsoft.com/en-us/azure/data-
             | explorer/kusto/q... - look at the examples at
             | https://docs.microsoft.com/en-us/azure/data-
             | explorer/write-q... and tell me they're not more readable
             | than comparable SQL?
             | 
             | (I can see the result type both by hovering on the query
             | but also by just looking at the end of it - and in SQL most
             | of the SELECTed items in complex queries are from
             | subqueries anyway - at least in my use case)
        
               | keithnz wrote:
               | now if MS made a KQL -> TSQL or support it natively in
               | SQL Server, that would be great :)
        
               | adamrezich wrote:
               | yes please!!
        
             | jiggawatts wrote:
             | I've been frustrated by toilets where I have to contort my
             | body to reach the dispenser. Similarly, I've had dispensers
             | intrude on the space where my legs would normally be and
             | make it awkward to even just sit on the toilet.
             | 
             | Toilets are absolutely designed to make the dispenser
             | placement convenient. You just don't think about it because
             | 95% of toilets get it right, so it just doesn't bother you
             | that much that it _can_ be wrong.
             | 
             | In SQL, some decisions are right about 10% of the time and
             | are annoying and awkward the other 90%.
             | 
             | That's why the order matters. Because everything else got
             | it right.
        
             | oldsecondhand wrote:
             | There's a person behind the IDE. If you help the IDE, you
             | help the person using the IDE.
        
             | emteycz wrote:
             | I'd agree if there was any way whatsoever of fixing this
             | issue, but there simply isn't. The editor can't even begin
             | to guess what you might want until you write your FROM.
        
               | taeric wrote:
               | Maybe in gigantic systems with more tables than makes
               | sense. Realistically, all of the columns available in a
               | database can be fit in memory with ease.
               | 
               | Then, the ide could basically fill my from out for me,
               | based on what I'm asking for. Can even suggest what join
               | I will need, if I list columns from multiple tables.
        
               | [deleted]
        
               | tester756 wrote:
               | >Maybe in gigantic systems with more tables than makes
               | sense. Realistically, all of the columns available in a
               | database can be fit in memory with ease.
               | 
               | Every table has more than one column
               | 
               | So there's always more columns to remember than tables
               | and generally tables are pretty easy like user invoices
               | blabla
               | 
               | I worked with systems that had like 500 tables and some
               | of them with 20-50 columns
               | 
               | you really want good intellisense in such a environment
        
               | taeric wrote:
               | 500 times 50 is still not a big number. And you could do
               | decent statistical suggestions on the current columns in.
               | 
               | Good intelligent suggestions is, of course, helpful. And
               | I agree that suggesting one of 500 is easier than the
               | other. That said, neither is hard for a computer. And
               | even asking friends what table I want will often be done
               | with starting with the actual columns I want.
        
               | tester756 wrote:
               | On the other hand I don't see why it couldnt work both
               | ways
               | 
               | if you start query with
               | 
               | SELECT MiddleName then you could receive auto complete
               | thats adds "FROM Users" and moves your cusor after
               | MiddleName.
               | 
               | if you start query with
               | 
               | FROM Users SELECT _ and know intellisense drops list of
               | columns
        
               | taeric wrote:
               | Agreed. I'm really just arguing that it doesn't have to
               | be from first.
        
           | lemmsjid wrote:
           | That's interesting because it also explains why I was going
           | to say I do like having from first. When trying to reason
           | about a query, I mentally go through the following:
           | 
           | 1. What tables are being pulled from? This speaks to the
           | potential domain of the query. 2. What data is being selected
           | (I can now know what is or isn't being pulled from the
           | aforementioned tables...) 3. What operations, aggregations,
           | groupings, etc. are being performed to work on the pulle data
           | 
           | Of course from vs select ordering is completely arguable, but
           | my thinking process seems to follow that of the auto complete
           | --in other words that my cognitive load of looking at the
           | select statement is lessened when I know from what the
           | columns are being selected.
           | 
           | It also follows (at least to me) the mental process of
           | writing the query. First look at the tables, then decide what
           | columns, then decide what functions to apply.
        
           | taeric wrote:
           | I said it in a sibling, but I feel this is somewhat missed.
           | Auto complete that simply lists the tables is easier if from
           | is first. But... Auto complete that helps me know what tables
           | can give me my requested columns works the other direction.
        
         | KerryJones wrote:
         | This feels like a English-language thing. In english we tend to
         | put our adjectives first, it feels natural, "Where is my red,
         | round ball?", rather than some other languages (like German)
         | where you put the subject first. Equivalent of "Where is my
         | ball, red & round?"
         | 
         | While it inherently feels unnatural I do agree with the others
         | here that the context is actually easier to understand once
         | over the initial uncomfort.
        
         | mcsoft wrote:
         | Both CTEs and this idea address the same problem: poor
         | readability of complex SQL queries. Compared to CTEs, the
         | author takes the idea to split the complex query into parts to
         | the next level.
         | 
         | To your point - a solid IDE will show you what's being
         | processed at each line (or returned, if the cursor is on the
         | last line) - in an autocomplete window or a side panel.
        
       | andreygrehov wrote:
       | I like it. Was it inspired by CloudWatch Logs Query Language? [0]
       | Looks somewhat similar.
       | 
       | [0] -
       | https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL...
        
       | goodlinks wrote:
       | maybe i am jaded (too many hours reading and writing sql as it
       | is) but the structure of SQL is very closely linked to how i
       | write a query:
       | 
       | what data do i want to see: most important
       | 
       | where doest is come from: next most important thing
       | 
       | What filters and restrictions am i going to put on it: least
       | important
        
       | btown wrote:
       | For any who want this kind of pipelining way-of-writing-SQL that
       | has the benefit of existing in live production databases today, I
       | highly, highly recommend looking into Postgres' and Snowflake's
       | LATERAL JOIN keyword.
       | 
       | https://heap.io/blog/postgresqls-powerful-new-join-type-late...
       | 
       | https://stackoverflow.com/questions/28550679/what-is-the-dif...
       | 
       | https://docs.snowflake.com/en/sql-reference/constructs/join-...
       | 
       | The TL;DR is that they allow you to reuse annotated and
       | aggregated columns in an incredibly elegant way. Compared to OP's
       | proposal, you still do need to start the query with what columns
       | you want to come out at the end, and normal SQL weirdnesses still
       | apply - but it's far, far, easier when writing massive analytics
       | queries to see the flow of variables from one stage to another.
        
       | snidane wrote:
       | You don't need additional SQL repetitive cruft.
       | from population         select country, rollup(city), count(*)
       | sort
       | 
       | Can represent this repetitive SQL query:                   select
       | country, city, count(*)           from population          group
       | by country, rollup(city)          order by country, city
       | 
       | Information in group by is often redundant. You can tell which
       | columns are measures vs dimensions by examining the 'aggregate'
       | function - rollup or no function vs sum, count, avg. Order by can
       | have a default to sort by all columns instead of naming them one
       | by one.
        
       | rchrch wrote:
       | Awesome! Would love to see an implementation. I worked on
       | something similar over the Summer. It's just relational algebra
       | with pipes for composition. If you are interested, we could get
       | an antlr grammar going and plug it into this basic execution
       | engine to get a feel for the language.
       | 
       | - https://github.com/RCHowell/Sift -
       | https://github.com/RCHowell/Sift/blob/main/src/main/kotlin/c...
        
         | maximilianroos wrote:
         | Yes this looks really cool, and similar! Feel free to hit me up
         | on Twitter https://twitter.com/max_sixty
        
       | cies wrote:
       | I love quality language proposals like this. I'm not so much in
       | data processing/bigdata, but have had to interact with SQL a lot.
       | 
       | This syntax is lovely! It's more intuitively readable (and SQL is
       | not that bad in that regard).
       | 
       | My feedback:
       | 
       | 1. Lower case, underscored everything makes the terms a bit hard
       | to differentiate. Maybe set some classes of symbols in CamelCase,
       | or add a !@#$%& prefixes to them to make it more readable.
       | 
       | 2. I dont like to use another language (SQL or PRSQL for that
       | matter) to db interaction, I like to write the queries in the
       | language that I'm using to develop in. There are ORMs in this
       | design space, but I'm a little fed up with them. In Java there's
       | jOOQ. Other less-OO-more-functional ORMs exist in Rust and
       | Haskell land. These often have a code generation step, a library
       | is generated that guarantees some type safety for a give schema
       | version. Some are more SQL-like, some provide a different API.
       | PRQL is much more diverted from SQL than these, and for good
       | reasons. Maybe several languages could easily have libs like this
       | building on top of JPQL?
       | 
       | 3. You solution is a bit like GraphQL in some regards; where
       | there is a tool needed to convert the query to SQL. Tools like
       | this exist, like Hasura and the likes. Hasura does a lot more. To
       | me GraphQL has the huge advantage of serving a schema so that
       | clients can be generated. I can interact with GraphQL in a type
       | safe fashion from by generating a client in, say, Elm. The
       | generated client lib does not allow my to write syntax errors in
       | my queries and ensures all type conversions are sound. Maybe PRQL
       | can also be a language like GraphQL in that regard, and provide a
       | schema too.
       | 
       | 4. JPQL. It's close to SQL. It improves to SQL, but I never found
       | it enough of an improvement to justify the cost. I think your
       | proposal is better. But still I think JPQL deserves a mention as
       | maybe one of the most widespread compile-to-SQL languages.
        
       | bachmeier wrote:
       | My thought is that joins are the tough part of the SQL learning
       | curve, but I don't see much in here that reduces the complexity
       | of joins.
        
         | omarhaneef wrote:
         | There was this professor of language who would say "Do you
         | think the question ('are carpets furniture?') tells you
         | something about the ambiguity of the word carpet, or do you
         | think it tells you something about the ambiguity in the world?"
         | 
         | Similarly, I think joins are "tough" not because of the way SQL
         | expresses them but because the logical possibilities of merging
         | data from multiple tables are varied.
        
           | bob1029 wrote:
           | There is no such thing as a domain-agnostic SQL database that
           | holds up under this kind of semantic scrutiny. I don't think
           | that there ever _could_ be.
           | 
           | If you are rolling a SQL schema for a home improvement
           | contractor, it is extraordinarily unlikely that their
           | specific business would expect any scenarios in which carpets
           | are sometimes known as furniture.
           | 
           | Having a bounded context to operate within is what makes SQL
           | magical for me. When people don't understand the business or
           | simply the game around how you _talk_ about the business,
           | things start getting messy wrt joins.
        
             | omarhaneef wrote:
             | The carpet discussion was simply to say that you can't take
             | out all the complexity of a language if the domain it is
             | meant to describe is complex. The language has a limit to
             | how simple it can be.
             | 
             | I was not proposing a SQL database of carpets, or
             | furniture, as a thought experiment.
        
           | bachmeier wrote:
           | Then make some cases easier and fall back to the SQL we
           | already have for the rest?
        
             | Supermancho wrote:
             | SQL has effectively failed, as a standard, despite it's
             | ubiquity. It's literally being aged out, which makes for
             | opportunities for PRQL, etc to fill pragmatic gaps.
             | 
             | eg the lack of default column aliasing from joins
             | SELECT              A.id AS A__id,              A.name AS
             | A__name,              B.id AS B__id,              B.name AS
             | B__name          FROM A          LEFT JOIN B
             | ON A.other_id = B.other_id
             | 
             | When you could have:                   SELECT
             | A.*,              B.*          FORMAT (TABLE__)
             | FROM A          LEFT JOIN B              ON A.other_id =
             | B.other_id
        
         | cogman10 wrote:
         | IMO, this appears not to be something that solves the SQL
         | learning curve but rather the usability of a query language
         | with tooling.
         | 
         | I don't think there is much that could be done to address left,
         | right, inner, outer join semantics. It's just something you
         | have to learn if you want to do a lot of SQL (though, you are
         | likely only ever going to use left and inner joins).
        
       | munk-a wrote:
       | I like the flow direction compared to standard SQL. SQL is
       | supposed to read like a sentence I suppose but I have many times
       | looked at it and really wanted things to be in a more logical
       | order.
       | 
       | My main suggestion would be to be a bit less terse and introduce
       | a bit more firm formatting. I'm not a huge fan of the term
       | "split" and feel like jazzing that up to "split over" or even
       | just reviving "group by" would improve readability. Additionally
       | the aliasing could use work, I'd suggest reversing the assignment
       | to be something closer to `use salary + payroll_tax as gross
       | salary`. In terms of firm formatting, unless I'm missing
       | something there isn't any reason to allow a filter statement
       | before any aliases - so you can force two fixed positions for
       | filter clauses which would make it always legal to reference
       | aliases in filters.
       | 
       | On the brief topic of form vs. flexibility. SQL is a thing that,
       | when complex, is written by many people over the course of its
       | lifetime - removing the ability to make bad decisions is better
       | than enabling the ability to write simple things even simpler -
       | those silly do nothing queries like "SELECT * FROM customer WHERE
       | deleted='f'` are written once[1] in a moments time and never
       | inspected again. The complex queries are what you want to
       | optimize for.
       | 
       | 1. If they even are - with ORMs available a lot of those dead
       | simple queries just end up being done through an ORM.
        
         | hn_throwaway_99 wrote:
         | > On the brief topic of form vs. flexibility. SQL is a thing
         | that, when complex, is written by many people over the course
         | of its lifetime - removing the ability to make bad decisions is
         | better than enabling the ability to write simple things even
         | simpler
         | 
         | Hallelujah! But, to your footnote, this is a major reason why I
         | despise ORMs. In my mind they make writing simple code slightly
         | easier, but they make complicated SQL statements, especially
         | when you get some weird issue under load and you're trying to
         | debug why your DB is falling over, a _ton_ more difficult and
         | you spend so much time just battling your ORM.
        
         | jnsie wrote:
         | I like the flow direction specifically for
         | intellisense/autocomplete. I'm sure it would be easier to
         | provide hints when the table name is known immediately.
        
           | m1sta_ wrote:
           | I'd love for the next release of SQL to have optional
           | alternative ordering of clauses
        
         | maximilianroos wrote:
         | This is great feedback, and I agree with you re de-prioritizing
         | terseness.
         | 
         |  _And_ I agree with you on both the assignments and `split`
         | being a bit awkward. Kusto just uses `by`, WDYT?
        
           | tomtheelder wrote:
           | Not the original commenter, but just using `by` makes total
           | sense to me.
        
             | maximilianroos wrote:
             | I've made this change [1]. Thank you!
             | 
             | [1] https://github.com/max-
             | sixty/prql/commit/dde7fcfc13daaadbdce...
        
               | Serow225 wrote:
               | <3
        
           | munk-a wrote:
           | By actually sounds great to me to, yea. In this case it's
           | short but it's extremely communicative!
        
           | loic-sharma wrote:
           | Yes Kusto's `by` is excellent!
        
       | dragonwriter wrote:
       | > It's not declarative. It's functional.
       | 
       | Functional, logical, and relational paradigms are the most
       | commonly cited examples given of declarative programming.
        
       ___________________________________________________________________
       (page generated 2022-01-24 23:02 UTC)