[HN Gopher] Show HN: PRQL 0.2 - a better SQL
       ___________________________________________________________________
        
       Show HN: PRQL 0.2 - a better SQL
        
       Hi everyone -- thanks for your interest in PRQL -- let us know any
       questions or feedback!  We're excited to be releasing 0.2[1], the
       first version of PRQL you can use in your own projects. It wouldn't
       exist without the feedback we got from HackerNews when we
       originally posted the proposal.  [1]:
       https://github.com/prql/prql/releases/tag/0.2.0
        
       Author : maximilianroos
       Score  : 239 points
       Date   : 2022-06-27 17:03 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ewuhic wrote:
       | Previous discussion: PRQL - A proposal for a better SQL
       | https://news.ycombinator.com/item?id=30060784
        
       | nawgz wrote:
       | Is it true that this is somehow an analogue to how JS development
       | is really ultimately targeting browser-compatible-JS in the end,
       | even though we use the latest ECMAScript features & TypeScript in
       | development? I.e. is it expected someone writes PRQL and then
       | transpiles before executing against the database? Is there a REPL
       | one can use against a local Postgres or something?
        
         | maximilianroos wrote:
         | > Is there a REPL one can use against a local Postgres or
         | something?
         | 
         | Somewhat -- you can use it in Jupyter now[1]; e.g.:
         | %%prql         from p = products.csv         group categoryID (
         | aggregate [average unitPrice]         )
         | 
         | This doesn't yet have the benefits we'd get from e.g.
         | autocomplete, so there's much more to do there.
         | 
         | There's also a cool TUI in PyPrql[2]
         | 
         | [1]: https://pyprql.readthedocs.io/en/latest/magic_readme.html
         | 
         | [2]: https://pyprql.readthedocs.io/en/latest/readme.html
        
       | notimportant0 wrote:
       | I mostly work using T-SQL but I like PRQL.
       | 
       | Are you able to add in the examples the following:
       | 1) Use of delimiters for names that include space, etc. I don't
       | know if PRQL uses double quote or square brackets.          2)
       | Use of two/three/four-naming convention to refer to servers,
       | databases, tables and columns.
        
       | exabrial wrote:
       | I don't really find SQL that difficult, other than NULL !=NULL, I
       | wish <=> could be switched to the default!
        
       | bornfreddy wrote:
       | Looks awesome! I don't think it adds much to SQL when the queries
       | are simple, but when you have this looong and complex query I can
       | totally see the appeal.
        
       | gigatexal wrote:
       | I'm a raw-SQL-no-ORM snob and I really like this. I'd like to see
       | it become more mainstream.
        
       | dvirsky wrote:
       | Nice. A few years ago I designed a query language to do
       | aggregations in RediSearch, and it's quite similar in its
       | structure.
       | https://redis.io/docs/stack/search/reference/aggregations/
        
       | dmeijboom wrote:
       | I recently started implementing the Postgres protocol in Rust
       | (https://github.com/dmeijboom/postgres-conn). So I guess I'll be
       | experimenting with creating a Postgres proxy which translates
       | PRQL on-the-fly.
        
         | maximilianroos wrote:
         | That looks really exciting! Please keep us in touch with your
         | efforts and let us know if there's any way we can be helpful.
        
       | jeroen79 wrote:
       | SQL doesn't need fixing or improving if you ask me, its well
       | supported and just works.
        
       | vander_elst wrote:
       | Congrats for the milestone!! The syntax looks more intuitive than
       | SQL. Great to see viable alternatives to SQL!
        
       | ssalka wrote:
       | I just noticed the play on words "prequel" vs "sequel". Nice.
        
       | slotrans wrote:
       | Hot takes: SQL is great, actually. This thing isn't better.
        
         | breakfastduck wrote:
         | As much as I love SQL it can often be a pain and involve lots
         | of nested subqueries to do 'simple' things. I like this way
         | this abstracts it.
         | 
         | Would I use this instead of proper SQL in a data warehouse /
         | large app? Maybe not.
         | 
         | Would I use it to manually query DBs when I need some ad hoc
         | info? For sure.
        
           | notimportant0 wrote:
           | If this becomes a full-fledge DQL that can be used in a proc
           | or function in a running Postgres instance, I would use it in
           | production.
        
         | mellosouls wrote:
         | No, but it's not as foreign a paradigm or language to
         | (presumably?) its main target audience - developers.
        
       | [deleted]
        
       | emmelaich wrote:
       | Doesn't the pipelining mean that it's not declarative? At least
       | not in that part.
        
       | OJFord wrote:
       | Somehow I didn't see it coming -
       | 
       | > pronounced "Prequel".
       | 
       | - and I burst out laughing. Very good.
        
       | michelpp wrote:
       | It seems like an obvious next step for Postgres support would be
       | to make PRQL a stored Procedure Language.
       | 
       | create function foo() returns bar as language prql $$<prql code
       | here>$$;
        
       | ranjanprj wrote:
       | Great project, was looking something like this
       | 
       | but wish this was somehow encoded as JSON, so you could easily
       | build pipeline UI for complex SQL Generation.
        
       | cogman10 wrote:
       | Awesome to see the progress here. Looks like the language has
       | significantly matured since last it popped up on HN.
        
       | oarabbus_ wrote:
       | Is it correct there's no CASE WHEN and instead you have to define
       | a function using a ternary operator? CASE WHENs may be verbose
       | but when you have a dozen of them they're more readable and the
       | waterfall nature is far preferable to a giant block of ternary
       | clauses.
        
       | qolop wrote:
       | Why should I use this instead of SQL?
        
         | snthpy wrote:
         | That's a really good question! (and one we should probably
         | answer explicitly in the [FAQ](https://prql-lang.org/faq/)
         | rather than just implicitly)
         | 
         | The README states that "PRQL is a modern language for
         | transforming data -- a simple, powerful, pipelined SQL
         | replacement. Like SQL, it's readable, explicit and declarative.
         | Unlike SQL, it forms a logical pipeline of transformations, and
         | supports abstractions such as variables and functions. It can
         | be used with any database that uses SQL, since it transpiles to
         | SQL."
         | 
         | What that means to me is that PRQL more naturally maps onto how
         | I think about and work with data.
         | 
         | Say I have some dataset, `employees`, and I want to answer some
         | questions about it like, for US employees, what is the maximum
         | and minimum salary and how many employees are there:
         | from employees         filter country == "USA"
         | # Each line transforms the previous result.         aggregate [
         | # `aggregate` reduces column to a value.           max salary,
         | min salary,           count,
         | # Closing     commas are allowed :)         ]
         | 
         | Moreover, after each line you have a valid pipeline which you
         | can transform further by adding more steps/lines to your
         | pipeline. This matches more closely how people construct data
         | pipelines in R using dplyr/tidyverse and in Python using
         | Pandas.
         | 
         | If you find that it doesn't map well onto how you think about
         | data pipelines then please let us know as we're constantly
         | looking for more real world examples to help us iterate on the
         | language!
        
           | hui-zheng wrote:
           | One benefit of SQL is that the Database Engine will do the
           | hard work of optimizing the query plan.
           | 
           | Do you think the SQL complied by PRQL could be as effective
           | and optimized by database engine as the direct-written SQL?
        
             | snthpy wrote:
             | As you said, let the Database Engine do the hard work of
             | optimizing the query plan for you.
             | 
             | I currently have no reason to believe that the PRQL
             | generated SQL would be any worse than hand written SQL.
             | That said, I don't think we've currently looked at any ways
             | of passing hints to the query planner. We're always open to
             | suggestions!
             | 
             | In the worst case, you have full access to the generated
             | SQL, and for absolutely crucial queries you can hand modify
             | that SQL. At least PRQL might have saved you the trouble of
             | writing a cumbersome window function or something like that
             | (see for example the example of picking the top row by some
             | GROUP BY expression).
        
           | Cilvic wrote:
           | This reminds me of KUSTO I'm not sure how it compares to SQL
           | in general. But it was really fun to work with for querying
           | Azure application insigts
        
         | ithrow wrote:
         | To avoid working with SQL strings.
        
           | hui-zheng wrote:
           | SQL/jinja like dbt could also avoid working with SQL strings.
           | what would be the better advantage?
        
             | snthpy wrote:
             | There is already an integration for dbt:
             | https://github.com/prql/dbt-prql
             | 
             | For example                   {% prql %}         from
             | source = {{ source('salesforce', 'in_process') }}
             | derive expected_sales = probability * value         join {{
             | ref('team', 'team_sales') }} [name]         group name (
             | aggregate (sum expected_sales)         )         {% endprql
             | %}
             | 
             | would appear to dbt as                   SELECT
             | name,           SUM(source.probability * source.value) AS
             | expected_sales         FROM           {{
             | source('salesforce', 'in_process') }} AS source
             | JOIN {{ ref('team', 'team_sales') }} USING(name)
             | GROUP BY           name
             | 
             | dbt is definitely a use case we are very aware of and I am
             | personally very keen on (since I use that in my $dayjob).
             | With some of the ideas in
             | https://github.com/prql/prql/issues/381 , I think PRQL
             | could really shine in this area!
             | 
             | With your contribution we can get there faster!
        
         | tstack wrote:
         | I work on a TUI logfile viewer that uses SQLite as a backend
         | for doing analysis on the log messages (https://lnav.org).
         | However, writing SQL interactively is painful since you can't
         | really provide good auto-complete or preview, which is
         | something I try to provide for most other operations.
         | 
         | The PRQL pipeline syntax would make for a much better
         | experience for lnav since you're able to progressively refine a
         | query without having to jump around. (You've probably noticed
         | that many log services, like Sumologic, already provide a
         | pipeline-style syntax instead of something SQL-like.) The nice
         | thing is that you can simply keep typing to get the results you
         | want and get a preview at each stage. For example, entering
         | "from" and then pressing <TAB> would make it clear to the
         | program that table-names should be suggested. The program could
         | then show the first few lines of the table. Typing "from
         | syslog_log | filter " and then pressing <TAB> would make it
         | clear that columns from the syslog_log table should be
         | suggested (along with some other expression stuff). And, then,
         | the preview of the filtered output could be shown.
         | 
         | In the current implementation, pressing <TAB> just suggests
         | every possible thing in the universe, whether it's appropriate
         | or not. This leaves the poor with not much help after they've
         | typed "SELECT". I find myself having to lookup docs/source to
         | figure out column names or whatever and I wrote the darn thing.
         | Ultimately, I think the analysis functionality just doesn't get
         | used because interactively writing SQL is so user-hostile. So,
         | I'm looking forward to seeing this succeed so that I can
         | integrate it and still be able to use SQLite in the backend.
        
       | davidw wrote:
       | A good example might be a groupwise maximum. Those always tend to
       | be a bit of a PITA in SQL if you're not writing them regularly.
       | Be interesting to see what it transpiles to, as well.
        
         | aerzen wrote:
         | If you only want maximum of one column, the PRQL is quite
         | simple:                   from my_table         group column_a
         | (           aggregate (max column_b)         )
         | 
         | If you want the row with the maximum value it gets interesting:
         | from my_table         group column_a (           sort
         | [-column_b]           take 1         )
         | 
         | You can read more about group here: https://prql-
         | lang.org/book/transforms/group.html
        
           | beaugunderson wrote:
           | Opened an issue as I couldn't get this to work against
           | sqlite:
           | 
           | https://github.com/prql/prql/issues/695
        
             | maximilianroos wrote:
             | Thanks a lot for testing and opening an issue! This is now
             | fixed and released [1]. Let us know if you still face any
             | problems.
             | 
             | (We need better tests against real DBs, which is very much
             | on our roadmap)
             | 
             | [1]: https://github.com/prql/prql/pull/698
        
               | beaugunderson wrote:
               | Thank you for the amazingly quick fix!
        
       | rackjack wrote:
       | Obligatory dismissive comment:
       | 
       | > 0.2
       | 
       | No it ain't (in production).
       | 
       | Anyway, this looks great. I LOVE the fact that you've provided a
       | book too. Consider me a fan!
        
         | maximilianroos wrote:
         | We're definitely not ready for production! Sorry if that was
         | implied.
         | 
         | But we are ready for people to start using it in their
         | development work. Lmk if there's a better way of describing
         | that.
        
       | pgt wrote:
       | For those interested in database query languages, it is worth
       | knowing about Datalog, the query language behind Datomic, XTDB
       | and Datahike: http://www.learndatalogtoday.org/
       | 
       | E.g. a parameterised aggregate query that retrieves the name and
       | average rating of a film starring cast members whose names match
       | the input names:                   [:find ?name (avg ?rating)
       | :in $ [?name ...] [[?title ?rating]]          :where          [?p
       | :person/name ?name]          [?m :movie/cast ?p]          [?m
       | :movie/title ?title]]
       | 
       | To reveal the answer, click on tab labelled "3" and then "I give
       | up!": http://www.learndatalogtoday.org/chapter/7
        
       | paol wrote:
       | Here's one suggestion: SQL tediously requires specifying the
       | equality condition on joins, when 90% of the time you just want
       | to join on the fk defined between the tables.
       | from a       join b
       | 
       | should implicitly join on the FK if no condition is given.
       | 
       | It would require knowledge of the schema. I don't know if this is
       | possible in PRQL, or if the transpilation to SQL has to be
       | stateless.
        
         | CuriousSkeptic wrote:
         | SQL has that actually
         | 
         | select * from a natural join b
         | 
         | (not based on fk constraints though, it will join on all
         | attributes with the same name in the relations)
        
         | smallnamespace wrote:
         | If you're willing to sacrifice economics elsewhere, repeating
         | the table's name in the id column is one workaround:
         | from a       join b on b.a_id = a.a_id
         | 
         | You can even use NATURAL JOIN if you can guarantee that the
         | only fkey/pkey names will overlap between tables.
         | 
         | An unreasonable way to achieve that is to put the table name in
         | every column. A more palatable way is to write some clever
         | functions in your schema to scan the information table look for
         | column name clashes (you essentially write a tiny "linter"
         | inside your schema).
        
           | hyperman1 wrote:
           | If you have identical field names, you can do in sql:
           | Select * from a join b using (a_id)
           | 
           | Don't do this in Oracle though, pain follows when you try to
           | touch an a_id column.
        
             | snthpy wrote:
             | You can also do this in PRQL:                   from a
             | join b [a_id]
             | 
             | is the equivalent query.
        
         | aerzen wrote:
         | Hello another contributor here!
         | 
         | Compilation does have to be stateless (for performance
         | reasons), but we are planning to add some kind of schema
         | definitions which could also specify foreign keys.
         | 
         | So joins without conditions would be possible, we'll look into
         | it!
         | 
         | What do you think should happen if there are multiple foreign
         | keys connecting the two tables? Should this also work for many-
         | to-many relations with an intermediate table?
        
           | JohnDeHope wrote:
           | "What do you think should happen if there are multiple
           | foreign keys connecting the two tables? Should this also work
           | for many-to-many relations with an intermediate table?"
           | 
           | If it's not ambiguous, then let me do it. If I rely on
           | ambiguity then throw an exception. In the case of multiple
           | foreign keys, throw an exception, as there's no way to know
           | which one I mean. It'd be nice if I could disambiguate the
           | situation though. Normal SQL allows the `on` clause.
           | from TableA       inner join TableB on <expression>
           | 
           | What if I could specify a foreign key constraint just as
           | easily...                 from TableA       inner join TableB
           | by ConstraintC
           | 
           | Where ConstraintC is the name of a foreign key constraint
           | between Table A and Table B. It'd be nice to specify the
           | constraint without having to specify the column name details.
           | 
           | The same goes for the many to many relationship with an
           | intermediate table. It could look something like this...
           | from TableA       inner join TableB through TableC
           | 
           | I wouldn't introduce TableC into the scope of the statement.
           | It's not in the FROM clause. It's used in the query but is
           | not available for selecting from. If you want to bring in
           | columns from it, join on it the usual way.
           | 
           | As applications grow, and initially simple lookup table
           | semantics get more nuanced, it might be nice to be able to
           | constrain the join on the lookup table like this...
           | from TableA       inner join TableB through TableC where
           | <expression>
           | 
           | That way if my TableC has some extra columns, such as
           | effective dates, or deleted flags, or that sort of thing,
           | then I can filter out some of the joins that might usually
           | happen.
        
             | kbenson wrote:
             | Unambiguous things can become ambiguous at later points. As
             | soon as you add a second relation between the tables, what
             | once was unambiguous now is, and because of something which
             | may be entirely unrelated to the specifics of the original
             | query.
             | 
             | This is where many conveniences that use implicit data run
             | into problems. A small convenience now for the possibility
             | of accidentally breaking because of mostly unrelated
             | changes later is a poor trade off for anyone that wants to
             | have stable and consistent software.
             | 
             | This is likely one of those cases where you're better off
             | with tooling to help make writing the correct unambiguous
             | code easier (or automated away) than introducing a feature
             | which leads to less stable systems in some cases.
             | 
             | Edit: Along the lines of what you note at the end, I would
             | rather see joins able to use named relations as defined in
             | the schema. Of there's a relation from table movie to table
             | actor specifically names roles in the schema, I would
             | rather be able to join movie on roles and have actors
             | joined correctly using that relation, and aliases to roles
             | which I could then use. Then you're using features that are
             | designed and stable and not implicit and subject to
             | changing how or whether they function based on semi-
             | unrelated changes.
             | 
             | That might look like: "from movie relate roles" which is
             | equivalent to "from movie join actor roles on movie.id =
             | roles.movie_id", but because actor.movie_id has a
             | constraint in the schema named roles which restricts it to
             | a movie.id already.
        
             | ximeng wrote:
             | One way to avoid constraint name collisions is to include
             | the base table and foreign table names and keys in the
             | constraint name separated by underscores, at which point
             | you don't save much by using the constraint in a join.
        
         | aidos wrote:
         | This is something you might end up regretting later.
         | 
         | It's annoying adding another foreign key later and then having
         | previously working queries fail at runtime due to an ambiguous
         | join condition.
        
           | paulhodge wrote:
           | Agree about not implicitly finding the join key. But as long
           | as we're brainstorming imaginary features, then maybe as part
           | of the schema, we could somehow declare the default join key
           | to use, for any given two tables. In most cases it's pretty
           | obvious what the best join key would be.
        
         | go_prodev wrote:
         | I agree with you that it's a pain writing join conditions with
         | many fields...
         | 
         | But I think that's a shortcoming of the client tool, rather
         | than the language.
         | 
         | If SQL tools auto completed the join conditions as best as they
         | could it would probably be a great help.
        
       | skybrian wrote:
       | I see that the JavaScript package is at [1] and it's implemented
       | by compiling the Rust code to WASM. That should eventually make
       | it pretty easy to run it.
       | 
       | It has a typescript definition file, but it looks like it's
       | autogenerated and a bit clunky. You get back a CompileResult and
       | have to call free() explicitly, it seems? That doesn't seem very
       | idiomatic for JavaScript.
       | 
       | Also, the links to the documentation and examples in the README
       | are broken.
       | 
       | [1] https://www.npmjs.com/package/prql-js
        
         | maximilianroos wrote:
         | Thanks a lot -- issue added:
         | https://github.com/prql/prql/issues/708
        
         | aerzen wrote:
         | That's true - the package is auto-generated using
         | [wasmpak](https://github.com/rustwasm/wasm-pack), that's why
         | TypeScript definitions are clunky. I did the initial prql-js
         | release and I'm actually not sure about the free() issue you
         | are talking about.
         | 
         | We are currently working on compiling it for both Node.js and
         | the browser target, and would be happy to see some advice if
         | you are familiar with WASM!
        
           | skybrian wrote:
           | I didn't actually try it out and I'm not all that familiar
           | with WASM. Here is the typescript I see (stripped of
           | boilerplate comments):
           | 
           | export function compile(s: string): CompileResult;
           | 
           | export class CompileResult { free(): void;
           | readonly error: CompileError | undefined;            readonly
           | sql: string | undefined;
           | 
           | }
           | 
           | What is the purpose of the free() method?
        
       | digisign wrote:
       | Thanks, I've frequently wanted a query language that was designed
       | after the 70s. The ideas are sound, but a modernized syntax with
       | variables to reuse subqueries would be lovely. This looks like
       | it.
       | 
       | I noticed one issue though... please don't copy the prefix of
       | f-strings! That only exists because Python boxed itself in and it
       | was literally the only ascii syntax left that could be used for
       | string interpolation. It's mildly ugly but the best that could be
       | done given those requirements. Not so here.
       | 
       | The way shells do it with single quotes producing literal strings
       | and double quotes available for interpolation has not been topped
       | imho. Triple quotes are a nice extension as well, not sure if
       | that made it in.
        
         | oarabbus_ wrote:
         | > a modernized syntax with variables to reuse subqueries would
         | be lovely.
         | 
         | CTEs provide this functionality already, don't they?
        
           | digisign wrote:
           | When I've needed them I've needed them for multiple
           | statements, once is not enough. Currently have to use plpgsql
           | for this, which is half awesome, half abomination. :-D A
           | single simple language sounds easier to learn.
        
         | aerzen wrote:
         | Interesting suggestion. We added f-strings because we already
         | had s-strings (pass trough to SQL) and r-strings (for raw
         | multi-line text).
         | 
         | And would you rather see "My {name}" or "My ${name}"? I
         | personally dislike the $ prefix for all variables and
         | interpolations...
        
           | psychoslave wrote:
           | Languages like Perl, Ruby and more offer plethora of
           | additional ways to encode interpolated strings. I especially
           | love the squiggly heredoc for multi line quotations
           | 
           | https://infinum.com/blog/multiline-strings-ruby-2-3-0-the-
           | sq...
        
           | Kinrany wrote:
           | Choose backticks as the quote style for interpolation to
           | attract Markdown fans and confuse the hell out of MySQL users
           | :D
        
           | digisign wrote:
           | The first one, the $ is redundant if braces required. Multi-
           | line could be triple quoted. SQL, not sure, maybe sql:'' ?
        
       | ComputerGuru wrote:
       | From a mathematical point-of-view are there any
       | transforms/operations (note: not end results, but actual
       | operations) that this can do that SQL can't or vice-versa?
        
         | scottlamb wrote:
         | I assume it can't do anything SQL can't, because they write "It
         | can be used with any database that uses SQL, since it
         | transpiles to SQL." Not sure about the reverse.
         | 
         | I'm used to SQL syntax, but this has definite appeal. As a
         | small example, I like that it starts with the "from" clause, so
         | autocomplete is more viable.
        
           | ComputerGuru wrote:
           | Thanks, I missed that it transpiles to SQL.
        
           | BeefWellington wrote:
           | Transpiling to SQL doesn't mean all the underlying SQL
           | features are being exposed to you.
        
             | scottlamb wrote:
             | Yes, that's what I meant by "not sure about the reverse".
        
         | aerzen wrote:
         | As said, currently PRQL transpiles to SQL, so all expressions
         | in PRQL are can be expressed in SQL. But not all SQL expression
         | can be translated back into PRQL - some intentionally and some
         | are just not yet implemented (UNION - i.e. vertical concat).
         | 
         | But we also have plans for doing things that some SQL databases
         | may not support, such as pivot (rows to columns).
        
           | coremoff wrote:
           | hopefully you'll forgive my pedantry - "union all" is
           | vertical concat - "union" without the "all" gives you the
           | distinct list
        
             | jhgb wrote:
             | One of the reasons why SQL is crap: there should be no
             | distinction between the two in relational algebra. A set of
             | {A, B, C, B, C} is the same as {A, B, C}.
        
               | ttfkam wrote:
               | Detecting duplicates has a cost you can't just hand-wave
               | away. UNION ALL tells the engine not to worry about it
               | and just output as it sees it, usually going faster.
               | Depends on your data needs.
        
         | Beltiras wrote:
         | This can be transpiled into SQL which makes it then trivial
         | that it can do everything SQL can do. SQL is Turing complete so
         | it can do anything PRQL can do.
         | 
         | EDIT: I'm sorry, I didn't realize that even if something
         | transpiles from one language to another it does not guarantee
         | that one language can generate all strings of another language.
         | But taking a look at the abstractions PRQL offers I would be
         | very surprised to find it not capable of it.
        
           | ComputerGuru wrote:
           | The second part of your statement is fine, but the first part
           | is just a complete fallacy.
           | 
           | I can transpile a pure language exposing only `if`, `while`,
           | and `for` with no standard library and no interop to C - that
           | definitely does not make it "trivial" that it can do
           | everything SQL can do.
        
             | Beltiras wrote:
             | I realized this after posting and edited the post. Thanks.
        
               | ComputerGuru wrote:
               | No problem. (I didn't downvote.)
        
       | MasterIdiot wrote:
       | I've seen similar solutions being built internally in multiple
       | companies, none with a syntax as well thought out as this.
       | Amazing work!
        
       | airtnp wrote:
       | This sounds like LINQ, or SparkSQL. Instead of a full new
       | language, it makes feel better to create libraries in languages
       | that supports embedded DSL easily.
        
       | Flimm wrote:
       | Looks fantastic.
       | 
       | There are a lot of rough edges when building a string
       | representing an SQL query in the programming language that you're
       | using. You have to be careful to avoid SQL injections, for
       | starters. Do the bindings for PRQL innovate at this level?
        
       | elforce002 wrote:
       | This is really a SQL renaissance.
        
       | roG0d wrote:
       | I've been contributing to this project on a few little things due
       | to my little knowledge level. But I felt like home with such a
       | good company of people!.
       | 
       | I not mature enough to fully appreciate the technical potential
       | of the project, but the good ambient, the kindness and the growth
       | potential is for sure worthwhile. I truly encourage everyone to
       | contribute!
        
         | cogman10 wrote:
         | In particular, I think this is looking pretty good and I'd want
         | to see even more complicated examples. For example, What do
         | window functions end up looking like? [1]
         | 
         | What about crazy operations like calculating percentile_cont?
         | [2]
         | 
         | Or just in general, how would "implementation specific" queries
         | end up looking?
         | 
         | [1] https://www.postgresql.org/docs/current/tutorial-
         | window.html
         | 
         | [2] https://docs.microsoft.com/en-
         | us/sql/t-sql/functions/percent...
        
           | maximilianroos wrote:
           | Great questions!
           | 
           | Window functions are here [1]. (We should add these to the
           | homepage too)
           | 
           | Implementation specific queries can be handled by the Dialect
           | parameter [2], though there's still lots of work to do to
           | build that out.
           | 
           | [1]: https://prql-lang.org/book/transforms/window.html
           | 
           | [2]: https://prql-
           | lang.org/book/queries/dialect_and_version.html
        
             | maximilianroos wrote:
             | Window example is now on the homepage, thanks for the
             | question: https://github.com/prql/prql/pull/692
        
       | jhgb wrote:
       | I wonder, why would you go for a "pipeline" of relational
       | operations, when it's strictly weaker than allowing for a tree of
       | operations? The way the examples seem to be written, a stack
       | machine would subsume the existing syntax, since you first
       | specify an operand (like "from employees") and then you specify
       | and operation (like "filter country == "USA"), where in a stack
       | machine an operation such as "from X" would put the relation X
       | onto the top of stack, whereas an operation such as "filter"
       | would then replace the top of stack with a transformed relation.
       | This could be extended by for example "join on ..." being simply
       | an operation consuming _two_ relations from the top of stack and
       | putting one result back, joining two pipelines into one.
        
       | jitl wrote:
       | I'm surprised that none of the examples on Github or the website
       | deals with join. I eventually found some in the "book" here:
       | https://prql-lang.org/book/transforms/join.html
       | from employees         join side:left positions [id==employee_id]
       | 
       | turns into                   SELECT           employees.*,
       | positions.*         FROM           employees           LEFT JOIN
       | positions ON id = employee_id
       | 
       | I would love to see joins worked into the main learning examples.
       | Without join, the examples lack a bit of the "relation" part; we
       | could just as easily be compiling a DSL to a chain of
       | `array.filter`, `array.reduce`, `array.map` calls. Joins are what
       | makes relational modeling interesting!
       | 
       | I would love to see Datalog/SPARQL-style implicit joins to make
       | graph traversals like "which users have edited documents I own?"
       | less verbose.
        
         | psychoslave wrote:
         | I always found that side:left/right should also be expressible
         | as rapport:antecedent/consequent as in propositional logic,
         | rather than limiting these relationships to the geometric
         | representation of Venn diagram.
         | 
         | And maybe a shorter alternative might be tie:arm/leg.
        
           | aerzen wrote:
           | I'm not sure if this is a joke, be we actually had a serious
           | an idea to replace side:left/right with nulls_left:true and
           | nulls_right:true
           | 
           | This part of the join operation should be an after thought -
           | just a flag after the central argument of the transform which
           | should be the condition you join over.
        
         | maximilianroos wrote:
         | Great point, we'll add that.
         | 
         | I don't think we do joins that much better than SQL does. We're
         | thinking whether there's potential there, maybe through
         | understanding foreign keys -- but we're being conservative
         | about introducing change without value.
        
           | igorkraw wrote:
           | I looked at the book after this and have to say, I'd heavily
           | recommend spending the next dew months _just_ improving joins
           | (and complex joins especially). Like GP says, relational
           | modelling is the interesting bit about SQL and I don 't feel
           | exaggerative in saying the only reason I use SQL are joins,
           | and so the only reason I'd introduce the complexity of your
           | project into my stack would be if it makes handling joins,
           | views and other aspects of relational modeling and slicing
           | nicer - one example could be many to many relationships, or
           | the gradient between graph/document based and normalised
           | table based modeling
        
           | maximilianroos wrote:
           | Added: https://github.com/prql/prql/pull/697
        
         | [deleted]
        
       | [deleted]
        
       | p33p wrote:
       | This looks great. I've thought about something similar to this
       | for quite a while now. Column autocomplete is key for me from a
       | quality of life perspective and to make it truly usable.
       | 
       | I'd absolutely love to see the next level of this pipeline be
       | continued where something like Observable Plot or ggplot2 like
       | functionality where you can take your pipeline data analysis and
       | directly plot it to visualize it.
        
       | 1st1 wrote:
       | I also recommend looking at EdgeQL --
       | https://www.edgedb.com/showcase/edgeql -- a new query language
       | aimed to eliminate some of the SQL quirks.
       | 
       | (I'm a co-founder)
        
         | maximilianroos wrote:
         | I'm a huge fan of EdgeDB!
         | 
         | Possibly our focus is a bit different -- I see EdgeDB as
         | primarily focused on transactional queries, whereas PRQL is
         | very focused on analytical queries. PRQL doesn't do quite as
         | much -- e.g. we don't model the relationships between entities,
         | which is less functional but more compatible.
         | 
         | Feel free to reach out on Twitter if you think there's some way
         | of us collaborating, or if you have any feedback or guidance
         | for us.
        
       | hardwaregeek wrote:
       | I've thought about building a better query language too. I'd love
       | the ability to model sum types in databases, something like:
       | enum SchoolType {            College {                degrees:
       | Vec<Degree>            },            HighSchool         }
       | 
       | It's such a common pattern and yet it's so annoying to model in a
       | normal relational database. I wouldn't be surprised if the rise
       | of NoSQL is tied to the inability of relational databases to
       | model basic patterns like this.
       | 
       | Part of me has wondered if a language is the solution. Maybe just
       | a better query builder with support for sum types is necessary.
       | But I suppose there's something useful about having a consistent
       | model based around a language, even if people aren't writing the
       | language directly.
        
         | jhgb wrote:
         | I'm not sure that relational databases are "unable" to model
         | something like this, since I recall from years ago Date
         | describing how to do something like this. Don't remember the
         | details, but you might want to look into Date's writings.
        
         | BeefWellington wrote:
         | What is annoying about implementing something like this in a
         | relational database?
        
           | AtlasBarfed wrote:
           | Auto-completion sucks in a lot of sql statements because the
           | table provides all the hints that good autocompletion would
           | need to provide good suggestions.
           | 
           | That sounds like a nitpick, but man is it useful when you
           | need it.
           | 
           | Notice how the first thing in PRQL is the table declaration.
           | 
           | The fact that UPDATE and INSERT have different syntaxes for
           | basically specifying the same mutation operation is pretty
           | dumb.
        
             | andydd wrote:
             | Have you tried Datagrip?
        
           | vore wrote:
           | It's not straightforward to do polymorphic joins: one common
           | pattern is to have child tables for each case of the union,
           | but there's no integrity constraint such that each parent
           | must only have one child, e.g.                 CREATE TABLE
           | schools (id SERIAL PRIMARY KEY);       CREATE TABLE colleges
           | (id INTEGER NOT NULL REFERENCES schools (id));       CREATE
           | TABLE high_schools (id INTEGER NOT NULL REFERENCES schools
           | (id));
           | 
           | How can you ensure that a school is either a college or
           | high_school but not both?
           | 
           | Another alternative is to make one big table with check
           | constraints but that's also hairy in its own right:
           | CREATE TYPE school_type AS ENUM ('college', 'high_school');
           | CREATE TABLE schools (         id SERIAL PRIMARY KEY,
           | type school_type,         /* college columns */,         /*
           | high school columns */,         CHECK (type = 'college' AND
           | /* college column constraints */),         CHECK (type =
           | 'high_school' AND /* high school column constraints */)
           | );
           | 
           | The other thing in the grandparent's comment that's a
           | constant pain in SQL is representing an ordered list: how do
           | you insert items into the middle of the list? Depending on
           | your database, it can also be painful to renumber the other
           | items.
        
             | roller wrote:
             | A combined approach works if want to encode the exclusive
             | constraint:                   CREATE TYPE school_type AS
             | ENUM ('college', 'high_school');         CREATE TABLE
             | schools (           id SERIAL PRIMARY KEY,           type
             | school_type,           unique (id, type)         );
             | CREATE TABLE colleges (           id INTEGER NOT NULL,
             | type school_type default 'college',           check
             | (type='college'),           foreign key (id, type)
             | references school(id, type)         );
             | 
             | Ya, the syntax is annoying and repetitive. It would be nice
             | if foreign key could be a literal to remove the extra
             | column altogether. e.g.:                   foreign key (id,
             | 'college') references school(id, type)
        
               | vore wrote:
               | Good point, I hadn't thought of that. Thanks!
        
             | go_prodev wrote:
             | > How can you ensure that a school is either a college or
             | high_school but not both?
             | 
             | Do you have any real world scenarios where you've faced
             | this problem?
             | 
             | In your example, you wouldn't model it like that. A school
             | just needs an attribute that identifies the type of school
             | (high school or college), and other attributes that would
             | be common to both.
             | 
             | I'm sure there's lots of examples but it's late and I'm
             | struggling to think of one that a good normalized data
             | model couldn't handle.
        
             | rq1 wrote:
             | You add a xor non null check on the foreign keys?
        
         | andyferris wrote:
         | I totally agree with this. Interesting point about NoSQL!
         | 
         | I'm not sure if it's just the query language though - the
         | definition language needs to make creating columns that are sum
         | types trivial. For one-to-many data this might be a slight
         | generalization of foreign key (compound of table tag + foreign
         | key for that table). This can work for one-to-one data too, but
         | can be a bit annoying having lots of tables compared to doing
         | adding a couple nullable columns (plus there's also data
         | locality differences). I suppose a wrapper language that covers
         | both DDL and DML could work.
        
         | srcreigh wrote:
         | You'd have 1 table per sum type which requires extra data. Then
         | polymorphic foreign key (aka a pair of fields school_type,
         | school_id).
         | 
         | (No foreign key constraints, but those are falling out of use
         | in some cases due to inability to online migrate mysql schemas
         | anyways.)
        
           | ivank wrote:
           | You can retain foreign key constraints by having one column
           | per type of reference. It is also possible to ensure that
           | exactly one column of several is NOT NULL, so that the
           | columns can always be mapped to an enum in application code.
           | Also, in PostgreSQL, the storage for the extra NULLs uses
           | just one bit per column in a bitmap.                   CREATE
           | TABLE dirents (             parent         bigint   NOT NULL,
           | child_dir      bigint,             child_file     bigint,
           | child_symlink  bigint,             basename       text
           | NOT NULL,                      -- Ensure exactly one type of
           | child is set             CHECK (num_nonnulls(child_dir,
           | child_file, child_symlink) = 1),
           | CONSTRAINT dirents_child_dir_fkey     FOREIGN KEY (child_dir)
           | REFERENCES dirs (id),             CONSTRAINT
           | dirents_child_file_fkey    FOREIGN KEY (child_file)
           | REFERENCES files (id),             CONSTRAINT
           | dirents_child_symlink_fkey FOREIGN KEY (child_symlink)
           | REFERENCES symlinks (id),                      PRIMARY KEY
           | (parent, basename)         );
        
       ___________________________________________________________________
       (page generated 2022-06-27 23:00 UTC)