[HN Gopher] Show HN: PRQL 0.2 - a better SQL ___________________________________________________________________ Show HN: PRQL 0.2 - a better SQL Hi everyone -- thanks for your interest in PRQL -- let us know any questions or feedback! We're excited to be releasing 0.2[1], the first version of PRQL you can use in your own projects. It wouldn't exist without the feedback we got from HackerNews when we originally posted the proposal. [1]: https://github.com/prql/prql/releases/tag/0.2.0 Author : maximilianroos Score : 239 points Date : 2022-06-27 17:03 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | ewuhic wrote: | Previous discussion: PRQL - A proposal for a better SQL | https://news.ycombinator.com/item?id=30060784 | nawgz wrote: | Is it true that this is somehow an analogue to how JS development | is really ultimately targeting browser-compatible-JS in the end, | even though we use the latest ECMAScript features & TypeScript in | development? I.e. is it expected someone writes PRQL and then | transpiles before executing against the database? Is there a REPL | one can use against a local Postgres or something? | maximilianroos wrote: | > Is there a REPL one can use against a local Postgres or | something? | | Somewhat -- you can use it in Jupyter now[1]; e.g.: | %%prql from p = products.csv group categoryID ( | aggregate [average unitPrice] ) | | This doesn't yet have the benefits we'd get from e.g. | autocomplete, so there's much more to do there. | | There's also a cool TUI in PyPrql[2] | | [1]: https://pyprql.readthedocs.io/en/latest/magic_readme.html | | [2]: https://pyprql.readthedocs.io/en/latest/readme.html | notimportant0 wrote: | I mostly work using T-SQL but I like PRQL. | | Are you able to add in the examples the following: | 1) Use of delimiters for names that include space, etc. I don't | know if PRQL uses double quote or square brackets. 2) | Use of two/three/four-naming convention to refer to servers, | databases, tables and columns. | exabrial wrote: | I don't really find SQL that difficult, other than NULL !=NULL, I | wish <=> could be switched to the default! | bornfreddy wrote: | Looks awesome! I don't think it adds much to SQL when the queries | are simple, but when you have this looong and complex query I can | totally see the appeal. | gigatexal wrote: | I'm a raw-SQL-no-ORM snob and I really like this. I'd like to see | it become more mainstream. | dvirsky wrote: | Nice. A few years ago I designed a query language to do | aggregations in RediSearch, and it's quite similar in its | structure. | https://redis.io/docs/stack/search/reference/aggregations/ | dmeijboom wrote: | I recently started implementing the Postgres protocol in Rust | (https://github.com/dmeijboom/postgres-conn). So I guess I'll be | experimenting with creating a Postgres proxy which translates | PRQL on-the-fly. | maximilianroos wrote: | That looks really exciting! Please keep us in touch with your | efforts and let us know if there's any way we can be helpful. | jeroen79 wrote: | SQL doesn't need fixing or improving if you ask me, its well | supported and just works. | vander_elst wrote: | Congrats for the milestone!! The syntax looks more intuitive than | SQL. Great to see viable alternatives to SQL! | ssalka wrote: | I just noticed the play on words "prequel" vs "sequel". Nice. | slotrans wrote: | Hot takes: SQL is great, actually. This thing isn't better. | breakfastduck wrote: | As much as I love SQL it can often be a pain and involve lots | of nested subqueries to do 'simple' things. I like this way | this abstracts it. | | Would I use this instead of proper SQL in a data warehouse / | large app? Maybe not. | | Would I use it to manually query DBs when I need some ad hoc | info? For sure. | notimportant0 wrote: | If this becomes a full-fledge DQL that can be used in a proc | or function in a running Postgres instance, I would use it in | production. | mellosouls wrote: | No, but it's not as foreign a paradigm or language to | (presumably?) its main target audience - developers. | [deleted] | emmelaich wrote: | Doesn't the pipelining mean that it's not declarative? At least | not in that part. | OJFord wrote: | Somehow I didn't see it coming - | | > pronounced "Prequel". | | - and I burst out laughing. Very good. | michelpp wrote: | It seems like an obvious next step for Postgres support would be | to make PRQL a stored Procedure Language. | | create function foo() returns bar as language prql $$<prql code | here>$$; | ranjanprj wrote: | Great project, was looking something like this | | but wish this was somehow encoded as JSON, so you could easily | build pipeline UI for complex SQL Generation. | cogman10 wrote: | Awesome to see the progress here. Looks like the language has | significantly matured since last it popped up on HN. | oarabbus_ wrote: | Is it correct there's no CASE WHEN and instead you have to define | a function using a ternary operator? CASE WHENs may be verbose | but when you have a dozen of them they're more readable and the | waterfall nature is far preferable to a giant block of ternary | clauses. | qolop wrote: | Why should I use this instead of SQL? | snthpy wrote: | That's a really good question! (and one we should probably | answer explicitly in the [FAQ](https://prql-lang.org/faq/) | rather than just implicitly) | | The README states that "PRQL is a modern language for | transforming data -- a simple, powerful, pipelined SQL | replacement. Like SQL, it's readable, explicit and declarative. | Unlike SQL, it forms a logical pipeline of transformations, and | supports abstractions such as variables and functions. It can | be used with any database that uses SQL, since it transpiles to | SQL." | | What that means to me is that PRQL more naturally maps onto how | I think about and work with data. | | Say I have some dataset, `employees`, and I want to answer some | questions about it like, for US employees, what is the maximum | and minimum salary and how many employees are there: | from employees filter country == "USA" | # Each line transforms the previous result. aggregate [ | # `aggregate` reduces column to a value. max salary, | min salary, count, | # Closing commas are allowed :) ] | | Moreover, after each line you have a valid pipeline which you | can transform further by adding more steps/lines to your | pipeline. This matches more closely how people construct data | pipelines in R using dplyr/tidyverse and in Python using | Pandas. | | If you find that it doesn't map well onto how you think about | data pipelines then please let us know as we're constantly | looking for more real world examples to help us iterate on the | language! | hui-zheng wrote: | One benefit of SQL is that the Database Engine will do the | hard work of optimizing the query plan. | | Do you think the SQL complied by PRQL could be as effective | and optimized by database engine as the direct-written SQL? | snthpy wrote: | As you said, let the Database Engine do the hard work of | optimizing the query plan for you. | | I currently have no reason to believe that the PRQL | generated SQL would be any worse than hand written SQL. | That said, I don't think we've currently looked at any ways | of passing hints to the query planner. We're always open to | suggestions! | | In the worst case, you have full access to the generated | SQL, and for absolutely crucial queries you can hand modify | that SQL. At least PRQL might have saved you the trouble of | writing a cumbersome window function or something like that | (see for example the example of picking the top row by some | GROUP BY expression). | Cilvic wrote: | This reminds me of KUSTO I'm not sure how it compares to SQL | in general. But it was really fun to work with for querying | Azure application insigts | ithrow wrote: | To avoid working with SQL strings. | hui-zheng wrote: | SQL/jinja like dbt could also avoid working with SQL strings. | what would be the better advantage? | snthpy wrote: | There is already an integration for dbt: | https://github.com/prql/dbt-prql | | For example {% prql %} from | source = {{ source('salesforce', 'in_process') }} | derive expected_sales = probability * value join {{ | ref('team', 'team_sales') }} [name] group name ( | aggregate (sum expected_sales) ) {% endprql | %} | | would appear to dbt as SELECT | name, SUM(source.probability * source.value) AS | expected_sales FROM {{ | source('salesforce', 'in_process') }} AS source | JOIN {{ ref('team', 'team_sales') }} USING(name) | GROUP BY name | | dbt is definitely a use case we are very aware of and I am | personally very keen on (since I use that in my $dayjob). | With some of the ideas in | https://github.com/prql/prql/issues/381 , I think PRQL | could really shine in this area! | | With your contribution we can get there faster! | tstack wrote: | I work on a TUI logfile viewer that uses SQLite as a backend | for doing analysis on the log messages (https://lnav.org). | However, writing SQL interactively is painful since you can't | really provide good auto-complete or preview, which is | something I try to provide for most other operations. | | The PRQL pipeline syntax would make for a much better | experience for lnav since you're able to progressively refine a | query without having to jump around. (You've probably noticed | that many log services, like Sumologic, already provide a | pipeline-style syntax instead of something SQL-like.) The nice | thing is that you can simply keep typing to get the results you | want and get a preview at each stage. For example, entering | "from" and then pressing <TAB> would make it clear to the | program that table-names should be suggested. The program could | then show the first few lines of the table. Typing "from | syslog_log | filter " and then pressing <TAB> would make it | clear that columns from the syslog_log table should be | suggested (along with some other expression stuff). And, then, | the preview of the filtered output could be shown. | | In the current implementation, pressing <TAB> just suggests | every possible thing in the universe, whether it's appropriate | or not. This leaves the poor with not much help after they've | typed "SELECT". I find myself having to lookup docs/source to | figure out column names or whatever and I wrote the darn thing. | Ultimately, I think the analysis functionality just doesn't get | used because interactively writing SQL is so user-hostile. So, | I'm looking forward to seeing this succeed so that I can | integrate it and still be able to use SQLite in the backend. | davidw wrote: | A good example might be a groupwise maximum. Those always tend to | be a bit of a PITA in SQL if you're not writing them regularly. | Be interesting to see what it transpiles to, as well. | aerzen wrote: | If you only want maximum of one column, the PRQL is quite | simple: from my_table group column_a | ( aggregate (max column_b) ) | | If you want the row with the maximum value it gets interesting: | from my_table group column_a ( sort | [-column_b] take 1 ) | | You can read more about group here: https://prql- | lang.org/book/transforms/group.html | beaugunderson wrote: | Opened an issue as I couldn't get this to work against | sqlite: | | https://github.com/prql/prql/issues/695 | maximilianroos wrote: | Thanks a lot for testing and opening an issue! This is now | fixed and released [1]. Let us know if you still face any | problems. | | (We need better tests against real DBs, which is very much | on our roadmap) | | [1]: https://github.com/prql/prql/pull/698 | beaugunderson wrote: | Thank you for the amazingly quick fix! | rackjack wrote: | Obligatory dismissive comment: | | > 0.2 | | No it ain't (in production). | | Anyway, this looks great. I LOVE the fact that you've provided a | book too. Consider me a fan! | maximilianroos wrote: | We're definitely not ready for production! Sorry if that was | implied. | | But we are ready for people to start using it in their | development work. Lmk if there's a better way of describing | that. | pgt wrote: | For those interested in database query languages, it is worth | knowing about Datalog, the query language behind Datomic, XTDB | and Datahike: http://www.learndatalogtoday.org/ | | E.g. a parameterised aggregate query that retrieves the name and | average rating of a film starring cast members whose names match | the input names: [:find ?name (avg ?rating) | :in $ [?name ...] [[?title ?rating]] :where [?p | :person/name ?name] [?m :movie/cast ?p] [?m | :movie/title ?title]] | | To reveal the answer, click on tab labelled "3" and then "I give | up!": http://www.learndatalogtoday.org/chapter/7 | paol wrote: | Here's one suggestion: SQL tediously requires specifying the | equality condition on joins, when 90% of the time you just want | to join on the fk defined between the tables. | from a join b | | should implicitly join on the FK if no condition is given. | | It would require knowledge of the schema. I don't know if this is | possible in PRQL, or if the transpilation to SQL has to be | stateless. | CuriousSkeptic wrote: | SQL has that actually | | select * from a natural join b | | (not based on fk constraints though, it will join on all | attributes with the same name in the relations) | smallnamespace wrote: | If you're willing to sacrifice economics elsewhere, repeating | the table's name in the id column is one workaround: | from a join b on b.a_id = a.a_id | | You can even use NATURAL JOIN if you can guarantee that the | only fkey/pkey names will overlap between tables. | | An unreasonable way to achieve that is to put the table name in | every column. A more palatable way is to write some clever | functions in your schema to scan the information table look for | column name clashes (you essentially write a tiny "linter" | inside your schema). | hyperman1 wrote: | If you have identical field names, you can do in sql: | Select * from a join b using (a_id) | | Don't do this in Oracle though, pain follows when you try to | touch an a_id column. | snthpy wrote: | You can also do this in PRQL: from a | join b [a_id] | | is the equivalent query. | aerzen wrote: | Hello another contributor here! | | Compilation does have to be stateless (for performance | reasons), but we are planning to add some kind of schema | definitions which could also specify foreign keys. | | So joins without conditions would be possible, we'll look into | it! | | What do you think should happen if there are multiple foreign | keys connecting the two tables? Should this also work for many- | to-many relations with an intermediate table? | JohnDeHope wrote: | "What do you think should happen if there are multiple | foreign keys connecting the two tables? Should this also work | for many-to-many relations with an intermediate table?" | | If it's not ambiguous, then let me do it. If I rely on | ambiguity then throw an exception. In the case of multiple | foreign keys, throw an exception, as there's no way to know | which one I mean. It'd be nice if I could disambiguate the | situation though. Normal SQL allows the `on` clause. | from TableA inner join TableB on <expression> | | What if I could specify a foreign key constraint just as | easily... from TableA inner join TableB | by ConstraintC | | Where ConstraintC is the name of a foreign key constraint | between Table A and Table B. It'd be nice to specify the | constraint without having to specify the column name details. | | The same goes for the many to many relationship with an | intermediate table. It could look something like this... | from TableA inner join TableB through TableC | | I wouldn't introduce TableC into the scope of the statement. | It's not in the FROM clause. It's used in the query but is | not available for selecting from. If you want to bring in | columns from it, join on it the usual way. | | As applications grow, and initially simple lookup table | semantics get more nuanced, it might be nice to be able to | constrain the join on the lookup table like this... | from TableA inner join TableB through TableC where | <expression> | | That way if my TableC has some extra columns, such as | effective dates, or deleted flags, or that sort of thing, | then I can filter out some of the joins that might usually | happen. | kbenson wrote: | Unambiguous things can become ambiguous at later points. As | soon as you add a second relation between the tables, what | once was unambiguous now is, and because of something which | may be entirely unrelated to the specifics of the original | query. | | This is where many conveniences that use implicit data run | into problems. A small convenience now for the possibility | of accidentally breaking because of mostly unrelated | changes later is a poor trade off for anyone that wants to | have stable and consistent software. | | This is likely one of those cases where you're better off | with tooling to help make writing the correct unambiguous | code easier (or automated away) than introducing a feature | which leads to less stable systems in some cases. | | Edit: Along the lines of what you note at the end, I would | rather see joins able to use named relations as defined in | the schema. Of there's a relation from table movie to table | actor specifically names roles in the schema, I would | rather be able to join movie on roles and have actors | joined correctly using that relation, and aliases to roles | which I could then use. Then you're using features that are | designed and stable and not implicit and subject to | changing how or whether they function based on semi- | unrelated changes. | | That might look like: "from movie relate roles" which is | equivalent to "from movie join actor roles on movie.id = | roles.movie_id", but because actor.movie_id has a | constraint in the schema named roles which restricts it to | a movie.id already. | ximeng wrote: | One way to avoid constraint name collisions is to include | the base table and foreign table names and keys in the | constraint name separated by underscores, at which point | you don't save much by using the constraint in a join. | aidos wrote: | This is something you might end up regretting later. | | It's annoying adding another foreign key later and then having | previously working queries fail at runtime due to an ambiguous | join condition. | paulhodge wrote: | Agree about not implicitly finding the join key. But as long | as we're brainstorming imaginary features, then maybe as part | of the schema, we could somehow declare the default join key | to use, for any given two tables. In most cases it's pretty | obvious what the best join key would be. | go_prodev wrote: | I agree with you that it's a pain writing join conditions with | many fields... | | But I think that's a shortcoming of the client tool, rather | than the language. | | If SQL tools auto completed the join conditions as best as they | could it would probably be a great help. | skybrian wrote: | I see that the JavaScript package is at [1] and it's implemented | by compiling the Rust code to WASM. That should eventually make | it pretty easy to run it. | | It has a typescript definition file, but it looks like it's | autogenerated and a bit clunky. You get back a CompileResult and | have to call free() explicitly, it seems? That doesn't seem very | idiomatic for JavaScript. | | Also, the links to the documentation and examples in the README | are broken. | | [1] https://www.npmjs.com/package/prql-js | maximilianroos wrote: | Thanks a lot -- issue added: | https://github.com/prql/prql/issues/708 | aerzen wrote: | That's true - the package is auto-generated using | [wasmpak](https://github.com/rustwasm/wasm-pack), that's why | TypeScript definitions are clunky. I did the initial prql-js | release and I'm actually not sure about the free() issue you | are talking about. | | We are currently working on compiling it for both Node.js and | the browser target, and would be happy to see some advice if | you are familiar with WASM! | skybrian wrote: | I didn't actually try it out and I'm not all that familiar | with WASM. Here is the typescript I see (stripped of | boilerplate comments): | | export function compile(s: string): CompileResult; | | export class CompileResult { free(): void; | readonly error: CompileError | undefined; readonly | sql: string | undefined; | | } | | What is the purpose of the free() method? | digisign wrote: | Thanks, I've frequently wanted a query language that was designed | after the 70s. The ideas are sound, but a modernized syntax with | variables to reuse subqueries would be lovely. This looks like | it. | | I noticed one issue though... please don't copy the prefix of | f-strings! That only exists because Python boxed itself in and it | was literally the only ascii syntax left that could be used for | string interpolation. It's mildly ugly but the best that could be | done given those requirements. Not so here. | | The way shells do it with single quotes producing literal strings | and double quotes available for interpolation has not been topped | imho. Triple quotes are a nice extension as well, not sure if | that made it in. | oarabbus_ wrote: | > a modernized syntax with variables to reuse subqueries would | be lovely. | | CTEs provide this functionality already, don't they? | digisign wrote: | When I've needed them I've needed them for multiple | statements, once is not enough. Currently have to use plpgsql | for this, which is half awesome, half abomination. :-D A | single simple language sounds easier to learn. | aerzen wrote: | Interesting suggestion. We added f-strings because we already | had s-strings (pass trough to SQL) and r-strings (for raw | multi-line text). | | And would you rather see "My {name}" or "My ${name}"? I | personally dislike the $ prefix for all variables and | interpolations... | psychoslave wrote: | Languages like Perl, Ruby and more offer plethora of | additional ways to encode interpolated strings. I especially | love the squiggly heredoc for multi line quotations | | https://infinum.com/blog/multiline-strings-ruby-2-3-0-the- | sq... | Kinrany wrote: | Choose backticks as the quote style for interpolation to | attract Markdown fans and confuse the hell out of MySQL users | :D | digisign wrote: | The first one, the $ is redundant if braces required. Multi- | line could be triple quoted. SQL, not sure, maybe sql:'' ? | ComputerGuru wrote: | From a mathematical point-of-view are there any | transforms/operations (note: not end results, but actual | operations) that this can do that SQL can't or vice-versa? | scottlamb wrote: | I assume it can't do anything SQL can't, because they write "It | can be used with any database that uses SQL, since it | transpiles to SQL." Not sure about the reverse. | | I'm used to SQL syntax, but this has definite appeal. As a | small example, I like that it starts with the "from" clause, so | autocomplete is more viable. | ComputerGuru wrote: | Thanks, I missed that it transpiles to SQL. | BeefWellington wrote: | Transpiling to SQL doesn't mean all the underlying SQL | features are being exposed to you. | scottlamb wrote: | Yes, that's what I meant by "not sure about the reverse". | aerzen wrote: | As said, currently PRQL transpiles to SQL, so all expressions | in PRQL are can be expressed in SQL. But not all SQL expression | can be translated back into PRQL - some intentionally and some | are just not yet implemented (UNION - i.e. vertical concat). | | But we also have plans for doing things that some SQL databases | may not support, such as pivot (rows to columns). | coremoff wrote: | hopefully you'll forgive my pedantry - "union all" is | vertical concat - "union" without the "all" gives you the | distinct list | jhgb wrote: | One of the reasons why SQL is crap: there should be no | distinction between the two in relational algebra. A set of | {A, B, C, B, C} is the same as {A, B, C}. | ttfkam wrote: | Detecting duplicates has a cost you can't just hand-wave | away. UNION ALL tells the engine not to worry about it | and just output as it sees it, usually going faster. | Depends on your data needs. | Beltiras wrote: | This can be transpiled into SQL which makes it then trivial | that it can do everything SQL can do. SQL is Turing complete so | it can do anything PRQL can do. | | EDIT: I'm sorry, I didn't realize that even if something | transpiles from one language to another it does not guarantee | that one language can generate all strings of another language. | But taking a look at the abstractions PRQL offers I would be | very surprised to find it not capable of it. | ComputerGuru wrote: | The second part of your statement is fine, but the first part | is just a complete fallacy. | | I can transpile a pure language exposing only `if`, `while`, | and `for` with no standard library and no interop to C - that | definitely does not make it "trivial" that it can do | everything SQL can do. | Beltiras wrote: | I realized this after posting and edited the post. Thanks. | ComputerGuru wrote: | No problem. (I didn't downvote.) | MasterIdiot wrote: | I've seen similar solutions being built internally in multiple | companies, none with a syntax as well thought out as this. | Amazing work! | airtnp wrote: | This sounds like LINQ, or SparkSQL. Instead of a full new | language, it makes feel better to create libraries in languages | that supports embedded DSL easily. | Flimm wrote: | Looks fantastic. | | There are a lot of rough edges when building a string | representing an SQL query in the programming language that you're | using. You have to be careful to avoid SQL injections, for | starters. Do the bindings for PRQL innovate at this level? | elforce002 wrote: | This is really a SQL renaissance. | roG0d wrote: | I've been contributing to this project on a few little things due | to my little knowledge level. But I felt like home with such a | good company of people!. | | I not mature enough to fully appreciate the technical potential | of the project, but the good ambient, the kindness and the growth | potential is for sure worthwhile. I truly encourage everyone to | contribute! | cogman10 wrote: | In particular, I think this is looking pretty good and I'd want | to see even more complicated examples. For example, What do | window functions end up looking like? [1] | | What about crazy operations like calculating percentile_cont? | [2] | | Or just in general, how would "implementation specific" queries | end up looking? | | [1] https://www.postgresql.org/docs/current/tutorial- | window.html | | [2] https://docs.microsoft.com/en- | us/sql/t-sql/functions/percent... | maximilianroos wrote: | Great questions! | | Window functions are here [1]. (We should add these to the | homepage too) | | Implementation specific queries can be handled by the Dialect | parameter [2], though there's still lots of work to do to | build that out. | | [1]: https://prql-lang.org/book/transforms/window.html | | [2]: https://prql- | lang.org/book/queries/dialect_and_version.html | maximilianroos wrote: | Window example is now on the homepage, thanks for the | question: https://github.com/prql/prql/pull/692 | jhgb wrote: | I wonder, why would you go for a "pipeline" of relational | operations, when it's strictly weaker than allowing for a tree of | operations? The way the examples seem to be written, a stack | machine would subsume the existing syntax, since you first | specify an operand (like "from employees") and then you specify | and operation (like "filter country == "USA"), where in a stack | machine an operation such as "from X" would put the relation X | onto the top of stack, whereas an operation such as "filter" | would then replace the top of stack with a transformed relation. | This could be extended by for example "join on ..." being simply | an operation consuming _two_ relations from the top of stack and | putting one result back, joining two pipelines into one. | jitl wrote: | I'm surprised that none of the examples on Github or the website | deals with join. I eventually found some in the "book" here: | https://prql-lang.org/book/transforms/join.html | from employees join side:left positions [id==employee_id] | | turns into SELECT employees.*, | positions.* FROM employees LEFT JOIN | positions ON id = employee_id | | I would love to see joins worked into the main learning examples. | Without join, the examples lack a bit of the "relation" part; we | could just as easily be compiling a DSL to a chain of | `array.filter`, `array.reduce`, `array.map` calls. Joins are what | makes relational modeling interesting! | | I would love to see Datalog/SPARQL-style implicit joins to make | graph traversals like "which users have edited documents I own?" | less verbose. | psychoslave wrote: | I always found that side:left/right should also be expressible | as rapport:antecedent/consequent as in propositional logic, | rather than limiting these relationships to the geometric | representation of Venn diagram. | | And maybe a shorter alternative might be tie:arm/leg. | aerzen wrote: | I'm not sure if this is a joke, be we actually had a serious | an idea to replace side:left/right with nulls_left:true and | nulls_right:true | | This part of the join operation should be an after thought - | just a flag after the central argument of the transform which | should be the condition you join over. | maximilianroos wrote: | Great point, we'll add that. | | I don't think we do joins that much better than SQL does. We're | thinking whether there's potential there, maybe through | understanding foreign keys -- but we're being conservative | about introducing change without value. | igorkraw wrote: | I looked at the book after this and have to say, I'd heavily | recommend spending the next dew months _just_ improving joins | (and complex joins especially). Like GP says, relational | modelling is the interesting bit about SQL and I don 't feel | exaggerative in saying the only reason I use SQL are joins, | and so the only reason I'd introduce the complexity of your | project into my stack would be if it makes handling joins, | views and other aspects of relational modeling and slicing | nicer - one example could be many to many relationships, or | the gradient between graph/document based and normalised | table based modeling | maximilianroos wrote: | Added: https://github.com/prql/prql/pull/697 | [deleted] | [deleted] | p33p wrote: | This looks great. I've thought about something similar to this | for quite a while now. Column autocomplete is key for me from a | quality of life perspective and to make it truly usable. | | I'd absolutely love to see the next level of this pipeline be | continued where something like Observable Plot or ggplot2 like | functionality where you can take your pipeline data analysis and | directly plot it to visualize it. | 1st1 wrote: | I also recommend looking at EdgeQL -- | https://www.edgedb.com/showcase/edgeql -- a new query language | aimed to eliminate some of the SQL quirks. | | (I'm a co-founder) | maximilianroos wrote: | I'm a huge fan of EdgeDB! | | Possibly our focus is a bit different -- I see EdgeDB as | primarily focused on transactional queries, whereas PRQL is | very focused on analytical queries. PRQL doesn't do quite as | much -- e.g. we don't model the relationships between entities, | which is less functional but more compatible. | | Feel free to reach out on Twitter if you think there's some way | of us collaborating, or if you have any feedback or guidance | for us. | hardwaregeek wrote: | I've thought about building a better query language too. I'd love | the ability to model sum types in databases, something like: | enum SchoolType { College { degrees: | Vec<Degree> }, HighSchool } | | It's such a common pattern and yet it's so annoying to model in a | normal relational database. I wouldn't be surprised if the rise | of NoSQL is tied to the inability of relational databases to | model basic patterns like this. | | Part of me has wondered if a language is the solution. Maybe just | a better query builder with support for sum types is necessary. | But I suppose there's something useful about having a consistent | model based around a language, even if people aren't writing the | language directly. | jhgb wrote: | I'm not sure that relational databases are "unable" to model | something like this, since I recall from years ago Date | describing how to do something like this. Don't remember the | details, but you might want to look into Date's writings. | BeefWellington wrote: | What is annoying about implementing something like this in a | relational database? | AtlasBarfed wrote: | Auto-completion sucks in a lot of sql statements because the | table provides all the hints that good autocompletion would | need to provide good suggestions. | | That sounds like a nitpick, but man is it useful when you | need it. | | Notice how the first thing in PRQL is the table declaration. | | The fact that UPDATE and INSERT have different syntaxes for | basically specifying the same mutation operation is pretty | dumb. | andydd wrote: | Have you tried Datagrip? | vore wrote: | It's not straightforward to do polymorphic joins: one common | pattern is to have child tables for each case of the union, | but there's no integrity constraint such that each parent | must only have one child, e.g. CREATE TABLE | schools (id SERIAL PRIMARY KEY); CREATE TABLE colleges | (id INTEGER NOT NULL REFERENCES schools (id)); CREATE | TABLE high_schools (id INTEGER NOT NULL REFERENCES schools | (id)); | | How can you ensure that a school is either a college or | high_school but not both? | | Another alternative is to make one big table with check | constraints but that's also hairy in its own right: | CREATE TYPE school_type AS ENUM ('college', 'high_school'); | CREATE TABLE schools ( id SERIAL PRIMARY KEY, | type school_type, /* college columns */, /* | high school columns */, CHECK (type = 'college' AND | /* college column constraints */), CHECK (type = | 'high_school' AND /* high school column constraints */) | ); | | The other thing in the grandparent's comment that's a | constant pain in SQL is representing an ordered list: how do | you insert items into the middle of the list? Depending on | your database, it can also be painful to renumber the other | items. | roller wrote: | A combined approach works if want to encode the exclusive | constraint: CREATE TYPE school_type AS | ENUM ('college', 'high_school'); CREATE TABLE | schools ( id SERIAL PRIMARY KEY, type | school_type, unique (id, type) ); | CREATE TABLE colleges ( id INTEGER NOT NULL, | type school_type default 'college', check | (type='college'), foreign key (id, type) | references school(id, type) ); | | Ya, the syntax is annoying and repetitive. It would be nice | if foreign key could be a literal to remove the extra | column altogether. e.g.: foreign key (id, | 'college') references school(id, type) | vore wrote: | Good point, I hadn't thought of that. Thanks! | go_prodev wrote: | > How can you ensure that a school is either a college or | high_school but not both? | | Do you have any real world scenarios where you've faced | this problem? | | In your example, you wouldn't model it like that. A school | just needs an attribute that identifies the type of school | (high school or college), and other attributes that would | be common to both. | | I'm sure there's lots of examples but it's late and I'm | struggling to think of one that a good normalized data | model couldn't handle. | rq1 wrote: | You add a xor non null check on the foreign keys? | andyferris wrote: | I totally agree with this. Interesting point about NoSQL! | | I'm not sure if it's just the query language though - the | definition language needs to make creating columns that are sum | types trivial. For one-to-many data this might be a slight | generalization of foreign key (compound of table tag + foreign | key for that table). This can work for one-to-one data too, but | can be a bit annoying having lots of tables compared to doing | adding a couple nullable columns (plus there's also data | locality differences). I suppose a wrapper language that covers | both DDL and DML could work. | srcreigh wrote: | You'd have 1 table per sum type which requires extra data. Then | polymorphic foreign key (aka a pair of fields school_type, | school_id). | | (No foreign key constraints, but those are falling out of use | in some cases due to inability to online migrate mysql schemas | anyways.) | ivank wrote: | You can retain foreign key constraints by having one column | per type of reference. It is also possible to ensure that | exactly one column of several is NOT NULL, so that the | columns can always be mapped to an enum in application code. | Also, in PostgreSQL, the storage for the extra NULLs uses | just one bit per column in a bitmap. CREATE | TABLE dirents ( parent bigint NOT NULL, | child_dir bigint, child_file bigint, | child_symlink bigint, basename text | NOT NULL, -- Ensure exactly one type of | child is set CHECK (num_nonnulls(child_dir, | child_file, child_symlink) = 1), | CONSTRAINT dirents_child_dir_fkey FOREIGN KEY (child_dir) | REFERENCES dirs (id), CONSTRAINT | dirents_child_file_fkey FOREIGN KEY (child_file) | REFERENCES files (id), CONSTRAINT | dirents_child_symlink_fkey FOREIGN KEY (child_symlink) | REFERENCES symlinks (id), PRIMARY KEY | (parent, basename) ); ___________________________________________________________________ (page generated 2022-06-27 23:00 UTC)