[HN Gopher] SQL language proposal: JOIN FOREIGN ___________________________________________________________________ SQL language proposal: JOIN FOREIGN Author : JoelJacobson Score : 142 points Date : 2021-12-30 17:54 UTC (5 hours ago) (HTM) web link (gist.github.com) (TXT) w3m dump (gist.github.com) | jeff-davis wrote: | I'm not sure this is a good idea. It means that a constraint on | the data (a FK is a constraint) affects the semantics of the | query. In SQL, those two concepts are seperate. | unbanned wrote: | >This improved in SQL-92: | | Did it? | JoelJacobson wrote: | Yes, I think so, at least according to: | https://learnsql.com/blog/history-of-sql-standards/ | | What makes you think otherwise? | antender wrote: | I, personally, feel like this is a rather pointless addition to | an already somewhat bloated language. It it isn't saving that | much typing (we have NATURAL JOIN's already, nobody is using | them), is kind of inconsistent with other language principles (we | should be able to join anything to anything if we need to, see | SQL-89) and limits JOIN flexibility (you don't have to join with | "=" operator, JOIN's supports different types of conditions and | are equivalent to WHERE clause) | thom wrote: | Of all the myriad indignities of SQL, this isn't near the top of | my list. I also don't like making the names of objects like | foreign keys and indexes first class concerns in your queries, | that's a whole new layer of cognitive overhead. | degenerate wrote: | Agree. I'd much prefer they worked on the way MySQL determines | what indexes to use on a multi-table join, so there could be | less emphasis on the need to ANALYZE TABLE and FORCE INDEX when | the DB is seemingly being dumb, but there's not enough | information in EXPLAIN to tell you where it's actually getting | tripped up. Troubleshooting full table scans on large sets of | data is a nightmare. | Tostino wrote: | Slightly different group of people working on MySQL vs this | proposal. | torgard wrote: | I dig it! | | I prefer defining tables like this: CREATE | TABLE category ( id int GENERATED ALWAYS AS IDENTITY, | name text ); CREATE TABLE post ( | id int GENERATED ALWAYS AS IDENTITY, category_id int | REFERENCES category (id) ON DELETE CASCADE | ); | | That is, category.id rather than category.category_id. But the | USING clause doesn't work with that style, as far as I | understand. | | This would make my queries nicer. | JoelJacobson wrote: | Much cleaner, I agree, that's a big win. Will add that to the | list of benefits. | | Will also add a "Drawbacks / Remaining issues" section to the | Gist, from all the valuable comments so far in this thread, | thank you all, positive as well as negative comments, all very | helpful. | thom wrote: | I've been quite happy having fully prefixed column names for a | long time now. Makes joins easier, big views clearer, and | random exports more readable out of the box. Also in my case | it's also easier to line up unique column names with things | like Clojure specs but I accept that's a niche concern. | unbanned wrote: | Foreign keys exist for data integrity... something which is | rather opaque to query onterface... So why do you believe that | has place in a query? | JoelJacobson wrote: | Foreign keys both define relationships between tables, and also | enforce referential integrity. The discussion is about how we | could potentially mine various additional value from foreign | keys as an information resource, rather than just mostly being | about referential integrity. | hn_throwaway_99 wrote: | This is my "Thanks, I hate it" response. | | Reason being if you use the example they gave: | | SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f | JOIN FOREIGN f.films_did_fkey d | | You need to _implicitly_ know the table that films_did_fkey | points to, because 'd' is just a table alias. I can't think of | anywhere else in the SQL standard where you can introduce a table | alias without explicitly referencing the table. In my opinion | making code essentially unreadable unless you have other | background information is an antipattern. | JoelJacobson wrote: | I think your concern is addressed by the idea further down in | proposal; giving the foreign keys the same names as the | referenced tables. This example was provided: | | SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f | JOIN FOREIGN f.distributors d | hn_throwaway_99 wrote: | No, my concern is not that you _can_ also specify the table | name that explicitly states the name of both tables, it 's | that you have proposed a syntax where it is _possible_ to | leave off the explicit table name. | | If you got rid of the syntax that uses the underlying "magic" | of needing to know which table the foreign key points to, I'd | be more amenable. | JoelJacobson wrote: | There are two asymmetrical cases; one of them needs two | pieces of information, the other three. | | Having to always specify both tables would syntax wise be | redundant, but I agree it's worth considering it might be a | good thing to improve readability, which would help | especially in the case where the foreign key isn't/cannot | be given the same name as the referenced table. | mrweasel wrote: | Yeah, my concern is people in the real world. If you can | "magically" join on a foreign key and leave the table name | out, or still call the thing anything you want, some smart | ass will abuse the live crap out of it and I'll somehow be | left to deal with it. | | I like things to be explicit. Tell me what you're joining | and how you want to join it. What is the use case for this? | Other than saving some typing, and let's face it, sometimes | a little extra typing now, will save you a lot of trouble | later. The proposal claims: "The idea is to improve the SQL | language", but is it really better? | tapas73 wrote: | First, very valid concern regarding table name. | | > Other than saving some typing, | | I would like to point out the corectness aspect of the | proposal. | | Today foreign keys are enforced during inserts, updates, | deletes. (you just can't violate fk. this is so good) But | you can violate it in selects. (e.g. mistype column name, | forget to include a column) | | This proposal (or its adjustment) would allow to use fk | also during select. It's like static typing for join | conditions. | tqi wrote: | 1. Just because you can doesn't mean people will. 2. That | doesn't work if you need to have multiple keys pointing to | the same table (ie owner_id and secondary_owner_id pointing | to a users table) | mritchie712 wrote: | yeah, this saves like 10 characters of text in exchange for | massive confusion. | JoelJacobson wrote: | In this simple example yes, but there can be multiple columns | in a join. | | The syntax proposal i O(1) syntax wise compared to O(2n) for | a JOIN ON where n is the number of columns. | hn_throwaway_99 wrote: | But that "simple example" is like 95% plus of all my joins, | where I'm joining two tables on a foreign key. | JoelJacobson wrote: | Your data model must be a lot simpler than mine, congrats | ;) | unbanned wrote: | mjevans wrote: | In this example D is the target table, FK's natural | relationship could be aliased (specified multiple times under | different names). | ridaj wrote: | Another thing, this makes things harder to reverse. | | Some engines have join-order-dependent performance, so there | are instances where you would want to write | | ``` FROM orders LEFT JOIN customers USING (customer_id) ``` | | and others where you'd want to write | | ``` FROM customers RIGHT JOIN orders USING (customer_id) ``` | | Swapping join order with current syntax is relatively easy | since references in the same FROM clause are interchangeable. | But in this proposal, the reference to the joined table isn't | written out, so it would be pretty complicated to reverse join | order. | jmull wrote: | Agreed. Maybe something like this instead: | SELECT f.title, f.did, d.name, f.date_prod, f.kind | FROM films f JOIN distributors d ON FOREIGN | f.films_did_fkey | | and SELECT f.title, f.did, d.name, | f.date_prod, f.kind FROM distributors d | JOIN films f ON FOREIGN films_did_fkey | silon42 wrote: | +1 ... needed at least as an option where FK is not declared | magicalhippo wrote: | I had a similar suggestion the other day[1], except I chose | what to me sounded more English-sounding: | JOIN films f USING FOREIGN KEY | | Here the explicit foreign key, films_did_fkey in this case, | could be specified after FOREIGN KEY. This would be similar | in syntax to when you force an index for a select statement, | at least in the DB we use. | | [1]: https://news.ycombinator.com/item?id=29687565 | zeroimpl wrote: | Yeah I think something like this is the way to go. The | table name needs to be listed in a consistent manner like | all other joins, and I don't think the syntax should be | different depending on which order the tables are listed | in. | | With the above proposal, it seems the foreign key name | could be left out in the common case of there being only | one fkey between the two tables too. | magicalhippo wrote: | > With the above proposal, it seems the foreign key name | could be left out in the common case of there being only | one fkey between the two tables too. | | Yes, this was indeed my intention. The common case with | only a single matching foreign key constraint would not | require being explicit, but one could be if needed in a | natural way. JOIN films f USING FOREIGN | KEY films_did_fkey | JoelJacobson wrote: | This is the same idea as already posted in this thread | https://news.ycombinator.com/item?id=29687134, but with a more | in-depth explanation that couldn't fit in the comment field, and | with some syntax improvements, such as using "FOREIGN", which is | a reserved keyword, instead of the previously suggested "->" | notation. | | Thanks for all the valuable comments on last proposal. Excited to | hear what you think about this update. | nicoburns wrote: | I wish the SQL standard would add some basic syntax sugar, such | as support for trailing commas and allowing use of aliases in | more places. | [deleted] | btilly wrote: | Please, no. A common problem in data warehousing is that a design | with lots of foreign keys becomes slow to load. A standard | solution is to move the checks for referential integrity | elsewhere, then drop the foreign key constraint. This massively | improves load performance. | | This syntax change means that this solution can't be used because | you have no idea what random queries out there might rely on the | specific existence of a foreign key constraint for the definition | of the query. Thereby meaning that if a foreign key constraint | becomes a performance problem, we're stuck with it rather than | having a solution. | | Features have consequences. And I don't like the consequences of | making business rules that are now explicit in the query, be | instead implicit in the table design. | tapas73 wrote: | Lets have disabled FK then. | | postgre, seems to not have it, but the proposal could include | also "disabled FK" part. | | teradata, oracle, sql server already have option for FK | "disable/no check". | | https://docs.teradata.com/r/eWpPpcMoLGQcZEoyt5AjEg/df1PvVh6e... | https://docs.microsoft.com/en-us/sql/relational-databases/ta... | https://docs.oracle.com/cd/B28359_01/server.111/b28310/gener... | zzzeek wrote: | this was my thought too. but now you need an ALTER CONSTRAINT | NON ENFORCING or something like that so that the "constraints" | can be present declaratively but don't actually get used. | | which then looks a whole lot like you're just introducing | macros into SQL where you have some symbolic keywords that | expand out into pre-fabricated ON clauses. | JoelJacobson wrote: | Good point, but addressable: | | Simply decouple the relationship definition and referential | integrity check, allowing a user to drop the referential | integrity check if desired, but keeping the relationship | definition. | | I cannot see why you would not want to at least always store | the information a certain table/column(s) references some other | table/column(s) in the data model. Enforcing referential | integrity is probably good in general too, but I agree you | might need to disable it for some FKs, in some databases, like | PostgreSQL before they got FOR KEY SHARE locks. | hn_throwaway_99 wrote: | At this point you should realize your proposal is a non- | starter, and I didn't even realize btilly's objection | originally. | | I can't think of any other feature in SQL where the rules of | the query are actually dependent on something _not explicit | to the query itself_. Even USING and NATURAL are just | syntactic sugar that depends on the _structure_ of the table, | not on any underlying constraints. | | So, what you have proposed, "allowing a user to drop the | referential integrity check if desired, but keeping the | relationship definition" would be a _massive_ change to tons | of SQL tools out there as it 's a huge new feature, for some | minor syntactic sugar. Ain't gonna happen. | JoelJacobson wrote: | > Even USING and NATURAL are just syntactic sugar that | depends on the structure of the table, not on any | underlying constraints. | | Similar to how JOIN FOREIGN would depend on the _structure_ | of the data model, defined by tables, foreign keys, etc. | | > So, what you have proposed, "allowing a user to drop the | referential integrity check if desired, but keeping the | relationship definition" would be a massive change to tons | of SQL tools out there as it's a huge new feature, for some | minor syntactic sugar. Ain't gonna happen. | | Why would it be a problem from the tools perspective if the | foreign key wasn't actually enforced if the DBA insists on | temporarily disabling the enforcement of the FK? If the | tool would e.g. be used to insert a row, and the DB would | accept it, even though it would violate the FK, what do you | suggest would be the problem from the tools perspective? | | This is also not a new idea. It's already implemented in | MSSQL, see WITH NOCHECK. | [deleted] | hn_throwaway_99 wrote: | > Similar to how JOIN FOREIGN would depend on the | structure of the data model, defined by tables, _foreign | keys_ , etc. | | The point that everyone is making is there are _not_ | currently any SQL statements that depend on structural | information as defined in the foreign key relationships | when calculating the structure of the data. Furthermore, | there are already tons of tooling and processes that | depend on this fact, that your proposal would break, for | a teeny bit of less typing. | | Beating a dead horse at this point. | tqi wrote: | The person you are responding to was more direct/harsh | than was necessary, but I think it would be good to step | back for a moment and reflect on the feedback from this | community. You said "If someone can convince me this is a | bad idea, that would help me forget about all of this, so | I would greatly appreciate your thoughts, no matter how | negative or positive." I think there are enough valid | objections here to at least consider the idea that this | is not a clear improvement? | JoelJacobson wrote: | When I wrote that comment, the idea had even more flaws | than currently, at that time I suggested using "WITH" and | a new "->" operator. Thanks to new ideas coming from | other users in the PostgreSQL and Hacker News community, | those problems have been solved, and we now have less | remaining problems with the proposal. I'm really grateful | for all the help. | | Like I said in another reply, I will put together a | "Drawbacks / Remaining issues" section and update the | Gist, based on all replies. Perhaps the end result will | be Status Quo, but at least then we have documented the | reasons why this idea is a dead end. However, thanks to | all the improvements just during the last couple of days, | I feel really optimistic and motivated, so I think there | is a great chance we can solve the remaining issues | together if we try. | | To comment on the response from the direct/hash person: | | The point made by the user, "not currently any SQL | statements that depend on structural information as | defined in the foreign key relationships", is true, but I | don't see why that's an argument by itself against the | idea? | | I find the other argument, claiming there would be a | problem with tooling and processes, much more interesting | and I'm eager to fully understand it. I asked a question | in hope to do so, "what do you suggest would be the | problem from the tools perspective", but has so far not | received any reply. | seadan83 wrote: | This, the result of a query should not change depending on a | constraint. | zmmmmm wrote: | By that do you mean, should not vary in the data returned or | should not break? | | Personally I agree that changing a constraint shouldn't alter | the data returned. But I'm happy enough if it breaks in a | clear and verifiable manner. There are plenty of other | situations where adding a constraint will cause existing SQL | (if not queries) to break so its not really that much of a | change. | zmmmmm wrote: | Data warehouses have all sorts of different design approaches | based on their different requirements, wouldn't this just be | another one of them? I actually think that having a way to | alter the constraint to non-enforcing sounds like it'd be | better for your situation anyway because you can then have your | load performance but utilise the constraints with background / | delayed process that checks the integrity because the | information is still there in a standard form. | | Which sort of leads to .... I don't agree with your | characterisation of foreign key constraints as business rules. | They are genuine information about the structure of the data. | cogman10 wrote: | Absolutely agree. | | Further, whenever someone is adjusting table performance index | tweaks is almost always the first thing to tackle. | | Adding foreign keys into the query is just as bad as adding | indexes into the query (which, you can do in T-SQL, but | generally shouldn't). Indexes can be dropped, changed, or added | and you SHOULD be relying on the SQL optimizer to use the most | appropriate index. | | This feature appears to only save a bit of typing in the best | of scenarios. In the worst, an update/drop of a foreign key | will end up breaking a bunch of queries, which is insane. | Tostino wrote: | While I wish there were a way to easily refactor queries when | making these changes, this is not different than it works for | renaming/dropping a column..."In the worst, an re-name/drop | of a column will end up breaking a bunch of queries, which is | insane." | tapas73 wrote: | I will write a letter to santa asking to get this into sql. | During 10 years in data warehouse I constantly wondered why SQL | does not have something like that. | | Reducing verboseness is nice, but the main perk is the | _correctness_. | | Oh.. if I got a cent every time I found a bug in colleagues sql, | because of join accidentally multiplying/doubling rows... :-) | thanatos519 wrote: | I dunno. I preferred the SQL-89 syntax. | samtheprogram wrote: | Same here. I didn't even realize that was an option. As a | relative beginner to SQL, is there a real reason the JOIN | syntax was needed at all beyond special inner/outer joins? | Tostino wrote: | Sanity for your coworkers who may have to touch your SQL. | tester756 wrote: | While we are at it | | let's write SQL queries starting from FROM. | | `FROM users SELECT *` | | It'd allow tooling to provide IntelliSense better. | miohtama wrote: | While this might be a joke, the world has been ripe for a | replacement for SQL... for the last 20 years. | hdjrudni wrote: | I started writing a new SQL parser that just transforms the | query...this was one of the features. The other being a | syntax for group-wise max queries which are just crazy dumb | to write efficiently in MySQL. | vosper wrote: | I agree, though I'll give SQL a pass for this because it's old. | | But how the Javascript world ever thought that `import { | function } from 'library'` was better than `from 'library' | import { function }` I'll never know. Python got this right | long before anyone was even thinking about adding imports to | JS! | hdjrudni wrote: | Ya..that one is particularly sad because it was recent. Don't | know why they felt the need to botch that. | goscript wrote: | I agree but not with current JavaScript modules. i would | rather work with this: import { functionA } | from 'library'; import { functionB } from | '../utils/core/abc'; import { functionC } from './a'; | | over: from 'library' import { functionA }; | from '../utils/core/abc' import { functionB }; from | './a' import { functionC }; | jimbobmcgee wrote: | Is anyone actually likely to implement these based on some random | gist on the internet, or are you just screaming into the void? | | Because, if you have that kind of clout, I've got an INSERT/SET | syntax I'd like to put your way... | JoelJacobson wrote: | I'm a small PostgreSQL contributor since 2010 myself. I | probably can't write the whole patch myself, but if there is | enough interest, and if we can work out the details and address | the problems raised in this thread and elsewhere, I'm pretty | confident we can do it. If we ever get there, the next step | would be a reference implementation, probably in PostgreSQL, or | to discuss a proposal in the SQL committee. | jimbobmcgee wrote: | I'm often interested into what goes into changes to | committee-driven standards. | | To an outsider, proposing a change seems to require one to be | part of a shady cabal of Big-5 employees, skilled in the art | of hiding subtle, privacy-invading features into inscrutable, | plain-text RFCs. | | That or subjecting yourself to 30K+ what-abouters who deform | your suggestion into something unrecognisable. | | It's refreshing to see a straightforward, well-formatted | proposal (even if I do slightly prefer the `FROM table1 x | JOIN table2 y ON x.fk` syntax suggested in other comments). | JoelJacobson wrote: | > To an outsider, proposing a change seems to require one | to be part of a shady cabal of Big-5 employees, skilled in | the art of hiding subtle, privacy-invading features into | inscrutable, plain-text RFCs. | | I thought so too. Initially I just tried to get in contact | with someone at the Swedish Institute for Standards (SIS), | to see if it would be possible to send a proposal to | someone in the SQL committee, which I thought was nearly | impossible to become a member of. But as it turns out, SIS | explained I could actually join the Swedish working group, | and participate directly there, I just had to send in an | application and get the approval from my employer, since | there is a cost involved and you have to be a member via a | company. Turns out ISO is a very open and democratic | organization, just like Hacker News! :) | | I think this proposal could take years until it land, if it | ever does, in some form, if concerns can be addressed, but | SQL is here to stay for a while, so that doesn't scare me. | Tostino wrote: | I was following this proposal on the -hackers thread, | glad to see it getting traction here. I find it really | interesting to read that, it'll be even more interesting | to see how it plays out longer term. I'd love to see some | more progress made with the SQL language / syntax, and | having someone in the community actively engaging with | the committees seems like a great way forward. | JoelJacobson wrote: | > It's refreshing to see a straightforward, well-formatted | proposal (even if I do slightly prefer the `FROM table1 x | JOIN table2 y ON x.fk` syntax suggested in other comments). | | Thank you! /me feeling happy | tqi wrote: | Personally, I think SQLs verbose syntax is a good thing, and | increases clarity/interpretability. It reduces cognitive load to | have things explicitly listed out. Not having the table and | column names in the query itself makes it much harder to read and | understand queries without prior knowledge of the data model. | bendbro wrote: | I agree with your point about explicit columns being easier to | read, but I still sometimes prefer implicit columns. An | example: whenever you have numerous subqueries, all using the | same columns, implicit columns are easier to read since there | is less text on the screen. It's also less error prone to | change just one line rather than numerous lines. | | This feeds into my view of metaprogramming-like situations. | Whenever the code-time-view of a program differs significantly | from some runtime-state-view of the program, I think there | should be a code-time way to view and perhaps edit both the | code-time-view and some kind of runtime-state-view. A | programmer shouldn't have to waste time digging through | numerous files to evaluate what implementation slots into some | dependency injected class, or find out what structure ends up | in a python method parameter, or what a preprocessor directive | ultimately produces. I know IDEs can handle some of these | things, but I think better tools can be produced. | | More concisely, instead of approaching code as the single and | unchanging view of the program, perhaps it would help to | approach code as something more dynamic. I have no concrete | ideas as to how this would work. | rcoveson wrote: | FWIW we do have `NATURAL JOIN` already, which is a lot worse | than this proposal. | benjiweber wrote: | NATURAL JOIN is great because it's like a relational AND. How | else will you join with Table Dee and Table Dum? | https://twitter.com/benjiweber/status/1476629550608101384 | | It's just risky if you don't design your schema with | relational algebra in mind. | cpill wrote: | I'm 20 years I have never seen anyone use relational | algebra. schemas have to be self documenting albeit only in | the real world. | salawat wrote: | ...I do... All the time. I'm cleaning up a mess of a | namespace collision as we speak. I was the only one who | saw it coming. I've spent the better part of a month | getting people to believe it's an issue. | gavinray wrote: | SQL _is_ relational algebra, the core of relational | algebra is Selection, Projection, Cross-products, Unions, | and Set Differences. | | SQL is a language that implements Relational | Algebra/Relational Calculus | rcoveson wrote: | That is a really interesting case. Correct me if I'm wrong, | but I don't think this is a special case where you have to | use `NATURAL JOIN` over `JOIN ON`. The problem is just that | the Postgres SQL grammar (maybe most/all SQL grammars) | require ON or USING in a non-NATURAL join, so you have to | specify the (empty) equijoin predicate list yourself, i.e. | the identity value: `JOIN ON TRUE`. | | EDIT: I had another thought about this. | | I think people not designing with the relational algebra in | mind is the heart of the issue, specifically w.r.t. column | names. We know that namespaces are a hard problem, and a | consequence of that problem is that `NATURAL JOIN` as | specified in the relational algebra seems risky, or overly | magick-y. It makes what might be an unfortunate coincidence | (name collision) into something algebraically impactful. | | A foreign key join gets around the problem by keeping names | and namespaces out of it. It's really doing exactly what | `NATURAL JOIN` is supposed to do, but only in the subset of | cases where name collisions are meaningful, not | coincidental. | mpolichette wrote: | I'm with you on this one. I prefer it to be in your face simple | to understand over the terse-ness. | | In the gist example, I actually prefer the SQL-92 approach | where we are joining given an explicit comparison condition. | Every other implementation seems to be trying to hide details, | for what gain? Less typing? | | In order to use FOREIGN, you will need to know not just what | columns a table has, but also their configuration. Which would | also require that you have properly configured your tables. | While this shouldn't be a hard ask, it does add additional | dependency and makes use of this "tool" slightly less | "portable" between systems. | | I have unfortunately seen cases where people will only have | foreign keys un-enforced by their table config. As a dev, if | you're introduced to a new DB, you wont know immediately if you | can use this, and if things are configured wrong, you need to | make a pretty significant change to be able to use it. | | I don't see a lot of harm from adding this syntax however as | people are free to not use it and it relies on an existing | strict convention. | JoelJacobson wrote: | > I don't see a lot of harm from adding this syntax however | as people are free to not use it and it relies on an existing | strict convention. | | This is a good argument I will add to the list. | | Also interesting to read about un-enforced foreign keys. I | haven't used MSSQL myself, the DB in which I heard it's | possible, I've only been using PostgreSQL for the last 20 | years, and before that MySQL. | | I think the problems you describe is an argument against a | WITH NOCHECK feature, since it could be misused. Maybe it's | necessary in some databases still, but at least in | PostgreSQL, the FOR KEY SHARE lock solved all the issues with | concurrent updates we had at Trustly. The FOR KEY SHARE was a | _huge_ patch [1] written mainly by Alvaro Herrera. Thanks to | it, Trustly has never since had any performance problems with | foreign keys, and they have AFAIK not needed to drop any | foreign keys up until today due to locking /performance | problems. | | [1] https://www.commandprompt.com/blog/fixing_foreign_key_dea | dlo... | tapas73 wrote: | I would argue that conceptually it's not too different from | needing to know what tables/columns exist in datamodel. | | I get that it is not common now to care about FKs when writing | selects. But it could be. Tooling can be improved to help here. | (show fks, autocomplete) | | Btw. Everybody seems to concentrate on conciseness, but keep in | mind that this helps also with query correctness. | hnthrowaway0315 wrote: | What I really need is something like: | | SELECT -col1, -col14 FROM table LIMIT 50; | | Where the minus sign means I don't want these two columns. I | still don't see a way to do it easily (for Vertica and in | Datagrip). | stingraycharles wrote: | Simple, effective, and really useful in a lot of situations. I | like it! | bobbyi wrote: | If you have specific columns that you frequently want to ignore | (which, I think, is the common case of this), you could define | a view that selects all the columns except those and do your | queries against that view. | Eremotherium wrote: | Or create a function that returns a view of a table minus | named columns in situ if that's really a common case for you. | Just as an idea. | piaste wrote: | Similarly, I'd love some form of GROUP BY | every column except for <these> | | It feels silly when you are SELECTing a ton of columns, then | you add a JOIN to a many-to-one relationship which you want to | aggregate. Now you need to either make it a subquery (and hope | the optimizer doesn't screw up) or duplicate all your SELECT | _expression_ (not even the identifiers) into the GROUP BY. | Svip wrote: | You know, you can just write GROUP BY x, y, ... | SELECT t.i+1, count(*) FROM table t GROUP BY 1 | | 1 in this context means the first selected item (i.e. t.i+1). | I know this works in PostgreSQL. | edoceo wrote: | I use this. It gets messy after like 10 columns. I have a | trick in my ORM that copies the non-aggrigate columns into | the statement. Maybe your ORM has it? | baskethead wrote: | Queries like this won't break but will silently fail upstream | by missing data if column names change or are deleted. The | explicit nature of SQL ensures that it will break positively | which is a better type of failure in my opinion. | [deleted] | Groxx wrote: | whether that's desirable or not depends on the query and | what kinds of changes you make. though I could see it being | error-prone in the most common cases. | dragonwriter wrote: | What I really want is a special phrase MINIMAL GROUPING such | that GROUP BY MINIMAL GROUPING includes exactly the items | that it would be an error _not_ to include in the GROUP BY. | icambron wrote: | I wish it just had a "group by all the stuff I selected | without aggregation" shortcut | xyzzy_plugh wrote: | Isn't this just SELECT DISTINCT ...? | orthoxerox wrote: | No. What icambron and I would like is something that can | get rid of this group by: select | division_name, branch_name, dealer_id, dealer_name, | quarter, month, sum(total_paid) from divsions, | branches, dealers, transactions where ..... | --buncha joins group by division_name, | branch_name, dealer_id, dealer_name, quarter, month | | If I didn't want to group the result by division_name, | branch_name, dealer_id, dealer_name, quarter and month, | why would I put them in the select clause? | benjiweber wrote: | "group by 1,2,3,4,5,6" is at least a little more consise. | Sesse__ wrote: | What if you do SELECT a + b, SUM(x) FROM t1? Should the | implicit GROUP BY be on (a,b) or (a+b)? | coderzach wrote: | yeah, this is the right answer. Are there any examples | where this isn't what you want? I guess GROUPING SETS and | ROLLUP, but those could be special. | oweiler wrote: | Can't you use an alias in a group by? | Sesse__ wrote: | Not by the SQL standard, since projection (which creates | aliases) happens after aggregation. There are many | databases that allow it as an extension, though. | blondin wrote: | i would rather it written as SELECT * BUT or SELECT * EXCEPT or | even SELECT ALL BUT. | | SQL has always been that language that is easy to read. even | when you don't understand what the queries are doing. adding a | cryptic syntax like "-column" would make it less readable. | colinmhayes wrote: | EXCEPT is already a sql keyword | recursive wrote: | The given syntax is obviously a no-go. It's already unary | minus. | polygotdomain wrote: | What about SELECT * WITHOUT {columns_you_do_not_want} FROM... | | EXCEPT is already a keyword and has is used for set-based | operations, so I don't think it's good to overload it. | edgyquant wrote: | I like this fwiw | c06n wrote: | R's data.table has that. | | dt[1:50, -c('col1', 'col14')] | jdunstan wrote: | That's very close to SELECT *, which has it's own dangers. I | agree that it would be nice for exploration and testing, but | probably should not be used in production. | hnthrowaway0315 wrote: | The problem I found out is that SELECT * LIMIT 10 is | guaranteed to be needed so it's actually much more important | than whatever is in production (you have to explore a lot | before writing the production scripts). My idea is that | exploration should be as easy as possible. | | Things such as SELECT * EXCEPT col1, col2 are really a PIA to | write and can build up frustration level really quickly. | Certain IDEs such as Datagrip ease the process by providing | "macros" but they are not enough. | | Another thing is to generate useful boilerplates such as | SELECT col1 FROM table GROUP BY col1 ORDER BY col1 to explore | all unique values of col1. | ipaddr wrote: | What dangers lay in select *...? Too much data? | wiredfool wrote: | Minimal if you refer to columns by name. Risky if you rely | on column ordering. Can be too much data and a performance | issue if you have large columns that you're not using. | (Though, it's not much different if you're using an orm | that loads the whole object anyway), like Django or | sqlalchemy. | | I've done it for many years (using named columns/ | dictionaries as the result set) and its never been an | issue. | nvartolomei wrote: | Some have support for this. | | - BQ https://cloud.google.com/bigquery/docs/reference/standard- | sq... | | - CH https://clickhouse.com/docs/en/sql- | reference/statements/sele... | hnthrowaway0315 wrote: | Thanks, these really look neat. Unfortunately we only use | Vertica :/ | dolmen wrote: | > SELECT -col1, -col14 FROM table LIMIT 50; | | This is already valid SQL. Example: SELECT -col1, -col4 FROM | (SELECT 1 AS col1, 2 AS col4) AS tbl; | | Do you think seriously that a new meaning could ever be | attached to that syntax? | bawolff wrote: | Seems like minor syntactic sugar. I'm not opposed but really only | seems mildly useful at best. | pella wrote: | related: pgsql-hackers thread and context | | https://www.postgresql.org/message-id/flat/1aec0dd0-dc27-40e... | mi_lk wrote: | tangent question - what're some effective ways to learn how to | write SQL? | jdunstan wrote: | w3schools and the online documentation for any database are | good places to start (Postgres and SQL Server have pretty good | documentation). | | There are also easy-to-google SQL Puzzles, if you're looking | for something more advanced. | | I would also recommend learning about query plans and how to | read them, they're invaluable for query optimization. | orthoxerox wrote: | Basic SQL has lots of tutorials. Graduate from them to Markus | Winand's use-the-index-luke.com and modern-sql.com | Guest42 wrote: | Is this not something that should be written at the app level | rather than dml? | Tostino wrote: | Foreign keys in general? I am not sure I understand what you | mean... | Guest42 wrote: | Not something I've ever thought about but I think that it | could be done via macro, | | I have written tsql functions in c# and imagine that other | dialects have analogous functionality. | | It seems as though the point is to cut down on sql code, it'd | also be possible to query the foreign keys and create a data | structure that could feed a function for joins. | | I haven't thought out the specifics but think this type of | approach would be more practical than changing the sql | standard. | Tostino wrote: | At that point you are just building a feature into an "ORM" | of some sort. The entire point is to allow queries to be | written in a more concise way for this simple equality join | use case which is (total ballpark figure) ~half of the | joins in my system. | Ayesh wrote: | It looked very useful and intuitive at first, but on further | thought, I think the only time that you'd truly benefit from it | if you SELECT *. For other SELECT queries with explicit field | name list, you'd need to know the table the key constraint links | to anyway. | | SQL Views serve perfectly well for queries encouraged by the | schema itself, and i think they are more sophisticated and | practical way. | JoelJacobson wrote: | There is similar comment in the thread, suggesting both tables | should always be specified. | | Not sure what I prefer yet. The idea is the foreign key name is | usually the same as the referenced table, so should be an | infrequent problem. | zmmmmm wrote: | > The idea is the foreign key name is usually the same as the | referenced table | | I think this is where you are hitting some of your turbulence | here because lots of ORMs / schema management tools actually | generate completely cryptic fk names (sometimes based on the | hash of the columns or similar). Personally I think weighing | in legacy baggage like that too highly is a bad thing as it | creates enormous inertia. | JoelJacobson wrote: | Thanks for explaining, I have no experience of ORMs, always | written my queries manually. This was insightful. Thanks | also for the word "inertia", good one, will add to my | vocabulary. | [deleted] | Smotko wrote: | It might be just me, but I feel like remembering the foreign key | name is more difficult than remembering the columns that you need | in the ON clause. Especially since you can usually find the | column names by just seeing the data in the table (select * from | x) wheres seeing the foreign key names is much harder (show | create table x?). | | Also, if you use an ORM it will usually generate foreign key | names that are almost impossible to remember. | exabrial wrote: | I think this is an operator problem. You're using the wrong | tool for the job. | | TablePlus, SequelAce, the official MySQL client all support | cntrl-space autocompletion. I wish we used Postgres, but I | imagine the landscape is the same. The big box databases like | Oracle, DB2 undoubtedly having this tooling as well. | | That being said, here is our fk naming convention: `fk- | asset_types->asset_categories` which pretty states what's going | on and is easy to remember. | dolmen wrote: | SQL is not only written in an SQL client. SQL is also written | (and read from) embedded/mixed in an other programming | language were tooling is not always available. | | Having to know the names of foreign keys (in addition to the | column names of the 2 tables) is adding more cognitive load. | I don't think that is an improvement. | JoelJacobson wrote: | It would indeed be difficult to remember, but the proposal also | suggest changing the default naming convention for foreign | keys, to give them the same name as the referenced table. | | If using an ORM, I would guess this proposal isn't useful, | since then you wouldn't hand-write queries anyway, right? | Except when you want to override the queries generated by the | ORM? (I'm not an ORM user myself.) | to11mtm wrote: | Speaking as someone who has used ORMs in the past and | contributes to a LINQ Micro ORM... | | It might make tooling 'easier', but since backwards | compatibility has to be considered the actual value add is | questionable IMO. | | Most ORMs/MicroORMs will have tooling that sniffs out the DB | Schema including foreign keys, and if you are using those | bits (i.e. 'not hand written') most will do the right thing | today. I suppose you could include some extra syntax for | whatever DSL you're providing users.... | | IDK. Speaking as someone who is very comfortable in SQL, This | feels more like syntactic sugar than anything else. | rtpg wrote: | In a universe where foreign key index names are important we | would specify better names. | | I think stuff like "documents_by_user" as foreign key names and | explicit index usage would improve peoples awareness of how | indices get used and would generally be a positive | matteote wrote: | This reminds me of KEY JOIN in Sybase which, unlike this syntax, | does not require to specify the foreign key. I find Sybase's | syntax to be too implicit, and challenging to port to other | dialects; JOIN FOREIGN with explicit PK reference looks much | better. | | I wonder how it is expected to work with non-table references | (views, CTEs, subqueries), especially when the columns involved | in the foreign key (on either side) are not returned explicitly | by the referenced object. | JoelJacobson wrote: | > I wonder how it is expected to work with non-table references | (views, CTEs, subqueries) | | It's not, it's only for the special but common case of joining | two tables based on a foreign key. | halayli wrote: | It's a great proposition. However, I would suggest having it as a | prefix rather than postfix of the JOIN expression to reduce | implementation side effects and avoid ambiguity between keyword | 'FOREIGN' and a table called 'FOREIGN'. | | example: FOREIGN LEFT JOIN | Upitor wrote: | In my opinion this proposal seems only to consider simple cases, | but there are many not-so-simple relationsship types: | | Consider a 'sales' table which includes columns [time] and | [sold_by_employee_id], and a periodized 'employee' table which | includes columns [employee_id], [valid_from] and [valid_to] | columns. There is a perfectly valid relationsship between the two | tables, but you cant join them using only equal-statements (you | need a between-statement as well) | JoelJacobson wrote: | Nice example! The join you describe would remain as a JOIN ON. | | This is per design. Quote from the proposal: | | "The idea is to improve the SQL language, specifically the join | syntax, for the special but common case when joining on foreign | key columns." ... "If the common simple joins (when joining on | foreign key columns) would be written in a different syntax, | the remaining joins would visually stand out and we could focus | on making sure we understand them when reading a large SQL | query." | | So, the special non-equal based join condition you describe, | would become more visible, and stand out, allowing readers to | pay more attention to it. | | The hypothesis is most joins are made on foreign key columns, | so if we can improve such cases, a lot can be won. | DevKoala wrote: | But how could you accurately tell if some queries join on the | foreign key, but were written by someone without knowledge of | the new specification? | Upitor wrote: | I see. I should have read the article more carefully :-) | Upitor wrote: | Also, consider a 'sales' table with multiple references to a | 'calendar' table: [shipped_date], [order_date], [received_date] | JoelJacobson wrote: | Good example too, but this one can with benefit be written | using the JOIN FOREIGN syntax, you just need to give the | foreign keys suitable names such as e.g. "shipped_date", | "order_date", "received_date". Or, to remind you of which is | the referenced table, perhaps you want to include it in the | names, and the names would be "shipped_date_calendar", | "order_date_calendar", "received_date_calendar", but | personally I would prefer just "shipped_date" or perhaps even | just "shipped". | zoomablemind wrote: | I wonder if primary objective of this proposal is to increase | convenience and reduce the amount of typing that users need to | do, when using SQL inline? | | If indeed that's the goal, then it targets a rather specific | subset of users dealing with explorative/ad-hoc analysis on a | database. Once such analysis is done, the queries would usually | need to be formalized for robustness and to avoid ambiguities. | | Obviously, the whole train of queries would derail, should the FK | (which is just an index) be dropped for one reason or the other. | | The existing JOIN features are explicit at least on the level of | specified table structure. I believe, any constraint details in | such context will be, well, ...foreign. | | Perhaps, a simple solution to verbosity problem may be to use an | "intelligent" SQL client, which supports some form of | autocomplete and which may as well internally use as many | schema/data details as available. | | In anycase, thanks for making the proposal. I was not aware of | JOIN ... USING syntax. I often wanted some convenient way of | specifying homonymous join columns, as some schemes are | consistent in such namings. So typing JOIN on col1, col2... would | translate into equality joins between the listed tables. However, | again, there is ambiguity here... | abhchand wrote: | This feels like adding syntactic sugar in some sense. That is, a | layer of convenience to accomplish something that can already be | accomplished today. | | It also breaks the following: | | * It's better to be explicit than infer * Keep specifications | simple | slt2021 wrote: | not a good idea, constaints are one thing and joins are another | thing. plus I can join on conditions other than =. | | the problem that author is trying to solve can be easily solved | by a view: | | 1. Declare a view with all necessary JOINs once | | 2. select from view only what you need, aggregate what you want | | 3. Optimizer will throw out unnecessary stuff and optimize query | while all JOIN logic will be declared only once and will be | hidden inside the view | | plus each DBMS has its own flavor of SQL and will have its own | query optimizer nuances when dealing with joins, especially | nested via CTEs/views/lateral queries,etc. | Sesse__ wrote: | Your #3 is very optimistic. You can't just remove a join (be it | inner, left, or really any other type) just because you didn't | refer to any fields from one of the sides, and it's even harder | when aggregation is in the mix. There are cases where you can | without influencing the result, but they are special-case | optimizations and not universally supported across databases. | gigatexal wrote: | I like this a lot. It makes thing cleaner and more like explicit. | clintonb wrote: | How is this more explicit than specifying join columns today? | lolive wrote: | Coming from the graph database / semantic web area, I would | propose foreign key/ primary key relationships in the DB to be | detailed _and named_ in a schema description, and then queries | reference those relationships by name to define the needed joins. | criticaltinker wrote: | Yup totally agree - OP is an interesting proposal and has | provoked a lot of quality commentary regarding the tradeoffs of | clarity vs conciseness, implicit vs explicit, FK performance, | etc. | | But IMO you've raised _the_ important long term consideration - | do graph based schemas and query languages obviate the need to | model foreign keys explicitly? If this JOIN FOREIGN proposal is | an incremental step forward, what's the next big leap? ___________________________________________________________________ (page generated 2021-12-30 23:00 UTC)