[HN Gopher] Challenges students face when learning to work with ... ___________________________________________________________________ Challenges students face when learning to work with relational databases and SQL Author : gousiosg Score : 70 points Date : 2021-08-29 17:13 UTC (5 hours ago) (HTM) web link (www.growkudos.com) (TXT) w3m dump (www.growkudos.com) | sdevonoes wrote: | I takes days/weeks to pick up the core of SQL: | | - create tables, update the schema, insert rows, add an index | | - select, filters, joins, order by, limit, inner queries | | It takes forever to be comfortable with: | | - anything that involves summarizing, grouping, having, min, max, | windows | myspy wrote: | Something other which is hard is writing performant queries. | Using statements with subqueries/in syntax for example. | | And I always forget which join does what. | AdrianB1 wrote: | Remembering which join does what is easy: inner joins | strictly joins the tables, left takes all on left (first | table), right takes all on right (second table), outer (or | cross) join are so rarely used you don't need to memorize. | MeinBlutIstBlau wrote: | same with the joins. on paper it makes sense, in practice it | does not. if its more than a "select * from dbo.whatever | where column abc = 'thing'" i have to refer back to notes and | play with it. | [deleted] | AdrianB1 wrote: | Not really. I used to teach SQL not a long time ago and about | 1/4 of the trainees were getting up to speed fast, about half | in a reasonable time, the rest were there only because they | were sent there by their managers. | | I found that the most important success factors in learning SQL | is the analytical thinking of the trainee and the way the | trainer is explaining the concepts, in what order and what | examples are used (the best examples are the ones the trainees | meet in their regular work). | | The functions are simple, the only difficulty is to remember | the ones that are not used often enough (ex: some window | functions). Even in that case, a quick check in the | documentation is enough to get up to speed. The major | difficulty with SQL is to write efficient queries on large data | volumes, covered by the right indexes. This is very specific to | each RDBMS, especially because of the tools helping with the | work are specific (ex: SSMS, SQL Sentry Plan Explorer, | statistics parser etc). | MeinBlutIstBlau wrote: | If all you do is SQL, anybody can learn it quick. If you do | full stack, you're only gonna care about what gives you the | data you need at that time. | AdrianB1 wrote: | No offence, but a full stack's job is not to write good | SQL, it's to write enough SQL to get what is needed, then | the development DBA's job is to make it fast and efficient. | You don't need to be a great car mechanic to drive to the | office and back. | 542458 wrote: | Personally I feel that it took me a while to get really | comfortable with more complex joins. There's an problem they | used in the study that required joining a table with itself, | and honestly I would probably take a while to come up with that | answer, if at all. | weaksauce wrote: | A basic approach(probably what they are going for in a basics | study) would be something like this off the top of my head | select c.cid, c2.cid from customer as c inner | join customer as c2 on c.street = c2.street where | c.city <> c2.city | | though that has reflective duplicates say (1, 5) would also | have (5, 1) in the output. So I'm not sure if that's | "allowed" | fifilura wrote: | There are lots of cases where a join with yourself is | applicable, although they are mostly superseded by window | functions these days. | | For example normalisation (join with a groupby/sum of | yourself) or rank (join each row with all rows that have | lower value than yourself and count those rows). | | But as I mentioned above. A good start is to sketch that | out in excel. You will realize that what you need is | another column (e.g. total sum for this id). And from that | you can work yourself backwards to figure out what is the | table you need to join with to create that column. | weaksauce wrote: | I'm not sure if you are replying to the wrong person but | the question has nothing to do with a total sum of ids... | | the question was: "List all pairs of customer IDs who | live on a street with the same name but in a different | city." listed under self-join | | that said i haven't wrangled with raw sql in a spell so | the reading on window functions is interesting. | fifilura wrote: | I think one tip here is to always work with CTEs "WITH", and | not nest queries. That way you can always go back and check | "what is it I join with what", by querying the individual steps | with LIMIT 10. | | The other tip is to sketch the problem in excel/google sheets | when it gets hairy. Not the actual code (I don't have a clue | how to do that, others have), just the values in the different | steps. In the end it is only about rows and columns. | | But that said, these days a lot of it happens intuitively for | me, I pretty much know the solution before I can spell it out. | It certainly was not like that when I started. | | When you begin, "programming without for loops" feels like | programming with your right hand tied behind your back. But in | hindsight you get a lot of exercise in the immutable paradigms | of functional programming, working with comprehensions, sets, | maps folds comes very natural. | odipar wrote: | Yep, CTEs are a huge boon to structure your SQL - use them | where you can. | btilly wrote: | Meh, summarizing, grouping, etc aren't that hard. | | However WINDOW queries definitely have a learning curve. Not | the least because useful examples almost always require you to | use a nested query. | magicalhippo wrote: | I learned SQL on a need to know basis. For me, recursive | queries were the ones that needed the most time to click. | | Another one that caught me by surprise was NULL vs | unknown[1]. That bit me in a couple of queries. | | [1]: https://learnsql.com/blog/understanding-use-null-sql/ | pge wrote: | and worst of all, anything that involves vendor-specific | keywords... | ipaddr wrote: | One of the best ways to learn advanced oracle specifically is | through the ask tom q/a. The question are difficult and the | answers teach more than any course. | | https://asktom.oracle.com/pls/apex/f?p=100:1000:::::: | agumonkey wrote: | To me the most difficult part is learning sql before knowing what | can be done with a computer, both on the complexity and the | language design part. | | Before learning interpreters/compilers/prolog, I'd spend a lot of | time trying to figure out about naming/namespaces in queries, | while after doing some PLT, it all becomes very very obvious, you | can now focus on the operators and since you'd know how far can | programming go, you'd see faster how nested queries could make | sense, what aggregating functions meants etc | TrackerFF wrote: | I see they mentioned previous course knowledge - this is | something you see in many (programming) classes. | | Students that have zero prior knowledge in programming, are able | to pick up functional programming pretty easy. Students that have | studied and used paradigms like OOP, seem to have a hard time | grokking functional programming - as they see everything through | the lens of OOP (and the languages they've used). | btilly wrote: | Functional and OOP techniques do not seem to be best friends. | | http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom... | uses Java to discuss what this can look like in an extreme | case. | gnat wrote: | https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 has the actual | paper. | hahamrfunnyguy wrote: | In my experience, inexperienced database developers pick up SQL | fairly quickly under the guidance of an experienced mentor. | [deleted] | Spooky23 wrote: | This. | | Usually I see people struggling to formulate questions. They | know what they want, but don't understand how to get there. | Left to their own devices, they hack up some nightmare in | Excel. | | I worked with a summer intern on creating reports and learning | SQL. She was a really smart business major who ended up with | the wrong work assignment. I was getting 5-7 questions a day | from her in June, 1-2 a week in July and by the time I got back | from vacation in August, she had basically done about 90% of a | project that was going to be hired out and was showing me some | features of the database we were using that I didn't know! | | It inspired her to switch majors and she is a fancy data | scientist somewhere! Awesome mentor experience. | patrakov wrote: | Direct link to the research paper, instead of the summary: | https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 (pdf) | tracyhenry wrote: | SQL has a steep learning curve. It expose almost zero insights | into the underlying query execution. As a result, increasingly | amount of inefficient queries are being written by ML engineers, | who in general care little about query efficiency. The solution | right now seems to have a team of data engineers to optimize the | queries. | | Should we think about an alternative, at least for ML ETL | workloads? | tester756 wrote: | >SQL has a steep learning curve. | | Does it? I think SQL just sucks and its tooling sucks too | | Even SQL Management Studio which felt way better than PGAdmin | is miles behind IntelliSense that's offered by Visual Studio | for C# (when it comes to reliability) | | SQL would benefit a lot from being like C#'s LINQ (Query | syntax) e.g: | | var result = from s in stringList | where s.Contains("Tutorials") select s; | | some SQLv2 is something we need | keithnz wrote: | try DataGrip, it's got really nice intellisense and | autocomplete. I'm not sure how your example from linq is any | better than SQL select s from stringList | where s like '%Tutorials%' | tester756 wrote: | The difference is when you type "select (here)" then your | tool cannot really give you hints about column names in | table | | because you haven't specified the table yet. | | Meanwhile LINQ starts with table name. | | Ofc you can always go ahead and write queries in "different | order" | da39a3ee wrote: | https://opensource.google/projects/logica | dspillett wrote: | _> SQL has a steep learning curve._ | | Overall I don't think it is that steep, though maybe I'm | blinded by having worked with various implementations of it for | more than two decades. The key sticking point is jumping to | thinking in a set based manner to get best results. The rest of | the difficult parts are when you need to think about | implementation details because the query planners are no | perfect (index hints and such) or being aware of limitations | (like postgres before the latest major version having | optimisation fences around CTEs). | | _> It expose almost zero insights into the underlying query | execution._ | | That is pretty much by design. It is intended that you say what | you want and let the query planner worry about implementation | details. Of course how you tell it what you want involves | learning to express those intentions in SQL. It does fall apart | a bit when implementation limitations become an issue, at which | point you are forced to think about the underlying | implementation and how you might prod this more imperative code | so that it interprets and process your relational descriptions | most efficiently. | | _> As a result, increasingly amount of inefficient queries are | being written by ML engineers_ | | That isn't specific to ML. I see a lot of inefficient data | interaction from code written by other devs. This seems to be | for two reasons: | | 1. People seem to have taken to heart "make it work, make it | work correctly, only then worry about making it work fast" to | heart but tend to skip that last part and assume because all is | well with their test sets of data at hundreds or thousands of | rows (or sometimes tens and singles) that it'll scale just find | to the hundreds of thousands or more that the clients datasets | will eventually contain. | | 2. People using further abstractions without much care for how | they implement their directives (again, in an ideal world they | shouldn't have to), resulting in massively overcomplex queries | as the framework tries to be clever and helpful and preempt | what might be needed, getting _everything_ whether needed or | not (effectively `SELECT _`) meaning the query planner can 't | apply families of its internal tricks for better performance, | or getting many rows individually instead of as a set which | sometimes means a lot of extra work for each row. | | There is a definite "we'll worry about that when it happens | attitude in both cases which is dangerous. While a live system | has practically ground to a halt and the client needs their | report by EOP or someone will get it in the neck (and be sure: | they will pass that on to you!) is not a good time to be | optimising data access, or worse finding out the structure just | doesn't support efficient generation of the required data. | Another common failing is applying what would idealy be UI or | BLL concerns (timezone conversions etc) in the SQL statements | in a way that blocks index use. | | _> Should we think about an alternative, at least for ML ETL | workloads?* | | I don't work with ML so that is a little outside my day-to-day | wexpertise, but I'd wager ETL there has the same problem as | everywhere: the basics are all well known and very well | optimised for already. The rest differ so much between | applications that no one abstraction would be optimal for more | than a small portion of real world needs. | | I'd be wary of a separate team for optimising queries. I | suggest a reasonable understanding in the whole dev team with a | data expert embedded who is involved in design work and code | reviews so issues are caught early and junior devs can be | tutored as needed so by the time they are seniors they don't | need the data expert except for really gnarly problems or long- | term planning. | ttfkam wrote: | Step 1: "SQL sucks!" | | Step 2: Let's make a database engine that doesn't use SQL. | | Step 3: "This is hard!" | | Step 4: Make SQL access layer. | | Wash. Rinse. Repeat. | | See: PartiQL | | Those who ignore the lessons of SQL are doomed to reimplement | them...poorly. | ttfkam wrote: | Corollary: | | 1. "SQL doesn't scale!" | | 2. We made this database engine that's "web scale"! | | 3. "This is hard to use!" | | 4. Make SQL access layer. | | See: Spanner | | Those who blame SQL for their performance problems are doomed | to repeat them using a proprietary syntax. | odipar wrote: | My first encounter with 'SQL' was a course on relational algebra | that was taught at my university. | | It started out with defining relations as a mathematical | construct, and continued with various operators on such | relations. Then they continued explaining the various normal | forms up the fifth normal form. I was completely out of my depth, | but at least it was good and solid theory that could be learned. | | What really messed with my head is they then introduced SQL as a | 'practical' implementation of relational algebra. I'm still | having nightmares where I try to understand nested HAVING | statements that where asked at the exams. | | Hey relations don't contain duplicates! But that's OK. We should | call (modern) SQL 'BAG ALGEBRA'. | melony wrote: | Don't forget loops | DaiPlusPlus wrote: | I noticed that the article doesn't mention relational-calculus at | all, only relational-algebra. That's a huge oversight, imo - as I | feel one needs to understand both RA and RC in order to grok SQL | and other RC-like systems, like Linq in C#/.NET and List- | comprehensions in Python (or even use those _before_ RC /RA and | SQL). | | ------- | | Rather than improve how SQL is taught (which seems to be the | paper's objective), why not improve SQL so it isn't as horrible | to try to learn in the first place? | | The barriers to grokking SQL could be lowered considerably if SQL | made minor adjustments like moving the projection part of a | SELECT query to being below or syntactically after the WHERE | clause instead of being at the top, and making SQL more "natural" | to write-in without needing excessively verbose inner-derived- | table expressions when all you want is to do perform some | repetitive calculation which will be reused in later query steps. | | Also, the GROUP BY clause really needs to be renamed to | "AGGREGATE BY" or similar, because when normal people think | "group" they're probably thinking of sorting/ORDER BY or | PARTITION BY and they certainly don't imagine "don't display | these rows at all, lol". | | I just don't understand what drives the ISO SQL language design | committee - I'd have thought that the newer revisions (e.g. | SQL-2003) would have improved the language's ergonomics - on the | contrary: the language's grammar and verbosity gets worse every | release, and the team has strange priorities: apparently they | feel needing to generate in-memory XML is more important than | deferrable constraints - and I only ever see ISO SQL's XML | features being abused to make-up for a lack of decent string- | aggregation functions. | | (...I could talk for hours about everything wrong with SQL.) | de6u99er wrote: | The trick of becoming really good at SQL (applies to all areas | of IT) is having a certain ambition to produce high performance | beautiful (readable) code. | | This requires experience which can only be gained by rolling up | your sleves and working on stuff until the high ambition has | been satisfied. Sometimes when I see old code from myself, and | I can follow what I have been doing I get really proud of | myself. Many times I end up slightly improving it based on new | knowledge I have acquired since I initially wrote it. | minism wrote: | High performance and readable certainly. Not sure why | beautiful would be something to strive for though | AdrianB1 wrote: | Readable code can be easily done via good formatting, but | performance requires a combination of writing the query in | the right way and the indexes to support it. The second part | is not even visible from the query and most of the time is | not self-explanatory, but the best part is that indexing is | not even universally valid, statistics decide execution plans | and the same query with the same indexes can result in very | different performance on 2 different instances. | jimbob45 wrote: | If the general computing community can agree on anything over | the last 20 years, it's that Python 2->3 was a disaster. Even | though Python 3 made several highly necessary (and irreversibly | transformative) changes to Python 2, no one liked it because it | fundamentally changed the language to something unfamiliar. | | I'm guessing the SQL and C++ committees looked at that | transition and decided that such transformative changes really | need to be done in new languages (like the Perl -> Raku change) | rather than in a new version which risks alienating your | existing base. | btilly wrote: | The Python 2 to Python 3 migration was such a disaster that | Python 3 is now used by both more programmers, and a higher | proportion of programmers, than Python 2 ever managed. | | This is not to minimize the pain of switching. But it does | not seem to ahve limited the success of the language. | darksaints wrote: | Python 2->3 was a disaster, but I'd refrain from | extrapolation because a lot of that difficulty was very | specific to dynamic typing or python itself. There are tons | of languages that have gone through far more transformative | changes in the core semantics of the language, and have gone | a lot smoother. | DaiPlusPlus wrote: | Oh of course - I have no doubt the ISO SQL committee is so | conservative (no... they're _regressive_ ) is because of the | sheer collective industry investment in not-only SQL tooling | and SQL-compatible databases, but just energy-spent in | teaching non-CS/SE/programmer types in businesses how to | express their data-queries in SQL. It's very, very difficult | to get the kind of industry cohesiveness around any technical | standard, so the fact that SQL is so widely supported is a | miracle (though it probably has something to do with US | federal government requirements for information systems to | support it, just like how POSIX is a thing because of the fed | pushing for it). | | To be clear: I am not advocating for a brand new query- | language syntax or any kind of Python3-style overhaul, but | I'd like to see SQL start to take small steps towards | integrating the lessons learned from the past 60+ years of | language design rather than doing the complete opposite. | monkeydust wrote: | Have been using openai codex for a week and it's shockingly good | at SQL with well defined prompts. | pcblues wrote: | I have been developing software that includes SQL for twenty | years, and watched my own mental progress from misunderstanding | to understanding. I found the biggest initial problem is that I | used to imagine SQL queries as an imperative language rather than | as expressions of data. Maybe in the teaching of SQL, this should | be highlighted so absolute beginners can have that mental model | when they are formulating solutions and grappling with the | syntax. | k__ wrote: | I don't know if that's enough. | | Understanding the difference between declarative and imperative | programming is rather hard with all the abstractions we have | today. | | People always say, declarative programming is defining what you | want, not doing the steps needed to get it. But today no | imperative interface requires you to do all the steps either, | plus, most programming languages use both paradigms at the same | time. | pcblues wrote: | I guess my point was that if you are trying to achieve | results in a language paradigm that isn't the one the | language was designed for, the learning curve is _really_ | steep, and to use the declarative features of any language | still requires you to understand the paradigms' differences. | When I was at uni it wasn't until the final year that | Programming Paradigms was a course, but even a rough | introduction to them in any of the languages I studied | earlier would have helped. Something like, "This language is | used like this. It is not used like this, for example." | k__ wrote: | I'm not even sure, I totally understood the difference now, | 10 years after I studied CS. | btilly wrote: | _Understanding the difference between declarative and | imperative programming is rather hard with all the | abstractions we have today._ | | The distinction is are you telling the computer *how* to do | it, or telling it *what* steps to take. | | If, even with access to all of the code, you'd have to ask | the computer how it chose to do it to figure out what it did, | you have a declarative system. If the code reads like | instructions for a recipe, it is imperative. | | The complications come with the fact that these two paradigms | do not describe all of the possibilities. Notably object | oriented and functional designs are neither imperative or | declarative. (But may share some features with both.) | k__ wrote: | Yes, that's probably what was always my problem. | | Theory is one thing, but actual programming languages are | something different. An "impure" mix. | | So, when people told my language X is imperative and | language Y is declarative, I got confused, because they | often had parts of both. | da39a3ee wrote: | I've done backend web development with a relational DB via an ORM | for 10 years. I'm OK at that, but I'm fucking hopeless at SQL. I | know that my opinions are thus undermined, but I really wish we | could get rid of SQL and replace it with something like logica | [1] like today. | | SQL's pseudo-natural language syntax is an embarrassment and its | lack of composability is even more of an embarrassment. | | [1] https://opensource.google/projects/logica | simonw wrote: | Have you used CTEs much (aka the WITH statement)? | | I find them to be a huge step forwards in terms of adding | composability to complex queries. | odipar wrote: | Yes I concur: CTEs is closer to the spirit of relational | algebra: every step/expression should yield a table/relation. | | As data munging is about | combining/correlating/sorting/grouping data, why not have a | sound (bag) algebra to do that? Such algebra would give us | equational reasoning, proofs, etc. | | And consequently: students would be learning an algebra which | is easier to learn IMO. | da39a3ee wrote: | Thanks yes I have learned to use the WITH statement, and I | agree it gives more composability. But still, a half way | house wouldn't you say? | dehrmann wrote: | > For example, some students wrote queries containing ,[?], | instead of != or <>. | | Was this done on paper? Typing [?] takes some doing. | Wevah wrote: | Option-= on a Mac with the US layout, fwiw. | JadeNB wrote: | Man, Mac's keyboard shortcuts for special characters irritate | me so much. | | First, they're there, and it's absolutely wonderful! I use | far more semantically accurate Unicode rather than lossy | ASCII approximations than I did back in my old Windows days. | (If you don't know the special characters you can get, turn | on Keyboard Viewer and whack your keyboard, especially | modifier keys, a bit.) | | But ... I can't customise them. Even back in the days when | macOS was OS X and believed in user customisation, these | specific shortcuts were frozen and un-customizable. (Like the | folder shortcuts in Finder. Maybe it makes sense to you for | CMD-SHIFT-D to open the Downloads folder, not the Desktop. | Too bad!) | | (Boy, I hope I'm wrong and someone will come along and | explain my stupidity to me.) | lelandfe wrote: | > these specific shortcuts were frozen and un-customizable | | I think Karabiner should allow you to do this: | https://karabiner-elements.pqrs.org/ | | It's a utility that "remaps" keys - you set up key/key | combinations that fire the original key/key combinations | (it does not remove the original combination). E.g. you | could bind Cmd-Q to Caps Lock if you wanted a really fast | way to quit stuff. | JadeNB wrote: | The memory of what happened with kext's always makes me | leery of relying on anything that reaches too deeply into | macOS's guts, so I've always shied away from Karabiner, | probably unreasonably. | | Old-style OS X believed in customisation--even now, you | can set per-app keyboard shortcuts; it doesn't seem | possible to bind Cmd-Q to Caps Lock, but, after slipping | from Cmd-W to Cmd-Q and so quitting rather than closing a | tab one too many times, I do have Cmd-Q bound to Cmd- | Opt-Q for Safari only--and yet there's no Apple-blessed | way of changing _those_ shortcuts, when it clearly had | the architecture in place to allow it. That always | irritated me. | turnerc wrote: | From the study: | | > Participants wrote their notes and answers on paper, which | they showed in front of the webcam. | | Yes it seems they did | dehrmann wrote: | That's pretty flawed methodology since you'd want to know | what problems people encounter in the real world and how | quickly they solve them. | nightpool wrote: | Yeah, IMO counting this as a syntax error is a pretty low blow. | It's completely clear what the person intended, and they would | probably have no problems clarifying if the researchers asked | how they would type that query in. | [deleted] | pcblues wrote: | Easy if you have an APL keyboard :) (Hint, it's on the 8) | https://www.dyalog.com/uploads/images/Business/products/us_r... ___________________________________________________________________ (page generated 2021-08-29 23:00 UTC)