[HN Gopher] Database drivers: Naughty or nice? ___________________________________________________________________ Database drivers: Naughty or nice? Author : ctc24 Score : 79 points Date : 2022-12-12 15:51 UTC (7 hours ago) (HTM) web link (www.prequel.co) (TXT) w3m dump (www.prequel.co) | wswope wrote: | Speaking as a professional data yeeter who has been bit by weird | driver issues in the past, I'll go against the grain to say that | I found this really helpful as a checklist to pattern match | against when troubleshooting, even without calling out specific | drivers. Thanks for sharing! | akdor1154 wrote: | I will presume that at least half of these are issues with the | Snowflake connectors until told otherwise.. those guys have a | great core product, but the connectors are pretty dodgy in many | cases I've seen. | | Telling as well that DuckDB fixed your issue in a couple of | days.. had snowflake support even demonstrated technical | understanding of your request in that time? I guess it doesn't | really matter if it takes them six months or more to fix the | connector bugs anyway. | omgbear wrote: | I've had such problems with the official AWS redshift JDBC driver | -- Mostly around memory usage compared to the PG8.4 driver that | also works. But also, | | - Much slower to cancel queries | | - Still reading and buffering most of the result after canceling. | | - Weird escaping issues | | Glad to be done with that. | no_wizard wrote: | I know this isn't a specifically Amazon issue or topic, but I | am so disappointed by their SDKs. From Amplify to AppSync | they're just layers of half baked ideas and implementations. | I've hit _a lot_ of bugs and limitations on their platform. | | And they they still have no concept for native dev tooling for | most of their services. Its always "just setup separate | environments for testing, CI, development, staging, release" | and pay them more money. | | I hate AWS, so much. | | I am not shocked their DB drivers have issues. | gumboza wrote: | Completely agree with this. The only products I'd even | consider by choice from Amazon are the core IaaS products. | Everything they built in house on top of their own platform | is a buggy shit show. | | One recent problem I have been dealing with, which I can't | detail because they'll know who I am and I am under NDA, | involved a major release last year. I found two critical bugs | in the stack which they had extended from an open standard. | As we're a massive company with a big spend I managed to get | the team leads and enterprise support on a call and I found | the whole fucking thing was clearly looked after by two guys | who didn't know what they hell they were doing, had never | spent any time on the stack that they were extending and had | never even considered a real world use case. | | I have considered looking into Azure but I'm not sure that'll | be any better. | no_wizard wrote: | From my experience, having extensively used each cloud at 3 | different jobs, this is _my take_. As always, YMMV, that | said: | | - AWS is the worst of all the clouds, not just (typically) | in pricing, but definitely it has the worst DX, confusion | of options, and documentation out of the big 3. They're | flexible I guess, and have tons of services but no easy and | tight way to integrate them, and they're very configuration | heavy. Just didn't care for how they presented their happy | paths (WTF is up with VTL?!). I think Lambda's are | overrated and idk why they have cold starts in 2022 when | all their competitors don't anymore. | | - Azure has some good stuff, including local emulators you | can download for some services, and they have mock packages | for programmatically emulation some of their services. | Their biggest downfall for me was the confusion amount of | options they have and sometimes their pricing is opaque | what the final cost on a service will be. Both combined | made it harder to use than I would have liked but it was a | step up (to me) from AWS. A note of caution though, is that | Azure doesn't have the per se greatest support for things | not in their "stack". .NET of course works well, Node, | Python, and more recently Go have seen good support | (particularly Node) but it has less flexible ways to do | more customized deployments in my experience. Doesn't mean | you can't, but you pay alot more for the privilege. They | also are more aggressive in the upsell department as well. | Azure is particularly great if you're migrating older | Windows Server apps and/or heavily using things like | ActiveDirectory. If you're willing to pay they do have good | support. I recall Azure having a pretty decent cloud SDK | and they supported all the cross-cloud ones decently | (Terraform, CF etc) as well. One thing I like about Azure | is they have the least confusing CLIs IME | | - GCP is has pretty good DX. I think they're more expensive | in alot of ways, if you aren't careful, but their "main" | services are pretty cost competitive. In particular, I feel | like AppRunner is the underrated alternative to things like | AWS Lambda or other FaaS offerings. GCP also has a good | amount of emulators and Firebase is still pretty great. GCP | is very happy path oriented (more so than the other cloud | providers) in my experience, but generally they seem to | have picked good "happy paths", again IME. BigQuery is more | byzantine in cost than I'd like, same with Spanner. Their | support is pretty bad IME unless you pay for big contracts, | though. Even then it wasn't the best, but at least we could | get a human on the phone. Also, I wouldn't trust their one | click apps, there is hardly any control GCP exerts over | them to ensure quality, so unless you know the vendor is | specifically supporting them for your use case, avoid them. | I did find their permissions model to be as confusing as | AWS. Didn't like their cloud SDK as much but it was better | than AWS. | | What would I do if I was able to start over again in 2022 | though? I'd deploy to something like Supabase and/or | Cloudflare if I wanted pretty much 100% managed services. | Otherwise if I just want compute and add in my own services | I'd do a combo of Fly.io with Planetscale (or a similiar DB | provider) and using Cloudflare services (such as R2 and | workers) for files / caching or a combo of B2 + Upstash. | You could proxy / load balance your HTTP / TCP traffic via | Cloudflare as well (idk that fly.io autoproxies provisioned | instances for load balancing. I'm pretty sure it doesn't) | gumboza wrote: | Don't start me on VTL. I spent two weeks trying to delete | a VTL which was sitting unused at $5000 a month. | | On lambda, the issue I mentioned was actually because one | of their services was built on top of that and maintained | state between lambda runs whereas the open source product | they ripped off did not. That meant their entire product | was stateful when it should have been stateless. It | compromised everything. | | I couldn't possibly use Google after a GApps fiasco I had | a couple of years back. No way. | | Thanks for the rest of the info. I shall carefully | decipher it. | | For the next platform migration we do it might be back to | a dedicated DC which is where we came from. We have | stable constrained, predictable load. The TCO was | cheaper, the staff were fewer, the performance was better | and the licensing of some commercial products we leverage | was much easier to handle and cheaper. | VWWHFSfQ wrote: | without actually naming the drivers in question so that people | can learn from it this is pretty pointless. | Nzen wrote: | Presumably, this is a warning for people building a fresh | system. Even if the authors called out a, b, and c for version | 1, 2, and 3 (respectively); there is no promise that x, y, and | z don't exhibit these issues (or that other versions of abc do | or don't exhibit the behavior, either). | | This is a reminder to consider our use cases and vet our | dependencies for those use cases carefully. | janci wrote: | It is the same shitshow in JDBC land. | [deleted] | counttheforks wrote: | This article would be way more useful if it named the offending | drivers and linked to relevant github issues. As is this is just | a set of useless curiosities. | | I came in wanting to check if any of the drivers I use were | affected by any of the bugs they found. No idea! | ctc24 wrote: | Understand the sentiment. On the flip side, and as noted in the | post, we don't want it to serve as a "name and shame" | situation. We have a lot of respect for the folks maintaining | the drivers we use, they do a pretty hard job, and so we don't | want to come out swinging and taking shots at them. So we're | trying to walk that line. | | We also think the information provided in the post is valuable | as is: it's easy enough to check whether a driver you use faces | any of the issues we mention. | | With that said, here's a bit more color: | | - the native Snowflake driver for GoLang does not implement | COPY functionality (or at least it did not when we last tried | to use it). | | - the memory leaks are pretty prevalent across ODBC drivers. | It's worth watching out for that one if you're using any ODBC | driver. | | - the breaking change on connection string was Databricks' | GoLang driver. | | - the DECIMAL one is pretty prevalent too. BigQuery only allows | you to go up to DECIMAL(38,9) while most other drivers let you | go to 18 on scale, and ClickHouse supports precision/scale of | up to 76. Redshift complains loudly if you try to insert a | DECIMAL(38,17) into a DECIMAL(38,18) column, for instance. | | Hope the added color is helpful! | | edit: formatting | tacker2000 wrote: | Was wondering the same. | | Are they scared of pointing fingers or what? | bsder wrote: | Agreed. Without actual references, this is marketing blather, | not a technical article. | | The worst part is that it doesn't have to be name and shame. | Take the "epoch" discussion, for example. The fact that | "epochs" differ in implementation is something that isn't even | a bug--it's just different. That alone is likely surprising to | a lot of people and would probably be worth an article. | | Of course, the real issue is probably that "boring" databases | just work and "exciting" databases are full of bugs. If you're | a database SAAS startup, slagging the databases that everybody | considers cool and hip isn't going to be good for your exit. | aynyc wrote: | I used to work with Sybase, and its JDBC driver was pure madness. | There were times where the same query on same data would return | different results. Never figured out why. We ended use its ODBC | driver for reporting instead. | tlocke wrote: | Hello, maintainer of the pg8000 driver for PostgreSQL here. I'm | keen to know of any problems you had if you used pg8000. Thanks! | https://github.com/tlocke/pg8000 | bhargav wrote: | This is one of the few Github repos I have seen that are very | very well documented. Great job; starred. | ctc24 wrote: | Appreciate that and will do! | hermitdev wrote: | I remember dealing with DataDirect on Linux more than 10 years | ago. I don't recall the specifics, but under certain error | conditions, either the ODBC driver we were using or the driver | manager would simply do `exit(0)` with no errors logged anywhere. | That was a "fun" one to solve... | eska wrote: | I remember something about mysql not treating localhost as local, | but 127.0.0.1 would. This led to very different performance. | (Long ago, not sure about details) | | Also surprised by libraries not using more efficient protocols | although they were defined. | | Compression also doesn't seem to be a thing. | georgewfraser wrote: | The world would be a better place if database drivers were | completely abandoned as a way for clients to connect to | databases. A standard API, implemented by multiple vendors, is a | vastly preferable solution. Arrow Flight is an example of this. | | https://arrow.apache.org/blog/2019/10/13/introducing-arrow-f... | lidavidm wrote: | Even within the Arrow project, there's still room for drivers | just because not every vendor is going to implement the same | wire protocol (at least on a feasible timeline). Hence both | "ADBC" [1] and Flight SQL [2] (note: NOT a SQL dialect, it is a | wire protocol) coexist in complementary niches. | | [1]: https://arrow.apache.org/docs/format/ADBC.html [2]: | https://arrow.apache.org/docs/format/FlightSql.html | kardianos wrote: | Nice. | | This [1] appears to be the SQL layer on top of Arrow Flight | specifically about SQL. It seems a bit chatty, where two | network requests are required for each query if I read it | correctly. | | [1] https://arrow.apache.org/docs/format/FlightSql.html | lidavidm wrote: | Yup. The chattiness is to account for distributed databases, | so you can spread the result set across multiple instances. | | That said there is a proposal for base Flight RPC to help | allow embedding small results directly into the first | response, that mostly needs someone to draft a prototype and | push it through. (That doesn't help the case of a large-ish | response from a single backend, though; that may also need | some work, if we want to get rid of the second request.) | jeff-davis wrote: | I generally think that's a good idea, but be aware that the | protocols are more interesting than you might first imagine, | and that leads to a lot of the differences between drivers for | different databases. | | For instance, when setting a user's password in Postgres, you | can do the hashing on the client side, even for non-trivial | schemes like SCRAM. This means that the password itself never | needs to move over the network, and that's very desirable. | Speaking of authentication methods, that also opens up a big | topic. | | There are also important modes. For instance, the client | encoding controls how strings are transcoded when they get to | the server. That allows the client to not know/care what the | encoding of the database is. You could demand that everything | is UTF-8, and that's one philosophy, but not everyone agrees. | | In practice, I think it'll be a while before there is consensus | on all these points. And even when there is, the standard will | need to evolve to handle new auth methods, etc. | | If we invent a standard protocol, it will probably be more of a | fallback for simple cases when the language framework doesn't | offer a driver yet. Still helpful, though. | EthicalSimilar wrote: | > For instance, when setting a user's password in Postgres, | you can do the hashing on the client side, even for non- | trivial schemes like SCRAM. This means that the password | itself never needs to move over the network, and that's very | desirable. | | Off-topic, but I'm surprised more online apps don't employ | something similar. | | It would all but eliminate accidental leaks that occur from | logs being incorrectly stored / misconfigured, not to mention | worries about MITM attacks (useful for corporate networks, or | public networks). | | Given how many people share usernames, emails, and passwords | across sites I find it quite important to mitigate those | issues as much as possible. | kardianos wrote: | SQL Database drivers have so much in common, but each represent | schema and data table(s) differently on the wire. Some handle | cancellation within the protocol, others require a work around | (like another DB connection that issues a KILL command). TDS | (Microsoft Server) is actually one of the better protocols and | better documented protocols (due in large part to historical | court orders) out there. | | My takeaway is that compact schema streaming data is not a well | developed field. I think we can do better. Not only that, but | developing both such a schema, protocol, and associated tooling | is key to significantly better data-centric applications from end | to end, not just the database. | ctc24 wrote: | Agreed! It's kind of fascinating that a better standard hasn't | emerged yet, but not surprising given that each one is | developed in isolation by a different organization. | | Definitely a lot of room for improvement. Curious if anyone has | thoughts about the best route to getting such a standard / | protocol in place: it seems like a lot of stars would have to | align, but would be invaluable nonetheless. | lidavidm wrote: | As mentioned by others elsewhere in the thread, Apache Arrow | Flight SQL aims to be a standard wire protocol that multiple | databases can implement, using a columnar data | representation: | https://arrow.apache.org/docs/format/FlightSql.html | | However it's early days yet. | layer8 wrote: | Such a nonproprietary protocol would reduce vendor lock-in, | hence vendors have no incentive to develop one. ___________________________________________________________________ (page generated 2022-12-12 23:00 UTC)