[HN Gopher] Database drivers: Naughty or nice?
       ___________________________________________________________________
        
       Database drivers: Naughty or nice?
        
       Author : ctc24
       Score  : 79 points
       Date   : 2022-12-12 15:51 UTC (7 hours ago)
        
 (HTM) web link (www.prequel.co)
 (TXT) w3m dump (www.prequel.co)
        
       | wswope wrote:
       | Speaking as a professional data yeeter who has been bit by weird
       | driver issues in the past, I'll go against the grain to say that
       | I found this really helpful as a checklist to pattern match
       | against when troubleshooting, even without calling out specific
       | drivers. Thanks for sharing!
        
       | akdor1154 wrote:
       | I will presume that at least half of these are issues with the
       | Snowflake connectors until told otherwise.. those guys have a
       | great core product, but the connectors are pretty dodgy in many
       | cases I've seen.
       | 
       | Telling as well that DuckDB fixed your issue in a couple of
       | days.. had snowflake support even demonstrated technical
       | understanding of your request in that time? I guess it doesn't
       | really matter if it takes them six months or more to fix the
       | connector bugs anyway.
        
       | omgbear wrote:
       | I've had such problems with the official AWS redshift JDBC driver
       | -- Mostly around memory usage compared to the PG8.4 driver that
       | also works. But also,
       | 
       | - Much slower to cancel queries
       | 
       | - Still reading and buffering most of the result after canceling.
       | 
       | - Weird escaping issues
       | 
       | Glad to be done with that.
        
         | no_wizard wrote:
         | I know this isn't a specifically Amazon issue or topic, but I
         | am so disappointed by their SDKs. From Amplify to AppSync
         | they're just layers of half baked ideas and implementations.
         | I've hit _a lot_ of bugs and limitations on their platform.
         | 
         | And they they still have no concept for native dev tooling for
         | most of their services. Its always "just setup separate
         | environments for testing, CI, development, staging, release"
         | and pay them more money.
         | 
         | I hate AWS, so much.
         | 
         | I am not shocked their DB drivers have issues.
        
           | gumboza wrote:
           | Completely agree with this. The only products I'd even
           | consider by choice from Amazon are the core IaaS products.
           | Everything they built in house on top of their own platform
           | is a buggy shit show.
           | 
           | One recent problem I have been dealing with, which I can't
           | detail because they'll know who I am and I am under NDA,
           | involved a major release last year. I found two critical bugs
           | in the stack which they had extended from an open standard.
           | As we're a massive company with a big spend I managed to get
           | the team leads and enterprise support on a call and I found
           | the whole fucking thing was clearly looked after by two guys
           | who didn't know what they hell they were doing, had never
           | spent any time on the stack that they were extending and had
           | never even considered a real world use case.
           | 
           | I have considered looking into Azure but I'm not sure that'll
           | be any better.
        
             | no_wizard wrote:
             | From my experience, having extensively used each cloud at 3
             | different jobs, this is _my take_. As always, YMMV, that
             | said:
             | 
             | - AWS is the worst of all the clouds, not just (typically)
             | in pricing, but definitely it has the worst DX, confusion
             | of options, and documentation out of the big 3. They're
             | flexible I guess, and have tons of services but no easy and
             | tight way to integrate them, and they're very configuration
             | heavy. Just didn't care for how they presented their happy
             | paths (WTF is up with VTL?!). I think Lambda's are
             | overrated and idk why they have cold starts in 2022 when
             | all their competitors don't anymore.
             | 
             | - Azure has some good stuff, including local emulators you
             | can download for some services, and they have mock packages
             | for programmatically emulation some of their services.
             | Their biggest downfall for me was the confusion amount of
             | options they have and sometimes their pricing is opaque
             | what the final cost on a service will be. Both combined
             | made it harder to use than I would have liked but it was a
             | step up (to me) from AWS. A note of caution though, is that
             | Azure doesn't have the per se greatest support for things
             | not in their "stack". .NET of course works well, Node,
             | Python, and more recently Go have seen good support
             | (particularly Node) but it has less flexible ways to do
             | more customized deployments in my experience. Doesn't mean
             | you can't, but you pay alot more for the privilege. They
             | also are more aggressive in the upsell department as well.
             | Azure is particularly great if you're migrating older
             | Windows Server apps and/or heavily using things like
             | ActiveDirectory. If you're willing to pay they do have good
             | support. I recall Azure having a pretty decent cloud SDK
             | and they supported all the cross-cloud ones decently
             | (Terraform, CF etc) as well. One thing I like about Azure
             | is they have the least confusing CLIs IME
             | 
             | - GCP is has pretty good DX. I think they're more expensive
             | in alot of ways, if you aren't careful, but their "main"
             | services are pretty cost competitive. In particular, I feel
             | like AppRunner is the underrated alternative to things like
             | AWS Lambda or other FaaS offerings. GCP also has a good
             | amount of emulators and Firebase is still pretty great. GCP
             | is very happy path oriented (more so than the other cloud
             | providers) in my experience, but generally they seem to
             | have picked good "happy paths", again IME. BigQuery is more
             | byzantine in cost than I'd like, same with Spanner. Their
             | support is pretty bad IME unless you pay for big contracts,
             | though. Even then it wasn't the best, but at least we could
             | get a human on the phone. Also, I wouldn't trust their one
             | click apps, there is hardly any control GCP exerts over
             | them to ensure quality, so unless you know the vendor is
             | specifically supporting them for your use case, avoid them.
             | I did find their permissions model to be as confusing as
             | AWS. Didn't like their cloud SDK as much but it was better
             | than AWS.
             | 
             | What would I do if I was able to start over again in 2022
             | though? I'd deploy to something like Supabase and/or
             | Cloudflare if I wanted pretty much 100% managed services.
             | Otherwise if I just want compute and add in my own services
             | I'd do a combo of Fly.io with Planetscale (or a similiar DB
             | provider) and using Cloudflare services (such as R2 and
             | workers) for files / caching or a combo of B2 + Upstash.
             | You could proxy / load balance your HTTP / TCP traffic via
             | Cloudflare as well (idk that fly.io autoproxies provisioned
             | instances for load balancing. I'm pretty sure it doesn't)
        
               | gumboza wrote:
               | Don't start me on VTL. I spent two weeks trying to delete
               | a VTL which was sitting unused at $5000 a month.
               | 
               | On lambda, the issue I mentioned was actually because one
               | of their services was built on top of that and maintained
               | state between lambda runs whereas the open source product
               | they ripped off did not. That meant their entire product
               | was stateful when it should have been stateless. It
               | compromised everything.
               | 
               | I couldn't possibly use Google after a GApps fiasco I had
               | a couple of years back. No way.
               | 
               | Thanks for the rest of the info. I shall carefully
               | decipher it.
               | 
               | For the next platform migration we do it might be back to
               | a dedicated DC which is where we came from. We have
               | stable constrained, predictable load. The TCO was
               | cheaper, the staff were fewer, the performance was better
               | and the licensing of some commercial products we leverage
               | was much easier to handle and cheaper.
        
       | VWWHFSfQ wrote:
       | without actually naming the drivers in question so that people
       | can learn from it this is pretty pointless.
        
         | Nzen wrote:
         | Presumably, this is a warning for people building a fresh
         | system. Even if the authors called out a, b, and c for version
         | 1, 2, and 3 (respectively); there is no promise that x, y, and
         | z don't exhibit these issues (or that other versions of abc do
         | or don't exhibit the behavior, either).
         | 
         | This is a reminder to consider our use cases and vet our
         | dependencies for those use cases carefully.
        
       | janci wrote:
       | It is the same shitshow in JDBC land.
        
       | [deleted]
        
       | counttheforks wrote:
       | This article would be way more useful if it named the offending
       | drivers and linked to relevant github issues. As is this is just
       | a set of useless curiosities.
       | 
       | I came in wanting to check if any of the drivers I use were
       | affected by any of the bugs they found. No idea!
        
         | ctc24 wrote:
         | Understand the sentiment. On the flip side, and as noted in the
         | post, we don't want it to serve as a "name and shame"
         | situation. We have a lot of respect for the folks maintaining
         | the drivers we use, they do a pretty hard job, and so we don't
         | want to come out swinging and taking shots at them. So we're
         | trying to walk that line.
         | 
         | We also think the information provided in the post is valuable
         | as is: it's easy enough to check whether a driver you use faces
         | any of the issues we mention.
         | 
         | With that said, here's a bit more color:
         | 
         | - the native Snowflake driver for GoLang does not implement
         | COPY functionality (or at least it did not when we last tried
         | to use it).
         | 
         | - the memory leaks are pretty prevalent across ODBC drivers.
         | It's worth watching out for that one if you're using any ODBC
         | driver.
         | 
         | - the breaking change on connection string was Databricks'
         | GoLang driver.
         | 
         | - the DECIMAL one is pretty prevalent too. BigQuery only allows
         | you to go up to DECIMAL(38,9) while most other drivers let you
         | go to 18 on scale, and ClickHouse supports precision/scale of
         | up to 76. Redshift complains loudly if you try to insert a
         | DECIMAL(38,17) into a DECIMAL(38,18) column, for instance.
         | 
         | Hope the added color is helpful!
         | 
         | edit: formatting
        
         | tacker2000 wrote:
         | Was wondering the same.
         | 
         | Are they scared of pointing fingers or what?
        
         | bsder wrote:
         | Agreed. Without actual references, this is marketing blather,
         | not a technical article.
         | 
         | The worst part is that it doesn't have to be name and shame.
         | Take the "epoch" discussion, for example. The fact that
         | "epochs" differ in implementation is something that isn't even
         | a bug--it's just different. That alone is likely surprising to
         | a lot of people and would probably be worth an article.
         | 
         | Of course, the real issue is probably that "boring" databases
         | just work and "exciting" databases are full of bugs. If you're
         | a database SAAS startup, slagging the databases that everybody
         | considers cool and hip isn't going to be good for your exit.
        
       | aynyc wrote:
       | I used to work with Sybase, and its JDBC driver was pure madness.
       | There were times where the same query on same data would return
       | different results. Never figured out why. We ended use its ODBC
       | driver for reporting instead.
        
       | tlocke wrote:
       | Hello, maintainer of the pg8000 driver for PostgreSQL here. I'm
       | keen to know of any problems you had if you used pg8000. Thanks!
       | https://github.com/tlocke/pg8000
        
         | bhargav wrote:
         | This is one of the few Github repos I have seen that are very
         | very well documented. Great job; starred.
        
         | ctc24 wrote:
         | Appreciate that and will do!
        
       | hermitdev wrote:
       | I remember dealing with DataDirect on Linux more than 10 years
       | ago. I don't recall the specifics, but under certain error
       | conditions, either the ODBC driver we were using or the driver
       | manager would simply do `exit(0)` with no errors logged anywhere.
       | That was a "fun" one to solve...
        
       | eska wrote:
       | I remember something about mysql not treating localhost as local,
       | but 127.0.0.1 would. This led to very different performance.
       | (Long ago, not sure about details)
       | 
       | Also surprised by libraries not using more efficient protocols
       | although they were defined.
       | 
       | Compression also doesn't seem to be a thing.
        
       | georgewfraser wrote:
       | The world would be a better place if database drivers were
       | completely abandoned as a way for clients to connect to
       | databases. A standard API, implemented by multiple vendors, is a
       | vastly preferable solution. Arrow Flight is an example of this.
       | 
       | https://arrow.apache.org/blog/2019/10/13/introducing-arrow-f...
        
         | lidavidm wrote:
         | Even within the Arrow project, there's still room for drivers
         | just because not every vendor is going to implement the same
         | wire protocol (at least on a feasible timeline). Hence both
         | "ADBC" [1] and Flight SQL [2] (note: NOT a SQL dialect, it is a
         | wire protocol) coexist in complementary niches.
         | 
         | [1]: https://arrow.apache.org/docs/format/ADBC.html [2]:
         | https://arrow.apache.org/docs/format/FlightSql.html
        
         | kardianos wrote:
         | Nice.
         | 
         | This [1] appears to be the SQL layer on top of Arrow Flight
         | specifically about SQL. It seems a bit chatty, where two
         | network requests are required for each query if I read it
         | correctly.
         | 
         | [1] https://arrow.apache.org/docs/format/FlightSql.html
        
           | lidavidm wrote:
           | Yup. The chattiness is to account for distributed databases,
           | so you can spread the result set across multiple instances.
           | 
           | That said there is a proposal for base Flight RPC to help
           | allow embedding small results directly into the first
           | response, that mostly needs someone to draft a prototype and
           | push it through. (That doesn't help the case of a large-ish
           | response from a single backend, though; that may also need
           | some work, if we want to get rid of the second request.)
        
         | jeff-davis wrote:
         | I generally think that's a good idea, but be aware that the
         | protocols are more interesting than you might first imagine,
         | and that leads to a lot of the differences between drivers for
         | different databases.
         | 
         | For instance, when setting a user's password in Postgres, you
         | can do the hashing on the client side, even for non-trivial
         | schemes like SCRAM. This means that the password itself never
         | needs to move over the network, and that's very desirable.
         | Speaking of authentication methods, that also opens up a big
         | topic.
         | 
         | There are also important modes. For instance, the client
         | encoding controls how strings are transcoded when they get to
         | the server. That allows the client to not know/care what the
         | encoding of the database is. You could demand that everything
         | is UTF-8, and that's one philosophy, but not everyone agrees.
         | 
         | In practice, I think it'll be a while before there is consensus
         | on all these points. And even when there is, the standard will
         | need to evolve to handle new auth methods, etc.
         | 
         | If we invent a standard protocol, it will probably be more of a
         | fallback for simple cases when the language framework doesn't
         | offer a driver yet. Still helpful, though.
        
           | EthicalSimilar wrote:
           | > For instance, when setting a user's password in Postgres,
           | you can do the hashing on the client side, even for non-
           | trivial schemes like SCRAM. This means that the password
           | itself never needs to move over the network, and that's very
           | desirable.
           | 
           | Off-topic, but I'm surprised more online apps don't employ
           | something similar.
           | 
           | It would all but eliminate accidental leaks that occur from
           | logs being incorrectly stored / misconfigured, not to mention
           | worries about MITM attacks (useful for corporate networks, or
           | public networks).
           | 
           | Given how many people share usernames, emails, and passwords
           | across sites I find it quite important to mitigate those
           | issues as much as possible.
        
       | kardianos wrote:
       | SQL Database drivers have so much in common, but each represent
       | schema and data table(s) differently on the wire. Some handle
       | cancellation within the protocol, others require a work around
       | (like another DB connection that issues a KILL command). TDS
       | (Microsoft Server) is actually one of the better protocols and
       | better documented protocols (due in large part to historical
       | court orders) out there.
       | 
       | My takeaway is that compact schema streaming data is not a well
       | developed field. I think we can do better. Not only that, but
       | developing both such a schema, protocol, and associated tooling
       | is key to significantly better data-centric applications from end
       | to end, not just the database.
        
         | ctc24 wrote:
         | Agreed! It's kind of fascinating that a better standard hasn't
         | emerged yet, but not surprising given that each one is
         | developed in isolation by a different organization.
         | 
         | Definitely a lot of room for improvement. Curious if anyone has
         | thoughts about the best route to getting such a standard /
         | protocol in place: it seems like a lot of stars would have to
         | align, but would be invaluable nonetheless.
        
           | lidavidm wrote:
           | As mentioned by others elsewhere in the thread, Apache Arrow
           | Flight SQL aims to be a standard wire protocol that multiple
           | databases can implement, using a columnar data
           | representation:
           | https://arrow.apache.org/docs/format/FlightSql.html
           | 
           | However it's early days yet.
        
           | layer8 wrote:
           | Such a nonproprietary protocol would reduce vendor lock-in,
           | hence vendors have no incentive to develop one.
        
       ___________________________________________________________________
       (page generated 2022-12-12 23:00 UTC)