[HN Gopher] Babelfish: SQL Server-to-Postgres Translation Layer
       ___________________________________________________________________
        
       Babelfish: SQL Server-to-Postgres Translation Layer
        
       Author : ke4qqq
       Score  : 215 points
       Date   : 2020-12-01 17:17 UTC (5 hours ago)
        
 (HTM) web link (aws.amazon.com)
 (TXT) w3m dump (aws.amazon.com)
        
       | bdcravens wrote:
       | As someone who has been fighting with a SQL Server to Postgresql
       | conversion this sounds AMAZING. Too bad it won't be available
       | before my conversion is complete (and if it is, that's an even
       | sadder proposition)
        
         | jnsie wrote:
         | Out of curiosity: besides cost, are there other significant
         | reasons that drive your desire to switch?
        
           | zmmmmm wrote:
           | I guess it's linked to cost but the pure painfulness of just
           | having licensing in the way of your infrastructure management
           | is pretty annoying.
           | 
           | We literally have had this problem the last few months where
           | we had a spike in load causing wide spread performance issues
           | and the obvious answer was to give the server more cores but
           | "we're not licensed for that" ... so everybody just suffered
           | through it because temporarily giving the DB more cores for a
           | few hours was just too painful / costly from a licensing
           | point of view.
        
           | pc86 wrote:
           | Don't discount the cost, which is borderline astronomical; if
           | you're on Enterprise, you're paying tens of thousands a year.
           | For large installation, potentially six figures. And that's
           | on-prem which is the cheapest way to do it more often than
           | not.
           | 
           | A 2-core Enterprise license is nearly $14,000.
        
           | bdcravens wrote:
           | Plenty of reasons, but principally we want to bring our stack
           | in line with the most common practices (ie, Rails/Postgresql)
           | for simplicity's sake; this is crucial for our small team.
        
       | PeterZaitsev wrote:
       | This would be even greater news if it would not be vaporware
       | "Babelfish for PostgreSQL will be available on Github in 2021."
       | https://babelfish-for-postgresql.github.io/babelfish-for-pos...
        
       | orf wrote:
       | > A commonly used datatype to store monetary values is the MONEY
       | data type. In SQL Server, the MONEY data type's behavior is fixed
       | using four digits to the right of the decimal (e.g., $12.8123).
       | However, in PostgreSQL, the MONEY data type is fixed using two
       | digits to the right of the decimal.
       | 
       | > So, when the application tries to store a value of $12.8123, by
       | example, PostgreSQL will round to $12.81. This subtle difference
       | will result in a rounding error and break an application if not
       | correctly addressed. To ensure correctness in Babelfish, we need
       | to ensure such differences, small and large, are handled with
       | absolute fidelity.
       | 
       | How are they going to solve this with just a query translation
       | layer? Isn't information lost on save?
        
         | OJFord wrote:
         | Presumably by not storing SQL Server's MONEYs in pg MONEYs, but
         | CASTing to a pg MONEY if pg asks for it.
        
         | radiowave wrote:
         | My guess would be: by not using Postgres's money type.
        
         | lowcodetv wrote:
         | > "...without changing libraries, database schema, or SQL
         | statements"
         | 
         | You don't need to manually redefine your table's schema from
         | SQL Server to PostgreSQL, but Babelfish must translate the
         | CREATE/ALTER TABLE command on the fly, changing the field to
         | DECIMAL or FLOAT when it does the migration, no?
        
         | dragonwriter wrote:
         | > How are they going to solve this with just a query
         | translation layer?
         | 
         | Well, the translation layer isn't just a query (DQL)
         | translation layer, its an SQL Translation layer including DDL,
         | DML, etc. Since both Postgres MONEY and SQL Server MONEY are
         | 8-byte, fixed-precision decimal types, with the only difference
         | being the position of the implicit decimal, a translation layer
         | can use one as the backing store for something that is
         | logically treated as the other without data loss, though it
         | will have to be aware of the difference when presenting data
         | and also when doing conversions to other datatypes, doing math
         | other than addition/subtraction, etc.
         | 
         | It would be even easier, I think, to just use, what,
         | DECIMAL(19,4) in Postgres for SQL Server MONEY, with some
         | special handling to have the right failure behavior at the edge
         | of the slightly-narrower range of the SQL Server MONEY type.
        
         | [deleted]
        
       | xupybd wrote:
       | I hope this means I can finally connect postgres to excel with
       | the same ease I can connect SQL server.
        
         | pletnes wrote:
         | What does "connect" mean in this context? Read data from SQL
         | server into an excel spreadsheet?
        
           | pc86 wrote:
           | Not the GP but yes I've seen a lot of folks go through the
           | Data tab in Excel and connect to a SQL database to display
           | data directly.
        
             | ComodoHacker wrote:
             | Excel can connect to PostgreSQL as well.
        
               | vetinari wrote:
               | Yes, it can on Windows, both via psqlODBC and Npgsql,
               | when you find out which .net runtime version your Excel
               | version uses.
               | 
               | On Mac, it is more interesting; there's only ODBC,
               | Microsoft doesn't support the same psqlODBC you can use
               | in Windows and you have to purchase one of the supported
               | commercial ODBC drivers.
        
       | temp667 wrote:
       | Why not do this for Oracle? I've not found SQL Server to be too
       | bad from the crazy Oracle stuff (light experience only - maybe
       | bigger players have it worse?).
        
         | croddin wrote:
         | I bet they are working on it.
        
         | throw93 wrote:
         | About 10 years back my company hired consultants, spent close
         | to 6 months translating SQL queries/stored procs to be Oracle.
         | The goal was to support both MSSQL Server & Oracle for the on-
         | premise product. It was quite costly undertaking. Then few
         | years later they just abandoned Oracle because of maintenance
         | costs.
         | 
         | I wonder if anyone starting out today chooses Oracle as their
         | relational database.
        
           | throwaway201103 wrote:
           | Oracle is very powerful. As a developer I found it easy to
           | work with. But the cost to run in production is eye-watering.
           | IDK how their licensing works today but in the past it was
           | free to use for evaluation or for developing a prototype.
           | Once you were "gaining business value" i.e. using it even if
           | only internally to develop a real product, you were supposed
           | to be paying for it.
        
           | lambda_obrien wrote:
           | Yes, large organizations that aren't tech oriented will use
           | Oracle every time over other solutions.
           | 
           | It's not a great solution for tech companies, but they do say
           | no one ever got fired for choosing Oracle or IBM or whatever,
           | mainly because you can always pay them to fix your shit and
           | there will always be someone supporting those products.
        
           | the-dude wrote:
           | > I wonder if anyone starting out today chooses Oracle as
           | their relational database.
           | 
           | Your boss does.
        
         | rst wrote:
         | Oracle has peculiarities that you won't see in most other
         | databases, which may make this a bit harder -- including such
         | things as an oddball set of date/time types ("date" includes
         | time to seconds), and treating zero length strings as SQL nulls
         | --                 '' is null
         | 
         | evaluates to true in Oracle. It also has features like "fast
         | refresh" on materialized views which might be tricky to emulate
         | on other db engines. Nothing's impossible, but it's certainly
         | more of a challenge.
        
           | throw93 wrote:
           | >> '' is null
           | 
           | This was a major pain for us especially in our java data-
           | access layer.
        
         | RegnisGnaw wrote:
         | As a DBA, T-SQL is much more standard then PL/SQL. Its probably
         | step 1 of the plan, with step 2 being Oracle.
        
       | linuxhiker wrote:
       | This is interesting because it will also help Sybase migrations.
       | SQL Server is the "brand name" but there are still a lot of
       | people stuck on Sybase.
        
         | lukaseder wrote:
         | The two dialects are very different today
        
         | JonathonW wrote:
         | Will it? MSSQL and Sybase diverged somewhere around 27 years
         | ago; anything that Sybase and Microsoft did differently since
         | then would likely be completely incompatible.
        
           | Svip wrote:
           | As someone who converted a 400'000+ LoC Sybase SQL codebase
           | to Microsoft SQL Server about 5-6 years ago, they did
           | diverged since around 2000, but not by a lot. We ended up
           | with a codebase that could support both Sybase and MSSQL,
           | where when differences occurred, we essentially used compiler
           | directives, which wasn't that often.
        
           | throwaway201103 wrote:
           | I would guess at least 80% of common SQL and T-SQL from
           | Sybase is still completely compatible with SQL Server.
           | 
           | As a side note, I had no idea Sybase still existed at all.
           | Looks like it's now part of SAP's portfolio.
        
       | conroy wrote:
       | Any idea what language it's written in?
        
         | BrentOzar wrote:
         | "Babelfish is written in C, which is the same programming
         | language used to develop PostgreSQL. Some parts of Babelfish
         | are developed using procedural language in PL/pgSQL. Many test
         | cases are written in PL/pgSQL and T-SQL."
        
           | conroy wrote:
           | Argh, I scanned the article multiple times and missed that
           | section. Thank you
        
       | edoceo wrote:
       | Heres ms2pg https://edoceo.com/dev/ms2pg
       | 
       | A tool I made and used over a decade ago when migrating a bunch
       | of stuff
        
       | etaioinshrdlu wrote:
       | Don't hate me for it, but I'd like this for MySQL to postgres
       | too. At least as a stepping stone.
       | 
       | Use case: some of my SQL syntax depends on MySQL but I realize I
       | made a poor life choice and would rather have transactional DDL
       | and a myriad of better features on postgres.
        
         | ziftface wrote:
         | Something like this would solve a lot of my problems
        
           | leesalminen wrote:
           | I'd considered migrating MySQL to Postgres on a 200 table
           | production app more than once. Couldn't find any good tooling
           | at the time so I just sucked it up and lived with my life
           | choices.
        
         | The_rationalist wrote:
         | Why not migrate to MariaDB? It's better than MySQL and has easy
         | and great clustering contrary to Postgres
        
         | lukaseder wrote:
         | Use https://www.jooq.org/translate/
        
         | robocat wrote:
         | However if the translation layer is designed for consistency,
         | like Babelfish, it would have to replicate many of the MySQL
         | "faults" too... e.g. it would need to have the same non-
         | transactional DDL so it acted "correctly"!
         | 
         | And it would need to replicate the specialised data types like
         | utf8mb3.
         | 
         | To use PostgreSQL features (such as JSON datatype) would
         | presumably need a separate data connection and transaction,
         | because if it isn't implemented by MySQL, then the syntax
         | wouldn't be supported by Babelfish.
         | 
         | You would get some wins, but you don't get a blend of the best
         | of both.
        
           | etaioinshrdlu wrote:
           | It would be nice if it could aim for doing the sane thing
           | over exact MySQL compatibility. But I agree it would be a
           | minefield...
        
         | lutoma wrote:
         | I would love that. There is so much mediocre but popular
         | software out there that uses hard-coded MySQL queries instead
         | of some abstraction layer.
         | 
         | Having to use MySQL/MariaDB after usually dealing with
         | PostgreSQL always feels like such a downgrade.
        
         | touisteur wrote:
         | Can't you do it with a foreign data wrapper or the equivalent
         | in mysql? Maybe keep your mysql-specific queries as views in
         | mysql and call them from pg?
         | 
         | Just spitballing, sorry, I love FDWs.
        
         | linsomniac wrote:
         | You might be able to get away with staying with MySQL or a
         | variant. Years ago I did an "interview problem" to generate a
         | report of some accounting data of a sample data set in MySQL.
         | The details are foggy now, but I ended up doing it all using
         | advanced SQL features that I thought only existed in PostgreSQL
         | (you know, or Oracle), but whatever engine I ended up using had
         | it as well. It was probably the latest Percona or Maria?
        
           | Conlectus wrote:
           | MySQL has worse guarantees in several cases, though. For
           | example, using `SERIALIZABLE` isolation means your queries
           | can deadlock when run in parallel.
           | 
           | Likewise, selecting a column not present in a GROUP BY leads
           | to random values being returned.
        
         | pletnes wrote:
         | For SQL server, AWS can save their customers money by cutting
         | license cost (to MS). MySQL is already free so I don't see how
         | they can benefit from such a project.
        
           | etaioinshrdlu wrote:
           | I agree, but there might be a creative way to do it that
           | increases customer happiness and still is a decent business.
        
       | justizin wrote:
       | i swear the fuck to god if one more piece of technology is called
       | babelfish i am going to find out who is responsible and toilet
       | paper their house.
        
         | technion wrote:
         | I currently support the following products named "Integrity":
         | 
         | - Law firm management software - Document management software -
         | A DVR appliance
         | 
         | Previously I supported HPE Integrity hardware. Certain names
         | just seem way overused.
        
         | justizin wrote:
         | imagine how hilarious douglas adams would find it to try and
         | google babelfish today lol.
        
       | beoberha wrote:
       | Sounds like they're open sourcing it to get some help on it. I
       | have to wonder if they've found it not worth the time to make it
       | fully production ready.
        
         | BrentOzar wrote:
         | > I have to wonder if they've found it not worth the time to
         | make it fully production ready.
         | 
         | It sounds like it's not compatible with all T-SQL commands and
         | data types. It would be tough to get full feature completeness:
         | Microsoft is constantly adding new stuff in each version, like
         | the ability to run Java stored procedures, R & Python in the
         | database, etc. Open sourcing it would let companies expand it
         | to get specific features coded that they need - that otherwise
         | might not get attention from the general public.
        
           | mjasay wrote:
           | Correct. The focus is on 100% correctness. As I wrote in the
           | post: "Over its 35 years in existence, SQL Server has evolved
           | to meet a wide array of use cases. When first made available
           | on GitHub, Babelfish won't be able to handle every use case,
           | but will be able to tackle the most common application
           | scenarios. Most importantly, Babelfish will meet the
           | correctness objective. That is, if Babelfish doesn't yet
           | support specific SQL Server functionality, it will return an
           | error to the application, rather than defaulting to
           | PostgreSQL behavior. Why? Because, again, developers (and the
           | enterprises for which they work) must be able to depend
           | absolutely on the correctness of SQL Server compatibility."
           | 
           | At launch Babelfish will be able to handle - with 100%
           | correctness - the main semantics you'd want. HOWEVER, as you
           | said (and as mentioned in the post), there's a large surface
           | area, and a "long tail" of functionality that needs expertise
           | from us _and_ others to cover. So to do this right, it needs
           | a community. It will be great for many use cases right from
           | the start, but to ensure it 's great/perfect for _your_ use
           | case...well, that just might take your help. Please work with
           | us on this.
        
             | mattashii wrote:
             | Would you know if Babelfish will be released for RDS for
             | PostgreSQL as well?
             | 
             | The only mentions so far are for Aurora for PostgreSQL, or
             | self-hosted PostgreSQL.
        
               | Tostino wrote:
               | Very good question, seems to be a strange decision. Maybe
               | it doesn't meet what they deem up to the standards to
               | include it in RDS due to the nature of how the backend is
               | hosted (just a guess). Could be something as simple as
               | the dependencies the extension requires are incompatible
               | with what the RDS servers run. It would be very nice if
               | the RDS team had some guidelines on what it takes to get
               | an extension accepted for RDS with a good framework
               | people can follow to ensure their extension complies with
               | the security requirements, etc. Then have a way to submit
               | an extension to be included and have it be reviewed and
               | given guidance on any changes needed to comply (for a fee
               | potentially).
        
             | Yeroc wrote:
             | Interesting. Now where's the Oracle-to-Postgres Translation
             | project? ;-)
        
               | tyldum wrote:
               | I have used ora2pg with great success.
        
               | Grazester wrote:
               | Their legal team might still be preparing for this one
               | given Oracle's reputation.
        
               | topspin wrote:
               | SCOTUS hasn't yet ruled on Google LLC v. Oracle America
               | Inc. One imagines that if Oracle prevails in some
               | demonstrative manner they may very well believe their
               | flavor of SQL is a software interface protected by
               | copywrite.
               | 
               | So will all the other 'software interface' owners of the
               | world.
        
               | pjmlp wrote:
               | Apparently everyone keeps forgetting about ISO.
        
               | topspin wrote:
               | That's because everyone knows about the delta between ISO
               | and real implementations.
        
               | pjmlp wrote:
               | Yet they still forget about pay to play with ISO
               | standards.
        
               | zozbot234 wrote:
               | IBM might have something to say about that, though.
        
               | topspin wrote:
               | Wouldn't that be ironic.
        
               | homarp wrote:
               | isn't that what enterprisedb is selling ?
        
               | linuxhiker wrote:
               | Yes but it isn't Open Source.
        
               | dragonwriter wrote:
               | Amazon acquiring EDB and rolling some of the currently-
               | proprietary bits like the Oracle compatibility layer into
               | open source Postgres while increasing resources to
               | developing them could be an interesting play.
        
           | throw_m239339 wrote:
           | At that point, thinking that SQL somehow is "portable" from a
           | DB to another is a bit of a mirage. Way too many differences
           | between systems, which does influence data modelling itself.
        
         | hobs wrote:
         | It sounds like their goals are pretty lofty -
         | 
         | >Those are the mechanics, but developers need to be certain
         | that Babelfish truly speaks SQL Server's language in a
         | dependable, predictable way. As such, the guiding principle for
         | Babelfish is correctness, with no compromises. What do I mean
         | by correctness? Namely, that applications designed to use SQL
         | Server semantics will behave the same on PostgreSQL as they
         | would on SQL Server.
         | 
         | I am assuming they just mean in regards to types and the like -
         | to get the same performance characteristics out of the code
         | would be bonkers.
        
         | dragonwriter wrote:
         | > Sounds like they're open sourcing it to get some help on it.
         | 
         | My take is that they are open sourcing it because they don't
         | just want to use it to compete with Microsoft to host SQL
         | Server workloads on a better basis than Microsoft's licensing
         | policy has let them in the past, but also to just undercut
         | Microsoft's SQL Server licensing revenue generally.
         | 
         | As a business strategy, sure, but given the way everyone I've
         | heard from AWS talks about Microsoft and SQL Server licensing,
         | I wouldn't be surprised if there was a good deal of spite
         | involved, too.
        
           | coredog64 wrote:
           | One word: JEDI.
        
         | x0x0 wrote:
         | It feels like this is AWS fighting back with Microsoft after
         | Microsoft raised fees to run SQL Server on AWS. Not only will
         | AWS build this for themselves, but they'll contribute for
         | anyone who wants it. That's a bigger blow to Microsoft than
         | just facilitating migrations from MS SQL to pg in AWS; this
         | impacts non-cloud-lifted database revenue.
        
       | joshuaellinger wrote:
       | This is great. I suspect that I have exactly the right use case
       | for this.
       | 
       | The two main issues with running SQL Server are (1) you have to
       | license all the cores on a system and (2) the standard license
       | only recognizes up to 64GB RAM. So I actually wound up buying a
       | 3GHz single-socket system for around $10K to save $20K on the SQL
       | license.
       | 
       | With this, I can move a couple of the big DBs to another system
       | that has 32 Cores with 256GB RAM and the entire DB will fit in
       | memory, put in 5GB ethernet, and gain a tremendous amount of
       | performance.
       | 
       | But, more importantly, I can migrate the workload on a case-by-
       | case basis. Human costs always dwarf my software and hardware
       | costs.
        
         | sqlserver1 wrote:
         | (2) is not accurate. SQL Server Standard Edition supports up to
         | 128 GB RAM
         | 
         | https://docs.microsoft.com/en-us/sql/sql-server/editions-and...
        
       ___________________________________________________________________
       (page generated 2020-12-01 23:01 UTC)