[HN Gopher] Launch HN: Hydra (YC W22) - Query Any Database via P...
       ___________________________________________________________________
        
       Launch HN: Hydra (YC W22) - Query Any Database via Postgres
        
       Hi HN, we're Joe and JD from Hydra (https://hydras.io/). Hydra is a
       Postgres extension that intelligently routes queries through
       Postgres to other databases. Engineers query regular Postgres, and
       Hydra extends a Postgres-compliant SQL layer to non-relational,
       columnar, and graph DBs. It currently works with Postgres and
       Snowflake, and we have a roadmap to support MongoDB, Google
       BigQuery, and ClickHouse.  Different databases are good at
       different things. For example, Postgres is good at low-latency
       transactional workloads, but slow when running analytical queries.
       For the latter, you're better off with a columnar database like
       Snowflake. The problem is that for each new database added to a
       system, application complexity increases quickly.  Working at
       Microsoft Azure, I saw many companies juggle database trade-offs in
       complex architectures. When organizations adopted new databases,
       engineers were forced to rewrite application code to support the
       new database or use multiple apps to offset database performance
       tradeoffs. All this is expensive busy work that frustrates
       engineers. Adopting new databases is hard and expensive.  Hydra
       automatically picks the right DB for the right task and pushes down
       computation, meaning each query will get routed to where it can be
       executed the fastest. We've seen results return 100X faster when
       executing to the right database.  We've chosen to integrate with
       Snowflake first so that developers can easily gain the analytical
       performance of Snowflake through a simple Postgres interface. To an
       application, Hydra looks like a single database that can handle
       both transactions and analytics. As soon as transactions are
       committed in Postgres, they are accessible for analytics in real-
       time. Combining the strengths of Postgres and Snowflake in this way
       results in what is sometimes called HTAP: Hybrid Transactional-
       Analytical Processing
       (https://en.wikipedia.org/wiki/Hybrid_transactional/analytica...),
       which is the convergence of OLTP and OLAP.  Existing solutions are
       manual and require communicating with each datastore separately.
       The common alternative is trying to combine all of your data
       together into a data warehouse via ETL. That works well for
       analysts and data scientists, but isn't transactional and can't be
       used to power responsive applications. With Hydra engineers can
       write unified applications to cover workloads that had to be
       separate before.  Hydra runs as a Postgres extension, which gives
       it the ability to use Postgres internals and modify execution of
       queries. Hydra intercepts queries in real-time and routes queries
       based on query type, user settings, and Postgres' cost analysis.
       Writes and operational reads go to Postgres, analytical workloads
       go to Snowflake.  Recently committed transactions are moved from
       Postgres to Snowflake in near real-time using Hydra Bridge, our
       built-in data pipeline that links databases from within Postgres.
       The bridge is an important part of what we do. Without Hydra,
       workloads are typically isolated between different databases,
       requiring engineers to implement slow and costly ETL processes.
       Complex analytics are often run on older data, updated monthly or
       weekly. The Hydra bridge allows for real-time data movement,
       enabling analytics to be run on fresh data.  We make money by
       charging for Hydra Postgres, which is a Postgres managed service,
       and Hydra Instance, which attaches Hydra to your existing Postgres
       database. Pricing is listed on the product pages:
       https://hydras.io/products/postgres and
       https://hydras.io/products/instance.  A little about our
       backgrounds: Joseph Sciarrino - Former PM @ MSFT Azure Open-Source
       Databases team. Heroku (W08) and Citus Data (S11) alum. Jonathan
       Dance - Director @ Heroku (2011-2021)  Using Hydra you can create a
       database cluster of your own design. We'd love to know what Hydra
       clusters you'd be interested in creating. For example,
       Elasticsearch + Postgres, BigQuery + SingleStore + Postgres, etc.
       Remember - You can experiment different combinations without
       rewriting queries, since Hydra extends Postgres over these other
       databases. When you think about databases like interoperable parts
       you can get super creative!
        
       Author : coatue
       Score  : 243 points
       Date   : 2022-02-23 16:18 UTC (6 hours ago)
        
 (HTM) web link (hydras.io)
 (TXT) w3m dump (hydras.io)
        
       | tluyben2 wrote:
       | Shame it's not OSS but I get that. The 'no lock-in' statement on
       | the site; if we would see speedups in both dev and execution
       | performance by using this and develop everything on it from that
       | point to improve working with data easier across the enterprise,
       | how are we not locked in when we you decide to do something else
       | or sell to Oracle? The latter happened to us, quite exactly and
       | that's why no OSS is no go for dev infrastructure.
       | 
       | Definitely nice work though and best of luck!
        
         | wuputah wrote:
         | Hi, JD here, Hydra's CTO. It's still early days and we are
         | considering open source; for now, we wanted to leave our
         | options open, and OSS feels like a one-way door. I think you
         | make a great point here - thanks for sharing your past pain /
         | experience. Definitely food for thought.
         | 
         | Our "no lock-in" claim refers to your data, since Hydra is
         | Postgres, you're not stuck using "HydraDB" forever -- it's
         | relatively easy to migrate in or out since you can use well
         | established Postgres tools. We also are open to licensing the
         | product should you wish to self-host, on-prem, etc.
        
       | jzelinskie wrote:
       | What does it take to collaborate on a backend? We've investigated
       | building a Postgres extension for querying SpiceDB[0] and Hydra
       | seems like it could help. What kind of consistency guarantees can
       | be made?
       | 
       | [0]: https://github.com/authzed/spicedb
        
       | hangonhn wrote:
       | This looks amazing. Love the strong Snowflake integration -- very
       | forward looking. I just passed this onto our Data Science team.
        
       | tullie wrote:
       | Looking forward to the support for MongoDB and other no-sql
       | stores. Interested to hear how you're trying to approach that.
        
         | wuputah wrote:
         | Hi! JD here, Hydra's CTO. NoSQL is certainly a challenge but we
         | have a few ideas/angles on how to solve it. Certainly we plan
         | to start with simple queries and then iterate from there based
         | on what our customers need.
         | 
         | We are really excited about the prospect of bringing SQL and
         | NoSQL together!
        
           | jvalencia wrote:
           | This is definitely something I could see myself paying for
           | --- but only if I could somehow get relational performance
           | for nasty Mongo aggregate queries.
        
             | jd_mongodb wrote:
             | Aggregations by their nature are designed to work on a
             | substantial footprint of data. As a result changing the
             | query model is unlikely to speed up the operation of
             | aggregation. In fact, as most of these libraries require
             | the data to be shipped to the client (whereas aggregation
             | queries run on the server) you will likely see
             | substantially reduced performance.
        
               | wuputah wrote:
               | Hydra doesn't ship data to the client in order to then do
               | further work like aggregations -- that's the whole point
               | of Hydra -- but that also means that you won't be able to
               | "workaround" a performance issue with an underlying data
               | store. For that, we'd need to find a way to replicate the
               | data to a data store that can solve the aggregation
               | performance issue.
        
       | tyingq wrote:
       | Interesting. I'm curious about how you handle security now, and
       | what the plans are. That is, is there any integration between the
       | roles/rights my postgres session user has, and the roles/rights I
       | have on the downstream database.
        
       | skrtskrt wrote:
       | Could this also improve either developer experience or query
       | performance when working with something like Redshift, which is a
       | columnar OLAP store that already uses a Postgres dialect?
        
       | michaelmior wrote:
       | For anyone interested, Apache Calcite[0] is an open source data
       | management framework which seems to do many of the same things
       | that Hydra claims to do, but taking a different approach.
       | Operating as a Java library, Calcite contains "adapters" to many
       | different data sources from existing JDBC connectors to
       | Elasticsearch to Cassandra. All of these different data sources
       | can be joined together as desired. Calcite also has it's own
       | optimizer which is able to push down relevant parts of the query
       | to the different data sources. However, you get full SQL on data
       | sources which don't support it, with Calcite executing the
       | remaining bits itself.
       | 
       | Generally all that is required to connect to multiple data
       | sources from CSV to Elasticsearch is just writing a JSON
       | configuration file. Then can get SQL access via JDBC with the
       | able to join all those sources together.
       | 
       | Unfortunately, I would not be too surprised if the query
       | execution Calcite was found to be less performance-optimized than
       | Hydra. There is ongoing work for improvement there. That said,
       | there are users of Calcite at Google, Uber, Spotify, and others
       | who have made great use of various parts of the framework.
       | 
       | [0] https://calcite.apache.org/
        
         | sdesol wrote:
         | I've never heard of Calcite before and dug a bit deeper into
         | this project and found it to be quite active. Only taking into
         | consideration changes to Java files, this is the repo activity
         | for the last 12 months:                    month  | authors |
         | commits | files | churn
         | ---------+---------+---------+-------+-------         2022-02 |
         | 21 |      62 |    84 |  4387         2022-01 |      26 |
         | 109 |   243 | 32429         2021-12 |      33 |      99 |   198
         | | 10461         2021-11 |      19 |      49 |    77 |  6960
         | 2021-10 |      26 |      64 |   371 | 13626         2021-09 |
         | 18 |      41 |    68 |  2258         2021-08 |      11 |
         | 17 |    25 |  1924         2021-07 |      17 |      30 |    51
         | |  2704         2021-06 |      14 |      31 |    28 |  1708
         | 2021-05 |       9 |      17 |    35 |  1606
         | 2021-04 |      11 |      46 |    99 |  4224         2021-03 |
         | 16 |      36 |   143 |  8471
        
           | riverdroid wrote:
           | What do you use to produce this analysis?
        
             | sdesol wrote:
             | My product (https://gitsense.com) moves most of Git's
             | history into a Postgres database and from there, you can
             | execute the following SQL statement:
             | select             commit_ym AS month,
             | count(distinct(author_email)) as authors,
             | count(distinct(commit_id)) as commits,
             | count(distinct(path_id)) as files,             sum(total)
             | as churn         from             z1_commits_422 as
             | commits,             z1_changes_422 as changes,
             | z1_code_churn_422 as churn         where
             | commits.id=changes.commit_id and
             | changes.code_churn_id=churn.id and             lang='java'
             | group by commit_ym         order by commit_ym desc
             | limit 12
             | 
             | By having most of Git's history in SQL, I can slice, dice
             | and cross-reference code history, which is how my product
             | works.
        
         | imachine1980_ wrote:
         | What's the difference whit regular ORM?(genuine ask)
        
           | michaelmior wrote:
           | ORMs map relational databases to objects. Calcite does not do
           | that. Calcite takes different data sources (potentially with
           | different data models) and presents them all in the
           | relational model. You interact with your data entirely in SQL
           | while ORMs typically have some DSL which uses whatever object
           | model the ORM defines. ORMs are also typically designed to
           | connect to a single data source at a time. I'm personally not
           | familiar with any ORMs that allow combining data from
           | multiple sources within the same application.
           | 
           | The primary advantage of Calcite for connecting multiple data
           | sources are 1) easily joining data sources that use
           | completely different APIs (assuming there is a Calcite
           | adapter available) and 2) supporting more complex queries
           | than the original data source supports without having to
           | write code (other than SQL) to do the processing.
        
             | gavinray wrote:
             | Don't forget having two industrial-grade academic query
             | planners/optimizer implementations that took collective
             | decades of engineering effort!
             | 
             | And the project being founded by someone who has written
             | multiple RDBMS.
             | 
             | Calcite is a wonder.
        
         | wuputah wrote:
         | Calcite is definitely some cool tech. I can see why it would be
         | attractive for bigger teams, but it seems like a big lift for
         | smaller teams. Our goal is to make it easy for devs already
         | familiar with Postgres to be able to use add databases without
         | learning new tools or adding software... besides adding Hydra,
         | of course!
        
           | michaelmior wrote:
           | That's a fair point. Although if you're already using a JVM
           | language, it's incredibly easy to integrate. Just another
           | JDBC data source with some JARs to add to your classpath :)
        
         | gavinray wrote:
         | Calcite is actually pretty damn fast, the overhead is
         | surprisingly minimal.
         | 
         | I am using it as the backbone for my hobby project that auto-
         | generated federated GraphQL API's:
         | 
         | https://github.com/GavinRay97/GraphQLCalcite
         | 
         | The experience has been incredibly positive and the community
         | has been incredibly helpful & supportive. It's one of the
         | coolest technical projects I've ever seen and has sparked my
         | interest in query engines and relational databases.
         | 
         | I posted some JMH benchmarks of an app that parses a GraphQL
         | query, converts it to Calcite relational expressions, and then
         | executes it against an in-memory DB and it ran on the orders of
         | milliseconds:
         | 
         | https://lists.apache.org/thread/hofjx628864t0kt4kk8vo4tjfrxb...
         | 
         | Something very similar to Calcite but much lesser-known is the
         | "Teiid" project:
         | 
         | https://github.com/teiid/teiid
         | 
         | Highly recommend checking the code out. It's got a brilliant
         | query optimizer/planner tailored to cross-datasource queries, a
         | cache system, and a translator architecture that can convert a
         | generic SQL dialect to dozens of SQL/NoSQL flavors.
         | 
         | Also integrates other data sources like REST API's, S3, flat
         | files etc as queryable data sources.
        
       | bradly wrote:
       | This looks great! Couple questions...
       | 
       | 1.) Can you talk a bit about how this is better that the existing
       | foreign data wrappers Postgres has available?
       | 
       | 2.) Any thoughts on S3 support? More and more I see teams using
       | S3 as a data store for certain specific use cases.
        
         | wuputah wrote:
         | Hi, I wrote a response
         | [https://news.ycombinator.com/item?id=30443033] about FDWs, I
         | hope it answers your questions.
         | 
         | We will definitely think about S3 support! Would love to
         | understand those use cases more and how Hydra could help.
        
       | alexvboe wrote:
       | Congrats on the launch, this is amazing!
        
       | chrisweekly wrote:
       | Wow. This seems like such a staggeringly good idea. Congrats on
       | launch, and kudos for bringing this to life! Curious about the
       | overhead (ie, benchmarks for the simplest scenario: vanilla
       | postgres vs going through hydra for the same queries and load).
       | But unless there's a huge hit there (which seems unlikely), this
       | seems like a really exciting development.
        
         | wuputah wrote:
         | Hi, I'm JD, Hydra's CTO. There's no perceivable overhead to
         | using Hydra on queries being routed to Postgres. I think you
         | would not be able to see Hydra in the noise in a benchmark --
         | but it's a great idea to demonstrate this! I will do a blog
         | post! :) Of course, if you were to use Hydra Instance (where
         | your Postgres database is remote) then there will be some
         | network latency.
        
       | gavinray wrote:
       | > Hydra automatically picks the right DB for the right task and
       | pushes down computation, meaning each query will get routed to
       | where it can be executed the fastest. We've seen results return
       | 100X faster when executing to the right database.
       | 
       | This is really interesting. Could you talk a bit more about query
       | pushdown and planning/optimization?
       | 
       | Is this through FDW's? Would love to hear more about the
       | technical details.
       | 
       | Shameless plug -- I work at Hasura (turn DB's into GraphQL API's)
       | and this seems incredibly synergistic and useful to get access to
       | databases we don't have native drivers for at the moment.
       | 
       | Any chance of an OSS limited version?
        
         | wuputah wrote:
         | Hi! JD here, Hydra's CTO.
         | 
         | Hydra does not use FDWs except for Postgres-to-Postgres
         | communication (for now). What we found was that FDWs do not do
         | pushdown very well, even when Postgres has full information.
         | You can get FDWs to push down aggregations, but complex queries
         | with subqueries etc quickly get slow again. In short, our goal
         | is to have your queries take full advantage of the power of
         | each datastores, and we found that FDWs do not accomplish that
         | goal.
         | 
         | We want to support GraphQL at some point, so same goes for us!
         | 
         | We are thinking about an OSS version, I think how we do
         | "limited" is a big part of what that means. What would you like
         | to see in an OSS version? What would you use it for?
        
           | gavinray wrote:
           | Thanks for the explanation =D                 > What would
           | you like to see in an OSS version? What would you use it for?
           | 
           | I think that's a difficult question to answer because it's
           | hard to do data-access partially. How do you gate that, so it
           | doesn't give everything away for free and incentivizes people
           | to still pay you?
           | 
           | Read-only access might be one way, but I'm unsure how popular
           | that would be.                 > What would you use it for?
           | 
           | Generating GraphQL API's for other datasources by funneling
           | them through Postgres
        
             | zozbot234 wrote:
             | Parent comment is absolutely right that FDW as a general
             | query router is still under heavy development. It's very
             | likely that we'll see further improvement in forthcoming
             | Postgres releases, which will come with additional benefits
             | since FDW are used for a lot more than just "high-level"
             | query routing in Postgres.
        
               | gavinray wrote:
               | This would be great.
               | 
               | I know that EnterpriseDB is heavily invested in FDW
               | development and core Postgres stuff, so maybe we'll see
               | some more neat stuff come out of that team that makes it
               | upstream.
        
         | nathanwallace wrote:
         | Steampipe [1] is an open source [2] project that uses Postgres
         | FDWs to query 67+ cloud services (e.g. AWS, GitHub,
         | Prometheus). The plugins [3] are written in Go similar to
         | Terraform. We've found this approach very effective for DevOps
         | data and pushdown works well for most (simple) cases.
         | (Disclaimer: I'm a lead on the project.)
         | 
         | 1 - https://steampipe.io 2 -
         | https://github.com/turbot/steampipe 3 -
         | https://hub.steampipe.io/plugins
        
           | gavinray wrote:
           | Whoa, how have I not heard of this before?
           | 
           | If it's OSS, I am definitely interested in experimenting with
           | this. Maybe I can write a blogpost or something?
        
             | nathanwallace wrote:
             | Yes, the CLI, FDW, plugins and mods are all open source.
             | Please let us know how you go - we thrive on feedback :-)
        
           | isoprophlex wrote:
           | Incredible! This is SUPER useful!
        
         | brodouevencode wrote:
         | I was thinking the same thing - FDWs As A Service
        
         | chatmasta wrote:
         | We have a similar product at Splitgraph, where we do use FDWs
         | in the routing layer (along with some PgBouncer magic). We
         | recently blogged about adding aggregation pushdown to our
         | Snowflake FDW. [0]
         | 
         | [0] https://www.splitgraph.com/blog/postgresql-fdw-
         | aggregation-p...
        
           | pookeh wrote:
           | Just a suggestion, your home page's one liner is super vague
           | and confusing. I had to scroll down to really figure out what
           | is it that Splitgraph does...
           | 
           | "Splitgraph connects numerous, unrelated data sources into a
           | single, unified SQL interface on the Postgres wire protocol."
           | 
           | Just my 2 cents.
        
           | gavinray wrote:
           | Big fan of Splitgraph and I know some other folks at Hasura
           | are too
        
       | CodeAlong wrote:
       | Any plans to offer a self-hosted version of Hydra instance?
        
         | coatue wrote:
         | Hi, Joe CEO @ Hydra- Yes. It's not on our website currently,
         | but we can offer Hydra self-hosted today. Ping me and we can
         | get you set up. We have a new discord too
         | https://discord.gg/SQrwnAxtDw
        
       | imachine1980_ wrote:
       | I like but how much you can truly do whiout the specifics some of
       | the big query for example, thing are so much especific that you
       | will end up whit required bigquery that sound like ORM whit
       | postgre syntax. I like the idea.
        
       | kleebeesh wrote:
       | Looks neat, but wasn't this the promise of Presto? Presto didn't
       | seem to really work out. From what I've seen it converged to a
       | mostly analytical engine. It's still very useful, but I've never
       | seen it used (successfully) in an OLTP workload. Maybe there's
       | some difference in the intended product trajectory that I'm
       | overlooking here?
        
         | WaxProlix wrote:
         | AWS's Athena uses Presto to pretty good effect, though I guess
         | you could say those use cases are largely relegated to
         | analytical purposes.
         | 
         | Back in my consulting days, I built a distributed query system
         | based on Presto to integrate some custom/onprem data sources
         | with more distributed/cloudy ones, Hive and such, and it worked
         | well for that, too. Most of that was also ad-hoc, batch, or
         | event-driven analytics, too, but there were plans for
         | supporting production workloads.
         | 
         | I think maybe one reason people shy away from things like
         | Presto (and the above) is the uneven performance guarantees;
         | waiting for an unoptimized Hadoop or Orcfile query by accident
         | because you joined on something or another is fine for one-
         | offs, but might become costly in prod workflows.
        
           | kleebeesh wrote:
           | > I think maybe one reason people shy away from things like
           | Presto (and the above) is the uneven performance guarantees;
           | waiting for an unoptimized Hadoop or Orcfile query by
           | accident because you joined on something or another is fine
           | for one-offs, but might become costly in prod workflows.
           | 
           | Right, so my question is: how is that solved with Hydra?
           | Seems like you'd arrive at the same issue?
        
         | buremba wrote:
         | Presto is pretty successful but its focus is to be distributed
         | query engine, not a proxy layer for the existing query engines.
         | We use Trino ( formerly Presto) as our query layer and do
         | something similar to Hydra at Metriql [1] with a fairly
         | different use-case. Data people provide a semantic layer with
         | the mecrics and expose them to 18+ downstream tools.
         | 
         | [1]: https://metriql.com
        
       | agacera wrote:
       | This is really nice! Congrats!
       | 
       | I once started building as a side project something similar but
       | focused on querying cloud resources (like S3 buckets, ec2s,
       | etc... discovering the biggest file from a bucket was trivial
       | with this). I abandoned the project but someone else built a
       | startup on the same concept - even the name was the same:
       | cloudquery.
       | 
       | I built it using the multicorn [1] postgres extension and it is
       | deligthful of how easy it to get something simple running.
       | 
       | [1] https://multicorn.org/
        
       | aslakhellesoy wrote:
       | Congratulations on the launch - this sounds interesting.
       | 
       | I'm currently using Postgraphile[0], which uses Postgres'
       | introspection API to discover the schema structure.
       | 
       | Would this still work with Hydra?
       | 
       | [0] https://www.graphile.org/postgraphile/
        
         | wuputah wrote:
         | Absolutely! Hydra is 100% Postgres and supports any existing
         | Postgres-compatible tools.
        
       | abledon wrote:
       | I see a lot of software named after beasts, and a lot of 'Hydra'
       | programs/companies all doing different things. Imagine if someone
       | in 300 BC thought about how we would base our future creations
       | off mythological beasts... they would've increased the CIDR range
       | on all available beast name ideas and written a whole bunch of
       | extra stories.
        
       | garysahota93 wrote:
       | This is super powerful. While I see the immediate value in this
       | for simplifying applications, I can also see this becoming a
       | powerful tool for data analysts & data engineers in speeding up
       | their "time to insight".
       | 
       | I've had (early in their career) analysts report to me that
       | struggle writing optimal queries across relational, non-
       | relational, & graph DBs (they're usually great at one & mediocre
       | on others). This will be a huge for them & our stakeholders who
       | rely on them to get them trustworthy insights.
        
       | gunnarmorling wrote:
       | Congrats on the launch! Two questions:
       | 
       | - How does this deal with specifics of the query languages of the
       | different data stores? I'm not an expert with Snowflake, but I
       | suppose it supports specific querying capabilities not found in
       | Postgres' SQL dialect. How are those exposed to Hydra users?
       | 
       | - I'm confused by "As soon as transactions are committed in
       | Postgres, they are accessible for analytics in real-time" vs.
       | "Recently committed transactions are moved from Postgres to
       | Snowflake in near real-time". Is data propagated to Snowflake
       | synchronously or asynchronously? I.e. is it guaranteed that data
       | can be queries from Snowflake right the next moment after a
       | transaction has been committed (as suggested by the former) or
       | not (as suggested by the latter)?
       | 
       | Disclaimer: I work on Debezium, another solution people use for
       | propagating data from different databases (including Postgres)
       | into different data sinks (including Snowflake)
        
         | wuputah wrote:
         | Hi, JD here, Hydra's CTO. Thanks for the interest and
         | questions!
         | 
         | Today, queries need to be Postgres-compatible to be
         | intelligently routed, but queries with specific query syntax or
         | functions beyond Postgres can be routed with our manual
         | router[1]. This is our first solution to this problem and plan
         | to iterate in response to customer pain.
         | 
         | Sorry for the confusion! Data moves asynchronously -- we're not
         | trying to implement multi-phase commits -- but we can act on
         | data very quickly once committed. Our solution here uses
         | Postgres logical replication. Using the Data Bridge is optional
         | and a customer's existing solutions are welcome as well.
         | 
         | [1]: https://hydras-
         | io.notion.site/Router-a91f5282f1354c54a9ba894...
        
       | sequoia wrote:
       | > Hydra automatically picks the right DB for the right task and
       | pushes down computation, meaning each query will get routed to
       | where it can be executed the fastest.
       | 
       | Does this mean the data is duplicated to all the available
       | storage backends?
        
         | wuputah wrote:
         | Hi, JD here, CTO at Hydra. In an HTAP scenario, local
         | transactional data would be replicated, but your data warehouse
         | will likely have a great amount of data that your Postgres
         | database does not. You can still connect that data to Postgres
         | with Hydra. Ultimately, it's up to you if/how you choose to
         | replicate your data -- along with guidance from our team along
         | the way.
        
       | edublancas wrote:
       | Congrats on the launch! Coming from a data science role, this
       | could've been pretty useful for my previous projects. I had to
       | rewrite all of my feature engineering queries when the company I
       | worked at moved to Snowflake.
       | 
       | One question I have is how to Hydra balances writing postgres
       | scripts vs leveraging system-specific features. For example, I
       | remember going through Snowflake's documentation and found
       | interesting functions for data aggregation. Can I leverage
       | Snowflake-specific features when using Hydra?
        
         | wuputah wrote:
         | Hi, JD here, Hydra's CTO. Thanks for the great question!
         | 
         | You can use our manual router[1] to route queries that use a
         | specific syntax or functions. The way this works today is you
         | wrap your query in an SQL function. In the future, we could
         | detect use of a specific features and route those queries
         | appropriately. I think there might be other ways to solve this
         | as well e.g. by having a 'stub' aggregate function in Postgres
         | for the function you want to call. We are working with
         | customers to iterate on issues like this as they occur.
         | 
         | [1]: see "Manual Routing" at https://hydras-
         | io.notion.site/Router-a91f5282f1354c54a9ba894...
        
       | teej wrote:
       | Been thinking about this sort of thing for awhile, your vision
       | for how this should work is so much better than mine was.
       | 
       | One of the ideas I kicked around was "materialize-on-read" - when
       | a query comes in but the underlying data is stale, refresh the
       | views first then serve the query.
       | 
       | I'm wondering how much state you plan to put into the Hydra layer
       | or if you plan to keep it mostly a router.
        
       ___________________________________________________________________
       (page generated 2022-02-23 23:00 UTC)