[HN Gopher] Show HN: Hydra 1.0 - open-source column-oriented Pos...
       ___________________________________________________________________
        
       Show HN: Hydra 1.0 - open-source column-oriented Postgres
        
       hi hn, hydra ceo here  hydra is an open source, column-oriented
       postgres. you can set up remarkably fast aggregates on your project
       in minutes to query billions of rows instantly.  postgres is great,
       but aggregates can take minutes to hours to return results on large
       data sets. long-running analytical queries hog database resources
       and degrade performance. use hydra to run much faster analytics on
       postgres without making code changes. data is automatically loaded
       into columnar format and compressed. connect to hydra with your
       preferred postgres client (psql, dbeaver, etc).  following 4 months
       of development on hydra v0.3.0-alpha, our team is proud to share
       our first major version release. hydra 1.0 is under active
       development, but ready for use and feedback. we're aiming to
       release 1.0 into general availability (ga) soon.  for testing, try
       the hydra free tier to create a column-oriented postgres instance
       on the cloud. https://dashboard.hydra.so/signup
        
       Author : coatue
       Score  : 203 points
       Date   : 2023-08-03 16:19 UTC (6 hours ago)
        
 (HTM) web link (hydra-so.notion.site)
 (TXT) w3m dump (hydra-so.notion.site)
        
       | florianherrengt wrote:
       | Can you add the extension to an existing database?
        
         | ahmedfromtunis wrote:
         | That's also my question. Couldn't find anything for it in the
         | docs provided.
         | 
         | Also, how to migrate data from an existing database? (Is it the
         | usual pg_dump/psql combo?)
        
       | mdaniel wrote:
       | I wanted to say thank you for using actual Open Source licenses.
       | It's gotten to where I treat any recent "Launch HN" or "Show HN"
       | containing "open source" in the title as "well, I wonder which
       | crazy license this project is using"
        
         | dang wrote:
         | Can you point me to examples of Launch HNs using funky
         | licenses? Show HNs are free-form but Launch HNs are curated by
         | us, and I'd like to know what red flags to watch for.
         | 
         | (As this is offtopic for the current Show HN, it might be
         | better for to email hn@ycombinator.com if you, or anyone, would
         | be willing to share that way.)
        
           | mdaniel wrote:
           | I sent email to avoid being a distraction, but I did want to
           | follow up publicly and say that I apologize for lumping
           | Launch HN into the same bucket as Show HN. For the most part
           | the Launch ones are really Open Source and I apologize for
           | the over generalization :-(
        
       | qeternity wrote:
       | Ok so this is a Citus fork.
       | 
       | Where can I read about what the differences / trade offs are? I
       | don't see anything in the docs.
        
       | ryanb_wise wrote:
       | Talked to OP last night and played around with it this morning.
       | This is something I've wanted to see added to postgres for a long
       | time, and couldn't have been done by a nicer and more
       | accommodating founder. Very excited.
        
       | zhendlin wrote:
       | awesome project - and we tested Zing Data (
       | http://www.zingdata.com ) with Hydra to make really fast
       | analytical queries on postgres scale to analytics users on mobile
       | and so far have seen great results.
        
       | s-mon wrote:
       | This looks wild! Been looking for a good event based logs DB and
       | didn't want to go full clickhouse. This will do!
        
       | techwizrd wrote:
       | This looks really impressive, and I'm excited to see how it
       | performs on our data!
       | 
       | P.S., I think the name conflicts with Hydra, the configuration
       | management library: https://hydra.cc/
        
         | entuno wrote:
         | And also the password bruteforcing tool by THC.
        
           | notpushkin wrote:
           | Also with the OIDC server by Ory [1] and a certain defunct
           | Russian darknet marketplace [2].
           | 
           | [1]: https://www.ory.sh/hydra/
           | 
           | [2]: Not today, _tovarisch mayor_.
        
       | chrisjc wrote:
       | Very out of touch with Postgres, but is there a native column
       | oriented table type option in Postgres so that you choose either
       | row-based or columnar in the CREATE TABLE DDL?
        
         | btown wrote:
         | I don't believe Postgres has this natively, but an alternative
         | to OP is Citus, a Postgres extension which allows this kind of
         | syntax.
         | 
         | https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...
         | 
         | EDIT: per another comment, OP is a fork of Citus Columnar!
        
       | zrizavi17 wrote:
       | Such a game changer and useful alternative to legacy databases!
        
       | hgimenez wrote:
       | Always awesome to see folks moving Postgres forward. Congrats on
       | the launch!
        
       | mitchpatin wrote:
       | super impressive performance improvements!
       | 
       | do most of your customers replicate their postgres database to
       | Hydra for analytics jobs, or what's the typical set up?
        
       | say_it_as_it_is wrote:
       | Hydra Columnar is a fork from Citus Columnar c. April, 2022.
        
         | thih9 wrote:
         | Citus: https://github.com/citusdata/citus
         | 
         | BTW, Citus license is GNU Affero General Public License (github
         | lists "conditions: same license") and hydra is Apache. How is
         | that possible if the latter is a fork? There's probably
         | something about these licenses I'm not aware of and I'm
         | curious.
        
           | jerrysievert wrote:
           | hydra columnar inherits its license from citus: https://githu
           | b.com/hydradatabase/hydra/blob/main/columnar/LI...
           | 
           | but, hydra itself is more than just the columnar extension.
        
             | thih9 wrote:
             | Thanks for explaining. This is confusing to me as a github
             | user, i.e. if I saw a license in the project's description,
             | I wouldn't expect another license in a subdirectory.
             | 
             | Github now has UI for repos with multiple licenses:
             | https://github.blog/changelog/2022-05-26-easily-discover-
             | and... , that would have been clearer for me.
        
       | mlenhard wrote:
       | Congrats on the launch!
       | 
       | For those who have not experimented with columnar based
       | databases, I would highly recommend toying around with them.
       | 
       | The performance improvements can be substantial. Obviously there
       | are drawbacks involved with integrating a new database into your
       | infrastructure, so it is exciting to see columnar format
       | introduced to Postgres. Removes the hurdle of learning, deploying
       | and monitoring another database.
        
       | quadrature wrote:
       | How are updates handled, is it doing a merge on read ?
        
       | pajep wrote:
       | can I ask if you guys take contributors?
        
       | edublancas wrote:
       | Congrats on the 1.0 milestone!
       | 
       | A few months ago, we worked with the team to bring Hydra to
       | Jupyter, you can check out the tutorial here:
       | https://docs.hydra.so/analyze/jupyter
       | 
       | JupySQL's GitHub: https://github.com/ploomber/jupysql
        
         | coatue wrote:
         | Nice and thanks @edublancas, that was a useful tutorial and
         | it's nice to be able to query hydra with SQL via Jupyter. fan
         | of your project
        
       | therealwardo wrote:
       | how does Hydra compare to Citus? https://www.citusdata.com
        
         | jerrysievert wrote:
         | generally faster across the board, a lot of work was done to
         | expand and speed it up, plus updates, deletes, and vacuuming.
         | 
         | https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQXRoZW5hIC...
        
           | rubiquity wrote:
           | Since benchmarks can be misleading I want to point out that
           | the differences between Hydra and the "tuned"[0] PostgreSQL
           | (which are some very basic settings) are a lot less
           | convincing, with plain old PG coming ahead on quite a few:
           | https://tinyurl.com/eju9tht2
           | 
           | I also noticed quite a bit of parity between Hydra and Citus
           | on data set size. Is Hydra a fork of Citus columnar storage?
           | 
           | 0 - https://github.com/ClickHouse/ClickBench/blob/main/postgr
           | esq...
        
             | riku_iki wrote:
             | > 0 - https://github.com/ClickHouse/ClickBench/blob/main/po
             | stgresq...
             | 
             | that postgres config is very underpowered, it has only 8
             | workers per gather while machine has 192 vcpus.
        
             | arp242 wrote:
             | > plain old PG coming ahead on quite a few
             | 
             | I found that is common among these types of databases (e.g.
             | Citus, Timescale, etc.) which perform well under very
             | specific conditions, and worse for many (most?) other
             | things, sometimes _significantly_ worse.
             | 
             | That said, Hydra does take up ~17.5G for that benchmark and
             | "PostgreSQL tuned" about 120G, the insert time is ~9 times
             | faster, and "cold run" is quite a bit faster too. It's only
             | "hot run" that shows a fairly small difference. I think
             | it's fair to say Hydra "wins" that benchmark.
             | 
             | > Is Hydra a fork of Citus columnar storage?
             | 
             | Yes: "Hydra Columnar is a fork from Citus Columnar c.
             | April, 2022".
        
               | riku_iki wrote:
               | > Hydra does take up ~17.5G for that benchmark and
               | "PostgreSQL tuned" about 120G
               | 
               | you can run pg on compressed filesystem
        
               | arp242 wrote:
               | I'm sure you can, but AFAIK neither uses compression in
               | that benchmark so it's a fair comparison. Even _if_
               | filesystem compression would reduce that to 17.5G
               | (doubtable), it won 't be free in terms of CPU cycles,
               | and no matter what it's still ~120G to load in memory,
               | bytes to scan/update, etc.
        
               | riku_iki wrote:
               | my bet is that hydra uses compression inside already,
               | otherwise it is hard to explain where difference comes
               | from.
               | 
               | > it won't be free in terms of CPU cycles
               | 
               | it can reduce IO traffic significantly, and it can be
               | very positive trade off depending on circumstances.
        
               | arp242 wrote:
               | I had assumed that PostgreSQL is so much larger because
               | it creates heaps of indexes (which is probably also why
               | inserts are so much slower for it), but I don't really
               | have a good way to confirm that quickly.
        
               | riku_iki wrote:
               | one can choose to not create "heaps of indexes".
        
               | arp242 wrote:
               | At which point your performance will drop like a brick
               | for these types of queries - I'm pretty sure these
               | indexes weren't added for the craic.
        
               | riku_iki wrote:
               | it depends on your query obviously.
               | 
               | In general, I did very deep benchmarking of pg,
               | clickhouse and duckdb, and I sure didn't make stupid
               | mistakes like this:
               | https://news.ycombinator.com/item?id=36990831
               | 
               | My dataset has 50B rows and 2tb of data, and I think
               | columnar dbs are very overhiped and I chose pg because:
               | 
               | - pg performance is acceptable, maybe 2-5x times slower
               | than clickhouse and duckdb on some queries if pg is
               | configured correctly and run on compressed storage
               | 
               | - clickhouse and duckdb start falling apart very fast
               | because they specialized on very narrow type of queries:
               | https://github.com/ClickHouse/ClickHouse/issues/47520
               | https://github.com/ClickHouse/ClickHouse/issues/47521
               | https://github.com/duckdb/duckdb/discussions/6696
        
               | benn0 wrote:
               | Do you have happen to have any documentation about your
               | benchmarking? I'm also considering these options at the
               | moment (currently using pg+timescaledb) and interested in
               | what you found.
        
               | zX41ZdbW wrote:
               | ClickHouse can do large GROUP BY queries, not limited by
               | memory: https://clickhouse.com/docs/en/sql-
               | reference/statements/sele...
        
               | riku_iki wrote:
               | as explained in https://github.com/ClickHouse/ClickHouse/
               | issues/47521#issuec... it can't, that parameters only
               | applies on pre aggregation phase but not aggregation.
               | 
               | Feature request is not implemented yet:
               | https://github.com/ClickHouse/ClickHouse/issues/40588
        
           | adr1an wrote:
           | Right when I was thinking URL shorteners were out of
           | fashion... /S
        
             | biugbkifcjk wrote:
             | It's just there to make it easier for mobile users to click
             | it..
        
               | setr wrote:
               | I don't see why the GitHub link is any harder to click
               | than the tiny url link in that post.
               | 
               | I'm pretty sure the only reason url shorteners exist with
               | purpose is because of Twitter limits (and software that
               | doesn't visually hide egregiously long urls), but
               | continues to be used outside of those places due to cargo
               | culting
        
       | pella wrote:
       | Congratulations!
       | 
       | Can we expect support for gist, gin, spgist, and brin indexes
       | sometime in the near future?
       | 
       | Based on the source code, it appears that they are not supported:
       | 
       | https://github.com/hydradatabase/hydra/blob/96056312e7c0f413...
       | 
       |  _"... Columnar supports `btree` and `hash `indexes (and the
       | constraints requiring them) but does not support `gist`, `gin`,
       | `spgist` and `brin` indexes. "_
        
         | nerdponx wrote:
         | Do different kinds of indexes work better for columnar storage?
         | Or is it the same principles for both?
        
           | hodgesrm wrote:
           | Difference principles of indexing, as least based on my
           | experience with ClickHouse.
           | 
           | * Column-based stores have really fast scans due to
           | compression and vectorization, so you'll generally always
           | read down the column. The way to speed it up is to have "skip
           | indexes" that allow you to skip blocks, e.g., don't even
           | bother to read/decompress them.
           | 
           | * Commonly used indexes need to be very sparse, so they fit
           | in memory even when tables run to hundreds of billions of
           | rows.
           | 
           | * Finally highly compressed columns can be used as indexes to
           | filter data rapidly. ClickHouse calls this PREWHERE
           | processing.
           | 
           | Edit: clarify skip indexes
        
           | pella wrote:
           | We need a spatial index for spatial (columnar) data!
           | 
           | - https://www.crunchydata.com/blog/the-many-spatial-indexes-
           | of...
           | 
           | - http://postgis.net/workshops/postgis-intro/indexing.html
           | 
           | - Spatial indexes for OSM in PostGIS (PDF) : https://pretalx.
           | com/media/sotm2019/submissions/CAD93S/resour...
        
       | Iwan-Zotow wrote:
       | > to query billions of rows instantly
       | 
       | Rows? Rows?!? What's the point to have columnar DB to query rows?
        
         | ithkuil wrote:
         | Columnar DBs often allow you to have tables consisting of
         | multiple columns where values in a column are correlated with
         | values in the other columns. All such correlated values
         | belonging to different columns are commonly called a "row"
         | despite not being stored contiguously.
         | 
         | Generally what you do is to scan a column and evaluate a
         | predicate in each value you encounter during the scan (possibly
         | in parallel). For each value of that column that matches the
         | predicate you then keep track of the "position" of the value in
         | the column (a common technique is sparse set data structure
         | such as for example a roaring bitmap). Then you scan through
         | another column and select values for the saved "positions".
         | 
         | As you can see, it's not a stretch to view values from
         | different columns that belong to the same "position" as
         | belonging to the same "row" and the "position" to be the "row
         | index" or "row id"
        
       | afeiszli wrote:
       | Really cool!
        
       | burcs wrote:
       | This is really cool, just played around with it a bit but excited
       | to do a deeper dive later. Nice work guys!
        
         | coatue wrote:
         | Thanks! feel free to DM me in the hydra discord (or email)
         | anytime
        
       | joshgray wrote:
       | Congrats to the entire Hydra team on the launch! We (Artemis -
       | https://www.artemisdata.io/) are stoked to be build with you as a
       | partner to help data teams analyze data even faster!
        
       | ushakov wrote:
       | Are you funded?
        
         | coatue wrote:
         | Check out the "about us" section on the page for more details!
         | "We are excited to share that Hydra raised a $3.1M seed round
         | to drive development of columnar Postgres. We remain committed
         | to sharing our upcoming releases to open source."
        
       | wrowan33 wrote:
       | Congrats on the success!
        
       | carlod wrote:
       | Congrats guys!
        
       ___________________________________________________________________
       (page generated 2023-08-03 23:00 UTC)