[HN Gopher] Launch HN: Grai (YC S22) - Open-Source Data Observab...
       ___________________________________________________________________
        
       Launch HN: Grai (YC S22) - Open-Source Data Observability Platform
        
       Hi HN, my name is Ian. My co-founder Edward and I started Grai
       (https://grai.io), an open-source data observability platform. It
       helps prevent production data outages by evaluating changes to your
       data pipelines in CI, rather than at runtime.  Ever experienced a
       production outage due to changes in upstream data sources? That's a
       problem we regularly encountered whether deploying machine learning
       or keeping a datawarehouse operational and it led us to create
       Grai.  Systematically testing the impact of data changes on the
       rest of your stack turns out to be quite difficult when the same
       data is copied and used across many different services and
       applications. Simple changes like renaming a column in a database
       can result in broken BI dashboards, incorrect training data for ML
       models, and data pipeline failure. For example, business users
       regularly deal with questions like "why does revenue look different
       in different dashboards".  These sort of problems are commonly
       dealt with by passively monitoring application execution logs for
       anomalies that might indicate an outage. Our goal was to move that
       task out of runtime where an outage has already occurred back into
       testing.  At its core, Grai is a graph of the relationships between
       the data in your organization, from columns in a database to JSON
       fields in an API. This graph allows Grai to analyze the downstream
       impact of proposed changes during CI and before they go live.  It
       includes a variety of pre-built integrations with common data tools
       such as PostgreSQL, Snowflake, dbt, and Fivetran, which
       automatically extract metadata and synchronize the state of your
       graph. It's built on a flexible data model backed by REST and
       GraphQL APIs and a Python client library. This way, users can
       directly build on top of Grai as they see fit. For example, because
       every object in Grai serializes to a yaml definition file, sort of
       like a CRD in Kubernetes, even if a pre-built integration doesn't
       exist it's fairly easy to manually create or script a custom
       solution.  We made the decision to build open-source from the
       beginning in part because we believe lineage is underutilized both
       organizationally and technologically. We hope to provide a
       foundation for the community to build cool concepts on top and have
       already had companies come to us with amazing ideas, like
       optimizing their real-time query pipelines to take advantage of
       spot price arbitrage between cloud and on-prem.  We try not to be
       overly opinionated about how organizations work, so whether you
       maintain a development database or run service containers in GitHub
       Actions it doesn't really matter. When your tests are triggered we
       evaluate the new state of the environment and check for any
       impacts, before reporting back as a comment in the pull request.
       Data observability can have unexpected benefits. One of our
       customers uses us because we make on-boarding new engineers easier.
       Because we render an infinitely zoomable Figma-like graph of the
       entire data stack it's possible for them to visually explore end-
       to-end data flows and application dependencies.  You can find a
       quick demo here: https://vimeo.com/824026569, we've also put
       together an example getting started guide if you want to try things
       out yourself: https://docs.grai.io/examples/enhanced-dbt. Since
       everything is open source, you can always explore the code
       (https://github.com/grai-io/grai-core) and docs
       (https://docs.grai.io), where we have example deployment
       configurations for docker-compose and Kubernetes.  We would love to
       hear your feedback. If there's a feature we're missing, we'll build
       it. If you have a UX or developer experience suggestion, we'll fix
       it. If it's something else, we want to hear about it. We can't wait
       to hear your feedback and thank you in advance!
        
       Author : ersatz_username
       Score  : 68 points
       Date   : 2023-07-17 13:40 UTC (9 hours ago)
        
       | MattSWilliamson wrote:
       | We had this problem at one of my previous companies; glad to see
       | someone addressing it, and the open-source approach just makes so
       | much sense. Best of luck
        
       | pdimitar wrote:
       | Hey, I like this project and will write it down to show it to
       | superiors AtOnePoint(tm). Looks well done.
       | 
       | If you allow me a remark on the website: it requires JS from 8
       | separate domains to show content which is fine but I know that
       | more technical readers can be sensitive to these aspects.
       | Secondly, the browser addon DarkReader doesn't work well with the
       | website so I had to turn it off and could only browse it in light
       | mode.
       | 
       | Perhaps these could be actionable points for the future.
       | 
       | Good job and keep going!
        
         | ersatz_username wrote:
         | Shoot! I wasn't familiar with DarkReader but I just created a
         | ticket to see if we can get it fixed. We recently redid the
         | website and there's still plenty of room for improvement.
         | Thanks for pointing that out :).
        
       | James_Bowers wrote:
       | I definitely like the flexibility to be able to create a custom
       | solution with a yaml file. Nice idea. All the best!
        
       | ssddanbrown wrote:
       | The license chosen [1] (Elastic License 2.0) is one that isn't
       | considered open source by many, due to not being OSD [2]
       | compatible. Were you aware of this before marketing as open
       | source and, out of interest, does the license & usage of "open
       | source" come into conversation when going through the YC process?
       | 
       | [1] https://github.com/grai-io/grai-core/blob/master/LICENSE [2]
       | https://opensource.org/osd/
        
         | ersatz_username wrote:
         | We are pretty open to feedback on licensing and have gone back
         | and forth internally because, frankly, we'd rather use a copy-
         | left license.
         | 
         | We believe a project like this needs financial backing and a
         | dedicated team driving development along but therein lies the
         | tension. The common monetization paths either feature-lock
         | critical self-hosted capabilities like SSO behind a paywall
         | and/or monetize behind a cloud hosted option.
         | 
         | The Elastic license is an attempt to maintain feature parity
         | between the cloud and self-hosted tool while still being
         | protected from something like the big cloud providers ripping
         | the code off altogether.
         | 
         | In all seriousness though, we would love to hear suggestions if
         | you think there's a better path.
        
           | ssddanbrown wrote:
           | I personally don't have anything against the license you've
           | chosen, and I respect your right to protect your efforts
           | against usage you don't desire. I just think it's better to
           | avoid using "open source" if going down the ELv2 path, and
           | using something like "source available" or "fair code"
           | instead to prevent confusion in misrepresenting this as, what
           | is commonly considered, open source.
           | 
           | If you'd like further detail in regards to why I (and others)
           | think this matters, I've previously written my thoughts up
           | here: https://danb.me/blog/posts/why-open-source-term-is-
           | important...
        
             | jagtstronaut wrote:
             | This question is for my education alone, but since you seem
             | quite passionate I am curious.
             | 
             | I just read a super long article about licensing to
             | understand your comment as well as the article you wrote.
             | Under these "source available" licenses, I can still sell
             | the software within some kind of package correct? Like if I
             | create my own PR linter I can use Grai and still sell it? I
             | just can't host grai with some observability and sell it?
             | Or am I misunderstanding?
        
               | ssddanbrown wrote:
               | Just to be clear for my responses, I am not a legal
               | expert in any way.
               | 
               | > Under these "source available" licenses, I can still
               | sell the software within some kind of package correct?
               | Like if I create my own PR linter I can use Grai and
               | still sell it?
               | 
               | "Source available" means the source is accessible.
               | Whether you can sell the software depends on the license.
               | In the case of the Elastic License v2 as used here, I
               | believe you could re-sell the works but you cannot re-
               | license and the original limitations will remain which
               | include providing as a hosted/managed service. There are
               | other limitations too, the limitations around license
               | keys functionality could be a significant hindrance
               | depending on specific use and implementation.
               | 
               | > I just can't host grai with some observability and sell
               | it? Or am I misunderstanding?
               | 
               | That is kind of the most significant limitation, but
               | ultimately you are subject to the detail of all
               | limitations:
               | 
               | ~~
               | 
               | >> You may not provide the software to third parties as a
               | hosted or managed service, where the service provides
               | users with access to any substantial set of the features
               | or functionality of the software.
               | 
               | >> You may not move, change, disable, or circumvent the
               | license key functionality in the software, and you may
               | not remove or obscure any functionality in the software
               | that is protected by the license key.
               | 
               | >> You may not alter, remove, or obscure any licensing,
               | copyright, or other notices of the licensor in the
               | software. Any use of the licensor's trademarks is subject
               | to applicable law.
               | 
               | ~~
               | 
               | Note that there's nothing about selling at all. Also
               | think about how widely that first limitation could cover
               | different types of use-case. And, as touched on above,
               | that second limitation could be used in quite a
               | protective/combative way to make significant parts of the
               | software unusable in re-use.
        
             | ersatz_username wrote:
             | Totally fair and appreciate the (well written) thoughts.
        
           | jedberg wrote:
           | > We believe a project like this needs financial backing and
           | a dedicated team driving development
           | 
           | What benefits do you get from being open source other than
           | the OS stamp of approval?
           | 
           | Perhaps the solution is to just go closed source. I'm all for
           | open source, but I'm not the biggest fan of open core or
           | source available. All it does it hurt the business with
           | little benefit to me. I'd rather you make more money and
           | support me or go full altruistic and make it truly open
           | source.
        
             | ersatz_username wrote:
             | We aren't open source because we want to get anything out
             | of it is the short answer. Of course to each their own but
             | I've personally gotten a ton of value from open core tools
             | in the past.
        
         | satvikpendem wrote:
         | Indeed, it's "source available" at best, not open source as it
         | limits how other parties can use the software, even if the
         | creators don't like their use.
        
           | ersatz_username wrote:
           | Just to be clear, the only limitation imposed by the license
           | is preventing someone from reselling a cloud hosted copy of
           | the tool. The code is otherwise totally free to use fork /
           | modify / etc...
        
             | satvikpendem wrote:
             | That's great, it's not open source though so you shouldn't
             | call it open source. Call it something else.
        
       | whytai wrote:
       | How do you guys do the static analysis on the queries? I notice
       | you support dbt, bigquery etc, but all of our companies pipelines
       | are in airflow. That makes the static analysis difficult because
       | we're dealing with arbitrary python code that programmatically
       | generates queries :).
       | 
       | Any plans to support airflow in the future? Would love to have
       | something like this for our companies 500k+ airflow jobs.
        
       | pjot wrote:
       | I recently demoed an observability platform with another company,
       | and one of my biggest gripes was that we weren't able to
       | "observe" the error before it actually made it to the open.
       | 
       | And that it took 2+ weeks to train their models with the table
       | metadata - so time to value for my team was always "in two
       | weeks".
       | 
       | Glad to see y'all going against that trend!
        
       | kevinmershon wrote:
       | Your intro video started with the assumption, I think, that a
       | team already has some infra relating to this called DBT. It would
       | be nice to have a video for onboarding from scratch assuming
       | there's no data prior effort toward data observability.
        
         | boredemployee wrote:
         | Just a side note, DBT is being required everywhere these days
        
           | kevinmershon wrote:
           | I believe you w.r.t. tech-first companies. I work in a tiny
           | software dept in a small service company and we have no
           | infrastructure like this at all. It would be nice to know how
           | I can go from zero-to-Grai.
        
         | ersatz_username wrote:
         | Pre-built integrations are a big part of what makes onboarding
         | easy but it sort of ends up in a catch-22 situation where
         | whichever integrations gets highlighted is only directly
         | applicable to the people using those tools.
         | 
         | If you have a different toolset onboarding will look exactly
         | the same though, there's nothing truly DBT specific at work
         | here. It's a good idea though! We really should put together a
         | few other combinations so more people can see their own stack
         | represented.
        
       | [deleted]
        
       | mdaniel wrote:
       | Elastic v2 if one is interested in such things:
       | https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE
        
       | BlackjackCF wrote:
       | Website looks great!
       | 
       | Just FYI, I'm getting a "failed to load search index" error in
       | your docs.
       | 
       | Also I saw GitHub Actions called out in the workflow. Do you have
       | GitLab support?
        
         | ersatz_username wrote:
         | Thanks so much! Really appreciate the kind words.
         | 
         | We haven't had anyone request Gitlab yet but would love to add
         | support! Any chance you'd be willing to beta test for us? If
         | so, shoot me an email at ian@grai.io :).
         | 
         | EDIT: It looks like the index issue is related to our search
         | provider. Were you able to eventually load the page or is it
         | fully blocking you?
        
       | swordsmith8 wrote:
       | Thanks for sharing! Seems like this is a dbt-centric lineage tool
       | that surfaces failed tests in the lineage itself?
       | 
       | Unlike a data observability platform like Monte Carlo which
       | proactively monitors data, am I correct in assuming that your
       | solution is less focused on data observability (i.e. monitoring
       | production data and conducting root cause analysis / impact
       | analysis) and more on ensuring reliable CI/CD?
        
       ___________________________________________________________________
       (page generated 2023-07-17 23:00 UTC)