hngopher.com

       [HN Gopher] Launch HN: Hightouch (YC S19) - Sync data from data ...
       ___________________________________________________________________
        
       Launch HN: Hightouch (YC S19) - Sync data from data warehouses to
       SaaS tools
        
       Hey HN! Kashish, Tejas, and Josh here. We're building Hightouch
       (https://www.hightouch.io/), a reverse ETL platform-- that is,
       software that gets your data back out of your data warehouse and
       into the SaaS tools that people at your company are familiar with
       (like Salesforce). We enable you to Bring Your Own (BYO) database
       so that all your SaaS tools run off of the same dataset. You
       specify what data you want and where, and we take care of the rest.
       We were exposed to the data integration space as early engineers at
       Segment. Segment and other CDPs (Customer Data Platforms) were
       built on an older model that hits a wall once you reach a certain
       level of complexity. You don't have access to your own data, there
       isn't a great way to express business logic, and you don't have
       flexibility to transform data to your needs.  Cloud-based data
       warehouses like Snowflake and tools like dbt solved part of this
       problem. Where Segment/CDPs require you to store data in their
       format, warehouses let you store your data in any format. They let
       you store it in your own cloud for privacy and security. And where
       Segment/CDPs only have event data, warehouses have all your data--
       things like a full replica of Salesforce data and a full replica of
       Postgres data.  The problem is, all this data tends to get stuck in
       the warehouse and only get used for reports and dashboards. In our
       experience, business teams don't want another BI dashboard. They
       want their data in their primary tools--the SaaS applications where
       they spend their days--so they can use it to actually operate their
       business.  Because of this mismatch, a lot of engineers are doing
       busywork writing scripts to get data from warehouses into CRMs like
       Salesforce, Hubspot, Customer.io, and so on. Such scripts are
       brittle--they need changing when users request more columns, when
       an API changes, etc. And that's only if the business people are
       lucky enough to get engineers' time in the first place. There are
       also a lot of teams downloading CSVs and manually uploading them to
       various platforms because they can't get their work in front of
       engineers who are busy with a hundred other priorities.  We decided
       to build something that would appeal to both sides: business people
       who know what they want from their data and just need access to it,
       and data engineers who want to help but can't build and maintain
       every integration as the marketing team buys an endless number of
       SaaS tools. That's how we came up with Hightouch.  Hightouch is a
       platform that makes it easy to take models and views from your
       warehouse and sync them into your SaaS apps, using only SQL to
       express your logic. Mapping between columns and application fields
       is done through a declarative UI. You write a SQL query to pull the
       data you need, map columns from that query to fields in your SaaS
       tool, and set how often you want data to sync. We handle the rest.
       See a demo here: https://www.youtube.com/watch?v=kDhHWG9hwj0.  No
       more hard-coding database columns to a Salesforce field in Python
       or Javascript, only to have a sales team ask for a 'quick change'.
       We handle all the annoying complexities of moving data around:
       type-casting, error handling, authentication, retries, debugging,
       observability, notifications/alerts, and changing APIs--freeing up
       your engineering time to work on problems specific to your business
       rather than syncing data into a CRM.  We've also built integrations
       we think data engineers will love. We integrate directly with dbt
       and dbt Cloud, we offer git sync for version control of your models
       and syncs, we have an Airflow operator, as well as a public API,
       and we'd love your ideas on what you think is missing.  Hightouch
       doesn't store anything. We connect directly to your existing
       warehouse/database and SaaS tools. As data changes in the
       warehouse, it changes in the SaaS tool. You get full control and
       your data is always owned by you.  Our customers use Hightouch to
       do things like: sending a feed of new leads and customers to Slack;
       syncing product usage data into CRMs like Hubspot and Salesforce;
       and syncing user cohorts into marketing systems, such as all users
       who abandoned their shopping cart, or users with a high "churn
       risk" score.  We've grown from 4 people to almost 30 now, and work
       with amazing customers like CircleCI, Plaid, Retool, Ramp, Lucid
       Chart, Nando's, Grafana, Kong, Autotrader, Blend, and Imperfect
       Foods. We're also hiring--we have over 15 positions open at
       https://hightouch.io/careers/, and we would love to meet you and
       have you join the team.  We'd love to hear your thoughts, feedback
       and experiences on data warehouses, building integrations, ETL,
       workflow orchestration, and anything data related!
        
       Author : kashishg
       Score  : 89 points
       Date   : 2021-11-11 15:02 UTC (7 hours ago)
        
       | wlrd wrote:
       | what about the latency of data warehouses? how do you get around
       | that?
        
         | tejasmanohar wrote:
         | Good callout. Sometimes, I joke that warehouse ingestion
         | latency is the bane of my existence, but it's improving...
         | 
         | Our average customer runs Hightouch syncs roughly every hour,
         | but we can actually run syncs up to every minute! HT has a lot
         | of optimizations like only sending changes to destinations
         | instead of all data every run.
         | 
         | On the warehouse side, we're seeing a lot of improvements.
         | BigQuery has streaming insert APIs [0] implemented with a
         | parallel database on the backend that's joined at read time.
         | Combined with timestamp partitioned tables (sortable) and our
         | in-warehouse diff'ing, you can actually create a streaming
         | pipeline in Hightouch. Some companies like JetBlue are doing
         | cool stuff with lambda views on top of Snowflake [1]. Our power
         | users at Hightouch are running syncs as fast as every minute.
         | 
         | For wider context, we find 90%+ of business use cases to be
         | just fine in batch. It's amazing to see how many people are
         | still replacing... manual CSV workflows... with Hightouch :)
         | 
         | That said, there are some use cases for truly real-time
         | workflows (e.g. a post-checkout email), and for that, customers
         | either implement outside of Hightouch or lately, we've been
         | fiddling around with letting customers plug directly into
         | streams like Kafka, Kinesis, PubSub - though they lose the
         | power of SQL aggregations _for now_.
         | 
         | Streaming SQL databases like Materialize [2] will fix this
         | fundamentally, and Hightouch can connect to them. Email
         | hello@hightouch.io if you want to try any of the new stuff!
         | 
         | [0]: https://cloud.google.com/bigquery/docs/write-api [1]:
         | https://discourse.getdbt.com/t/how-to-create-near-real-time-...
         | [2]: https://materialize.com/
        
       | winterplace wrote:
       | Are posts by YC companies handled by an API that submits at
       | around 7am PST?
        
       | aryik wrote:
       | Congrats Kash, Tejas, Josh, and your whole team! You guys are
       | killing it.
       | 
       | You've made incredible progress over the last year. Your customer
       | list is looking very strong, and it seems like you've honed in on
       | a real and pressing problem.
       | 
       | Keep up the great work, but try to take a moment to celebrate how
       | far you've come!
        
         | kashishg wrote:
         | Thank you for the kind words :) We are just getting started!
        
       | bleonard wrote:
       | Congrats on the launch! Hightouch looks great and this need is
       | real. Things seem to be going well, so I don't think I'm taking
       | too much away by mentioning that we have been been working on
       | Grouparoo, an open source alternative that solves similar pain
       | points.
       | 
       | A few differences: git developer workflow focused (branches, CI,
       | PRs, etc), ability to self host, segmentation in destinations
       | (tagging people in mailchimp based on rules, for example)
       | 
       | https://www.grouparoo.com
        
         | gurubavan wrote:
         | Hightouch user here. HT actually has a lot of that - git
         | integration [0], visual segmentation [1]. Not sure about self-
         | hosting though. Open-source is cool, will check it out.
         | 
         | [0]: https://hightouch.io/docs/integrations/git-sync/
         | 
         | [1]: https://hightouch.io/docs/hightouch-audiences/overview/
        
           | bleonard wrote:
           | There are probably some nuances one level down. Things our
           | users have told us they can do in these areas that, to my
           | knowledge, Hightouch doesn't do:
           | 
           | * Combine data from different sources to define a model. We'v
           | seen using Postgres as a source of truth and supplementing
           | with Snowflake data, for example.
           | 
           | * Add tags to contacts in mailchimp, zendesk or make lists of
           | them in customer.io, Pardot, etc based on segmentation. I
           | believe Hightouch Audiences is more like a filter.
           | 
           | * Full workflow with branches, PRs, test suite in a repo. I
           | saw Hightouch added git syncing to a known branch yesterday
           | and it looks cool, but it's not the full workflow yet.
           | 
           | I'm certainly trying to keep it in the friendly-competition
           | area, especially on this thread :-)
        
             | tejasmanohar wrote:
             | This probably isn't the best place for an extended
             | comparison, but since it's our launch post, I'll try to
             | close the thread with a couple corrections for factuality.
             | If anyone is interested in a deep-dive, email
             | hello@hightouch.io, and I'm happy to set one up personally.
             | And, I'm sure the team at Grouparoo would be willing to do
             | the same ("contact us" at bottom of their website).
             | * Add tags to contacts in mailchimp, zendesk or make lists
             | of them in customer.io, Pardot, etc based on segmentation.
             | I believe Hightouch Audiences is more like a filter.
             | 
             | With static mappings, audiences can be synced to
             | destinations as tags :). The magic is in the abstractions,
             | not features!                   * Full workflow with
             | branches, PRs, test suite in a repo. I saw Hightouch added
             | git syncing to a known branch yesterday and it looks cool,
             | but it's not the full workflow yet.
             | 
             | Lots more coming soon here. Our git integration is
             | bidirectional so you can totally do that stuff in git, but
             | UI support is on the way. We've found the UI experience is
             | a lot better of an experience than code for _most_ Reverse
             | ETL workflows... so I see the value in this - I'lll check
             | it out
             | 
             | If I have to be honest, the biggest thing that customers
             | love about our product is that it works and accomplishes
             | their use cases. Platform features are cool, but from time
             | to time, I have to remind myself that Fivetran has proven
             | that integrations and actually working comes first, and it
             | is volume but not _just_ volume... our philosophy
             | (destinations as a product), design, and progress there is
             | quite differentiated from the space. You can read more in
             | our Series A announcement from a few months ago at
             | https://hightouch.io/blog/series-a
             | 
             | PS: I haven't tried Grouparoo in a while. I do love the
             | concepts, will give it a swing!
        
               | bleonard wrote:
               | It's hard to leave the comparisons dangling, for sure.
               | But I'll defer for now. Congrats on the launch :-)
        
           | tejasmanohar wrote:
           | Haha thanks. Love some friendly competition :). In all
           | seriousness, though we're focusing elsewhere, the OSS angle
           | is cool.
           | 
           | If you're interested in self-hosted though, just reach out at
           | hello@hightouch.io.
           | 
           | That said, IMO one of the coolest parts of our tech is our
           | "hybrid architecture". Out of the box, no data is stored in
           | Hightouch - it's all in your cloud (warehouse, s3 bucket).
           | This is how fintech (Plaid, Blend, Betterment, + some banks
           | now!) and healthcare brands like Headway use us. We've also
           | done a ton of compliance work and have certificates for SOC2
           | Type II and whanot.
        
         | soumyadeb wrote:
         | Congrats Tejas and team on the launch. Great to see your
         | progress and broad innovation in this space
         | (Census/Hightouch/Grouparoo/us@RudderStack).
         | 
         | It's a huge market and we can all help push each other.
        
       | mrwnmonm wrote:
       | If you don't mind me asking, how much time did it take to reach
       | this maturity or quality as a product? Did you start in 2019 or
       | before that?
       | 
       | Kiss your designer for me.
        
         | kashishg wrote:
         | we started building this one August 2020 but honestly just had
         | a lot of fun working on the design and UX! Conveying your
         | feedback to him now!
        
           | [deleted]
        
           | tomnipotent wrote:
           | One small suggestion when connecting fields, auto-select a
           | best guess column. A good deal of the time it will match
           | (email=>email, first_name=>firstName), and it will cut in
           | half the time to configure that part of the sync. Or an
           | option to toggle this on/off.
        
             | joshwget wrote:
             | This is on our roadmap! Slotted for release sometime in the
             | next month.
        
         | [deleted]
        
       | supsupsup wrote:
       | Interesting. How does Reverse ETL differ from ETL/ELT tools like
       | Fivetran?
        
         | kashishg wrote:
         | On a high level, ETL/ELT is about sending data from your SaaS
         | tools into your data warehouse (you are reading from different
         | tools). Reverse ETL is about getting data from your warehouse
         | into tools (writing into different tools). Building ELT is a
         | fundamentally different technical challenge than building
         | Reverse ETL. Aspects like types, rate limits, and destination
         | state (knowing whether data already exists in a destination)
         | are unique to Reverse ETL. Visibility becomes challenging too
         | as some destinations have unique quirks, like API contracts
         | where you write to them but you don't know if the write was
         | successful or completed until later. Writing to tools also
         | requires references between objects (foreign keys onto existing
         | data, like mapping Companies and Opportunities in Salesforce)
         | that aren't necessary in the ELT world.
         | 
         | From a product perspective, the UX is very different as well.
         | Reverse ETL requires a lot more user input (ex: mapping which
         | fields to update in a tool), whereas ELT typically mirrors data
         | using a standard schema (without much user customization
         | involved).
         | 
         | We are close partners with ELT tools like Fivetran, and you can
         | see our partnership post here:
         | https://fivetran.com/blog/fivetran-partners-with-hightouch-t...
        
       | tomnipotent wrote:
       | Really, really love that it's just SQL. How is data mapped to the
       | target API from the SQL projection? Are the columns themselves
       | the actual API contract?
        
         | tejasmanohar wrote:
         | Cofounder here! Not quite. In Hightouch, you define your model
         | (SQL) and create a sync (point-and-click or JSON/YAML).
         | 
         | The syncs are declarative, not imperative. They don't map 1:1
         | to API calls by design. You tell us what you want the
         | destination to look like, and we figure out how :). Kinda like
         | how a database creates the best plan for your SQL query before
         | executing it.
         | 
         | Here's an example - https://i.imgur.com/05T5iKK.png. This sync
         | maps your users table to Salesforce "Contacts" and the mapping
         | interface also encodes the foreign key relationship between
         | Contact:Account in Salesforce. Under the hood, we do all the
         | lookups, caching, batch API calls using the bulk API,
         | automatically handle rate limits, and only send changes from
         | your database.
         | 
         | This is one of our key design differences compared to iPaaS
         | tools like Tray, Zapier, Workato, Mulesoft, etc., which tend to
         | just map actions to API calls 1:1. Data integration being
         | declarative is something I'm really passionate about
         | personally... wrote a blog with more examples at
         | https://hightouch.io/blog/the-future-of-data-integration-wha...
        
           | matchagaucho wrote:
           | What if the state of IsHireable changes in the system of
           | record (SFDC)? Will Hightouch overwrite OLTP data with stale
           | warehouse data?
        
             | joshwget wrote:
             | Short answer is yes, but here's why:
             | 
             | By syncing data to a particular field in Salesforce, you're
             | effectively saying that the source of truth for that field
             | is the warehouse, and not Salesforce. If you expect a human
             | to update a field, then Salesforce is the source of truth
             | for that field, and Hightouch shouldn't write to it!
             | 
             | What we typically see is that Salesforce contains data
             | that's expected to be updated and maintained in the tool,
             | and then other "read-only" fields coming from Hightouch.
        
           | tomnipotent wrote:
           | > You tell us what you want the destination to look like
           | 
           | Implicit mapping between SQL to target is great, but how does
           | the SQL author know what SQL to write in the first place?
           | 
           | I've done no shortage of integrations like this, and there is
           | no avoiding reading the target SaaS documentation to know
           | what their schema looks like so I can shape data accordingly.
           | Without that step, I can't even start writing SQL.
        
             | tejasmanohar wrote:
             | Not implicit but - declarative! Our goal is to provide
             | enough context in our docs, app (e.g. autocomplete,
             | automatic schema discovery, etc.), and resources to guide
             | users through this and then recipes on top for common
             | workflows!
             | 
             | We do a lot of validation upfront (at both the schema &
             | data layer), and I think it's still early days there...
             | this is a big opportunity IMO. Great callout.
             | 
             | We find people start with a simple SQL model + sync and
             | then bounce back and forth and edit their queries as they
             | explore our columns.
        
       | theboat wrote:
       | I've been a hightouch customer for nearly a year now, and I have
       | to say the team and product are both great.
       | 
       | That being said, isn't it a bit late for a Launch HN post? :P
        
         | kashishg wrote:
         | Thanks so much for your support from the very beginning! We've
         | only been in market publicly for a little over a year now
         | actually. We figured better late than never (and it still feels
         | early for us!) :)
        
         | dang wrote:
         | Well...
         | 
         |  _Launch HN: Rainforest QA (YC S12) - No-Code UI Test
         | Automation_ - https://news.ycombinator.com/item?id=28947689 -
         | Oct 2021 (88 comments)
         | 
         |  _Launch HN: RescueTime (YC W08) - Redesigned for wellness,
         | balance, remote work_ -
         | https://news.ycombinator.com/item?id=28683597 - Sept 2021 (141
         | comments)
         | 
         | The basic rule is that each YC startup gets one. We've made a
         | couple exceptions in cases of complete reinventions.
        
       | PhoenixReborn wrote:
       | I am probably missing something here, but how is Hightouch
       | differentiated from other similar tools like Census
       | (https://www.getcensus.com/)?
        
         | [deleted]
        
         | kashishg wrote:
         | Great question! We have a whole page about this here
         | (https://hightouch.io/blog/hightouch-vs-census/).
         | 
         | But the TLDR is that Hightouch has more developer focused
         | features (like a live debugger, alerting, version control with
         | Git, and more here: https://hightouch.io/data-features/), a
         | dedicated UI for business users to visually filter models
         | (called Hightouch Audiences), more transparent pricing, as well
         | as more integrations (70+) that are also deeper and customized
         | for each tool.
        
           | PhoenixReborn wrote:
           | Very cool and thanks for the response! I will definitely give
           | Hightouch a try.
        
       | deepaksshah wrote:
       | Congratulations on the launch.
       | 
       | We were looking for this very solution. Writing queries and
       | mapping them downstream is pretty handy.
       | 
       | I also noticed that you support Rudderstack(yet another great
       | tool), and we can send events via their http connector.
       | 
       | Looking forward to using this tool.
       | 
       | Do you plan on adding Clickhouse as source anytime soon?
        
         | killerpixler wrote:
         | Interesting tool indeed. You made a good point about
         | RudderStack and synergies there. I'm curious to see how
         | hightouch is going to diffentiate from something like
         | RudderStack, which has a boatload of reverse ETL functionality
         | of its own. I mean event stream data and moving data from your
         | warehouse back out to tools is pretty their mantra
        
         | kashishg wrote:
         | Yes, Clickhouse is a top priority source for us to build next
         | to enable real-time analytics (we already support Rockset as a
         | source used by customers like Seesaw). Would love to learn more
         | about your use case for Clickhouse: feel free to reach out to
         | hello@hightouch.io!
        
       | pantulis wrote:
       | " They want their data in their primary tools--the SaaS
       | applications where they spend their days--so they can use it to
       | actually operate their business."
       | 
       | I'm obviously missing something here, but thinking in terms of
       | "operationalizing" data that comes from some kind of analytical
       | environment, was not the data in their operational SaaS tools in
       | the first place?
        
         | pinkbeanz wrote:
         | It starts there, but then once you get into complex workflows
         | that merge data across your product and CRMs it all moves to
         | the warehouse first. Typical flow is a Fivetran or Stitch into
         | the warehouse, lots of dbt models, then business models fit for
         | consumption down stream.
         | 
         | Once in the warehouse, it needs to get back into those
         | operational systems again, which is the tricky part.
         | 
         | I've done these one off integrations from the warehouse into
         | Salesforce (creating leads, converting them, moving stages all
         | based off product usage), and into marketing tools (customer
         | segmentation built using SQL in the warehouse, then sent to
         | marketing automation tools).
         | 
         | Being able to feed tools directly off the warehouse instead of
         | writing one off integrations is the real value.
        
         | gurubavan wrote:
         | The Salesforce data is in Salesforce, and the HubSpot data is
         | in HubSpot, and the Mixpanel data is in Mixpanel, but those
         | applications don't have each others data (not to mention
         | missing any transformations on top). E.g. As a sales rep, you
         | can benefit from understanding product usage and marketing
         | activity for a contact in salesforce
        
       ___________________________________________________________________
       (page generated 2021-11-11 23:01 UTC)