[HN Gopher] Launch HN: PostHog (YC W20) - open-source product an...
       ___________________________________________________________________
        
       Launch HN: PostHog (YC W20) - open-source product analytics
        
       James, Tim and Aaron here - we are building a self-hosted, open
       source Mixpanel/Amplitude style product. The repo is at
       https://github.com/posthog/posthog and our home page is
       https://posthog.com/.  After four years of working together, we
       originally quit our jobs to set up a company focused on tech debt.
       We didn't manage to solve that problem, but we learned how
       important product analytics were in finding users, getting them to
       try it out, and in understanding which features we needed to focus
       on to impact users.  However, when we installed product analytics,
       it bothered us how we needed to send our users' data to 3rd
       parties. Exporting data from these tools costs $manyK a month, and
       it felt wrong from a privacy perspective. We designed PostHog to
       solve these problems.  We made PostHog to automatically capture
       every front-end click, removing the need to add track('event') - it
       has a toolbar to label important events after they're captured.
       That means you're spending less time fixing your tracking. You can
       also push events too.  You can have API/SQL access to the
       underlying data, and it has analytics - funnels and event trends
       with segmentation based on event properties (like UTM tags). That
       means we've got the best parts of the 3rd party analytics providers
       but are more privacy and developer friendly.  We're thinking of
       adding features around paths/retention/pushing events to other
       tools (ie slack/your CRM). We'd love to hear your feature requests.
       We are platform and language agnostic, with a very simple setup. If
       you want Python/Ruby/Node, we give you a library. For anything
       else, there's an API. The repo has instructions for Heroku (1
       click!), Docker or deploy from source.  We've launched this repo
       under MIT license so any developer can use the tool. The goal is to
       not charge individual developers. We make money by charging a
       license fee for things like multiple users, user permissions,
       integrations with other databases, providing a hosted version and
       support.  Give it a spin: https://github.com/posthog/posthog. Let
       us know what you think!
        
       Author : james_impliu
       Score  : 131 points
       Date   : 2020-02-20 17:27 UTC (5 hours ago)
        
       | scapecast wrote:
       | Just yesterday I discussed a blog post that I have on my mind,
       | with somebody from dbt, on the rise of the open source analytics
       | stack.
       | 
       | there's a bunch of great hosted tools out there, across ETL,
       | workflows, dashboards, etc. Think Fivetran, Segment, Matillion,
       | Periscope, etc. And then of course the warehouses like Snowflake,
       | Redshift, etc.
       | 
       | But I think there are three issues with that stack, roughly like
       | this (I've got to do some more thinking, would appreciate your
       | input):
       | 
       | - Privacy: you have your customer data flying around in all these
       | different tools. it's hard to impossible to track your compliance
       | 
       | - cost: all these vendors charge in some way by data volume,
       | MAUs, etc. - you get taxed multiple times for the same data
       | stream. It all adds up.
       | 
       | - control: your data is subject to pre-determined schemas,
       | proprietary formats, black boxes, etc. - mismatch for the same
       | metric across different tools, less flexibility to manipulate
       | your data and pack up and go elsewhere.
       | 
       | I think there's a valid open source alternative for every layer
       | of the stack:
       | 
       | Segment --> Rudder Labs, Snowplow
       | 
       | Matillion --> Airflow, dbt
       | 
       | Fivetran --> Stitch / Singer
       | 
       | Periscope, Looker, Tableau, etc. --> Metabase, Superset
       | 
       | Warehouses --> just yesterday I learned about materialize.io here
       | on HN
       | 
       | And then add open source products like PostHog, that add
       | additional value for very specific use cases (in this case
       | product analytics).
       | 
       | Not arguing the value of the hosted products. They are amazing to
       | use if you just get started. But there's a great open source
       | "stack" available that long term likely will be more transparent,
       | more flexible, and cheaper.
       | 
       | Would love your thoughts!
        
         | curo wrote:
         | For many projects I've often looked for easy solutions that
         | would handle exposing data connectors to the end user of a
         | project. Somewhat like an embedded Zapier with community
         | maintained connectors (e.g., for the end user to grab
         | SalesForce data and sync it with your account).
         | 
         | I suppose Singer might be the closest thing from your list
         | above, but still you'd have to build out a large amount of auth
         | & end user tooling to get it to work.
         | 
         | Every B2B SaaS developer these days has to build in a ton of
         | integrations. Even an affordable hosting service for this would
         | work well (embedded integration where you don't require your
         | customer to sign up or pay for a second product).
        
       | RIMR wrote:
       | Oh man, good luck with social media. The Chapo Trap House
       | community uses "post hog" as a memetic insult.
        
       | buremba wrote:
       | It's cool to see an open-source product analytics tool in YC!
       | 
       | I'm the co-founder of a company that had a similar value
       | proposition back in 2017. We got invited to the interview at the
       | YC office but couldn't convince people because of a number of
       | reasons:
       | 
       | 1. GDPR was not huge back in 2017 so the idea of creating an
       | open-source alternative was not attractive enough.
       | 
       | 2. We were targeting the companies that want to build their own
       | data pipeline on cloud and the cloud providers such as AWS were
       | claiming that their products (specifically Kinesis & Redshift)
       | make it dead-easy to create such a data pipeline. At first, we
       | thought that we were doing something complementary to cloud
       | providers but soon we realized that we were competing with them.
       | Our potential customers were trying to create such a data
       | pipeline in AWS thinking that it will be simple and AWS actually
       | made it easy to start in the beginning. However; data enrichments
       | and the cost optimizations are really tricky when your data grows
       | and our product was optimized for these workloads. AWS doesn't
       | really need partners like us, we're saving costs from the
       | customers but make AWS lose money in the long run because of
       | these cost optimizations. The switching cost becomes more than
       | just increasing the Redshift capacity by 2x as you store all the
       | data in Redshift.
       | 
       | 3. We're not native speakers so we probably couldn't express
       | ourselves back then.
       | 
       | Time flies. We got into 500Startups Batch 21 that year but had to
       | pivot last year since we couldn't make money to create a
       | sustainable business.
       | 
       | Shameless plug: Right now, we provide the same feature-set
       | (segmentation, funnel, retention, and SQL) for different CDPs
       | such as Segment, Snowplow, Firebase, and in-house solutions. You
       | can think of it Amplitude or Mixpanel but on top of your data
       | warehouse. We generate SQL queries and run them on your data-
       | warehouse just like a BI tool.
       | 
       | I would love to collaborate if you're open to partnerships since
       | we're now complementary to each other. :) You can see how the
       | product looks like from here: https://rakam.io/product
        
       | salsakran wrote:
       | This is awesome and long overdue. We (I'm the Metabase founder)
       | have been working on the Business Intelligence side of things,
       | but have always struggled in our own usage with the ubiquity of
       | closed source solutions for event collection (read: Google
       | Analytics) and the relative lack of attention on this problem on
       | the open source side.
        
         | beckingz wrote:
         | Metabase is great. I did a workshop on it last night and people
         | were really impressed that it was free.
         | 
         | They also loved the X-ray feature.
        
         | shafyy wrote:
         | Awesome! We've been using Metabase for a while now, and it's
         | amazing!
        
         | elm_ wrote:
         | I'm actually using this with Metabase and the two compliment
         | each other really nicely!
         | 
         | Also, thanks for making something really cool :)
        
         | orliesaurus wrote:
         | Is metabase similar to statsbot? I just googled metabase, am I
         | looking at the right one?
        
           | ignoramous wrote:
           | That is Sameer Al-Sakran, CEO of https://www.metabase.com.
           | 
           | And here is a recently show-hn'd OSS metabase alternative:
           | https://news.ycombinator.com/item?id=22347516
           | 
           | You might also like reading a recent news.yc discussion on BI
           | tools: https://news.ycombinator.com/item?id=21513566
        
       | elm_ wrote:
       | Congrats on the launch!
       | 
       | I've been using PostHog with my app for about a week now, and so
       | far the results have been good. Pretty straightforward to
       | integrate with a Swift iOS app too!
        
       | samblr wrote:
       | Congrats on launch - a quick demo video would be wonderful to go
       | with this.
        
       | stranger___ wrote:
       | Great!
        
       | dodata wrote:
       | Very cool - congrats on the launch! I like the docker deploy
       | command that you posted on your landing page. Tried that out and
       | it is super easy to get up and running.
       | 
       | Do you have a sample dataset to feed into our local environment
       | or demo environment to test out the UI? Id love to poke around a
       | bit before deploying to Heroku and setting it up on a site.
        
         | timgl wrote:
         | Thanks! There's a hidden 'demo' page [0] that you can click
         | around that will create some events. You can also add that URL
         | when you create an action to test out the editor.
         | 
         | [0] https://127.0.0.1:8000/demo
        
       | Risse wrote:
       | You should mention on the README that the production dockerfile
       | (and posthog/posthog:latest) are busted, they do not create any
       | database. Spent last hour debugging it :) Otherwise, looking
       | really good!
        
         | timgl wrote:
         | Apologies! We've made that clearer now.
        
       | malisper wrote:
       | I'm curious as to how you plan to scale PostHog to larger users.
       | As the person who scaled Heap, here is my honest opinion of this.
       | I think there is going to be a huge challenge ahead in scaling
       | query performance. This was perpetually a challenge at Heap and
       | was for a long time the main limitation on Heap's growth.
       | 
       | The challenge was tough enough for Heap and PostHog is going to
       | be at a huge disadvantage due to the lack of multi-tenancy. When
       | you use Heap, your data is stored across Heap's entire cluster of
       | machines. When you run a query, that query is ran simultaneously
       | against every single machine in Heap's cluster. Even though your
       | data may be taking up something like .1% of the total disk space,
       | when you run a query, 100% of the disk throughput of Heap's
       | cluster will go to processing your query. It's not an
       | overstatement to say this alone results in a >50x improvement to
       | query performance.
       | 
       | I honestly think Heap wouldn't be possible without multi-tenancy.
       | It's hard enough as is to get queries that process multiple
       | terabytes of data to return in seconds when you have a fleet of
       | dozens of i3s available. I'm not sure how you would do that with
       | a fleet a tiny fraction of that size. If you're curious about
       | Heap's infrastructure, Heap's CTO, Dan Robinson, has given a
       | number of talks on how it works[0][1].
       | 
       | That's not to say that PostHog won't work for anyone. I
       | previously tried (and failed) to start a company based on
       | optimizing people's Postgres instances. One of the big takeaways
       | I had was that no matter how you use it, Postgres will work
       | completely fine as long as you have <5GB of data. I think if you
       | have a modest amount of data, something like PostHog would work
       | perfectly fine for you. Since the Postgres optimization business
       | didn't work out, I wound up pivoting to freshpaint.io which
       | eliminates the need to setup event tracking for your analytics
       | and marketing tools by automatically instrumenting your entire
       | site. Since I started working on it, things have been going a lot
       | better.
       | 
       | [0] https://www.youtube.com/watch?v=NVl9_6J1G60
       | 
       | [1] https://www.youtube.com/watch?v=iJLq3GV1Dyk
        
       | veeralpatel979 wrote:
       | Hey - congrats on launching! I'm adding Posthog to my list of
       | codebases to check out.
       | 
       | Can you talk more about your tech stack?
       | 
       | As an aside - it seems like most of the analytics companies I've
       | heard of went through YC: MixPanel, Heap, Amplitude!
        
         | timgl wrote:
         | Thanks so much! It's built on pretty simple/proven
         | technologies: Postgres, Django and React. We've already seen
         | this scale to millions of daily events on basic Heroku dynos.
         | We can swap in other databases if you end up going beyond that.
        
           | veeralpatel979 wrote:
           | Got it! And how do you identify users of your open source
           | version who you can upsell your paid version to, since
           | there's no sign up needed to start using PostHog?
           | 
           | Or is this something you don't do?
        
             | timgl wrote:
             | We thought long and hard about this. We spoke to founders
             | of other big OS projects - we think if developers at big
             | companies want to use it, they'll want to use some of the
             | enterprise features and reach out. We've already seen that
             | start happening :-)
        
               | [deleted]
        
       | dkatri wrote:
       | Congrats on the launch!
       | 
       | Will have to see where I can fit this in to a project.
        
       | ignoramous wrote:
       | Dalton mentions in another comment on this thread that posthog
       | was originally a different idea and so I'm super interested in
       | knowing if you guys really started building this on Jan 23rd [0]
       | and got the backend, frontend, integrations, docs, sdks ready for
       | not a soft-launch but a _Launch HN_ in 4 weeks? That is nuts.
       | Congratulations, either way.
       | 
       | I am glad someone is tackling this problem.
       | 
       | A feature request (or perhaps an architectural direction) would
       | be if you could accommodate the backend behind graphql instead of
       | Django+MySql, there's a potential for it go full Serverless
       | (frontend and backend) with JAM-stack frameworks like redwood.js
       | [1] (backed by apollo-graphql) or using Cloudflare Workers [2].
       | 
       | Edit: Another question I have is, is posthog at 80% feature
       | parity with mixpanel / amplitude / heap already? If not, what do
       | the timelines look like (asking since you're OSS, though, it is
       | understandable if you can't reveal just yet). May be there needs
       | to be a page on competitor-matrix on the website?
       | 
       | [0]
       | https://github.com/PostHog/posthog/commits/master?after=9ae6...
       | 
       | [1] https://redwoodjs.com/
       | 
       | [2] https://blog.cloudflare.com/jamstack-at-the-edge-how-we-
       | buil...
        
         | timgl wrote:
         | Thanks! We really did start committing code on Jan 23rd -- but
         | we had strong views on what a solution should look like as we
         | experienced these problems first hand.
         | 
         | We've already had requests from people to store events into
         | different databases, but I hadn't considered doing it with
         | graphql/JAM. That could be a really nice way of having the
         | storage abstracted from the database.
         | 
         | In terms of feature parity, our goal is basically 100% parity.
         | Anything you can do analytics wise in those tools you should be
         | able to do in PostHog. We're going to try to keep up the same
         | pace we've had for the last 4 weeks.
        
         | james_impliu wrote:
         | - We had a good view of what to do product-wise - the key thing
         | that is different for us is the open source model.
         | 
         | - I'd stress how important it was feeling inspired by the idea.
         | Ian from Mattermost was really helpful, as were Dalton and the
         | YC partners. Enjoying what we were working on probably tripled
         | our speed.
         | 
         | - I'm meh technically, so we focussed on making sure Tim (CTO)
         | could focus on exclusively the development. We split it up
         | pretty clearly to create the right environment. I did the
         | design, product, website (Elementor/WP) and docs, Aaron
         | focussed on getting user feedback.
         | 
         | - We spent $1k on marketing, to speed up user engagement early
         | on, so that helped get some bugs out.
         | 
         | Will do a blog post if there's more interest in the journey.
        
       | samblr wrote:
       | Could you please point where is backend code in your github ?
        
         | timgl wrote:
         | Hi! If you mean the PostHog code itself, most of it is in the
         | PostHog folder.
         | 
         | If you want to send events from your own backend to PostHog,
         | there's instructions for Ruby/Python/Node/API in the docs[0]
         | 
         | [0] https://github.com/PostHog/posthog/wiki
        
           | samblr wrote:
           | Thank you.
           | 
           | actual link :
           | https://github.com/PostHog/posthog/tree/master/posthog
        
       | krmmalik wrote:
       | I would just like to say THANK YOU! I can't believe it has taken
       | this long for someone to solve this problem in this way
        
       | eclipsetheworld wrote:
       | I love the idea of capturing all events and providing the user
       | with an option to label "useful" events. Similarly, I'd like to
       | capture API calls. In a typical modern SPA + REST api setup,
       | calls to the REST api often correspond to events. An analytics
       | integrations that captures all api calls and provides tools to
       | label/transform these api calls as events would prove similarly
       | useful.
        
         | james_impliu wrote:
         | We were debating applying it at a framework or query level but
         | were nervous about this being harder to "clean up" conceptually
         | - what would the equivalent of the "PostHog UX toolbar" need to
         | look like to make that possible do you think?
        
       | Coxa wrote:
       | Your sign-up page seems broken [1] on my Firefox.
       | 
       | [1] https://imgur.com/a/wYxbKj4
        
         | timgl wrote:
         | Fixed :)
        
       | chasers wrote:
       | How do you plan on monetizing?
       | 
       | edit: whoops didn't read.
        
         | neonate wrote:
         | That's up there.
        
       | shafyy wrote:
       | Amazing, going to give it a try. How are you going to monetize?
       | Like Metabase?
       | 
       | Edit: Nevermind, just read your last paragraph :-)
        
       | codegeek wrote:
       | It is encouraging to see YC accepting Open Source products. It is
       | generally difficult to monetize OSS and will be interesting to
       | see how a collaboration with YC helps a startup like this.
        
         | dabeeeenster wrote:
         | We fairly recently released Bullet Train as 100% open source
         | (https://bullet-train.io/). We're beginning to make it work,
         | and have been surprised at the amount of enterprise interest it
         | has generated.
         | 
         | Just because the code is open source doesn't mean you can't
         | make money out of it.
        
         | dalton wrote:
         | Hi, I am the partner at YC who funded PostHog, though
         | originally for a different idea.
         | 
         | I think this can be a _great_ business, we have funded startups
         | following similar models like Gitlab, Mattermost, etc. Excited
         | to keep funding more :)
        
       | marcushyett wrote:
       | Very nice, I'll use this on my next project for sure.
        
       | dimensi0nal wrote:
       | maybe not the best choice of name
       | 
       | https://www.google.com/search?q=post+hog
        
         | cwkoss wrote:
         | Yeah, this name has some unfortunate connotations
        
         | montenegrohugo wrote:
         | Actually I love the name. Seems tongue in cheek, a small
         | criticism to "extreme" data-driven decision making that makes
         | unfounded assumptions and oftentimes mistakes the forest for
         | the trees, or over-optimizes the short term to a detriment of
         | the long term.
        
       | FanaHOVA wrote:
       | Congrats on launch!
       | 
       | What are some of the selling points compared to more mature OS
       | solutions like Matomo? Also, isn't the enterprise version the
       | opposite of your thesis? I.e. "it bothered us how we needed to
       | send our users' data to 3rd parties", but then provide a hosted
       | version which would do the same thing? How do you think about
       | that?
        
         | james_impliu wrote:
         | Whilst Matomo and PostHog both provide analytics, Matomo are
         | more focussed on session-tracking, rather than user/event-
         | tracking. That means they're better at things like analytics
         | for traffic sources.
         | 
         | The things you can do with PostHog that you can't easily do
         | with Matomo, are things like pulling up identifiable user event
         | histories, or plotting trends in product usage over time.
         | 
         | The enterprise version is just a private repo we'd give you
         | access to that's still self hosted. We can also provide hosted
         | deployments of any version, but that's really just for people
         | that can't set it up themselves... hosting it isn't our core
         | focus.
        
       | pedalpete wrote:
       | It would be interesting to see quick deploy to firebase, aws
       | (lambda??) or other services.
       | 
       | Any idea what a moderate size website (10k users per day,500k
       | events) would cost to run on Heroku?
        
       | dizzydiz wrote:
       | Looks cool :)
       | 
       | Curious as to how deep you plan to go on the peripherals to
       | product analytics - attaching additional attributes to users to
       | group them (eg. Subscription level), getting a view into
       | attribution channels for marketing strategy etc.
        
         | james_impliu wrote:
         | We have got the ability to do grouping in the backend at the
         | moment, but the UI isn't quite there yet - we definitely want
         | "team" level analytics as a good starting point as we've
         | already had this question several times. We know that's
         | important for B2B SAAS, a world we have come from before.
         | 
         | We don't aim to go "data science" deep with analytics, as we
         | suspect you'd rather just integrate Metabase/Tableau/etc. We
         | can see some cool ways to use it for attribution though - as
         | you can host it we don't need to charge you enormous fees if
         | your MTUs are very big... we see lots of B2C companies using
         | product analytics on the product, but not the website, and
         | struggling with tracking say UTM tags the whole way through.
         | 
         | There are two "out there" areas that we're really interested in
         | right now...
         | 
         | 1) We're thinking of focussing more on precisely what a
         | developer (not product, not marketing) needs, as we think there
         | is an underserved and enormous group here. Imagine when you're
         | building something being able to run a command in your CLI,
         | then being able to open a browser with a good understanding of
         | which pages/features are being used as you work. The point
         | being - give developers user data so they know how to build for
         | impact.
         | 
         | 2) We also want to explore integrations with other platforms to
         | push stuff to them. I can't stop refreshing our own product, so
         | I think pushing an Action to Slack, for example, would be
         | helpful and would get it into everyday workflows a bit more
         | easily. We don't want to do too much here and kind of hope the
         | community spot these kinds of things and run with them :)
         | 
         | What's your reaction to the above? I'd love to know if you had
         | a specific pain point in mind
        
       | rupertdev wrote:
       | Looks pretty sweet. I like the UI for adding event captures.
        
       ___________________________________________________________________
       (page generated 2020-02-20 23:00 UTC)