[HN Gopher] Launch HN: Evidence (YC S21) - Web framework for dat...
       ___________________________________________________________________
        
       Launch HN: Evidence (YC S21) - Web framework for data analysts
        
       Hi HN! We're Adam and Sean from Evidence (https://evidence.dev).
       We're building a static site generator for data analysts. It's like
       Jekyll or Hugo for SQL analysts.  In Evidence, pages are markdown
       documents. When you write SQL inside that markdown, the SQL runs
       against your database (we support BigQuery, Snowflake, and Postgres
       - with more to come). You can reference the results of those
       queries using a simple templating syntax, which you can use to
       inline query results into text or to generate report sections from
       a query. Evidence also includes a component library that lets you
       do things like add charts and graphs (driven by your queries) by
       writing declarative tags like: <LineChart />  How is it different?
       Most BI tools use a no-code drag-and-drop interface. Analysts click
       around to build their queries, set up their charts etc. and then
       they drag them into place onto a dashboard. To stick with the
       analogy, if Evidence is Hugo, most BI tools are Squarespace. BI
       tools are built that way because they assume that data analysts are
       non-technical. In our experience, that assumption is no longer
       correct. Data analysts increasingly want tools that let them adopt
       software engineering practices like version control, testing, and
       abstraction.  When everything is under version control, you are
       less likely to ship an incorrect report. When you can write a for
       loop, you can show sections for each region, product-line etc.,
       instead of asking your users to engage with a filter interface.
       When you can abstract a piece of analysis into a reusable
       component, you don't have to maintain the same content in multiple
       places. Basically, we're providing the fundamentals of programming
       in a way that analysts can easily make use of.  Reporting tools
       have been around since COBOL, and have gone through many iterations
       as tech and markets have evolved. Our view is that it's time for
       the next major iteration. We worked together for five years
       building the data science group at a private equity firm in Canada.
       We set up 'the modern data stack' (Fivetran, dbt, BigQuery etc.) at
       many of the firm's portfolio companies and we were in the room
       during a lot of key corporate decisions.  In our experience, the BI
       layer is the weakest part of the modern data stack. The BI layer
       has a poor developer experience, and decision makers don't really
       like the outputs they get. It turns out, these two issues are
       closely related. The drag and drop experience is so slow and low-
       leverage that the only way to get all the content on the page is to
       push a lot of cognitive load onto the end user: global filters,
       drill down modals, grids of charts without context. Like most
       users, business people hate that shit. And because the production
       process isn't in code, the outputs are hard to version control and
       test--so dashboards break, results are internally inconsistent, and
       so on, in just the way that software would suck if you didn't
       version control and test it.  As early adopters of the modern data
       stack, we saw the value in treating analytics more like software
       development, but we were consistently disappointed with the
       workflow and the quality of the outputs our team could deliver
       using BI tools and notebook products. Graphics teams that we admire
       at newspapers like the New York Times don't use BI tools or Jupyter
       notebooks to present their work. They code their data products by
       hand, and the results are dramatically better than what you see in
       a typical BI deployment. That's too much of an engineering lift for
       most data teams, but with a framework designed for their needs and
       their range of expertise, we think data teams could build products
       that come much closer to those high standards.  Evidence is built
       on Svelte and Svelte Kit. This is the JS framework that the NYT has
       used to build some of their more recent data products, like their
       Covid risk maps. Sean and I fell in love with Svelte, and we owe a
       huge debt to that project. In this early stage,Evidence is really
       just a set of convenience features wrapped around SvelteKit to make
       it accessible to data analysts (the markdown preprocessor, db
       connections, chart library). The core framework will always be open
       source, and eventually we plan to launch a paid cloud version of
       our product, including hosting, granular access control, and other
       features that enterprises might pay for.  We would love to hear
       your thoughts, questions, concerns, or ideas about what we're
       building - or about your experiences with business intelligence in
       general. We appreciate all feedback and suggestions!
        
       Author : amcaskill
       Score  : 116 points
       Date   : 2021-08-25 18:03 UTC (4 hours ago)
        
       | edusoftwerks wrote:
       | Are you hiring?
        
         | amcaskill wrote:
         | Not quite yet, but I'm happy to chat: adam at evidence.dev
        
       | mrosett wrote:
       | This is a great idea. My current role doesn't require any kind of
       | BI reporting but the next time I need to build a dashboard this
       | will be the first thing I try.
        
       | jart wrote:
       | Would you be interested in using https://redbean.dev to enhance
       | your product? It's really nice for building web apps that handle
       | sensitive data since you can put redbean with your CSVs or SQLite
       | database on something like an Iron Key and anyone you share it
       | with will be able to use your app and view the reports in a
       | purely offline manner on any o/s.
       | 
       | If your app is built on Node it's got an unwieldy amount of
       | dependencies which frequently have security issues and something
       | like Postgres is usually only viable as an online service you're
       | self-hosting, and those things get hacked. So redbean is really a
       | no-brainer if you want to protect data without making life
       | difficult for the people who are authorized to look at it. We're
       | also looking at integrating QuickJS soon, as an alternative to
       | Lua, so there should be a painless migration path for Node folks.
        
       | kvothe_ wrote:
       | neat
        
       | jrumbut wrote:
       | This is very cool and something that appeals to me as someone
       | does a blend of web and data work. A constant problem I run into
       | is making good reports quickly. Like you mention, I don't have
       | time to hand code it, but no code dashboards are both slow and
       | tedious to make and the quality is terrible.
       | 
       | I have a couple questions.
       | 
       | 1. I work in research and we use a lot of strange databases and
       | query languages, how hard would it be to add support for new
       | databases (or alternative sources like CSVs or API calls) and to
       | include multiple sources in the same report?
       | 
       | 2. I had trouble telling from the docs how hard it was to drop in
       | hand coded components (say I have some D3 creation, or I have
       | some requirement that breaks the model and requires JavaScript
       | and CSS to change everything)?
        
         | amcaskill wrote:
         | Thanks for the kind words!
         | 
         | 1. Yes, if you have specific DBs, please feel free to create an
         | issue on Github. We're also working on opening up the DB
         | connector ecosystem so that people can add their own. I've also
         | opened an issue for CSVs, I think we could support them pretty
         | seamlessly.
         | 
         | 2. Evidence is actually pretty slick in this regard. This is
         | one of the benefits from starting from a web framework and
         | working backwards towards the data analyst, rather than
         | starting from Jupyter and trying to work forwards (if that
         | makes sense).
         | 
         | The markdown documents compile to svelte components, so you can
         | just add <script> tag, and/or a <style> tag right into your .md
         | file. d3 works pretty seamlessly in svelte, so you can go nuts.
         | The other neat thing is the styles scope themselves to the
         | 
         | We haven't written the docs for this portion yet, but you can
         | also add global svelte components to your project, so if you
         | wanted to write something re-usable you can write it that way,
         | and then just import it into your reports to use it as a
         | component. In either case, you could call out to other APIs if
         | you didn't want to retrieve data via a SQL query.
         | 
         | As an example, an add-on component library we'd like to build
         | is an interface to FRED data from the St Louis Fed. That way if
         | you just need a quick chart of GDP, or you want to add
         | recession shading to one of your charts, you can just drop it
         | in without having to load that data into your database.
         | <FredTimeSeries ref=gdp/> that type of thing.
        
       | Shicholas wrote:
       | this is very awesome, I'll find a way to use it soon. Nice work!
        
       | jahewson wrote:
       | As an engineer I quite like the look of this. Right now for this
       | kind of internal report I'd use a hosted Jupyter notebook, e.g.
       | Mode. Both data science and engineering folks can handle it.
       | What's the one-sentence selling point for Evidence in my use
       | case?
        
       | dg4 wrote:
       | Looks great! How are you thinking about the review workflow. I've
       | noticed that BI artifacts / dashboards rarely get a detailed
       | logical review. This living in Github seems like a step in the
       | right direction, but it'd be great to have SQL execution in the
       | review context. i.e. I think you're accidentally filtering out
       | these rows here they are SELECT ...
        
         | amcaskill wrote:
         | Thanks!
         | 
         | That's a super promising line of thinking.
         | 
         | We really like how Vercel works with pull requests --
         | generating a preview, blocking the pr if there is a failure in
         | the build process etc. and that's definitely where we'd like to
         | go with the cloud service. We hadn't thought of providing
         | executed SQL back into the review context but of course that
         | would be do-able and very useful.
         | 
         | There is a whole host of tooling that you can build around the
         | artifacts when you move them into code. One example an early
         | user suggested was scanning your entire project to find
         | repeated blocks of SQL, and surfacing them to be re-factored
         | into your data warehouse (into your dbt project for example).
         | You could imagine a github action that periodically opens a PR
         | with those suggested re-factors.
        
       | joefigura wrote:
       | Very cool! I struggled with this exact problem at a previous role
       | to generate reports from the results of a simulation codebase.
       | Ended up learning and using Django, which always felt overpowered
       | for a simple static report. Excited to see how the product
       | develops!
        
         | amcaskill wrote:
         | Thank you so much!
        
       | [deleted]
        
       | nonameiguess wrote:
       | Have you heard of knitr (https://yihui.org/knitr/)? It's the gold
       | standard as far as I'm concerned for dynamic report generation
       | that needs to run code. Since it supports running arbitrary shell
       | commands, it can already be used to query remote databases as
       | long as you have a CLI to query them with. Combined with
       | RMarkdown (https://rmarkdown.rstudio.com/), which augments
       | Markdown with support for LaTeX typesetting, it's the ultimate
       | toolset for doing this kind of thing. You can read a blog post
       | here on how to use knitr within RMarkdown:
       | https://kbroman.org/knitr_knutshell/pages/Rmarkdown.html
       | 
       | I'm not trying to be a downer, but it seems like your product is
       | just duplicating the functionality of these existing products but
       | does less since it only supports SQL and Markdown.
       | 
       | I guess you autogenerate charts, but it says you're targeting a
       | technical audience that is presumably comfortable calling
       | functions in Python and R for graphical data visualization.
       | 
       | This is nitpicky, and I'm sure you have some command line option
       | to choose another port (though your "get started" doesn't show
       | how), but mdbook also uses 3000. I'm sure they probably weren't
       | the first to default to that, either.
       | 
       | I hope this doesn't come across as downplaying your product. It
       | looks nice. I just don't see what you offer here that can't
       | already be done with existing data ecosystem tools. I was using
       | RMarkdown with knitr to generate all of my papers when I was an
       | ML grad student years ago. It felt back then like I was the only
       | person at Georgia Tech who realized these tools existed, and now
       | it still feels that way.
        
         | edusoftwerks wrote:
         | The problem with these tools is they only work sometimes, and
         | when they do, its because you spent a whole day configuring
         | your environment.
        
         | lytefm wrote:
         | I guess you could make the same argument pointing to Jupyter
         | Notebook or D3.js - if you already know how to use a
         | programming language like R, Python or JS and how to visualise
         | data with it, you're probably not the target audience as end
         | user.
         | 
         | This looks more like it's made for an Analyst who mainly uses
         | SQL or Excel.
         | 
         | But if it makes me more productive that Jupyter Notebooks for
         | simple reports, I'll give it a try.
        
         | amcaskill wrote:
         | Absolutely no need to apologize, thanks for taking the time to
         | check out our project.
         | 
         | I have written a lot of R Markdown over the years, and I agree
         | wholeheartedly with most of what you're saying. The R ecosystem
         | is phenomenal. Anyone who is excited about our project, might
         | be 10x more excited about learning R and writing a report with
         | R markdown.
         | 
         | A big part of why we are building Evidence is that my co-
         | founder Sean and I felt like we lost a lot on the presentation
         | side when we graduated from notebooks to primarily working with
         | data warehouses, dbt & BI tools.
         | 
         | The thing is, we gained so much from that transition to 'the
         | modern data stack' that we would never go back. So we're
         | setting out to fix the presentation layer in a way that would
         | have worked for us.
         | 
         | Undoubtably, anything that you could accomplish in Evidence is
         | going to be do-able within the R Markdown or jupyter
         | ecosystems, so I won't try to claim any truly unique features.
         | It's maybe more of a vibe: what's easy in Evidence vs. what's
         | tricky in a notebook?
         | 
         | If you're writing an ML paper, R markdown is definitely the
         | move. If you're trying to build a common, internally consistent
         | understanding across hundreds (thousands) of people about how
         | your business is doing, and what they might do about it,
         | Evidence is going to be a better fit.
         | 
         | Here's a comment from awhile ago discussing the comparison with
         | Jupyter: https://news.ycombinator.com/item?id=27363349
         | 
         | It only supports SQL and Markdown:
         | 
         | That constraint is part of the point.
         | 
         | In a large organization, a fair number of people are going to
         | contribute to your reporting apparatus, and you want to keep it
         | in a state where you can re-factor useful abstractions up into
         | your data warehouse. This gets a lot harder if your reporting
         | is a swirl of R scripts and python snippets and whatever else.
         | 
         | Some order of magnitude more people know SQL and markdown than
         | R or Python. Every business I have been involved in has someone
         | there who is cranking out analysis and data pulls using SQL.
         | Very rarely would that person be comfortable working in R
         | markdown.
         | 
         | You can't in-line an ML model into your reports:
         | 
         | Again, we think this constraint is basically a good thing. If
         | you have a model that is profitable to your business, it should
         | be governed and executed in a purpose built environment and,
         | where feasible, you should be storing the relevant outputs for
         | posterity in your data warehouse.
         | 
         | We will add instructions on setting the port! :)
        
       | melenaboija wrote:
       | Very cool!
       | 
       | I would like to see an option to make a copy of the analyzed data
       | and/or a hash value for it.
        
         | amcaskill wrote:
         | There's a discussion earlier in the thread about static vs.
         | live data. I think there are some interesting things we could
         | do here.
         | 
         | There are also some other great products doing data diffing,
         | like datafold that might fit the bill here.
        
       | wizwit999 wrote:
       | I like it. I've seen the same issue and agree people over index
       | on no code. I didn't see it on your page, do you get into
       | visualization as well?
        
         | amcaskill wrote:
         | Thanks so much!
         | 
         | Yes, definitely. We include a visualization library with the
         | Evidence.
         | 
         | You can write <LineChart .../> to add a line chart to your
         | document, <Hist .../> for a histogram etc.
         | 
         | You can see the documentation for the chart types we have built
         | under 'components' in our docs. Here's the histogram:
         | https://docs.evidence.dev/components/hist
         | 
         | Designing this is one of the trickiest parts of the project,
         | and is going to be one of the biggest areas of effort going
         | forward. We're trying to build something that is very
         | declarative, so that people don't have to spend a lot of time
         | configuring their charts, and something that is composable, so
         | that you can create more complex viz that include things like
         | annotations.
        
       | sails wrote:
       | > In our experience, the BI layer is the weakest part of the
       | modern data stack. The BI layer has a poor developer experience,
       | and decision makers don't really like the outputs they get
       | 
       | Totally agree, very interested in trying this out. FWIW I've
       | tried and been frustrated by Looker, Metabase, PowerBI, Superset,
       | Redash.
       | 
       | I do think that while dbt does a great job with dimensional
       | modelling, the BI layer is still required to provide some aspects
       | of metric modelling. Is this something that Evidence is looking
       | to solve? From what I've seen it looks more to be a pure frontend
       | visualisation rather than a tool for managing business metrics.
       | Looker and Metabase do some good work in this metric management
       | space, Superset and PowerBI much less so.
        
         | amcaskill wrote:
         | You have cut straight to the heart of a pretty interesting
         | problem that we are still thinking through.
         | 
         | You are exactly right, right now this is pure front-end. That's
         | intentional, and there's a lot that we like about that
         | approach, especially in light of the success of dbt.
         | 
         | We think dbt fundamentally changes what is needed from a BI
         | tool and that vendors who are maintaining really heavy built-in
         | data transformation layers will basically be wasting resources
         | over the coming years. The approach of modelling in your data
         | warehouse is just so much more sensible, that we think it's a
         | really a good thing to bet on.
         | 
         | That said, having some form of metric modelling in your BI tool
         | is really nice -- it helps you keep your queries dry, and makes
         | it simpler to roll out changes. If we were to build something
         | here, I think it would be very lightweight -- a config that
         | basically let you define re-usable sql snippets, and maybe some
         | constraints on them.
         | 
         | On the other hand, there are A LOT of startups building metrics
         | layers, which look great. Usually these expose an API endpoint,
         | and some sort SQL interface. We'd be just as happy to plug into
         | one of those SQL interfaces and call it a day. I just wish one
         | of those was open source, since the metrics layer is such a
         | choke-hold on your data operation.
         | 
         | Maybe someone will build the 'dbt of metrics layers'. That
         | would be great for the ecosystem. Maybe dbt will do it
         | themselves. I think there's probably something interesting they
         | could do there by treating stored procedures as materialization
         | targets.
        
           | buremba wrote:
           | I believe that we're working on a similar product that you
           | described. Our metrics layer, metriql, extends dbt for
           | metrics definitions and provides an open-source CLI that
           | serves your metrics to the data tools.
           | 
           | We have REST API and use Trino protocol so in case you
           | support Trino (formerly Presto) you already support metriql.
           | :)
           | 
           | Here is the link: https://metriql.com
        
             | amcaskill wrote:
             | Look at that! Yeah, I love it. That's exactly what I was
             | thinking of.
             | 
             | Trino is on the to-do list!
        
         | adithyasrin wrote:
         | I have worked with MODLR [1] for data modelling for a FP&A
         | solution and highly recommend it. Its a complete platform, not
         | just a visualization tool.
         | 
         | [1] - https://modlr.co
        
       | smashah wrote:
       | Wow that's cool. Now of I can create these snippets to output as
       | SVG then I can add it to my GitHub readme!
        
       | 101008 wrote:
       | How do you handle live data vs fixed data? If I am making a
       | report, I want the charts to remain static - if not, over time,
       | they may not match with what is said on the report. Is there an
       | option to, after saving the report or run the query, to make the
       | values static forever?
        
         | amcaskill wrote:
         | This is an excellent question, and it's one of the areas where
         | we think we can do some pretty novel things with our approach.
         | 
         | There are a two main cases of this idea that we have spent time
         | thinking about.
         | 
         | 1. Truly static report.
         | 
         | Here, you would need to condition your SQL queries so that they
         | continue to return the same results over time. E.g. your
         | `where` clause restricts the results to 'on or before' the day
         | of writing. Evidence will continue to build the report on a
         | schedule, but the results will never change so long as your
         | historical data is constant. You can do this today.
         | 
         | In a future state, we've talked about rendering a snapshot of
         | the report and checking that into version control, so that even
         | if your underlying data is a moving target, you can hold onto
         | what the report looked like at a moment in time.
         | 
         | We're kind of mixed on that idea of snapshotting reports
         | themselves though, it's just so much better to build your data
         | warehouse such that it is actually retaining the historical
         | data, but we recognize sometimes that's not practical. TBD on
         | that functionality.
         | 
         | 2. Recurring report with static commentary
         | 
         | Here, you have a recurring time-bounded report, and you
         | occasionally want to mix-in commentary that's only relevant for
         | specific time periods.
         | 
         | With Evidence, (this part is from svelte kit) you can mix
         | paramaterized pages, and static pages on the same route. So if
         | you had 'monthly mrr growth report', you could use a
         | parameterized page to generate the report for every month into
         | history and into the future, AND, you could include versions of
         | the report with hand-written commentary for any specific months
         | where it was needed. So if someone navigates to the February
         | 2021 page, they get the standard parameterized version, but if
         | they go to January 2021, they get the handwritten January
         | version that explains that there was an acquisition which drove
         | the big pick-up in MRR.
         | 
         | This one is a bit tricky to explain, but we will build some
         | examples.
        
           | 101008 wrote:
           | Thanks for answering. The approach about adding the date on
           | the where clause may not work. I may receive/ingest more data
           | from that point in time after I made the report. Of course,
           | the report wouldn't be updated, but I'd prefer to have an
           | updated report that says "We did it witht he data available
           | at this time" and not one that says one thing but shows a
           | different thing.
           | 
           | Kudos on the launching anyway!
        
             | amcaskill wrote:
             | Good point. Late arriving data is such a challenging
             | problem. Perhaps if you have the load date, you could
             | condition on that. We will think about the 'snapshot the
             | outputs' approach as well.
        
       | [deleted]
        
       | Jugurtha wrote:
       | Congratulations on the launch. I'll keep an eye on this.
       | 
       | We used to build custom, turn-key machine learning products for
       | enterprise. Recently, after playing with things like Voila,
       | Streamlit, and Superset, we made it possible for our data
       | scientists and ML people to show prototypes and applications
       | right from the platform, without worrying about creating a VM,
       | setting up the environment, scp stuff, create an application,
       | configure a server, set up authentication, send a link to the
       | client, etc.
       | 
       | I can envision doing something similar with Evidence. Given it's
       | markdown, could we imagine having a Jupyter notebook containing
       | markdown cells that somehow use Evidence? Could this be a
       | JupyterLab extension?
       | 
       | I'm asking this because we have live collaboration /
       | collaborative editing notebook on the platform, with access to
       | external data sources such as S3 as if they were filesystems, so
       | several people could collaborate on the same notebook, see
       | cursors and selections of others, etc. Why not do that on
       | Evidence work as well:
       | 
       | - I start a notebook. Add a Markdown cell. Some magic, I can do
       | whatever it is I can do to generate reports with Markdown.
       | 
       | - Share the notebook with other users. We get together and work
       | on that visualization/report.
       | 
       | Tangent: Something that kind of sucks is that some clients send
       | us a database _dump_ as a file, plus all other miscellaneous
       | data. We have to create a MySQL database from that dump. It 's
       | not a big deal, but we don't like it.
        
       | sdan wrote:
       | This looks similar to Posthog of YC W20
        
         | amcaskill wrote:
         | I think Posthog is more like an open source mixpanel -- you
         | need event tracking on your website, and standard analysis of
         | traffic, funnels, user segments etc.
         | 
         | Evidence is aimed at a longer tail of data analysis -- you have
         | more idiosyncratic data landing in a data warehouse, and you
         | need to turn it into custom reports, dashboards etc.
        
       ___________________________________________________________________
       (page generated 2021-08-25 23:00 UTC)