[HN Gopher] Launch HN: Mito (YC S20) - Edit a spreadsheet, gener...
       ___________________________________________________________________
        
       Launch HN: Mito (YC S20) - Edit a spreadsheet, generate Python
        
       Hiya HN, I'm Nate, cofounder of Mito (https://trymito.io) with my
       best friends Jake and Aaron. Mito is a spreadsheet UI that runs
       inside a Jupyter Notebook. Each time you edit the spreadsheet, it
       generates Python code for that edit. This allows analysts to write
       Python scripts using an interface they are familiar with, instead
       of waiting months for eng resources.  Mito is open core:
       http://github.com/mito-ds/monorepo. Our docs are at
       http://docs.trymito.io, and you can download it here:
       https://docs.trymito.io/getting-started/installing-mito.  Most
       people doing data analysis in Python struggle to just write basic
       Python. If you search StackOverflow for the [pandas] tag, you'll
       find pandas users wrestling with everything from "how can I make a
       pivot table?" to "how do I import from another folder?" These users
       are experts in their field -- they just aren't experts in Python.
       Tasks that take them seconds in spreadsheets can end up taking them
       days. (Here's how we put it to investors: the next 10 million
       Python programmers are transitioning from Excel and have one real
       problem: writing the damn code.) A lot of organizations are stuck
       on this dilemma: they want to move from spreadsheets to Python, but
       getting started with programming--even with a highly usable
       language like Python--is hard.  We've spent years with users trying
       to adapt their spreadsheet skills to Python. It takes weeks to
       learn the basics. Their existing skills don't transfer. Many of
       their needs are simple to do in a spreadsheet--writing a formula,
       aggregating data, graphing--but adapting them to Python requires
       long courses, emails to internal support (if any exists) waiting
       days for a reply, and countless trips to Stack Overflow. Often they
       just give up and return to Excel, but that makes them dependent on
       IT to write code for them. One of our users was quoted a full year
       for IT to implement a simple report! (Fast-forward: he ended up
       using Mito to automate it himself in less than a week.)  We went
       through this ourselves when we went to college together, studying
       engineering and business. We first learned data science with
       spreadsheets, then had to relearn it in Python. The transition was
       painful--basic Excel was much easier! Of course, not-so-basic Excel
       soon becomes not-so-easy, which is what drives the move to Python
       in the first place.  With our interest in spreadsheets, we started
       a spreadsheet-version-control company at the end of college, and
       spent a year working with Excel power users. Eventually, we
       realized that version control was secondary to the real problems
       users faced with spreadsheets: limited data size, speed limits,
       lack of advanced functionality, and a horrible replayability story.
       Essentially, enterprises are caught between a rock (their
       spreadsheet woes) and a hard place (the pain of moving analysts to
       Python). We decided to work on this instead, and started Mito.
       Mito is a spreadsheet UI built as an extension to Jupyter Notebooks
       / JupyterLab. Using a Mito spreadsheet, users can import data, add
       and delete columns, write formulas like Excel, make pivot tables,
       generate graphs, and more. See our docs (http://docs.trymito.io)
       for all our functionality.  Every tab in a Mito spreadsheet is a
       different pandas DataFrame. For each edit made, a line of pandas
       code is generated in a code cell directly below the spreadsheet
       that corresponds to this edit. For example, if I use Mito to import
       a CSV, add a column named Day of Week, and use the WEEKDAY formula
       from Excel to pull out the weekday from another column, Mito
       generates the following code:                 # Imported tesla
       stock.csv       import pandas as pd       tesla_stock =
       pd.read_csv(r'tesla stock.csv')              # Added column Day of
       week       tesla_stock.insert(1, 'Day of week',
       WEEKDAY(tesla_stock['Date']))       In practice, the typical user
       bounces back and forth between writing Python and using the Mito
       spreadsheet, depending on the task at hand. We think this fluid
       movement between a spreadsheet and Python is really cool. The
       spreadsheet backend is just a Python extension to the IPython
       kernel you're already running for your Jupyter Notebook. Because
       Mito is just a Python package, all data processing happens locally.
       As mentioned, Mito is an open core product. 90% of the code is AGPL
       licensed. The rest is under a separate enterprise license. These
       modules are still source-visible, but require users to pay for a
       pro or enterprise offering before using them. That's basically our
       business model.  We have 3 versions (https://trymito.io/plans): (1)
       Free: basic analysis tools, as well as some basic telemetry that
       you can opt out of; (2). Pro: all of (1), with advanced
       functionality; (3) Enterprise: all of (2), with more advanced
       features, optimizations, and support.  Because spreadsheets are
       sprawling pieces of software, we're pretty obsessed with optimizing
       for long-term development. We use strong types where we can
       (TypeScript on the frontend, fairly comprehensive MyPy in Python).
       We've implemented our own component libraries for common components
       from scratch, which lets us be flexible during large refactorings.
       We implemented our own custom JavaScript grid--hyper-optimized for
       our use case, and as a result is the fastest JS grid we tested in
       our context. We're also big fans of metaprogramming--we write an
       increasing amount of code that writes code for us--which in turn
       makes it easy to add more functionality to our spreadsheet.  We
       posted about Mito a long time ago:
       https://news.ycombinator.com/item?id=24305615. No one really liked
       it (we learned our lesson!), and it didn't do much at the time -- I
       think the app had a single button that added a column. Three months
       ago, someone (not sure who -- thank you, alefnula!) posted it
       again: https://news.ycombinator.com/item?id=31446236. It reached
       the top 3 and we got lots of comments--yay! Since then, we've
       doubled the number of features (mostly data processing), done a UI
       overhaul, dramatically expanded the Pro + Enterprise offering, made
       telemetry optional in the free version, and more.  We'd love to
       hear all about your experiences with spreadsheet analysis, the
       uncanny valley between spreadsheets and code, the travails of
       moving enterprise analytics off of spreadsheets, and whatever else
       you'd like to ask or mention. Any and all feedback is greatly
       appreciated!
        
       Author : narush
       Score  : 105 points
       Date   : 2022-09-05 12:54 UTC (10 hours ago)
        
       | rpicard wrote:
       | Congratulations on all the product development progress!
       | 
       | My wife is a spreadsheet wizard and I'm excited to get her take
       | on this too.
        
         | narush wrote:
         | Thanks! Lots of investment in meta-programming in the past 3
         | months - in our case, code that writes code that writes code :)
         | 
         | Takes from spreadsheet wizards greatly appreciated.
        
       | abanayev wrote:
       | Wouldn't actually using Excel create less friction for potential
       | users?
       | 
       | Your target audience is theoretically Excel users who want/need
       | to code instead, but I think you're alienating the power users of
       | Excel, because their power tools are unavailable in the Mito
       | spreadsheet editor.
       | 
       | For example, have you considered dumping the dataframes to
       | "smart" xlsx files with backing code that connects to a local
       | server, listens to worksheet events and tells the server
       | everything that happens so it can write python code in the source
       | notebook?
        
         | theptip wrote:
         | Excel workflows are terrible though. No version control, hard
         | to test, prone to indexing errors. And doing very sophisticated
         | things with it gets hard; lots of financial analysts/quants are
         | moving over to Python for analysis anyway.
         | 
         | If you're thinking about this in isolation, I can see why it
         | would seem a bad idea to move power-excel users to Python. But
         | take this in the context of a much wider shift where many shops
         | are already shifting to Python for other reasons, and so we
         | need a way to help transition the Excel power users over too.
         | 
         | Excel has its place for sure, but I think it's interesting to
         | consider whether another tool paradigm could gradually replace
         | it; we would need to really hone the flexibility and
         | expressivity of the UI for simple tasks. The benefit would be
         | that when your task grows you don't need to re-implement it in
         | a new Python engine.
        
           | aarondia wrote:
           | Aaron here, one of the Mito co-founders.
           | 
           | +1, beyond the most obvious reasons that companies are moving
           | away from Excel (too much data to process, not enough robust
           | automation features), there are important workflow management
           | reasons that companies are making the transition.
           | 
           | More and more, we're hearing that companies want to use
           | software engineering practices on their data analytics
           | workflows -- things like version control, easily
           | understanding what edits are applied by looking at the code,
           | and even things like CI to automatically build dashboards
           | from the most up to date data.
           | 
           | While you technically could build tooling around Excel to do
           | a lot of these things, its much easier and already exists in
           | the Python ecosystem.
        
         | narush wrote:
         | We've thought a lot about this one. It's a good idea for
         | usability - agree with you there - but there are some
         | development complexities that make it hard for other reasons.
         | 
         | We spent a considerable amount of time two years ago developing
         | Excel extensions for our spreadsheet-version-control product.
         | It was... not ideal from a development perspective.
         | 
         | The benefits of being in Excel (it has all the features!) is
         | also the cost of being in Excel (you have to support all the
         | features!). This means v1 of the extension you describe with
         | either have to be non-functional on most of these power tools
         | you mention, or we'd need to spent years building in stealth
         | mode before launching something fully working (and I'm not even
         | sure we ever could get there... Excel is... literally so big).
         | 
         | Also, the actual extension points for Excel are not as fully-
         | featured as you might think! In practice, we'd likely have to
         | gate much of Excel's functionality to get an extension that
         | actually works -- there are some hard limits to what you can
         | extend, further making it really hard to actually support these
         | power tools in practice.
         | 
         | Also, for the sake of our users, we love being in a Python
         | development environment! In practice, many of our users move
         | really fluidly back and forth between writing Python and
         | editing a Mito spreadsheet. Effectively - bring a spreadsheet
         | in for what it's good at, when you want it.
         | 
         | We'll keep considering this one, though -- I have a _feeling_
         | Microsoft might make some Python moves in Excel the next few
         | years... :)
        
       | coltoneakins wrote:
       | Congrats on the launch!
       | 
       | Random, but: what program did you use to make the intro video? It
       | looks really clean.
        
         | narush wrote:
         | Thank you! It's recorded using QuickTime screen cap, and edited
         | in Final Cut Pro. I also made some assets in Figma (e.g. the
         | little spreadsheet grid background).
         | 
         | It took longer than I'd like to admit... but feel validated in
         | spending that time now that you've asked :)
        
       | b800h wrote:
       | This is quite similar in concept to a spreadsheet product from
       | 2008, called Resolver One, which ran on IronPython.
       | 
       | https://media.prleap.com/image/221/640/share_trade_screensho...
       | 
       | It was excellent, and a bit of a shame that it didn't get more
       | traction at the time.
        
       | [deleted]
        
       | [deleted]
        
       | Closi wrote:
       | Well done on making such a huge application!
       | 
       | From a user perspective, what are the benefits using this rather
       | than using PowerQuery within Excel? From a functional perspective
       | it seems to do something very similar (i.e. your demo on your
       | front site, I could just do in PowerQuery).
        
         | narush wrote:
         | It's a good question why our users prefer Mito+Python over
         | something like PowerQuery+M! One might similarly ask what's
         | wrong with Excel+VBA - although I'll note I haven't heard
         | anyone champion VBA recently... :)
         | 
         | In practice, most of our users are have started with Python by
         | the time they use Mito. For now, we're not positioning
         | ourselves as an alternative to PowerQuery, but rather a tool
         | for someone who is coming from spreadsheets, has chosen Python,
         | and is struggling to write code.
         | 
         | The next obvious question is why our users are choosing Python
         | in the first place -- what I'll say here is that like any
         | programming language, there are a huge number of reasons: some
         | of our users prefer Python because that's what their colleagues
         | work; some choose Python because they think it's trendy/cool;
         | others choose python because that's where the libraries they
         | want to use are; others are starting down the path of getting
         | into ML (which is primarily in Python); others want to
         | integrate with existing Python infrastructure within their
         | company. We've also seen massive enterprises with top down
         | edicts to move to Python "within the next 5 years", as well.
         | 
         | In practice, Python is the most popular general purpose
         | programming language for data science - and so we're doing our
         | best to meet our users where they are: writing Python code, in
         | Jupyter Notebooks!
        
           | Closi wrote:
           | TBH I think your target market is quite confusing.
           | 
           | It seems to be a non-technical user who is struggling to
           | write Python and wants an easy way out, but is willing to
           | install a tool via a CLI within a python virtual environment,
           | knows what a Jupyter Notebook is and possibly wants to start
           | writing machine learning code?
           | 
           | If the target market is actually the 'struggling non-
           | technical user' I suspect you will need to remove as much
           | friction as possible, although i'm not entirely sure if that
           | is your target market.
           | 
           | IMO would be good to focus on how your product actually helps
           | do analysis better than Excel + PowerQuery/M, because
           | presumably there has to be some sort of functional benefit
           | otherwise what's the point?
        
             | narush wrote:
             | I think your description is a pretty accurate description
             | of most of our users: they are struggling to write Python
             | in a Jupyter Notebook, and can install some basic packages
             | (albeit it with some struggles -- see our Discord install
             | help channel). The ML code part, you're right, def more
             | rare :)
             | 
             | Python code helps these users do a variety of tasks that
             | aren't possible in other analytics tools like PowerQuery/M.
             | Many of these tasks are specific to the company/existing
             | infrastructures, as I mentioned above.
             | 
             | A super concrete example: the head of data strategy at a
             | life-sciences company made the transition to Python
             | primarily because the rest of his (2 person) team uses
             | Python. They primarily communicate about new datasets using
             | Mito generated code (e.g. here are the steps to clean this
             | data) - but he's not great at Python - so in practice he
             | uses Mito for 9/10 analyses he does to generate this code
             | he sends to his colleagues!
             | 
             | Can give a few more if you'd like -- let me know!
        
             | aarondia wrote:
             | The friction of getting started with Mito is something we
             | spend a lot of time focusing on. For example, when it comes
             | to the installation process, not only do users install Mito
             | through a CLI, but because JupyterLab 2, JupyterLab 3, and
             | Jupyter notebooks all support extensions in different ways,
             | there are different installation commands that users need
             | to run to get it working for their specific environment.
             | Initially, we just gave users instructions in our docs
             | about which commands to install for which environment. Now
             | we've built a completely new Python package, the
             | mitoinstaller package, that handles the entire installation
             | process. It downloads Jupyter if they don't have it,
             | detects which version of Jupyter they have installed, runs
             | the installation commands for their JupyterLab version and
             | Jupyter notebooks, and finally starts up the Jupyter server
             | with a tutorial notebook. In the success case, users run
             | two commands and then 2 minutes later have already imported
             | data into their first Mito spreadsheet.
             | 
             | That initial friction reduction is important to our target
             | users, who I would describe in two buckets:
             | 
             | 1. Target open source adopters. These users are beginner to
             | intermediate Python users that want to / need to write
             | Python for data analysis. Most of the open source users
             | that adopt Mito are already on their Python journey --
             | we're not teaching them what Python is or what a notebook
             | is in the vast majority of cases. Many of them have gone
             | through Kaggle courses, taken a couple data science classes
             | at school, or are particularly enginuitive. For those
             | beginner users, and even for people like me who have
             | written pandas code for a few years, some things are just
             | much easier to do in a spreadsheet interface, like creating
             | a pivot table or graph (two of our most popular features)
             | 
             | 2. Decision makers at large enterprises responsible for
             | moving their company from Excel to Python. Much like us,
             | these decision makers think a ton about the friction of
             | getting employees started with Python. In most cases, they
             | set up JupyterHub (https://jupyter.org/hub) so users don't
             | need to go through any installation processes themselves,
             | and they control things like version controlling, turning
             | notebooks into reports, etc. They generally also
             | offer/require Python training courses, provide template
             | notebooks, and have data scientists available to help the
             | business end users when they get stuck.
        
       | bravura wrote:
       | This is cool.
       | 
       | I'm a Python engineer (and ML person). I don't use pandas often,
       | so when I do need it, I am constantly on stackoverflow.com and
       | testing single lines at a time in Jupyter.
       | 
       | I'd love a version of Mito where I could give it the original
       | mock table and the desired output table maybe as a function (not
       | using a spreadsheet UI), and it would propose pandas code for me.
        
       | cm2187 wrote:
       | That's an interesting product. What is the advantage over Power
       | Query for a non technical user?
       | 
       | There is a slightly different idea on the same theme that I'll
       | give a stab at one day: let users express their logic in excel
       | using standard excel formulas, then tell a tool what are the
       | inputs and output ranges, the tool will extract the logic between
       | them (follow the formula), and generate the equivalent code. This
       | would allow a user to express and maintain, in excel, a logic
       | that can be run by IT with no dependency to excel.
        
         | aarondia wrote:
         | For some non-technical users, Power Query is a better option.
         | If your main purpose is to work with a large data set and then
         | update a PowerBI dashboard, for example, then Power Query
         | sounds like a perfect solution.
         | 
         | But what we see is that there are a bunch of reasons that these
         | non-technical users are excited about Python specifically. Here
         | are two examples:
         | 
         | 1. One of the first adopters of Mito, let's call her Shelly, is
         | helping a team of engineers build out a Salesforce dashboard to
         | predict when customers are going to refill an order. Since the
         | engineers don't have the business context to figure out how to
         | make that prediction, its Shelly's job to construct the (in
         | this case pretty simple) algorithm by querying the relevant
         | database, figuring out which fields are accurate (there might
         | be 5 different date fields and figuring out which one is
         | actually when the user last placed an order isn't as easy as it
         | sounds), and then making the prediction for each customer.
         | Shelly then uses the Python code that Mito generates as a
         | communication tool for the engineers. The code is an exact
         | audit log of each transformation she made to her data in order
         | to create the report.
         | 
         | 2. Many of the companies that we work with have business
         | specific metrics that they calculate, so they have an
         | engineering team build a Python package that can easily
         | calculate those metrics. Sometimes they will even provide
         | boiler plate Python code snippets to interact with those
         | packages. (In the future they'll be able to import them into
         | Mito!). Its a win for the employees who can rely on the code
         | snippets instead of calculating the metrics manually, and its a
         | win for the engineers who can write Python code instead of M.
         | 
         | The last thing I'll say is that companies are moving to Python
         | because of the openeness and robustness of the Python
         | ecosystem. They are power users of packages like Voila, Plotly.
         | Having employees work in Python opens up a ton of doors for how
         | companies can support them.
         | 
         | Your idea about expressing logic in excel and generating the
         | equivalent code has merit too for a different user base. Let us
         | know when you build it, excited to check it out!
        
       ___________________________________________________________________
       (page generated 2022-09-05 23:00 UTC)