[HN Gopher] Launch HN: Mito (YC S20) - Edit a spreadsheet, gener... ___________________________________________________________________ Launch HN: Mito (YC S20) - Edit a spreadsheet, generate Python Hiya HN, I'm Nate, cofounder of Mito (https://trymito.io) with my best friends Jake and Aaron. Mito is a spreadsheet UI that runs inside a Jupyter Notebook. Each time you edit the spreadsheet, it generates Python code for that edit. This allows analysts to write Python scripts using an interface they are familiar with, instead of waiting months for eng resources. Mito is open core: http://github.com/mito-ds/monorepo. Our docs are at http://docs.trymito.io, and you can download it here: https://docs.trymito.io/getting-started/installing-mito. Most people doing data analysis in Python struggle to just write basic Python. If you search StackOverflow for the [pandas] tag, you'll find pandas users wrestling with everything from "how can I make a pivot table?" to "how do I import from another folder?" These users are experts in their field -- they just aren't experts in Python. Tasks that take them seconds in spreadsheets can end up taking them days. (Here's how we put it to investors: the next 10 million Python programmers are transitioning from Excel and have one real problem: writing the damn code.) A lot of organizations are stuck on this dilemma: they want to move from spreadsheets to Python, but getting started with programming--even with a highly usable language like Python--is hard. We've spent years with users trying to adapt their spreadsheet skills to Python. It takes weeks to learn the basics. Their existing skills don't transfer. Many of their needs are simple to do in a spreadsheet--writing a formula, aggregating data, graphing--but adapting them to Python requires long courses, emails to internal support (if any exists) waiting days for a reply, and countless trips to Stack Overflow. Often they just give up and return to Excel, but that makes them dependent on IT to write code for them. One of our users was quoted a full year for IT to implement a simple report! (Fast-forward: he ended up using Mito to automate it himself in less than a week.) We went through this ourselves when we went to college together, studying engineering and business. We first learned data science with spreadsheets, then had to relearn it in Python. The transition was painful--basic Excel was much easier! Of course, not-so-basic Excel soon becomes not-so-easy, which is what drives the move to Python in the first place. With our interest in spreadsheets, we started a spreadsheet-version-control company at the end of college, and spent a year working with Excel power users. Eventually, we realized that version control was secondary to the real problems users faced with spreadsheets: limited data size, speed limits, lack of advanced functionality, and a horrible replayability story. Essentially, enterprises are caught between a rock (their spreadsheet woes) and a hard place (the pain of moving analysts to Python). We decided to work on this instead, and started Mito. Mito is a spreadsheet UI built as an extension to Jupyter Notebooks / JupyterLab. Using a Mito spreadsheet, users can import data, add and delete columns, write formulas like Excel, make pivot tables, generate graphs, and more. See our docs (http://docs.trymito.io) for all our functionality. Every tab in a Mito spreadsheet is a different pandas DataFrame. For each edit made, a line of pandas code is generated in a code cell directly below the spreadsheet that corresponds to this edit. For example, if I use Mito to import a CSV, add a column named Day of Week, and use the WEEKDAY formula from Excel to pull out the weekday from another column, Mito generates the following code: # Imported tesla stock.csv import pandas as pd tesla_stock = pd.read_csv(r'tesla stock.csv') # Added column Day of week tesla_stock.insert(1, 'Day of week', WEEKDAY(tesla_stock['Date'])) In practice, the typical user bounces back and forth between writing Python and using the Mito spreadsheet, depending on the task at hand. We think this fluid movement between a spreadsheet and Python is really cool. The spreadsheet backend is just a Python extension to the IPython kernel you're already running for your Jupyter Notebook. Because Mito is just a Python package, all data processing happens locally. As mentioned, Mito is an open core product. 90% of the code is AGPL licensed. The rest is under a separate enterprise license. These modules are still source-visible, but require users to pay for a pro or enterprise offering before using them. That's basically our business model. We have 3 versions (https://trymito.io/plans): (1) Free: basic analysis tools, as well as some basic telemetry that you can opt out of; (2). Pro: all of (1), with advanced functionality; (3) Enterprise: all of (2), with more advanced features, optimizations, and support. Because spreadsheets are sprawling pieces of software, we're pretty obsessed with optimizing for long-term development. We use strong types where we can (TypeScript on the frontend, fairly comprehensive MyPy in Python). We've implemented our own component libraries for common components from scratch, which lets us be flexible during large refactorings. We implemented our own custom JavaScript grid--hyper-optimized for our use case, and as a result is the fastest JS grid we tested in our context. We're also big fans of metaprogramming--we write an increasing amount of code that writes code for us--which in turn makes it easy to add more functionality to our spreadsheet. We posted about Mito a long time ago: https://news.ycombinator.com/item?id=24305615. No one really liked it (we learned our lesson!), and it didn't do much at the time -- I think the app had a single button that added a column. Three months ago, someone (not sure who -- thank you, alefnula!) posted it again: https://news.ycombinator.com/item?id=31446236. It reached the top 3 and we got lots of comments--yay! Since then, we've doubled the number of features (mostly data processing), done a UI overhaul, dramatically expanded the Pro + Enterprise offering, made telemetry optional in the free version, and more. We'd love to hear all about your experiences with spreadsheet analysis, the uncanny valley between spreadsheets and code, the travails of moving enterprise analytics off of spreadsheets, and whatever else you'd like to ask or mention. Any and all feedback is greatly appreciated! Author : narush Score : 105 points Date : 2022-09-05 12:54 UTC (10 hours ago) | rpicard wrote: | Congratulations on all the product development progress! | | My wife is a spreadsheet wizard and I'm excited to get her take | on this too. | narush wrote: | Thanks! Lots of investment in meta-programming in the past 3 | months - in our case, code that writes code that writes code :) | | Takes from spreadsheet wizards greatly appreciated. | abanayev wrote: | Wouldn't actually using Excel create less friction for potential | users? | | Your target audience is theoretically Excel users who want/need | to code instead, but I think you're alienating the power users of | Excel, because their power tools are unavailable in the Mito | spreadsheet editor. | | For example, have you considered dumping the dataframes to | "smart" xlsx files with backing code that connects to a local | server, listens to worksheet events and tells the server | everything that happens so it can write python code in the source | notebook? | theptip wrote: | Excel workflows are terrible though. No version control, hard | to test, prone to indexing errors. And doing very sophisticated | things with it gets hard; lots of financial analysts/quants are | moving over to Python for analysis anyway. | | If you're thinking about this in isolation, I can see why it | would seem a bad idea to move power-excel users to Python. But | take this in the context of a much wider shift where many shops | are already shifting to Python for other reasons, and so we | need a way to help transition the Excel power users over too. | | Excel has its place for sure, but I think it's interesting to | consider whether another tool paradigm could gradually replace | it; we would need to really hone the flexibility and | expressivity of the UI for simple tasks. The benefit would be | that when your task grows you don't need to re-implement it in | a new Python engine. | aarondia wrote: | Aaron here, one of the Mito co-founders. | | +1, beyond the most obvious reasons that companies are moving | away from Excel (too much data to process, not enough robust | automation features), there are important workflow management | reasons that companies are making the transition. | | More and more, we're hearing that companies want to use | software engineering practices on their data analytics | workflows -- things like version control, easily | understanding what edits are applied by looking at the code, | and even things like CI to automatically build dashboards | from the most up to date data. | | While you technically could build tooling around Excel to do | a lot of these things, its much easier and already exists in | the Python ecosystem. | narush wrote: | We've thought a lot about this one. It's a good idea for | usability - agree with you there - but there are some | development complexities that make it hard for other reasons. | | We spent a considerable amount of time two years ago developing | Excel extensions for our spreadsheet-version-control product. | It was... not ideal from a development perspective. | | The benefits of being in Excel (it has all the features!) is | also the cost of being in Excel (you have to support all the | features!). This means v1 of the extension you describe with | either have to be non-functional on most of these power tools | you mention, or we'd need to spent years building in stealth | mode before launching something fully working (and I'm not even | sure we ever could get there... Excel is... literally so big). | | Also, the actual extension points for Excel are not as fully- | featured as you might think! In practice, we'd likely have to | gate much of Excel's functionality to get an extension that | actually works -- there are some hard limits to what you can | extend, further making it really hard to actually support these | power tools in practice. | | Also, for the sake of our users, we love being in a Python | development environment! In practice, many of our users move | really fluidly back and forth between writing Python and | editing a Mito spreadsheet. Effectively - bring a spreadsheet | in for what it's good at, when you want it. | | We'll keep considering this one, though -- I have a _feeling_ | Microsoft might make some Python moves in Excel the next few | years... :) | coltoneakins wrote: | Congrats on the launch! | | Random, but: what program did you use to make the intro video? It | looks really clean. | narush wrote: | Thank you! It's recorded using QuickTime screen cap, and edited | in Final Cut Pro. I also made some assets in Figma (e.g. the | little spreadsheet grid background). | | It took longer than I'd like to admit... but feel validated in | spending that time now that you've asked :) | b800h wrote: | This is quite similar in concept to a spreadsheet product from | 2008, called Resolver One, which ran on IronPython. | | https://media.prleap.com/image/221/640/share_trade_screensho... | | It was excellent, and a bit of a shame that it didn't get more | traction at the time. | [deleted] | [deleted] | Closi wrote: | Well done on making such a huge application! | | From a user perspective, what are the benefits using this rather | than using PowerQuery within Excel? From a functional perspective | it seems to do something very similar (i.e. your demo on your | front site, I could just do in PowerQuery). | narush wrote: | It's a good question why our users prefer Mito+Python over | something like PowerQuery+M! One might similarly ask what's | wrong with Excel+VBA - although I'll note I haven't heard | anyone champion VBA recently... :) | | In practice, most of our users are have started with Python by | the time they use Mito. For now, we're not positioning | ourselves as an alternative to PowerQuery, but rather a tool | for someone who is coming from spreadsheets, has chosen Python, | and is struggling to write code. | | The next obvious question is why our users are choosing Python | in the first place -- what I'll say here is that like any | programming language, there are a huge number of reasons: some | of our users prefer Python because that's what their colleagues | work; some choose Python because they think it's trendy/cool; | others choose python because that's where the libraries they | want to use are; others are starting down the path of getting | into ML (which is primarily in Python); others want to | integrate with existing Python infrastructure within their | company. We've also seen massive enterprises with top down | edicts to move to Python "within the next 5 years", as well. | | In practice, Python is the most popular general purpose | programming language for data science - and so we're doing our | best to meet our users where they are: writing Python code, in | Jupyter Notebooks! | Closi wrote: | TBH I think your target market is quite confusing. | | It seems to be a non-technical user who is struggling to | write Python and wants an easy way out, but is willing to | install a tool via a CLI within a python virtual environment, | knows what a Jupyter Notebook is and possibly wants to start | writing machine learning code? | | If the target market is actually the 'struggling non- | technical user' I suspect you will need to remove as much | friction as possible, although i'm not entirely sure if that | is your target market. | | IMO would be good to focus on how your product actually helps | do analysis better than Excel + PowerQuery/M, because | presumably there has to be some sort of functional benefit | otherwise what's the point? | narush wrote: | I think your description is a pretty accurate description | of most of our users: they are struggling to write Python | in a Jupyter Notebook, and can install some basic packages | (albeit it with some struggles -- see our Discord install | help channel). The ML code part, you're right, def more | rare :) | | Python code helps these users do a variety of tasks that | aren't possible in other analytics tools like PowerQuery/M. | Many of these tasks are specific to the company/existing | infrastructures, as I mentioned above. | | A super concrete example: the head of data strategy at a | life-sciences company made the transition to Python | primarily because the rest of his (2 person) team uses | Python. They primarily communicate about new datasets using | Mito generated code (e.g. here are the steps to clean this | data) - but he's not great at Python - so in practice he | uses Mito for 9/10 analyses he does to generate this code | he sends to his colleagues! | | Can give a few more if you'd like -- let me know! | aarondia wrote: | The friction of getting started with Mito is something we | spend a lot of time focusing on. For example, when it comes | to the installation process, not only do users install Mito | through a CLI, but because JupyterLab 2, JupyterLab 3, and | Jupyter notebooks all support extensions in different ways, | there are different installation commands that users need | to run to get it working for their specific environment. | Initially, we just gave users instructions in our docs | about which commands to install for which environment. Now | we've built a completely new Python package, the | mitoinstaller package, that handles the entire installation | process. It downloads Jupyter if they don't have it, | detects which version of Jupyter they have installed, runs | the installation commands for their JupyterLab version and | Jupyter notebooks, and finally starts up the Jupyter server | with a tutorial notebook. In the success case, users run | two commands and then 2 minutes later have already imported | data into their first Mito spreadsheet. | | That initial friction reduction is important to our target | users, who I would describe in two buckets: | | 1. Target open source adopters. These users are beginner to | intermediate Python users that want to / need to write | Python for data analysis. Most of the open source users | that adopt Mito are already on their Python journey -- | we're not teaching them what Python is or what a notebook | is in the vast majority of cases. Many of them have gone | through Kaggle courses, taken a couple data science classes | at school, or are particularly enginuitive. For those | beginner users, and even for people like me who have | written pandas code for a few years, some things are just | much easier to do in a spreadsheet interface, like creating | a pivot table or graph (two of our most popular features) | | 2. Decision makers at large enterprises responsible for | moving their company from Excel to Python. Much like us, | these decision makers think a ton about the friction of | getting employees started with Python. In most cases, they | set up JupyterHub (https://jupyter.org/hub) so users don't | need to go through any installation processes themselves, | and they control things like version controlling, turning | notebooks into reports, etc. They generally also | offer/require Python training courses, provide template | notebooks, and have data scientists available to help the | business end users when they get stuck. | bravura wrote: | This is cool. | | I'm a Python engineer (and ML person). I don't use pandas often, | so when I do need it, I am constantly on stackoverflow.com and | testing single lines at a time in Jupyter. | | I'd love a version of Mito where I could give it the original | mock table and the desired output table maybe as a function (not | using a spreadsheet UI), and it would propose pandas code for me. | cm2187 wrote: | That's an interesting product. What is the advantage over Power | Query for a non technical user? | | There is a slightly different idea on the same theme that I'll | give a stab at one day: let users express their logic in excel | using standard excel formulas, then tell a tool what are the | inputs and output ranges, the tool will extract the logic between | them (follow the formula), and generate the equivalent code. This | would allow a user to express and maintain, in excel, a logic | that can be run by IT with no dependency to excel. | aarondia wrote: | For some non-technical users, Power Query is a better option. | If your main purpose is to work with a large data set and then | update a PowerBI dashboard, for example, then Power Query | sounds like a perfect solution. | | But what we see is that there are a bunch of reasons that these | non-technical users are excited about Python specifically. Here | are two examples: | | 1. One of the first adopters of Mito, let's call her Shelly, is | helping a team of engineers build out a Salesforce dashboard to | predict when customers are going to refill an order. Since the | engineers don't have the business context to figure out how to | make that prediction, its Shelly's job to construct the (in | this case pretty simple) algorithm by querying the relevant | database, figuring out which fields are accurate (there might | be 5 different date fields and figuring out which one is | actually when the user last placed an order isn't as easy as it | sounds), and then making the prediction for each customer. | Shelly then uses the Python code that Mito generates as a | communication tool for the engineers. The code is an exact | audit log of each transformation she made to her data in order | to create the report. | | 2. Many of the companies that we work with have business | specific metrics that they calculate, so they have an | engineering team build a Python package that can easily | calculate those metrics. Sometimes they will even provide | boiler plate Python code snippets to interact with those | packages. (In the future they'll be able to import them into | Mito!). Its a win for the employees who can rely on the code | snippets instead of calculating the metrics manually, and its a | win for the engineers who can write Python code instead of M. | | The last thing I'll say is that companies are moving to Python | because of the openeness and robustness of the Python | ecosystem. They are power users of packages like Voila, Plotly. | Having employees work in Python opens up a ton of doors for how | companies can support them. | | Your idea about expressing logic in excel and generating the | equivalent code has merit too for a different user base. Let us | know when you build it, excited to check it out! ___________________________________________________________________ (page generated 2022-09-05 23:00 UTC)