hngopher.com

       [HN Gopher] Launch HN: Ploomber (YC W22) - Quickly deploy data p...
       ___________________________________________________________________
        
       Launch HN: Ploomber (YC W22) - Quickly deploy data pipelines from
       Jupyter/VSCode
        
       Hi HN, we're Eduardo & Ido, the founders of Ploomber
       (https://ploomber.io). We're building an open-source framework
       (https://github.com/ploomber/ploomber) that helps data scientists
       quickly deploy the code they develop in interactive environments
       (Jupyter/VScode/PyCharm), eliminating the need for time-consuming
       manual porting to production platforms.  Jupyter and other
       interactive environments are the go-to tools for most data
       scientists. However, many production data pipeline platforms (e.g.
       Airflow, Kubernetes) drag them into non-interactive development
       paradigms. Hence, when moving to production, the data scientist's
       code needs to move from the interactive environment to a more
       traditional software environment (e.g. declaring workflows as
       Python classes). This process creates friction since the code needs
       to cross this gap every time the data scientist deploys their work.
       Data scientists often pair with software engineers to work on the
       conversion, but this is time-consuming and costly. It's also
       frustrating because it's just busy work.  We encountered this
       problem while working in the data space. Eduardo was a data
       scientist at Fidelity for a few years. He deployed ML models and
       always found it annoying and wasteful to port the code from his
       notebooks into a production framework like Airflow or Kubernetes.
       Ido worked as a consultant at AWS and constantly found that data
       science projects would allocate about 30% of their time to convert
       a notebook prototype into a production pipeline.  Interactive
       environments have historically been used for prototyping and are
       considered unsuitable for production; this is reasonable because,
       in our experience, most of the code developed interactively exists
       in a single file with little to no structure (e.g., a gigantic
       notebook). However, we believe it's possible to bring software
       engineering best practices and apply them to the interactive
       development world so data scientists can produce maintainable
       projects to streamline deployment.  Ploomber allows data scientists
       to quickly develop their code in modular pipelines rather than a
       giant single file. When developed this way, their code is suitable
       for deployment to production platforms; we currently support
       exporting to Kubernetes, AWS Batch, Airflow, Kubeflow, and SLURM
       with no code changes. Our integration with Jupyter/VSCode/PyCharm
       allows them to iteratively build these modular pipelines without
       moving away from the interactive environment. In addition,
       modularizing the work enables them to create more maintainable and
       testable projects. Our goal is ease of use, with minimal
       disturbance to the data scientist's existing workflows.  Users can
       install Ploomber with pip, open Jupyter/VSCode/PyCharm, and start
       building in minutes. We've made a significant effort to create a
       simple tool so people can get started quickly and learn the
       advanced features when they need them. Ploomber is available at
       https://github.com/ploomber/ploomber under the Apache 2.0 license.
       In addition, we are working on a cloud version to help enterprises
       operationalize models. We're still working on the pricing details,
       but if you'd like us to let you know when we open the private beta,
       you can sign up here: https://ploomber.io/cloud. However, the core
       of our offering is the open-source framework, and it will remain
       free.  We're thrilled to share Ploomber with you! If you're a data
       scientist who has experienced these endless cycles of porting your
       code for deployment, an ML engineer who helps data scientists
       deploy their work, or you have any feedback, please share your
       thoughts! We love chatting about this domain since exchanging ideas
       always sheds light on aspects we haven't considered before! You may
       also reach out to me at eduardo@ploomber.io.
        
       Author : edublancas
       Score  : 80 points
       Date   : 2022-02-03 17:12 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tracyhenry wrote:
       | Hey congrats on the launch! This is definitely a useful concept.
       | 
       | I haven't dug deep, but is code reviews possible? A big point of
       | the whole data-as-code movement is to enable easier review of the
       | data generation process, make abstractions and versioning. Being
       | able to generate pipelines from Jupyter notebooks sounds exciting
       | in theory, but I'd imagine code reviewing the generated pipelines
       | can be a pain.
        
         | edublancas wrote:
         | We allow users to open .py files as notebooks in Jupyter, so
         | you can get the best of both worlds: interactivity with Jupyter
         | and nice code reviews. jupytext does the heavy lifting for us
         | (it's a great package!) and we add some extra things to improve
         | the experience
         | 
         | More on the docs: https://docs.ploomber.io/en/latest/user-
         | guide/jupyter.html
        
       | ensemblehq wrote:
       | Congrats on the launch. I'm a MLOps consultant that helps
       | enterprises with productionizing their models on cloud platforms.
       | Previously, also a startup founder who iterated in the same space
       | and can probably exchange notes.
       | 
       | The problem is definitely a time-consuming and costly one and I'm
       | intrigued to play around with Ploomber. How does Ploomber handle
       | collaboration/code sharing across data scientists?
        
         | idomi wrote:
         | We took an approach of keeping the notebook interface/IDE, and
         | behind the scenes Ploomber converts it into .py files so you
         | can collaborate with teammates through Git. The users can still
         | open those as a notebook and interact with the files regularly.
        
         | edublancas wrote:
         | We allow people to write pipeline tasks in .py files but open
         | them as notebooks in Jupyter. So they keep the same workflow
         | they're used to but under the hood, they're writing .py files -
         | so they can do code reviews (jupytext handles the .py to .ipynb
         | conversion). Also, when executing the pipeline, ploomber
         | generates an output report for each script, so teams can use
         | this to review any outputs generated by the code.
         | 
         | Finally, since the pipeline is modularized, it's easier to
         | split the work. Some people may work in data cleaning, others
         | in feature engineering, and they can all orchestrate the
         | pipeline with "ploomber build".
         | 
         | You can read more about our approach in this guest blog post we
         | published a few months ago on the Jupyter blog:
         | https://blog.jupyter.org/ploomber-maintainable-and-collabora...
         | 
         | We'd love to hear from your experience! Please send me an email
         | to eduardo@ploomber.io
        
       | hoerzu wrote:
       | Really helpful keeping notebooks tidy :)
        
         | edublancas wrote:
         | Yes! We want to help people keep enjoying Jupyter and produce
         | tidy pipelines!
        
       | wizwit999 wrote:
       | I think this is a good idea. Decoupling seems like an interesting
       | approach. When I worked in this space as an engineer, bridging
       | the notebook - production-ization divide was annoying. I'd be
       | interested to see if this solves it.
        
         | edublancas wrote:
         | Thanks for your feedback! Do you have any stories to share? I'd
         | love to hear about your experience with the notebooks-
         | production gap
        
           | wizwit999 wrote:
           | Yeah a bunch, I worked at Amazon but I'm sure its similar
           | everywhere else. Basically, the scientists were way more
           | familiar with notebooks, and they'd code their models there,
           | but when we needed to deploy it, we needed a proper python
           | package that we could store in git, build, test, run in a
           | container, integrate with data engineering tools, and deploy
           | on some internal tools and AWS Sagemaker later. So we'd
           | usually end up converting it to a Python package once it was
           | ready, which worked OK, but you could tell the scientists
           | were more comfortable in notebooks.
           | 
           | Funnily, there were a bunch of internal MLOps type frameworks
           | there (at least 4) that tried to let the scientists deploy to
           | production w/o engineers, but they all failed or semi-failed.
           | I've heard Netflix made it work and I follow MLFlow so I'd be
           | curious what sticks here.
           | 
           | I don't work in the space anymore but it was an interesting
           | space, definitely could use more standardized tooling.
        
             | edublancas wrote:
             | That totally resonates with me! I spent 6 years working as
             | a data scientist and notebooks just make it a lot simpler
             | to explore and interact with the data, so I totally
             | understand my data science peers for sticking with
             | notebooks.
             | 
             | Having said that, the challenge now is to hit a sweetspot
             | between keeping the Jupyter interactive experience, and
             | providing some features to help data scientists develop
             | modular work. That's where most frameworks fail, so we want
             | to keep our eyes open, and get feedback from both
             | scientists and engineers to develop something that works
             | for everyone.
        
       | jiriro wrote:
       | The audio in the landing page video is hard to understand. Is
       | this only my broken speakers?
       | 
       | Also the video cannot be made fullscreen on my phone. Is this by
       | design?
        
         | idomi wrote:
         | Thanks for letting us know! So the audio is ok on laptops and
         | mobiles. On the video you're right, it's a bug we need to fix
         | since it's within an iframe you can't expand it into full
         | screen mode.
        
       ___________________________________________________________________
       (page generated 2022-02-03 23:00 UTC)