[HN Gopher] Netflix's Metaflow: Reproducible machine learning pi... ___________________________________________________________________ Netflix's Metaflow: Reproducible machine learning pipelines Author : ChefboyOG Score : 158 points Date : 2020-12-21 17:20 UTC (5 hours ago) (HTM) web link (www.cortex.dev) (TXT) w3m dump (www.cortex.dev) | dustinhopkins wrote: | How does this compare against TensorFlow Extended (TFX)? | https://www.tensorflow.org/tfx | savin-goyal wrote: | Metaflow was built to assist in both developing ML models and | deploying/managing them in production. AFAIK, TFX is focused on | the deployment story of ML pipelines. | | https://docs.metaflow.org/introduction/what-is-metaflow#shou... | dustinhopkins wrote: | It's focused on building ML pipelines (similar to what Cortex | aims to be). In addition, it also conveniently supports | integration with Orchestrators like Airflow, Kubeflow, Beam | etc. This book "Building Machine Learning Pipelines: | Automating Model Life Cycles with TensorFlow" (https://www.am | azon.com/dp/1492053198/ref=cm_sw_em_r_mt_dp_Ig...) goes into | great details. | | I was curious to see what advantage Metaflow offered over | TFX. | the_duke wrote: | Setting up a decent, comprehensive, self-hosted (!) ML | environment is still extremely, frustratingly difficult. | | What I really want is a single solution, or a set of pluggable, | integrated components that offer: | | * training data and model storage (on top of a blob store like | S3, minio, ...) | | * interactive dev environments (Notebooks, dev containers, ...) | | * training (with history, comparisons, parameters, ...) with | experiments for parameter tuning | | * serving/deploying for production | | * a permission system so researchers and developers can only | access what they are supposed to | | * software heritage, probably via Docker images or Nix packages, | combined with source code references | | * (cherry on top: some kind of integrated labeling system and UI) | | Right now you have to cobble this together from different tools | that are all pretty suboptimal. | | The big players can set up sophisticated systems, but I'm curious | to hear how other startups are currently solving this. | solumos wrote: | Same - I'm a SWE embedded in a small (but growing) ML team. We | have all of the same problems. | | It seems that the "all-in" platforms are too "rigid", and all | of the point solutions for the things you mentioned aren't | proven enough. | thecellardoor wrote: | I think that by definition this is a tradeoff. Most times you | talk to data scientists that want a fully automated end-2-end | solution, that doesn't require they change anything about | their current workflow, and that any future modifications to | their workflow would be supported as well. | | That is magical thinking. I prefer best of breed solutions | that integrate nicely with other best of breed solutions | every day. That way if a tool doesn't suit you tomorrow, you | can relatively easily swap it out for something better | savin-goyal wrote: | Hi! Metaflow ships with a CloudFormation template for AWS that | automates the set-up of a blob store (S3), compute environment | (Batch), metadata tracking service (RDS), orchestrator (Step- | Functions) notebooks (Sagemaker) and all the necessary IAM | permissions to ensure data integrity. Using Metaflow, you can | then write your workflows in Python/R and Metaflow will take | care of managing your ML dev/prod lifecycle. | | https://github.com/Netflix/metaflow-tools/tree/master/aws/cl... | jamesblonde wrote: | You should checkout Hopsworks (disclaimer: work on it). It does | all of the above (including a Feature Store, notebooks as jobs, | airflow for ML pipelines, model serving (TensorFlowServing, | Flask, and soon KFserving), experiments, a project-based multi- | tenancy model that supports sensitive data on a shared cluster, | and a UI. It does not have a labelling system - but you can pip | install libraries. You don't need to learn Docker - each | project has a conda environment, and we compile Docker images | for projects transparently, so jobs are tied to Docker images, | but you don't need to write a Dockerfile (this is a huge win | for data scientists). You can run Python jobs (connect to a k8s | cluster) or Spark/Flink jobs (on Hopsworks itself). | | Open-source: | | * https://github.com/logicalclocks/hopsworks | | Managed platform on AWS/Azure (with elastic compute/storage, | integration with managed K8s, LDAP/AD): | | * https://hopsworks.ai | bfirsh wrote: | I think the solution to this is a bunch of pluggable tools that | integrate well. "AI Platforms" do everything, but they do each | thing not very well and force you into a particular way of | working. (There is a reason we don't use "software platforms" | any longer.) | | But unfortunately, as you say, most of the pluggable tools are | not very good and/or not mature enough. | | Here's our attempt at model storage, experiment tracking, and | software heritage: https://replicate.ai/ | | For interactive dev environments, Colab, Deepnote, and | Streamlit are all great. | | For deploying to production, Cortex mentioned in the post is | great. | | All are a work in progress, but I think we'll soon have a | really powerful ecosystem of tools. | jaz46 wrote: | There are lots of great platforms and tools in this space that | are trying to solve these problems. They all have their | tradeoffs and of course the list of needs/goals above is pretty | diverse. | | You are very likely going to be using a handful of tools that | cover to full gamut of needs. This is heavily discussed in the | blog post about an MLOps Canonical Stack and many of the tools | being suggested below are included. | | https://towardsdatascience.com/rise-of-the-canonical-stack-i... | edameme wrote: | I'd love to hear what you think about https://dagshub.com/. | We're building it with community collaboration in mind. It | doesn't cover all the bases you mention, but we do: * data and | model storage * experiment tracking * pipeline management * | access control * data, model, code, pipeline versioning | | We're also strictly based on Git and other Open Source formats | and tools so connecting with other tools you use like Colab for | IDEs or Jenkins/Kubeflow for training is super straightforward | (we have examples for some) | mmq wrote: | Hi @the_duke, | | disclaimer I am one of the authors of an open-source solution | (https://github.com/polyaxon/polyaxon) that specializes in the | experimentation and automation phase of the data-science | lifecycle. | | Our tool provides exactly the kind of abstraction you | mentioned: | | * Training, data operations, and interactive workspaces | (https://polyaxon.com/docs/experimentation/) | | * A scalable history and comparison table | (https://polyaxon.com/docs/management/runs- | dashboard/comparis...) | | * Currently pipelines and concurrency management is on the | commercial version (https://polyaxon.com/docs/automation/) but | several companies use Polyaxon with other tools like Kubeflow | (https://medium.com/mercari-engineering/continuous- | delivery-a...) or it can be used with MetaFlow for the | pipelines part. | | I would really like to hear your thoughts and feedback. | vtuulos wrote: | If you are curious about how Netflix uses Metaflow to power | behind-the-scenes machine learning, take a look at this recent | blog article https://netflixtechblog.com/supporting-content- | decision-make... | | Also I'm happy to answer any questions (I lead the Metaflow team | at Netflix). | itamarst wrote: | Hey, been meaning to reach out. | | There's a bit in the Metaflow docs that talks about choosing | resources, like RAM: "as a good measure, don't request more | resources than what your workflow actually needs. On the other | hand, never optimize resources prematurely." | | The problem is that for memory, too little means out-of-memory | crashes, so the tendency I've seen is to over-provision memory, | which ends up getting very expensive at scale. | | This choice between "my process crashes" and "I am incentivized | to make my process organizationally expensive" isn't ideal. Do | you have any ways you deal with this at Netflix, or have you | seen ways other Metaflow users deal with it? | | I have some ideas on how this could be made better (some | combination of being able to catch OOM situations | deterministically, memory profiling, and sizing RAM by input | size for repeating batch jobs), based in part on some tooling | I've been working on for memory profiling: | https://pythonspeed.com/fil, so would love to talk about it if | you're interested. | fwip wrote: | One way I deal with this problem in an alternative workflow | manager (Nextflow), is by calculating the memory requirement | for the ~95th percentile of a job, and submitting with a rule | "If this crashes from going OOM, re-submit with memory*N" (up | to some max number of retries/RAM). This lets most jobs sail | through with a relatively low amount of RAM, and the bigger | jobs end up taking a bit more time and resources. | | The better your estimator function, of course, the tighter | constraints you can use. | [deleted] | vtuulos wrote: | I'd love to hear more what you have in mind! Feel free to | drop by at our chat at | https://gitter.im/metaflow_org/community | | While it is true that auto-sizing resources is hard and the | easiest approach is to oversize @resources, the situation | isn't as bad as it sounds: | | 1) In Metaflow, @resource requests are specific to a | function/step, so you end up using resources only for a short | while typically. It would be expensive to keep big boxes | idling 24/7 but that's not necessary. | | 2) You can use spot instances to lower costs, sometimes | dramatically. | | 3) It is pretty easy to see the actual resource consumption | on any monitoring system, e.g. CloudWatch, so you can adjust | manually if needed. | | 4) A core value proposition of Metaflow is to make both | prototyping and production easy. While optimizing resource | consumption may be important for large-scale production | workloads, it is rarely the first concern when prototyping. | | In practice at Netflix, we start with overprovisioning and | then focus on optimizing only the workflows that mature to | serious production and end up being too expensive if left | unoptimized. It turns out that this is a small % of all | workflows. | merkleforest wrote: | It would be great if the infra layer can provide some help on | automated resource scaling, especially for RAM. The ML | solver/tooling layer has also been making progress on this | front, for example Dask for limited-RAM pandas, h2o.ai has | limited RAM solvers, xgboost has an external memory version, | pytorch/tensorflow models are mostly trained on SGD and only | needs to load data batch by batch. It's nice that Metaflow | can integrate with any python code and thus benefit from all | of the efforts made on the solver/tooling layer. | jamesblonde wrote: | The Spark community (LinkedIn) developed Dr Elephant to | profile jobs and provide suggestions for reducing memory/cpu | consumption. Metaflow would need something similar: | | https://github.com/linkedin/dr-elephant | itamarst wrote: | I've seen that, yeah. I've already implemented a memory | profiler for Python batch jobs | (https://pythonspeed.com/fil), but starting to think about | how to integrate it into specific pipeline frameworks. | troelsSteegin wrote: | What is Metaflow's explicit support for Transfer Learning | tasks? In other words, how do I know what models to use or not | use? I am surmising from the techblog post that there is a | stable set of content-intrinsic features, and that can be | separated from perhaps more dynamic features-sets that | characterize audiences, presentation treatment, and viewing (as | conditioned on all the other stuff). But it sounds like there | is a stable set of features for prediction tasks, as well, | which is to say that for a task like predicting an audience for | movie X in region Y, you'll need some set of features, and that | we have some set of trained models (and recommend analytic | components) available that match some or all of those features | for this task. Is that a "thing", or is the workflow support | simpler than that, and should that be a "thing"? | vtuulos wrote: | Good question! What you are asking is pretty much the core | question for a certain set of ML tasks at Netflix. | | Metaflow is rather unopinionated about those types of | questions, since they are subject to active research and | experimentation. Metaflow aims to make it easy to conduct the | research and experiments but it is up to the data scientist | to choose the right modeling approach, features etc. | | In some cases, individual teams have built a thin layer of | tooling on top of Metaflow to support specific problems they | care about. I could imagine such a layer for specific | instances of transfer learning, for instance. | | In general, we are actively thinking if/how Metaflow could | support feature sharing in general. It is a tough nut to | crack. | jsinai wrote: | Hi Ville, thanks for coming on here to answer questions. I see | that Metaflow has been made compatible with R now. Are there | any plans to do the same with Julia? | vtuulos wrote: | No plans but it should be possible technically. We'd need | help from the Julia community :) | spicyramen wrote: | Is there a comparison with TFX, KubeFlow pipelines and AirFlow? | smurf_t wrote: | What are the main advantages compared to Airflow. We use Airflow | to orchestrate ML jobs/tasks, and I found it to be more flexible | comparing other tools we tested. | vtuulos wrote: | Metaflow is largely complementary to a job scheduler like | Airflow. Technically you could export Metaflow workflows to | Airflow, although the specific integration doesn't exist yet. | For more details, see this blog article | https://netflixtechblog.com/unbundling-data-science-workflow... | pknerd wrote: | I am trying to figure out Kubflow. Surprisingly I found this one | easier to write. Haven't run it or used it yet. | savin-goyal wrote: | Take a look at metaflow.org/sandbox if you want to test drive | Metaflow. | manojlds wrote: | It takes me into a verification and waiting flow. Useless. | savin-goyal wrote: | Give it a few minutes :) | toxik wrote: | Edit: this is a somewhat OT rant. | | Netflix's recommender system is hands down the worst I have ever | seen. Every single thing I watch, it suggests The Queen's Gambit | and two other random Netflix productions. | | Even if I watch the first of a trilogy (LotR, for example). How | can they be so terrible at this? | | The categories in the main browsing view are also hysterically | arbitrary. It kind of looks like a topic model with back- | constructed titles for the topics. | | Finally, they replaced the ratings with "% matching". I guess so | they can recommend their subpar productions even if they get low | ratings. | emidln wrote: | Their recommender system likely has multiple inputs that are | under specific constraints and weightings. How well their | recommendation algorithms work will probably never be known by | the general public as we are force-fed a steady diet of | whitelisted staff picks and promotional items. | typon wrote: | So what's the point of making tools like Metaflow? Justifying | salaries for their engineers I suppose? | hateful wrote: | I really think that if they gave up all the complicated | algorithms and went with a simple algorithm out of the 90s we'd | be much happier with the recommendations. | alisonkisk wrote: | Do you have such an algorithm? | | Netflix doesn't make money by showing people stuff they don't | like. | toxik wrote: | Not for a lack of trying though... | Hard_Space wrote: | I long ago ceased to believe that the Netflix recommender | system serves any other purpose than to fulfill the company's | internal obligations to push favored content, depending on what | it cost. Sadly, the same is now true for Amazon Prime, which is | an even hotter mess. | borroka wrote: | That's exactly what's happening. Source: you can imagine. | umvi wrote: | I turned on "Super Wings" for my kid to watch on Prime Video | which at first glance seemed to be a fairly decent Paw Patrol | knock off, but then as I listened to the episodes in the | background, I realized that the entire show is basically an | advertisement for Amazon Prime in disguise. Seriously look it | up, the entire premise of the show is people ordering | packages and the "Super Wings" delivering the packages to the | consumer as quickly as possible... . | alisonkisk wrote: | Seems like you got what you expected... Super Wings is the | corporate side of the Government-Corporate complex. | dessant wrote: | Netflix often promotes new content by overzealously | recommending it, the constant suggestion you see for The | Queen's Gambit is probably an ad. | [deleted] | dddbbb wrote: | Maybe it's different in other countries (I am in the UK), but I | feel as if there isn't enough content for a recommender system | to even be useful. I feel like after browsing through the | catalogue a few times, I have a rough idea of most things I | would ever possibly be interested in. There's just not that | much there. Either that, or the recommender system is working | too well and I never see anything beyond what Netflix wants me | to. | visarga wrote: | Not just the recommender, but the design is horrible for me. I | can't rest my mouse anywhere, it auto-starts something or | enlarges something, it's all too twitchy. When you're watching | an episode, there's no navigation link from the play screen to | the main page of the series. As if they don't want us to | navigate the site, instead be led on their happy path. | kyawzazaw wrote: | I think so. This way, you don't easily arrive to the | conclusion the items I want to look for all don't exist here. | ska wrote: | > recommender system is hands down the worst I have ever seen. | | In my limited experience, they are all dismal and best ignored | entirely. | | When it was easier to see something plausibly like a real user | review, that helped I guess. | AnssiH wrote: | I see this complaint about poor recommendations very often. | | But the recommendations seem to work perfectly for me - I | wonder whether that is the case for the silent majority? | | The match % is usually spot on for me, and I've never seen it | recommend any titles I've given a thumbs-down to. | ed25519FUUU wrote: | Do you have a counter example of a streaming service that does | recommendations better? | DoofusOfDeath wrote: | Back when I used Netflix primarily for DVDs, the recommender | system worked pretty well for me. | | Much later, when they switched to simple thumbs up/down, the | recommender system was entirely useless to me. (Not merely | because of the dumbed-down rating system; the recommendations | were genuinely bad.) | | For the time in between, I'm not sure if the degradation was | gradual, sporadic, or not degraded at all. | ramraj07 wrote: | If you add dvd subscriptions you can still get the original | Netflix recommendations back, including sorting by top | predicted rating (which to me is eerily amazing). I used to | keep the dvd subscription mainly for the recommender, with | the delivered blurays considered an extra bonus for very | rare movies you couldn't stream even if you wanted to pay. | treis wrote: | It's pretty clear that their recommendation system and | broader UX is designed, at least in part, to obfuscate how | much content they have and how good it is. Back when it was | DVDs and they had basically everything it was more about | finding the next best thing for you. Now it's finding the | next best thing they have and, preferably, something they | own the rights to. | Reubend wrote: | The MovieLens recommender system works pretty well. | qw3rty01 wrote: | Not a streaming service, but a lot of movie and TV databases | tend to have decent recommendations for movies and shows | similar to the one you're viewing. Although in this case, one | could argue that the processing is offloaded to users who | provide the recommendations. | taeric wrote: | I read this as a condemnation of recommender systems in | general. | | That or a ceding of recommendations to marketing. That feels | too cynical, but more accurate. | jointpdf wrote: | It's not a home run for me, but I think Spotify | recommendations are quite good. They clearly use some form of | content based recommendation (extracting features from the | music itself) blended with other methods. It seems to make an | honest attempt at serendipity (songs/artists you may like but | would otherwise be unlikely to discover). | | I still think recommender engines should always enable some | form of user tuning. If it doesn't, then the recommender is a | tool for services to control your behavior rather than the | other way around. | octostone wrote: | Unless I'm missing this functionality somewhere (entirely | possible), this lack of user-tuning has ruined Spotify recs | for me, since I listen to entirely different playlists when | working or meditating. I don't check out recommendations to | get the latest binaural beats or nature sounds, you know? | | Although even before I started listening to Spotify while | working etc, it seemed to have run out of things to | recommend. My weekly discover playlist would be half things | I'd already liked. So...who knows. But I miss the discovery | functionality quite a bit. | jointpdf wrote: | I personally have a similar problem with the Spotify | generated playlists, but I have persnickety preferences | in electronic music so it often whiffs. But on the | spectrum of recommendation engines it is on the side of | honest effort (whereas Netflix is not). And for most | users and musical palates I think it works great. Just | yesterday my mom complimented me on my Christmas music DJ | skills, but it was just the generated continuation of her | own playlist. | | Nothing really beats the recommendations of a human | curator with exquisite taste, and just listening to new | music nonstop and plucking out the gems as you go. | nwsm wrote: | I like Spotify's recommendations and have found lots of | great artists from it. But now I feel I'm in a kind of | Spotify-created rut of listening to static groups of | artists. I have mostly stopped listening to my daily mixes | because of this. | | It's also easy to theorize about "big media" controlling | what I listen to, so I still feel the need to do my own | exploring, even when I'm getting recommended good fresh | stuff. | xiphias2 wrote: | Recommendations maybe not, but I love that in HBO I can sort | by IMDB score to find great movies/series that I haven't | watched yet. I just watched Broadwalk Empire as an example, | and loved it. | | Netflix doesn't show IMDB scores, so I have to check it | always independently...it sucks. | hogFeast wrote: | Spotify. Discover Weekly is very good, particularly | considering they have only one chance to recommend with 30-40 | songs, and quite a large universe to match on. | | Not a streaming service but Google's Discover news is also | very good (probably the best recommendations I have come | across). | toxik wrote: | The basic problem is understanding WHY I liked something. If | I watch Tintin because it's a cozy throwback to my childhood, | that does not mean I would like every single Studio Ghibli | movie in my recommendations. | | Similarly, if I play Blacklist in the background as basically | noise, I don't want to see a bunch of related shows. I guess | I could give it a thumbs down but I only do that for actually | terrible movies. | | Also Spotify and Apple music seem to have okay | recommendations | kyawzazaw wrote: | Music is different from movies and television though. There | are beats and rhythms that are easy to identify, lyrics | easy to analyze and artists that are roughly categorized. | toxik wrote: | I feel like that's not true, music is hard to analyze. | Movies have synopses, and are generally categorized by | their cast alone. I could figure out LotR belongs | together from these datapoints, easily. Not to mention | the fact that people watch trilogies in sequence like 99% | of the time. | drstewart wrote: | >Finally, they replaced the ratings with "% matching". I guess | so they can recommend their subpar productions even if they get | low ratings. | | That's always what ratings were. People didn't understand that | (as you can see), so they changed it to make it more | transparent. | | https://www.businessinsider.com/why-netflix-replaced-its-5-s... | | >Netflix's star ratings were personalized, and had been from | the start. That means when you saw a movie on Netflix rated 4 | stars, that didn't mean the average of all ratings was 4 stars. | Instead, it meant that Netflix thought you'd rate the movie 4 | stars, based on your habits (and other people's ratings). But | many people didn't get that. | alisonkisk wrote: | It's not just that (obviously, since It's not even possible | for me to rate a movie more granular than thumbs up/down) | | What happened is that the notion changed from "predict a | scalar rating" to "predict a binary satisfaction." | | As parent poster noted, the effect of this is to push "3 | star" and "4 star" acceptable shows to the user, instead of | "5 star" great (in the user's view) shows. | | Also, in Netflix's defense, users are horribly inconsistent | in their expressed ratings (they'll rate a movie based on | their personal mood at the time, and they'll binge shows they | claim aren't 5 stars while ignoring their 5 star movies) | mikeyjk wrote: | This was a sad turning point for me. We used to have a | single streaming platform with an awesome library, a | granular review system, and user reviews. You could easily | take a quick look to read other users thoughts on a film. | Now I have to lookup metacritic / reviews myself, and the | NF recommendation is based on whether I've said something | is 'palatable enough to watch, isn't terrible, but I'd | never watch it again' (thumbs up). I've taken to only | thumbs upping stuff that I particularly like to see if | that's any better. It all seems to be the same from an | uninformed end user perspective. | | I remember they stated the reviews would still be available | in some form for export. | | It's no longer the Netflix of old imo. | derivagral wrote: | I've wondered if the lack of this stuff is due to | business contracts or internal product goals. Not having | e.g. IMDB makes sense since it is owned by a competitor | (Amazon; whether that's a USA Trust thing, who knows) | nerdponx wrote: | _Also, in Netflix 's defense, users are horribly | inconsistent in their expressed ratings (they'll rate a | movie based on their personal mood at the time, and they'll | binge shows they claim aren't 5 stars while ignoring their | 5 star movies)_ | | This is exactly why they switched from 5 stars to | up/down/blank. | Peteris wrote: | If you are feeling overwhelmed with yet another machine learning | pipeline automation framework, you should check out Kedro | (https://github.com/quantumblacklabs/kedro). | | Kedro has the simplest, leanest, functional-programming inspired | pipeline definition and also spits out AirFlow and other formats | readily + comes in with an integrated visualisation framework | which is stunning & effective. | sysprogs wrote: | This looks similar to Google's Mediapipe [0] | | [0] https://github.com/google/mediapipe ___________________________________________________________________ (page generated 2020-12-21 23:00 UTC)