[HN Gopher] Machine learning saves us $1.7M a year on document p... ___________________________________________________________________ Machine learning saves us $1.7M a year on document previews Author : wsuen Score : 67 points Date : 2021-01-27 20:10 UTC (2 hours ago) (HTM) web link (dropbox.tech) (TXT) w3m dump (dropbox.tech) | mehrdada wrote: | The technical details are interesting but the emphasis on "1.7M | savings" screams misdirection of resources, considering the | salaries of SWEs/ML engineers and more importantly opportunity | cost of them to deploy to an optimization task. | jeffbee wrote: | It does sound like it would barely have broken even when | considering the opportunity cost of the highly-compensated | developers who had to write it, which they ignore in the | article. It goes against the "rules of thumb" i learned at | Google, which suggest that an engineer would break even if they | saved [redacted but huge number] CPUs per year, and should only | choose problems that promise to save 10x that or better. | bagels wrote: | Was totally going to point this out as well. It's entirely | possible that their ML team cost nearly this much to begin | with. | laluser wrote: | We're looking at this from a very narrow lens. A few of these | wins a year in a small team will end up paying for itself | fairly quickly. | piyh wrote: | Let's say X engineers making 200k a year work on this. This is | a 5x return on your money in 5 years if it took 8 people | working the full year to complete it. Sounds like a solid | business case to me. | dheera wrote: | You also need to subtract the value those 8 people could | create if they worked on something else. | Closi wrote: | You don't - If you do that you are double counting the cost | of the engineers (If they were working on something else, | you wouldn't account their salary against this | optimisation). | arthurcolle wrote: | This is unknowable | X6S1x6Okd1st wrote: | That makes sense if all engineers are fungable, and you are | never limited by engineering capacity. | wsuen wrote: | Hi mehrdada. In the article, we discuss how to evaluate | tradeoffs of ML projects. One of these tradeoffs is cost of | development and deployment vs. cost of not developing a | solution. In our particular case, the tradeoff made sense. | RcouF1uZ4gsC wrote: | > We used the "percentage-rejected" metric minus the false | negatives to ballpark the $1.7 million total annual savings. | | I think this may too sanguine for the false negatives in that it | ignore latency sensitivity. Generally, batch processing (like | preview generation during pre-warming) is cheaper than latency | sensitive processing (like preview generation when the user is | waiting for it). If you don't take that into account, you can be | misled by your cost metrics. | joosters wrote: | Is the cost saving really measuring the right thing? Instead of | comparing against the cost of pre-generating and caching every | file preview, shouldn't they be comparing against he cost of | adding enough infrastructure (or just optimising their preview | code) to make on-the-fly preview generation acceptably fast? | mushufasa wrote: | FYI there's also the off-the-shelf https://filepreviews.io for | this | appleflaxen wrote: | cool product, but the blog hasn't seen a post since 2017. is it | active? | wsuen wrote: | Hi folks, author here. I am very excited to share this post about | how we use machine learning at Dropbox to optimize our document | preview generation process. Enjoy! | | I'll be online for the next hour or so; happy to answer any | questions about ML at Dropbox. | wsuen wrote: | I've gotta run, I'll take a look later if other questions come | in! | setib wrote: | I work in an innovation ML-oriented lab, and we have a hard | time identifying use cases with real added values. | | So I wondered: who had the initiative to use ML in riviera, the | riviera team or the ML team? How do you collabore between the | two teams/worlds (production team and data science team)? | wsuen wrote: | Hi setib, great question. The original idea to use heuristics | for preview cost reduction came out of a Hack Week project. | This led to an initial brainstorm meeting between the ML team | and the Previews team about what this might look like as a | full-fledged ML product. | | From the beginning the ML team's focus was on providing | measurable impact to our Previews stakeholders. One thing | that helped us collaborate effectively was being transparent | about the process and unknowns of ML (which are different | from the constraints of non-ML software engineering). We | openly shared our process and results, including experimental | outcomes that did not work as well as planned and that we did | not roll into production. We also worked closely with | Previews to define rollout, monitoring, and maintenance | processes that would reduce ops load on their side and | provide clear escalation paths should something unexpected | happen. Consistent and clear communication helps build trust. | | On their side, the Previews team has been an amazing ML | partner, and it was a joy to work with them. | andy99 wrote: | I'm curious to know the answer to this question as well. I | have done a fair bit of work working with organizations to | identify ML use cases. When we looked at it from a business | process perspective, honestly it didn't go very well. Trying | to find company process specific interventions, especially in | the format of building a funnel to priortize which to move | forward, rarely surfaces unique or game changing ideas. We | usually ended up generating a list of things where either ML | played a minimal role, something more simple would have been | better, or you'd need AGI. | | What I've seen work better is a product approach, where ML is | incorporated as a feature (rarely but possible the | centerpiece) of a full solution for an industry that provides | a new way of doing something and the value that comes with | it. The caveat is that this is hard and takes up front R&D | and product market fit research that any product would. It | doesn't happen in a series of workshops with representatives | from the business. | | This Dropbox story is an obvious counterexample, and really | looks like the mythical "low hanging fruit" that we always | want to identify in ideation workshops. But I'd be careful | trying to generalize a process for identifying ML use cases | from it. | Jugurtha wrote: | @andy99, @setib: we're a boutique that helps large | organization in different sectors and industries with | machine learning. Energy, banking, telcos, retail, | transportation, etc. These organizations have different | maturity levels and their functions expect different | deliverables. | | The organizations range on the maturity level from "We want | to use AI, can you help us?" to "We have an internal | machine learning and data science team that's overbooked, | can you help?" to "We have an internal team, but you worked | on [domain] for one of your project and we'd like your | expertise". | | For the expectations, you can deal with an extremely | technical team that tells you: I want something that spits | JSON. I'll send your service this payload and I expect this | payload. So that's a tiny part. | | Sometimes, you have to build everything: data acquisition, | develop and train models, make a web application for their | domain experts with all the bells and whistles, admin, | roles, data management, etc. I wrote about some of the | problems we hit here[0]. | | The point is that, finding these problems is an effort that | requires a certain skill/process and goodwill from the | clients. We worked on a variety of problems. | | - [0]: https://news.ycombinator.com/item?id=25871632 | mlthoughts2018 wrote: | I've worked on ML across several large e-commerce firms, | and two patterns I have seen along the lines of your | comment: | | 1. many organizations dismiss ML solutions without actually | trying them. Rather, if one hack week style prototype | doesn't work on the first try, it's chalked up to "over | hyped AI" and never sees the light of day. Organizations | that succeed with ML don't do it that way. Instead they | ensure the success criteria are stated up front and | measured throughout, so you can see _why_ it didn't work | and iterate for v2 and v3. "We spent a bunch of R &D | expense on magic bullet v0 and it didn't succeed | immediately" is a leadership and culture problem - you | probably can't succeed with ML until you fix that. | | 2. Many companies have no idea how to staff and support ML | teams, and go through various cycles of either taking | statistical researchers and bogging them down with devops | or taking pure backend engineers and letting them do | unprofessional hackery with no clarifying product quality | ML expert in the loop. | | You need a foundation of ML operations / infra support that | cleanly separates the responsibilities of devops away from | the responsibilities of model research, and you must invest | in clear data platform tools that facilitate getting data | to these teams. | | If an org just figures they can throw an ML team sink or | swim into an existing devops environment or they can | require an ML team to sort out their own data access, it's | setting ML up for disaster - and again you'll get a lot of | cynics rushing to say it's failing because ML is just hype, | when actually it's failing due to poor leadership, poor | team structure and poor resourcing. | junippor wrote: | Haven't read the doc. But is it just me or does that seem tiny, | considering how large dropbox is? | heipei wrote: | It does seem tiny, and my first thought was "how many dollars | did they burn to save those $1.7M", but that was one of the | first things they evaluated, and both the research phase and | operational burden of running the service seem to be relatively | small so that the investment definitely paid off. It's great | that they're talking real numbers, loved the post in general! ___________________________________________________________________ (page generated 2021-01-27 23:00 UTC)