[HN Gopher] Shopify's Data Science and Engineering Foundations (... ___________________________________________________________________ Shopify's Data Science and Engineering Foundations (2020) Author : mooreds Score : 102 points Date : 2022-03-11 18:09 UTC (4 hours ago) (HTM) web link (shopify.engineering) (TXT) w3m dump (shopify.engineering) | kevinsundar wrote: | Having recently worked on a data team at FAANG, all this is an | ops nightmare for the team running the platform itself if you | want to ensure data quality for everyone querying the data. Im | talking when you have hundreds of data sources and hundreds of | query use cases. | | Anyone have any solutions you've tried? | atwebb wrote: | FAANG seems to be an outlier but, it sounds a lot like the | enterprise data mart strategy covered under a mix of stuff from | principle #1. | | If you want quality, you need structure and review. Accessible | data is helpful and needed to develop some of the mature | processes, but for most day to day analysis/reporting, no one | wants to create their own data model from scratch. | | Lots of FAANG doesn't apply to any other companies so it may | just be a case of having a wholly unique use case. Though I'm | surprised there isn't something already in place at this point | (of course having very little knowledge of the case). For the | dims/facts/marts, they tend to be business use case focused and | not source/data which can reduce the targets down significantly | since business use cases tend to repeat (or rhyme). | bushbaba wrote: | Checkout Apache Iceberg. Does a great job of handling many | readers few writers. With data consistency and query | consistency. | | It's a great approach for your data lake and data warehousing | needs. | faizshah wrote: | This timetravel/rollback feature is really interesting: | https://iceberg.apache.org/docs/latest/spark- | queries/#time-t... | xhevahir wrote: | I've read stuff before about Shopify's use of Nix. Since this | post doesn't mention Nix, I take it they don't use it in this | department of the company? | csears wrote: | It sounds like they have data science and data engineering in one | organization. Is that team structure something that others have | seen work well? | erulabs wrote: | One of the most interesting bits of devops work I've done was | when I was embedded with a data science team. Infrastructure | for data science is just so different than traditional ops - | but I feel like I was able to both help the team move more | quickly and also prevent them from spending all of the | companies money - so at least in that case, it worked quite | well. | | I've never understood why data science teams are typically so | far removed from "normal" engineering teams. Maybe it's the | DevOps kool-aide speaking, but in my opinion, teams should be | more horizontal than vertical! | cromd wrote: | I've been in orgs where it was on same team, and on different | teams, both as a modeler and a data engineer. So far, I | personally prefer when they're on the same team. | | Pros of same-team: fewer ideas "lost in translation" between | data scientists and data engineers, better understanding of | which datasets/flows are top priority, can sometimes share some | stack components and help datascientists improve their code, | better chances of getting data scientists to contribute their | own batch jobs (there's just more trust as opposed to dealing | with some "engineering" team that is less connected to you) | | Cons of same team: data engineers may not be as in-the-loop on | what's happening with production datasets, may not be as | tightly integrated with a devops team, may get overly caught up | in "business logic" as opposed to "plumbing". | quadrature wrote: | Data scientists are embedded in product teams and data platform | engineers are in a platform engineer org | thenipper wrote: | I work with operations research teams in a blended model of | engineering being embedded with the OR Scientists. I really | prefer it. Code can get to prod a lot quicker and we don't have | the "throw it over the fence to engineering" issues that can | arise. | mooreds wrote: | I liked how they took some of the essences of software | development (one set of tooling, DRY, re-use) and applied it to | the data science arena. | [deleted] | JHonaker wrote: | I started this expecting to be disappointed, but I really like | all of the principles they're describing. I've been pushing for | more of this attitude at my own company. ___________________________________________________________________ (page generated 2022-03-11 23:00 UTC)