[HN Gopher] Good data scientist, bad data scientist ___________________________________________________________________ Good data scientist, bad data scientist Author : ian-whitestone Score : 109 points Date : 2021-05-11 16:36 UTC (6 hours ago) (HTM) web link (ianwhitestone.work) (TXT) w3m dump (ianwhitestone.work) | vinay_ys wrote: | Good data scientist described here seems to have unrealistic | expectations at super human level of know-it-all/do-it-all. | | I think there are more well-established job architectures like | business intelligence analyst, data engineering, user experience | designers, product manager, software engineer etc - these roles | in combination serve to do a lot of what is described here as | data scientist. These roles are easier to hire, have well defined | career paths and good ways to get job satisfaction and can scale | well as the business-problem-space/orgs grows. | | I think the scientist label should be reserved for those who | actually do the scientific mathematical research - specialists | who have done deep research in specific areas. | | For applying pre-existing sciences to solve practical business | domain problems, we need lots of engineers, analysts and managers | etc who are all trained with AI-first software development | practices and just a few specialist data scientists. | gyulai wrote: | > Good data scientist described here seems to have unrealistic | expectations at super human level of know-it-all/do-it-all. | | Hmm. Know-it-all/do-it-all is a useful standard to strive for, | though, even when, in practice, one will often fall short in | one area or another. | | One of my personal frustrations is that I have invested heavily | in trying to be well-rounded and it doesn't quite pay dividends | because of how often I find myself confronted with prejudice of | the form "because he's good at X, that probably means he's bad | at everything else". For example, if the first impression I | leave on someone is that I'm good at math, they'll often jump | to the conclusion "because he's good at math, that probably | means he's bad at databases". If the first impression I leave | is that I know a lot about finance & economics, they'll assume | "because he knows a lot about finance & economics, that | probably means he can't do projects in a technical domain" and | so forth. | [deleted] | monkeybutton wrote: | Agreed. The second point about pipelines stuck out to me: | | > [Good DS] will often build these pipelines themselves. Bad DS | thinks it is someone else's job. | | In a small environment, sure, do the job so it gets done! But | in larger more corporate settings the 'cowboy' approach to | pipeline building is not sustainable or even feasible. Am I a | bad DS because I can't provision VMs, open firewalls, replicate | production DBs and build hooks in other teams' services to | expose data? No, its not my job. A good DS collaborates with | other teams and sysadmins to build a pipeline that is | maintainable and monitorable, and doesn't do it all themselves. | commandlinefan wrote: | > seems to have unrealistic expectations | | Well, the expectations aren't unrealistic - if you were to | grant the "good data scientist" a reasonable amount of time | rather than demand that everything be done by this afternoon, | which is what most "real data scientists" are up against. | klmadfejno wrote: | > Good DS starts simple, ships, and then iterates. Bad DS starts | with the most advanced technique they know. | | > Good DS is constantly learning & evolving their toolbox. Bad DS | stagnates and sticks with what they know. | | These are the big ones imo. But not super obvious. As a junior | data scientist I never needed to use anything but regularized | linear models and decision trees. Maybe a random forest but the | explainability usually wasn't worth it. | | Recent explainability tools like SHAP have changed this somewhat. | But for the most part I think its still ok for the average data | scientist to be regularized linear models, decision trees, and | then occasionally, idk, a LightGBM or Catboost + SHAP for | explainability. A lot of people still don't know about these, and | it's now a decent test for whether people are really trying to | stay up to date. | | But if they're not, I don't really care. | ska wrote: | You can't model your way out of poor data. | | It's a near certainty that good data + basic modelling delivers | the overwhelming majority of real value, globally. | beckingz wrote: | Turns out the right data makes logistic regression go a long | long way. | joncp wrote: | Great list. The rules apply to knowledge work in general. | tmule wrote: | I liked the article, but realize that in a decade of work in | Tech, I haven't meet a good data scientist! | | I'll also add: a good data scientist knows his/her strengths, and | doesn't try to become a unicorn. | sgt101 wrote: | >Good DS thinks from first principles. Bad DS accepts everything | they have heard or seen as the ground truth, or the best way to | do something. | | Domain knowledge - and the humble attitude that can get | stakeholders to give it to you is fundamental to understanding | data and how models will be interpreted and used. There is not | enough "listen to others" in this list (although I read the | "listen to customers" at the end). Listening... listening listen! | waserwill wrote: | This reminds me about a time when some geneticists tried to | find genes associated with a particular disease, to try to | unravel why it occurs. Complex trait, no single answer, so they | genotyped thousands of people with and without the disease, and | ran the stats. And... nothing. | | What has one common name is actually several similar diseases, | and the geneticists would have known that if they paid | attention to the clinicians. Listening and incorporating | knowledge is key. | | [I'm thinking of an early glaucoma GWAS, IIRC, though there are | similar cases.] | evandijk70 wrote: | I think this story is very, very common. Still, some complex | diseases (eg. Cystic fibrosis, Down syndrome) do turn out to | be simple on a genetic level, so there is some merit to this | approach. | | Moreover, there is currently no better way to understand | diseases genotyping thousands of people with and without the | disease and 'running the stats', so it's worth the try | _fullpint wrote: | Oh man! Domain knowledge is absolutely HUGE. I cannot even | begin to tell you how much I've had to dive into literature on | topics well outside of my domain to begin to understand how to | use my outside perspective to come up with solutions. | | Respecting stakeholders, and being able to be humble about | asking for help understanding the domain is paramount. | noodlenotes wrote: | I would say that a good data scientist can quickly estimate | where their time is best spent, either accepting what someone | else has told them as-is or investigating themselves from the | ground up. There's _always_ more to investigate so using your | time efficiently is one of the most important DS skills. Like | solving a multi-armed bandit problem. | dudeman13 wrote: | Sounds like something that is a function of your domain | knowledge and your data science skills will have very little | to do with it | antipaul wrote: | If there is a lot to build, like data pipelines or software apps, | as opposed to just "analyze", I think it helps to add a word for | the discipline of "engineering", eg software, data, backend | engineering. | | The role mismatch between data and other engineers, vs actual | (data) scientists, makes it difficult for decision makers to | figure out which one they need | | References | | https://www.oreilly.com/content/why-a-data-scientist-is-not-... | | https://medium.com/airbnb-engineering | analog31 wrote: | "A human being should be able to change a diaper, plan an | invasion, butcher a hog, conn a ship, design a building, write a | sonnet, balance accounts, build a wall, set a bone, comfort the | dying, take orders, give orders, cooperate, act alone, solve | equations, analyze a new problem, pitch manure, program a | computer, cook a tasty meal, fight efficiently, die gallantly. | Specialization is for insects." | | -- Robert Heinlein | sgt101 wrote: | Data scientists take data assets that were not designed to be | used for a particular task and set them to be used systematically | and with integrity for that task. It's something that comes from | having lots of data in enterprises which can be exploited to | create value, but can also be used to make very bad decisions and | confuse the hell out of everyone. Using data and using data well | are two very different things. | albertTJames wrote: | I feel this extends to other field. Its basically describing two | of the big five personality traits conscientiousness and | openness. | linspace wrote: | I think there is this false stereotype of the DS obsessed with | cool techniques and detached from the business. Most DS want | their work to have impact, actually like most people. But | successfully applying data science is hard. We have incredibly | mature tech for other problems, like for example databases, a | marvel of engineering, and in comparison DS is a kludge. The | value DS provides per $ is much lower although is considered a | competitive advantage (DBs are a commodity) and I think this is | one of the reasons it feeds this stereotype. | t8e56vd4ih wrote: | most data scientist are just jupyter notebook and sklearn cowboys | who know a lot of the buzzwords but lack even basic statistical | understanding. | | and I've met a lot of data scientists. | gyulai wrote: | I agree with most of what he's saying but reading the first | sentence almost stopped me in my tracks when I got to "obsessed". | I wonder when exactly it was that "obsessed about this" and | "obsessed about that" became a _good_ thing. ...it 's thrown | around way too much these days, and I for one think that being | obsessed with anything, regardless of how positive a thing it is, | always speaks to a psychology that is defective in some way or | another. | autokad wrote: | I guess you can't work for Amazon then. | | You'll never get passed the Customer Obsession LP | [deleted] | lhnz wrote: | "excited by" | SuoDuanDao wrote: | An interesting description of obsession I've come across is | that it's what happens when the will is frustrated. So maybe | temporary obsession can be a good thing, if it's a sign | someone's chosen a task so difficult that they need to expand | effort to overcome a significant hurdle. | concreteblock wrote: | Doesn't it just mean that the meaning of the word has changed? | xapata wrote: | Is changing, not has changed. If it already had, no one would | remark on it. | gyulai wrote: | > Doesn't it just mean that the meaning of the word has | changed? | | ...I do feel a bit bad amount mentioning it, because it's | pretty tangential to what the article is actually about. That | said: Changes in meanings of words often go hand-in-hand with | broad-based changes in the way people _think_ about | something, and it 's useful to reflect on whether or not one | wants to go along with that thinking. | | There is even a bit of a clichee anyway around sciency- | engineeringy folk falling within the "obsessive" range of the | personality spectrum in the very original sense of the word | where it might be something that a psychotherapist might work | on to try and rectify. So when I see it in this particular | sphere being attached to a positive value judgment and even | with slightly prescriptivist overtones, then it's something | that to me really "pops" and it's been happening to me more | and more lately. | ska wrote: | "focused on" is probably better terminology. | ian-whitestone wrote: | Obsessed may have been overkill :) | ubitaco wrote: | > Good DS understands the basics of web technology | | I'm not a data scientist but a portion of my job is creating | pipelines, data analytics and such. I also only have a bare | minimum knowledge of web technology. Why is knowledge of web | technology part of being a good Data Scientist? Or is this point | oriented specifically for data scientists working in web based | companies? | | Genuinely curious. I could imagine myself working as a DS in the | future and that's why I found this article interesting. | antipaul wrote: | Why web technologies? You may have to build a web app to | display some data or results. | | But like some top comments say, data science is super broad and | it just depends on your team. | | Mature orgs and teams have a clear idea what their focus area | is, while others don't have a cogent conception of what | constitutes "data science" | jefb wrote: | I don't think there is a single correct answer here, but I'll | offer a few insights from personal experience. | | Firstly, valuable data tends to live in places accessible via | web technology. Maybe you need to fetch a bunch of XML files | from an FTP site? Having a clear understanding of all the | nuances you're about to encounter will set you up for success. | | Secondly, valuable data tends to be generated by web technology | itself. Understanding that lifecycle can inform analytical | strategy. | | Finally, some data scientists add value by informing decision | makers. One of the most powerful things you can do for them is | give them a mobile friendly secure web experience that puts the | data they need directly at their finger tips. While yes, | Tableau et al. are an option here, you'll be ahead of your | peers by knowing how to DIY it when it counts. | jll29 wrote: | A data scientist is someone that people wish was a unicorn but | that is neither that nor a scientist, despite the name. | | People who are _actual_ scientists usually in industry go by the | name "scientist" or "research scientist", although they just data | just as much. You can recognize them by the peer reviewed | scientific papers they publish, often preceded by filed patent | applications, as their work is novel. A real scientist wonders | why some people call themselves "data" scientists, because | science has always been about data, modeling and measurement. | | But back to our "data scientist": | | On a good day, she is generating value from the company's data to | increase customer retention. | | On a bad day, she is just doing the ETL prep work so the boss' | other assistant can make that spreadsheet that aggregates the | data that the boss' PPT slides will show. | tmule wrote: | Many (most) scientists are also not everything they're made out | to be. Medicine, for example, has had a real replication | crisis. It's important to distinguish between Science and | scientists. Finally ...if you're running regressions, it's | better to get paid 300K than 130K. | borroka wrote: | This sentiment is quite popular among those who would like to | have the same popularity that data scientists currently (well, | more a few years ago, since there are many more critical voices | now) have, but they don't. | | Data science is a generic name. There are DS like me who have | been "actual scientists" and others who until yesterday were | working on dashboards and Excels files with 100 tabs open and | pivot tables as far as the eye can see. Whatever, it is a name. | What about "engineers"? It is a title with no legal value, | people in the US can call themselves software engineers, but in | many other countries, they could not. And who is a writer? | Somebody making a living out of writing, somebody who has been | published even if they got zero money for it and the magazine | editor was their cousin, or else? | | People in my team do causal modeling, use reinforcement | learning for network configuration, NLP for chatboxes, computer | vision for face ID, and (again) network configuration. They are | all called data scientists. Thinking that what people who have | the title "Data Scientist" do is "generating value via | increased consumer retention" or "ETL for Excel files for the | boss" is between misinformed and laughable, but mostly | laughable. The world is much bigger than that. | | Then, I agree that "learning from data" as a specialty has been | over-hyped, and most companies do not have the maturity to take | advantage of ML prediction, causal and statistical modeling, | etc., but that's the nature of the world: one can take | advantage of it or being bitter about it. I took advantage of | the hype and I am fine, happy, and with no regrets. If tomorrow | someone would propose to use for the same job the title "Data | Monk" and it paid more, were more visible, and led to more | career opportunities, I would grab it as quickly as I would | grab 100 dollars floating in and out of the sidewalk. | didibus wrote: | What would be the difference in role between a data scientist and | a product manager in this case? | minimaxir wrote: | Data Scientists can provide PMs with data and analysis to make | better-informed product decisions. Then you can get into more | detail, such as DS building tooling/dashboards/models for | PMs/stakeholders to self-serve and save time for everyone. | | Yes, there's some overlap with a Data Analyst position, but | there's enough day-to-day work to differentiate. | tpoacher wrote: | I was hoping this would be a variant of Good Cop Bad Cop as a | technique applied to datascience. It's not. | beforeolives wrote: | This is a good list... for one type of data scientist - the type | that has heavy involvement in product and business decisions. | | Other data scientists are basically software developers with a | very specific domain, a third kind focus a lot more on research | and many data science jobs are some blend of all of these things. | My point is that the author mentions in the intro how data | science is very broad and then continues to focus on what's only | a subset of all data science jobs. | | With that in mind, the list is actually spot on - it's just good | to know that it isn't relevant to many data science jobs. | ian-whitestone wrote: | Agree with you that not all of these things will apply to every | DS role - particularly research heavy ones. But my hope is the | vast majority will. | mturmon wrote: | Yep, some research-oriented DS people are (rightly) obsessed | (correct word) with a particular family of techniques | (variational inference! random forests! adversarial | networks!) and work to find problems to apply that family to. | They literally do pattern-match on their techniques with | every new problem they encounter, and move on if it doesn't | fit. | | A lot of the other of your distinctions do still apply to | such people, like knowing where the data comes from, knowing | when to stop, and adjusting the message to the audience. So, | still a good list. | | Also, even the research DS people need to evolve their | techniques over time. | SilurianWenlock wrote: | Is data science for most businesses just bs? | mywittyname wrote: | No. | | But (and this is a Big But), the value of data science comes at | the end of the data journey. Businesses need to be capturing | data that is relevant and accurate before they can start | analyzing it and deriving any value. | | My experience with clients is that they get a ton of value out | of that first step of thinking about what information they want | to collect about their customers, then actually collecting it | (or, conversely, surfacing what they already collect in a | meaningful way). So while they come in wanting some kind of | neural network powered prediction engine or whatever, they are | often really impressed by pretty basic dashboards about their | customer behavior. | ska wrote: | Not bs. But there is both a real GIGO problem, and a problem | with under specification. It's certainly easy to propose DS | analysis that are unlikely to have much return. | | Thinking "data science is hot, we should do that" is different | than "we have all this data and don't understand what it | means". The latter is more likely to lead somewhere | interesting. | screye wrote: | This highlights one of my main complaints about the DS role. You | are expected to have strong business intuition, sufficient coding | skills to hold down a SWE role, a strong background in | stats/math, know all the ML/DS specific skills and lastly, have | technical depth in the subdomain you are looking to solve. All of | this, while being paid the exact same as someone on the SWE or PM | track. | | No one can do it all. DSs that do 70% of these are the best of | the best. | | Mature DS groups have figured out that you have to pick your | poison, and focus on archetypes rather than a 'well rounded' DS. | Here are a few DS archetypes that I've seen. | | 1. The NLP/Vision/RL domain expert: High depth, low breadth | people. Not very concerned with business intuition. Strong grasp | of math for their domain. Moderate coding abilities, but | pipelining for their field is fairly well defined. What is SQL? | | 2. The Generalist : Comes close to the 'good data scientist' | outlined here. Never publishes, solves DS problems, will probably | struggle to reach principal IC level in any specific product | group because they lack the prerequisite depth. Will often become | a manager down the line though and can also become an excellent | PM at some point. SQL is their life blood. The less business | savvy people see them as MBA-adjacent. But, they are super | important. | | 3. Mr Maths or the Statistician : Pairs excellently with #4 | | 4. The MLE who doesn't want to be an MLE - Excellent coding | skills. Sufficient ML/DS skills. Just hasn't found a way to get | their foot in the door to transition to a DS role without taking | a pay cut. | | 5. The Researcher : Hiring a researcher in the wrong team can | lead to a completely ineffective team. Also, not having a | researcher in a team that needs it can lead to everyone going | around in circles. | | Top DSs will manage to host a max of 2 archetypes in them. Trying | to get your DS to host >2 archetypes, is a losing battle. This is | as good as it is going get. Also, most teams don't need all | archetypes. | | Identify the archetypes you need. Get some coverage over them | through your hired DSs and let them continue growing along their | selected archetypes. | 6gvONxR4sf7o wrote: | > Top DSs will manage to host a max of 2 archetypes in them. | | This ignores experience. Top DSs will manage to have maybe one | archetype per some number of years on the job. You can find | unicorns, but they all have many many years experience and | you're going to have to pay for them. | whatshisface wrote: | > _All of this, while being paid the exact same as someone on | the SWE or PM track._ | | Why not pay top quality DS roles more than SWEs? | IdiocyInAction wrote: | As always, it's supply and demand. DS is often not needed as | much as SWE and there is a lot of supply for DS, due to hype | and ease of transition from people in other fields. | huac wrote: | often (usually?) DS are paid less than SWEs of the same | level! | | I have plenty of cynical thoughts as to what drives that | compensation gap. Maybe the simplest is just that there is | high supply of people with these baseline skills and it isn't | easy to distinguish if somebody is good or not. | alexgmcm wrote: | I think there is just more demand for SWEs. Nearly every | company will have software engineers, but not every company | has data scientists and even the ones that do will almost | certainly have more engineers than data scientists. | | After all, you can't use data science to optimise your | product or service if you don't have sufficient engineers | to build it and maintain it in the first place. | Godel_unicode wrote: | %s/not/don't companies/ | chudi wrote: | I'm a swe that moved from backend to a ds role and then as a ds | manager at my company and this is spot on. If I advertise a job | por a ds position I have to mix all these archetypes and get | used to at best have a solid 4 that wants to pivot to ds as | this is the archetype that knows that we are creating real life | data products not just using the latest model or beating some | metric. | omgwtfbbq wrote: | >All of this, while being paid the exact same as someone on the | SWE or PM track. | | Actually at FAANGs especially they are usually paid less, | sometimes substantially. | hackton wrote: | Sadly on point. Some additions to your list of skills, from my | exp.: | | - Sufficient engineering skills to hold down a Data Engineer | role | | - Excellent at explaining and presenting your results/work to | all sort of audience (users, other DSs, management, etc). | | - Very good at Data Viz | martingoodson wrote: | Learning all this is not really that difficult. No more | difficult than a biochemist training in subjects as diverse as | organic synthesis (making stuff in test tubes), Raman | spectroscopy (prediction of chemical structures using | vibrational signatures) and DNA sequencing (computational | analysis). | | It's only because data science is much newer than biochemistry | as a field that it seems beyond the grasp of an individual. | It's perfectly possible to learn (and to teach) all of the | things you've mentioned. | | And what has pay got to do with it? Since when is pay | correlated to how much you need to study (see, for example, | musicians)? | jltsiren wrote: | Data science is a role, not a field. It's similar to but | wider than the applied statistician role that is well- | established in many fields of research. | | You have a background in one field, but you are working to | solve problems in another field (e.g. biochemistry). To do | that, you must understand biochemistry well enough to be able | to contribute. You are probably far from the best biochemist | in the team, as you were hired for your methodological | skills. In order to solve the problems, you may need tools | from a number of fields, including statistics, machine | learning, software engineering, data engineering, | mathematics, and theoretical computer science. No matter | which field your original degree was in, it's insufficient in | both depth and breadth. You must keep learning new things and | rely on others with complementary skills. | | I work in bioinformatics, which is basically a more | established flavor of data science. I have worked with people | from a variety of backgrounds from electrical engineering to | genetics, and everyone has had obvious gaps in their skills. | Except maybe one or two people, but they are world-famous | experts who are unnaturally curious about everything. | alexgmcm wrote: | Pay has a lot to do with it because if you can switch to an | engineering role (SWE or Data Engineer) and have more focused | responsibilities and a higher salary then that's what most of | them will do. | | Although given the demands made for a DS role are often | unicorn-level I don't even think increasing pay would help. | martingoodson wrote: | The parent comment says 'while being paid the exact same as | someone on the SWE or PM track.' Not 'less than a SWE', as | you imply. | | Why should a data scientist be paid more than a SWE? | Because they have to learn several different topics? That | is not such a big deal in my opinion (I work as a DS). | | This language of 'unicorns' has been highly damaging to the | field. There is nothing magical about a job which requires | a lot of varied technical knowledge. Try looking at a | syllabus for some other scientific subject. It's fairly | normal. | alexgmcm wrote: | I work as a DS as well. I don't think there's such a | thing as "should be paid more" - the market shows us that | SWE's are more highly valued presumably because there is | more demand for those skills. | | However, this will lead to people migrating from DS to DE | and SWE roles if the compensation is relatively better. | Yet we see articles about a 'shortage' in DS when they | just aren't paying as much as a similar skill-set can get | in a different role. | travisjungroth wrote: | > the market shows us that SWE's are more highly valued | presumably because there is more demand for those skills. | | I think it's that it's that a tech company can more | consistently make money from a SWE than any other role. | You can always roll together an app and sell it. For | every other role[0], you provide value to the | organization, which eventually makes its way to the | customers. | | This is why the software bootcamp grads have fared better | than the DS bootcamps (and ML bootcamps). A company can | get a lot of value from a pretty crummy SWE and is | willing to pay for it. A crummy Data Scientist, not so | much. | | [0] Sales is also similarly direct, depending on the | industry. They enjoy a similar status. | v8dev123 wrote: | According you, Dyslexic person can't become a DS person just | not because they love data but because ... | | You are expected to have a strong background in stats/math | | But waaaait | | How come you forget about Philosophy? | | Math and Stats based on Philosophy. You will have to learn | Philosophy to become Super DS person! | [deleted] | SilurianWenlock wrote: | I'm struggling to understand what people think is so difficult | about all this data science stuff. The maths is very basic, | even in "advanced" ml. Nor is it hard to learn backend software | engineering for the purposes of 99% of companies. | sdenton4 wrote: | It's all about epistemology. How do we know what we think we | know? How do we come to know things we didn't know before? | And how can we trust those conclusions? | | Even if the math is basic, it's really, really easy to draw | bad conclusions, look at the wrong problems, not realize that | your data is more incomplete than you might think, etc etc | etc. Guarding against these bad results - figuring out how to | actually manufacture new knowledge - is the heart of the | problem. | visarga wrote: | By the same logic what is so difficult about programming | computers - it's just a bunch of zeroes and ones, very basic | operations. | v8dev123 wrote: | I spent 15 years of my damn life to become a dev and you | don't know what's it like to be a beginner. | | If you can re-read what you wrote with a beginner's mind, you | will see how wrong you are. | amcoastal wrote: | 99% of companies? Definitely not. The skills needed to do DS | in business or healthcare are not very correlated with doing | DS for the physical sciences. Which is the whole point of | this comment thread, sure you can understand DL, but you also | have to have an understanding of the field to know what type | of DL to use. For example, in my role, I came with knowledge | of machine learning but had to learn complex fluid physics to | be able to know what type of DL techniques to apply or | develop. | willdearden wrote: | https://www.uptake.com/blog/good-data-scientist-bad-data-sci... | | done here too ___________________________________________________________________ (page generated 2021-05-11 23:00 UTC)