[HN Gopher] Andreessen-Horowitz craps on "AI" startups from a gr...
       ___________________________________________________________________
        
       Andreessen-Horowitz craps on "AI" startups from a great height
        
       Author : dostoevsky
       Score  : 253 points
       Date   : 2020-02-24 20:31 UTC (2 hours ago)
        
 (HTM) web link (scottlocklin.wordpress.com)
 (TXT) w3m dump (scottlocklin.wordpress.com)
        
       | bryanrasmussen wrote:
       | Generally the use of the phrase from a great height implies the
       | height is one of morality, intellect, or valor (each of these
       | decreasing in usage), I'm not exactly sure what the great height
       | Andreessen-Horowitz craps from is composed of - maybe money?
       | 
       | I think they may just be crapping on them from a reasonable
       | vantage point.
        
         | KaiserPro wrote:
         | The height is not really about morals. Its more about the blast
         | radius of the shit.
        
           | darwingr wrote:
           | Or like "nuked from orbit"
        
       | raiyu wrote:
       | The number of places where machine learning can be used
       | effectively from both a cost perspective and a return perspective
       | are small. They are usually tremendously large datasets at
       | gigantic companies, and they probably have to build in house
       | expertise because it's hard to package this up into a product and
       | resell it for various industries, datasets, etc.
       | 
       | Certainly something like autonomous driving needs machine
       | learning to function, but again, these are going to be owned by
       | large corporations, and even when a startup is successful, it's
       | really about the layered technology on-top of machine learning
       | that makes it interesting.
       | 
       | It's kind of like what Kelsey Hightower said about Kubernetes.
       | It's interesting and great, but what will really matter is what
       | service you put on top of it, so much so that whether you use
       | Kubernetes becomes irrelevant.
       | 
       | So I think companies that are focusing on a specific problem,
       | providing that value added service, building it through machine
       | learning, can be successful. While just broadly deploying machine
       | learning as a platform in and of itself can be very challenging.
       | 
       | And I think the autonomous driving space is a great example of
       | that. They are building a value added service in a particular
       | vertical, with tremendous investment, progress, and potentially
       | life changing tech down the road. But as a consumer it's really
       | the autonomous driving that is interesting, not whether they are
       | using AI/machine learning to get there.
        
         | andreilys wrote:
         | _"The number of places where machine learning can be used
         | effectively from both a cost perspective and a return
         | perspective are small."_
         | 
         | Thankfully transfer learning and super convergence invalidates
         | this claim.
         | 
         | Using pre-trained models + specific training techniques
         | significantly reduces the amount of data you need, your
         | training time and the cost to create near state of the art
         | models.
         | 
         | Both Kaggle and google colab offer free GPU.
        
           | ska wrote:
           | >Thankfully transfer learning and super convergence
           | invalidates this claim.
           | 
           | IME it is nowhere near as universally successful as this
           | suggests.
        
         | Q6T46nT668w6i3m wrote:
         | How would you explain the rise (and success) of machine
         | learning in science? A lab that uses some learning-based method
         | will likely be limited to just one or two people (responsible
         | for data acquisition, feature engineering, evaluation, etc.)
         | and extremely finite data.
        
           | PeterisP wrote:
           | Can you elaborate on what you mean by "A lab that uses some
           | learning-based method will likely be limited to just one or
           | two people (responsible for data acquisition, feature
           | engineering, evaluation, etc.)" ? I know a bunch of labs that
           | apply machine learning to specific tasks, and the parts you
           | list _each_ can easily take up multiple people for years for
           | a single task - not counting data acquisition, because data
           | is definitely not  "extremely finite", you need lots of
           | quality data, and improving data is something that always
           | gets improvements and can easily eat up more manpower than
           | you can have budget, no matter what that budget is.
        
           | semi-extrinsic wrote:
           | How do you define success? Adoption? Because right now,
           | writing "we will use machine learning to solve X" in a grant
           | proposal is an easy way to increase chances of getting
           | funding.
        
           | Barrin92 wrote:
           | I'm not sure there is a rise. 'Science' is a huge domain.
           | Machine learning if I had to guess maybe plays a role in < 1%
           | of them, and that may be overstating it.
           | 
           | Also it's doubtful to even categorize machine-learning as
           | science. The goal of science is to generate insight and
           | knowledge, ML solves particular engineering problems or
           | searches problem spaces, it doesn't build fundamental
           | scientific models.
        
           | ska wrote:
           | It's not clear there has been any deep impact actually, but
           | there has been a lot of discussion (and grant proposals)
           | 
           | I've seen a lot of cross pollination of ML and AI techniques
           | into various disciplines. A large percentage just didn't work
           | at all, most of the rest were more "kind of interesting,
           | but". Nothing earthshaking happened although pop sci press
           | likes to talk about it a lot.
           | 
           | If you have more digital data than you used to, using modern
           | free frameworks and toolkits to do basic (i.e. older, boring,
           | but understood) ML stuff to understand it seems to have a
           | reasonable return. Mostly I think this is because it becomes
           | accessible to someone without much background in the area,
           | and you can do reasonable things without having to put 6
           | months of reading and implementing together before starting.
        
       | joshuaellinger wrote:
       | I just spent $50K on coloc hardware. I'm taking a $10K/mo Azure
       | spend down to a $1K/mo hosting cost.
       | 
       | But the real kicker is that I get x5 the cores, x20 RAM, x10
       | storage, and a couple of GPUs. I'm running last-generation
       | Infiniband (56gb/sec) and modern U.2 SSDs (say 500MB/sec per
       | device).
       | 
       | I figure it is going to take me about $10K in labor to move and
       | then $1K/mo to maintain and pay for services that are bundled in
       | the cloud. And because I have all this dedicated hardware, I
       | don't have to mess around with docker/k8s/etc.
       | 
       | It's not really a big data problem but it shows the ROI on owning
       | your own hardware. If you need 100 servers for one day per month,
       | the cloud is amazing. But I do a bunch of resampling, simple
       | models, and interactive BI type stuff, so co-loc wins easily.
        
       | seibelj wrote:
       | I wrote an article I published a week ago about how AI is the
       | biggest misnomer in tech history https://medium.com/@seibelj/the-
       | artificial-intelligence-scam...
       | 
       | I wrote it to be tongue-in-cheek in a ranting style, but
       | essentially "AI" businesses and the technology underpinning it
       | are not the silver bullet the media and marketing hype has made
       | it out to be. The linked article about a16z shows how AI is the
       | same story everywhere - enormous capital to get the data and
       | engineers to automate, but even the "good" AI still gets it wrong
       | much of the time, necessitating endless edge-cases, human
       | intervention, and eventually it's a giant ball of poorly-
       | understand and impossible to maintain pipelines that don't even
       | provide a better result than a few humans with a spreadsheet.
        
       | ativzzz wrote:
       | I agree with the author's opinion about
       | 
       | > I'll go out on a limb and assert that most of the up front data
       | pipelining and organizational changes which allow for it are
       | probably more valuable than the actual machine learning piece.
       | 
       | Especially at non-tech companies with outdated internal
       | technology. I've consulted at one of these and the biggest wins
       | from the project (I left before the whole thing finished
       | unfortunately) were overall improvements to the internal data
       | pipeline, such as standardization and consolidation of similar or
       | identical data from different business units.
        
         | noelsusman wrote:
         | I do data science at a non-tech company with outdated internal
         | technology and I've seen this over and over again. Honestly
         | though, it's worth every penny because often the only way to
         | get the resources to truly solve data pipeline issues is to get
         | an executive to buy some crap from a vendor and force everyone
         | to make it work.
        
       | fxtentacle wrote:
       | I predict a great future for startups that sell pickaxes, err,
       | tools for AI.
       | 
       | AI is like the new gold rush. And just like back then, it's not
       | the gold diggers that will get rich.
       | 
       | "Most people in AI forget that the hardest part of building a new
       | AI solution or product is not the AI or algorithms -- it's the
       | data collection and labeling."
       | 
       | https://medium.com/startup-grind/fueling-the-ai-gold-rush-7a...
       | 
       | (from 2017)
        
         | moksly wrote:
         | Is it the new gold rush though. I work in a large organisation
         | that has a lot of data and inefficient processes, and we
         | haven't bought anything.
         | 
         | It hasn't been for a lack of trying. We've had everyone from
         | IBM and Microsoft to small local AI startup try to sell us
         | their magic, but no one has come up with anything meaningful to
         | do with our data that our analysis department isn't already
         | doing without ML/AI. I guess we could replace some of our
         | analysis department with ML/AI, but working with data is only
         | part of what they do, explaining the data and helping our
         | leadership make sound decisions is their primary function, and
         | it's kind of hard for ML/AI to do that (trust me).
         | 
         | What we have learned though, is that even though we have a
         | truck load of data, we can't actually use it unless we have
         | someone on deck who actually understands it. IBM had a run at
         | it, and they couldn't get their algorithms to understand
         | anything, not even when we tried to help them. I mean, they did
         | come up with some basic models that their machine
         | spotted/learned by itself by trawling through our data, but
         | nothing we didn't already have. Because even though we have a
         | lot of data, the quality of it is absolute shite. Which is
         | anecdotal, but it's terrible because it was generated by
         | thousand of human employees over 40 years, and even though I'm
         | guessing, I doubt we're unique in that aspect.
         | 
         | We'll continue to do various proof of concepts and listen to
         | what suppliers have to say, but I fully expect most of it to go
         | the way Blockchain did which is where we never actually find a
         | use for it.
         | 
         | With a gold rush, you kind of need the nuggets of gold to sell,
         | and I'm just not seeing that with ML/AI. At least no yet.
        
       | m0zg wrote:
       | "Huge compute bills" usually come from training, or to be more
       | precise, hyperparameter search that's required before you find a
       | model that works well. You could also fail to find such a model,
       | but that's another discussion.
       | 
       | So yeah, you could spend one or two FTE salaries' (or one deep
       | learning PhD's) worth of cash on finding such models for your
       | startup if you insist on helping Jeff Bezos to wipe his tears
       | with crisp hundred dollar bills. That's if you know what you're
       | doing of course. Literally unlimited amounts could be spent if
       | you don't. Or you could do the same for a fraction of the cost by
       | stuffing a rack in your office with consumer grade 2080ti's. Just
       | don't call it a "datacenter" or NVIDIA will have a stroke. Is
       | that too much money? Not in most typical cases, I'd think. If the
       | competitive advantage of what you're doing with DL does not
       | offset the cost of 2 meatspace FTEs, you're doing it wrong.
       | 
       | That, once again, assumes that you know what you're doing, and
       | aren't doing deep learning for the sake of deep learning.
       | 
       | Also, if your startup is venture funded, AWS will give you $100K
       | in credit, hoping that you waste it by misconfiguring your
       | instances and not paying attention to their extremely opaque
       | billing (which is what most of their startup customers proceed to
       | doing pretty much straight away). If you do not make these
       | mistakes, that $100K will last for some time, after which you
       | could build out the aforementioned rack full of 2080ti's on prem.
        
         | zitterbewegung wrote:
         | I was training ML models on AWS / Google Colab. After racking
         | up a few hundred dollars on AWS I bought a Titan RTX (I also
         | play video games so it does that very well also.
        
         | artsyca wrote:
         | Slow clap
        
         | bob1029 wrote:
         | I find it fun how the cost of the cloud is forcing people to
         | consider what absolutely must run in the cloud (presumably for
         | stability and compliance reasons) and what can be brought back
         | on-prem.
         | 
         | We don't train ML models, but we are in a similar boat
         | regarding cloud compute costs. Building our solutions for our
         | clients is a compute-heavy task which is getting expensive in
         | the cloud. We are considering options such as building
         | commodity threadripper rigs, throwing them in various
         | developers' (home) offices, installing a VPN client on each and
         | then attaching as build agents to our AWS-hosted jenkins
         | instance. In this configuration we could drop down to a
         | t3a.micro for Jenkins and still see much faster builds. The
         | reduction in iteration time over a month would easily pay for
         | the new hardware. An obvious next step up from this is to do
         | proper colocation, but I am of a mindset that if I have to
         | start racking servers I am bringing 100% of our infrastructure
         | out of the cloud.
        
           | buckminster wrote:
           | I once had a borrowed Sun blade server in my home office. The
           | fan in it sounded like an industrial vacuum cleaner. It got
           | moved to a different room and was powered on as little as
           | possible.
           | 
           | Your plan makes sense but be mindful of the acoustics or your
           | devs may grow to hate you.
        
             | m0zg wrote:
             | BTW, the only reason why consumer vacuum cleaners are so
             | loud is because consumers associate loudness with suction
             | power.
             | 
             | "Backpack" style commercial vacuum cleaners have more
             | suction, and are barely audible in comparison.
        
               | buckminster wrote:
               | I stand corrected. It was much louder than a consumer
               | vacuum but my analogy skills are weak.
        
               | ggm wrote:
               | Your analogy skills were strong, because analogy is
               | rooted in myth, not fact. Achilles did not actually
               | _have_ an Achilles tendon because Achilles _did not
               | exist_
               | 
               | There does not have to be an increadibly loud functional
               | industrial vacuum cleaner, for figuratively _everyone_ to
               | get your analogy, because the Herculean reality of vacuum
               | cleaners is that you cannot clean an augean stable of
               | lego on the floor, without a lot of noise. If you get my
               | analogy.
        
             | bob1029 wrote:
             | Excellent point. If we are building these rigs by hand
             | (which is a likely option considering the initial usage
             | context), the cooling solution would probably be a Noctua
             | NH-U14S or similar. I already have one of these in my
             | office attached to a 2950X and it is dead silent. You can
             | definitely hear it when every core is pegged, but it's
             | hardly noticeable over any other arbitrary workstation. The
             | sound is nowhere near as intrusive as something like a
             | blower on a GPU (or god forbid a sun microsystems blade).
        
           | m0zg wrote:
           | This is not a new phenomenon. As early as in 2009 I worked
           | for a company (ads, but not Google) which outgrew the typical
           | "cloud" cost structure at the time, and moved everything to a
           | more traditional datacenter, and saved substantial money even
           | considering 3 more SREs they had to hire to absorb the
           | increased support needs. AWS charges what the market will
           | bear, and as such it was never designed to make sense for
           | everyone. One needs to re-evaluate on the back of the napkin
           | from time to time.
        
           | blt wrote:
           | If I worked from home and my employer asked me to install a
           | server in my home, I would tell them to go fuck themselves.
           | 
           | It's noisy, it takes up space, and presumably I'm on call to
           | fix it if it breaks.
           | 
           | You should pay them an extra 24x(PSU wattage)x(peak $/Wh in
           | area) per day for the electricity too.
           | 
           | I'm alarmed that someone in your company felt this idea was
           | appropriate enough to propose.
        
         | pridkett wrote:
         | There's also the issue that data scientists often want to go
         | running to hyperparameter optimization and neural architecture
         | search. In most cases improving your data pipelines and
         | ensuring the data are clean and efficient will pay off much
         | more quickly.
        
           | fxtentacle wrote:
           | But manually improving the data pipeline requires an
           | understanding of the problem, whereas doing a hyperparameter
           | optimized architecture search just needs $$$ hardware and no
           | clue on the side of the operator.
        
             | derefr wrote:
             | Or, to put that another way: if you knew what algorithm the
             | AI would be using to discriminate the signal from the noise
             | in your data, why would you need the AI? Just write that
             | algorithm.
        
               | fxtentacle wrote:
               | Exactly :)
               | 
               | In most cases, unsupervised learning is nothing more than
               | having the AI try to approximate the solution of your
               | highly non-linear loss function. So if there's any way of
               | solving that loss function directly, it will perform like
               | a well-trained AI.
        
         | fxtentacle wrote:
         | No, also inference is quite expensive. You'll have 100% usage
         | on a $10,000 GPU for 3s per customer image for a decently sized
         | optical flow network. That's 3 hours of compute time for 1
         | minute of 60fps video.
         | 
         | Now let's say your customer wants to analyze 2 hours = 120
         | minutes of video and doesn't want to wait more than those 3
         | hours, then suddenly you need 120 servers with one $10k GPU
         | each to service this one customer within 3 hours of waiting.
         | 
         | Good luck reaching that $1,200,000 customer lifetime value to
         | get a positive ROI on your hardware investment.
         | 
         | When I talk about AI, I usually call it "beating the problem to
         | death with cheap computing power". And looking at the average
         | cleverness of AI algorithm training formulas, that seems to be
         | exactly what everyone else is doing, too.
         | 
         | And since I'm being snarky anyway, there's two subdivisions to
         | AI:
         | 
         | supervised learning => remember this
         | 
         | unsupervised learning => approximate this
         | 
         | Both approaches don't put much emphasis on intelligence ;) And
         | both approaches can usually be implemented more efficiently
         | without AI, if you know what you are doing.
        
           | m0zg wrote:
           | Some kinds of inference are expensive, yes, not going to
           | dispute that. But 99.95% of it is actually surprisingly
           | inexpensive. Hell, a lot of useful workloads can be deployed
           | on a cell phone nowadays, and that fraction will increase
           | over time, further reducing inference costs or eliminating
           | them outright (or rather moving them to the consumer).
           | 
           | For the vast majority of people the main expense is creating
           | the combination of a dataset and model that works for their
           | practical problem, with the dataset being the harder (and
           | sometimes more expensive) problem of the two.
           | 
           | The dataset is also their "moat", even though most of them
           | don't realize it, and don't put enough care into that part of
           | the pipeline.
        
             | fxtentacle wrote:
             | The algorithms that run on cell phones tend to be specially
             | optimized and quality-reduced neural networks. For example,
             | https://arxiv.org/abs/1704.04861
        
         | ignoramous wrote:
         | As someone hoping to build a world-wide footprint, say 25 to 50
         | DCs, of servers to deploy to with unmetered bandwidth, what are
         | some alternatives to the usual suspects?
         | 
         | I have come across fly.io, Vultr, Scaleway, Stackpath, Hetzner,
         | and OVH but either they are expensive (in that they charge for
         | bandwidth and uptime) or do not have a wide enough foot-print.
         | 
         | I guess colos are the way to go, but how does one work with
         | colos, allocate servers, deploy to them, ensure security and
         | uptime and so on from a single place, 'cause dealing with them
         | individually might slow down the process? Is there a tooling
         | that deals with multi-colos like the ones for multi-cloud like
         | min.io, k8s, Triton etc;
        
           | KaiserPro wrote:
           | > As someone hoping to build a world-wide footprint
           | 
           | Does adding an extra 100ms to the response time cost you that
           | much business wise?
           | 
           | As for colos, it depends on scale. If you have 30k servers
           | world wide, it pays to have someone manage the contracts for
           | you. If not it pays to go for the painful arseholes like
           | vodaphone, or whoever bought Cable & wireless's stuff.
           | 
           | as for security, it gets very difficult. You need to make
           | sure that each machine is actually running _what_ you told
           | it, and know if someone has inserted a hypervisor shim
           | between you and your bare metal.
           | 
           | none of that is off the shelf.
           | 
           | Which is why people pay the big boys, so that they can prove
           | chain of custody and have very big locks on the cages.
           | 
           | K8s gives you scheduling and a datastore. For a large
           | globally distributed system its going to scale like treacle.
        
           | mrkurt wrote:
           | (Hi, I'm from fly.io)
           | 
           | It depends what you need in your datacenters! If you just
           | want servers, and don't care about doing something like
           | anycast, you can find a bunch of local dedicated server
           | providers in a bunch of cities and go to town. But you can't
           | get them all from one provider, really, not with any kind of
           | reasonable budget.
           | 
           | You _could_ buy colo from a place like Equinix in a bunch of
           | cities, and then either use their transit or buy from other
           | transit providers.
           | 
           | But also, unmetered bandwidth isn't a very sustainable
           | service, so I'm curious what you're after? You're usually
           | either going to have to pay for usage, or pay large monthly
           | fixed prices to get reasonable transit connections in each
           | datacenter.
           | 
           | In our case, we're constrained by Anycast. To expand past the
           | 17 usual cities you end up needing to do your own network
           | engineering which we'd rather not do yet.
        
             | ignoramous wrote:
             | (thanks mrkurt)
             | 
             | It is anycast that I'm going after. Requirement for
             | unmetered bandwidth (or cheaper than AWS et al) is because
             | of the kind of workloads (TURN relays, proxy, tunnels etc)
             | we'd deal with gets expensive, otherwise. For another
             | related workload, per-request pricing gets expensive,
             | again, due to the nature of the workload (to the tune 100k
             | requests per user per month).
             | 
             | So far, for the former (TURN relays etc), I've found using
             | AWS Global Accelerator and/or GCP's GLB to be the _easiest_
             | way to do anycast but the bandwidth is slightly on the
             | expensive side. Fly.io matches the pricing in terms of
             | network bandwidth (as promised on the website), so that 's
             | a positive but GCP/AWS have a wider footprint. Cloudflare's
             | Magic Transit is another potential solution, but requires
             | an enterprise plan and one needs to bring-your-own-anycast-
             | IP and origin-servers.
             | 
             | For the latter (latency-sensitive workload with ~100k+ reqs
             | / month), Cloudflare Workers (200+ locations minus China)
             | are a great fit though would get expensive once we hit a
             | certain scale. Plus, they're limited to L7 HTTP reqs, only.
             | Whilst, I believe, fly.io can do L4.
        
         | avip wrote:
         | For balance, all big cloud providers - aws, gcp, azure, oracle
         | [0] have pretty similar startup plans. Y$$MV
         | 
         | (I'm in full agreement with everything you've written + it's
         | well-phrased and funny. gj!)
         | 
         | [0] that's not a typo - there is such thing as "Oracle cloud"
        
         | alephnan wrote:
         | > Just don't call it a "datacenter" or NVIDIA will have a
         | stroke.
         | 
         | Context please :) ?
        
           | mereel wrote:
           | Just a guess but maybe it's some licensing issue?
           | https://www.nvidia.com/en-us/drivers/geforce-license/
           | 
           |  _No Datacenter Deployment. The SOFTWARE is not licensed for
           | datacenter deployment, except that blockchain processing in a
           | datacenter is permitted._
        
             | derefr wrote:
             | > except that blockchain processing in a datacenter is
             | permitted
             | 
             | Well, Nvidia, y'see, my new blockchain does AI training as
             | its Proof-of-Work step...
        
             | mam2 wrote:
             | well they are the one writing the rules, so i'd side with
             | OP
        
           | ThePadawan wrote:
           | Datacenter GPUs are mostly identical to the much cheaper
           | consumer versions. The only thing preventing you from running
           | a datacenter with consumer hardware is the licensing
           | agreement you accept.
        
             | KaiserPro wrote:
             | And the cooling, amount of ram and the doubles performance.
             | 
             | the chip might be the same, but the rest of it isn't
             | 
             | Granted, its not worth the $3k price bump, but thats a
             | different issue.
        
               | zwaps wrote:
               | Nah that's not really it. The reason NVIDIA doesn't allow
               | this is precisely because the additional RAM -
               | functionally the only difference - is not cost efficient.
               | People would like (and did) use a bunch of consumer
               | 1080s, which is why NVIDIA disallowed precisely that. You
               | had to buy the equivalent pro grade card, which costs
               | easily two or three times that and offers a couple more
               | GB of RAM.
        
             | endorphone wrote:
             | "The only thing preventing you from running a datacenter
             | with consumer hardware is the licensing agreement you
             | accept."
             | 
             | The consumer cards don't use ECC and memory errors are a
             | common issue (GDDR6 running at the edge of its
             | capabilities). In a gaming situation that means a polygon
             | might be wrong, a visual glitch occurs, a texture isn't
             | rendered right -- things that just don't matter. For
             | scientific purposes that same glitch could be catastrophic.
             | 
             | The "datacenter" cards offer significantly higher
             | performance for some case (tensor cores, double precision),
             | are designed for data center use, are much more scalable,
             | etc. They also come with over double the memory (which is
             | one of the primary limitations forcing scale outs).
             | 
             | Going with the consumer cards is one of those things that
             | might be Pyrrhic. If funds are super low and you want to
             | just get started, sure, but any implication that the only
             | difference is a license is simply incorrect.
        
               | luisfmh wrote:
               | Learned a new word today. Pyrrhic.
        
               | [deleted]
        
           | OkGoDoIt wrote:
           | NVIDIA forces you to buy significantly more expensive cards
           | that perform marginally better if you are using them for
           | datacenter use. They try to enforce not letting businesses
           | use consumer grade gaming cards. I assume this is so cloud
           | providers don't buy up all the supply of graphics cards and
           | make it hard for gamers to get decent cards, like what
           | happened during the bitcoin craze.
        
             | nkassis wrote:
             | No it's just pure price discrimination. They don't care
             | about gamers they just know businesses will pay more if
             | forced to while gamers can't.
        
               | aj7 wrote:
               | Exactly.
        
           | [deleted]
        
           | [deleted]
        
         | fizixer wrote:
         | - Or AMD could change their policy of 'never miss an
         | opportunity to miss an opportunity' and offer high-performance
         | OpenCL GPGPU offerings. Then nVidia could have all the stroke
         | they wanted.
         | 
         | - Or Tensorflow/Pytorch could've crapped on OpenCL a little
         | less by releasing a fully functional OpenCL version everytime
         | they released a fully functional Cuda version, instead of
         | worshipping Cuda year in and year out.
         | 
         | - Or Google could start selling their TPUv2, if not TPUv3,
         | while they're on the verge of releasing TPUv4.
         | 
         | - Or one of the other big-tech's Facebook/Microsoft/Intel could
         | make and start selling a TPU-equivalent device.
         | 
         | - Or I could finish school and get funded to do all/most of the
         | above ;)
         | 
         | edit: On a more serious note, a cloud/on-prem hybrid is
         | absolutely the right way to go. You should have a 4x 2080 ti
         | rig available 24x7 for every ML engineer. It costs about $6k-8k
         | a piece [0]. Prototype the hell out of your models on on-prem
         | hardware. Then when your setup is in working condition and
         | starts producing good results on small problems, you're ready
         | to do a big computation for final model training. Then you send
         | it to the cloud, for final production run. (Guess what, on a
         | majority of your projects, you might realize, the final
         | production run could be carried out on on-prem itself; you just
         | have to keep it running 24 hours-a-day for a few days or up to
         | a couple weeks.)
         | 
         | [0]: https://l7.curtisnorthcutt.com/the-best-4-gpu-deep-
         | learning-...
        
           | m0zg wrote:
           | As someone who has actually worked on this stuff soup to
           | nuts, it's not as easy as people imagine, because you can't
           | just support some subset of available ops and call it a day.
           | If you want to make OpenCL pie from scratch, you must first
           | make the universe, and support every single stupid thing
           | (among thousands) and even mimic some of the bugs so that
           | models work "the same".
           | 
           | This is hard and time consuming, and this field is hard
           | enough as it is. What makes it even harder is that only
           | NVIDIA has decent, mature tooling. There is some work on ROCM
           | though, so AMD is not _totally_ dead in the water. I'd say
           | they're about 90% dead in the water.
        
             | derefr wrote:
             | > support every single stupid thing (among thousands) and
             | even mimic some of the bugs so that models work "the same".
             | 
             | Do you need to do the stupid things performantly, though?
             | Because that sounds like a case for skipping microcode
             | shims, and going straight to instructions that trap into a
             | software implementation. Or just running the whole compute-
             | kernel in a driver-side software emulator that then
             | compiles real sub-kernels for each stretch of non-stupid
             | GPGPU instructions, uploads those, and then calls into them
             | from the software emulation at the right moments. Like a
             | profile-guided JIT, but one that can't actually JIT
             | everything, only some things.
        
             | fizixer wrote:
             | I know Tensorflow decided to be cuda-exclusive for the
             | silly reason that the matrix library they were using
             | (eigen) only supported cuda.
             | 
             | I have never recovered from that.
        
             | simonebrunozzi wrote:
             | Are you in the Bay area? Would love to chat. Thinking of an
             | idea where your expertise could be very handy. $my-hn-
             | username at gmail.
        
       | marmaduke wrote:
       | I sometimes contribute to methodology projects in neuroscience
       | ("AI" for scientists). The most tiring part of it is explaining
       | essentially these things over and over. Very interesting to see
       | the sentiment vindicated in Startupistan.
        
       | atulkum wrote:
       | On the other hand some of the startup is doing absolutely fraud
       | on the name of AI.I went to a self checkout store (AIFI.io). I
       | did not touch anything but they charge me $35.10. According to
       | the receipt I took 17 packs of snacks :) These guys are doing
       | fraud on the name of AI. They have no technology no software just
       | put up some camera and open a store so that they can defraud the
       | investor. Anyone can try if intersted https://www.aifi.io/loop-
       | case-study
        
       | mtkd wrote:
       | AI on the algo side is only half the story -- it has to sit in a
       | domain specific framework to be most effective
       | 
       | I see a lot of 'bolt-on' tech emerging -- it looks mostly snake
       | oil -- there is no obvious way to be competitive against teams
       | that baked it in to the bare metal design
       | 
       | Also most commercial use-cases I've seen need effective ML more
       | than anything else
        
       | dang wrote:
       | A thread about the original article, from a few days ago:
       | https://news.ycombinator.com/item?id=22352750
        
       | rotrux wrote:
       | This is a terrific article. Two thumbs up.
        
       | moab wrote:
       | I found it fun to read this after reading this other post that
       | made the rounds today about AI automating most programming work
       | and making program optimization irrelevant:
       | https://bartoszmilewski.com/2020/02/24/math-is-your-insuranc...
        
       | blueyes wrote:
       | The A16Z piece makes all these points quite clearly. This
       | editorial is trying to put a finer point on a sharp knife.
        
       | correlator wrote:
       | No need to look at AZ for this. If you're building "AI" I wish
       | you a speedy road to being acquired by a company that can put it
       | to use. You've become a high priced recruiting firm.
       | 
       | If you're solving a real problem and use ML in service of solving
       | that problem, then you've got a great moat....happy trusting
       | customers.
       | 
       | It's not complicated
        
         | motohagiography wrote:
         | Sssh! Valuations are a function of projected market size and
         | opacity of the problem. Clarity like this collapses the
         | uncertainty and destroys value. If you pour enough capital into
         | rooms full of PhD's something's gotta hit.
         | 
         | My way of saying, you're very, very right.
        
       | lazzlazzlazz wrote:
       | Is the misspelling of "Andreessen-Horowitz" and use of "A19H"
       | instead of "a16z" intentional?
        
         | scottlocklin wrote:
         | I suck at spelling. If I was one of the cool kids I'd claim to
         | be dyslexic.
        
           | yubozhao wrote:
           | hi OP. We built an open-source library called,
           | BentoML(https://github.com/bentoml/bentoml) to make model
           | inferencing/serving a lot easier for Data scientists in
           | various serving scenarios.
           | 
           | Love to hear your thoughts on our library
        
         | khazhoux wrote:
         | You mean the fact that they left out an "s" in Andreessen?
        
           | dang wrote:
           | We've squeezed another s above.
        
       | leetrout wrote:
       | That is a great write up and very accurate description of both
       | the costs and human intervention based on my experience with "AI"
       | tools.
        
       | allovernow wrote:
       | All of this might be true currently, but that's because this
       | current first generation "AI" (technically should just be called
       | ML) is mostly bullshit. To clarify, I don't mean anyone is lying
       | or selling snake oil - what I mean by bullshit is that the vast
       | majority of these services are cooked up by software developers
       | without any background in mathematics, selling adtechy services
       | in domains like product recommendation and sentiment analysis.
       | They are single discipline applications accessable to devs
       | without science backgrounds and do not rely on substantial
       | expertise from other fields. That makes them narrow in technical
       | scope and easy to rip off (hence no moat, lots of competition,
       | and human reliance and lack of actual software).
       | 
       | The next generation of Machine Learning is just emerging, and
       | looks nothing like this. Funds are being raised, patents are
       | being filed, and everything is in early stage development, so you
       | probably haven't heard much yet - but these ML startups are going
       | after real problems in industry: cross disciplinary applications
       | leveraging the power of heuristic learning to make cross
       | disciplinary designs and decisions currently still limited to the
       | human domain.
       | 
       | I'm talking about the kind of heuristics which currently exist
       | only as human intuition expressed most compactly as concept
       | graphs and, especially, mathematical relationships - e.g.
       | component design with stress and materials constraints, geologic
       | model building, treatment recommendation from a corpus of patient
       | data, etc. ML solutions for problems like these cannot be
       | developed without an intimate understanding of the problem
       | domain. This is a generalist's game. I predict that the most
       | successful ML engineers of the next decade will be those with
       | hard STEM backgrounds, MS and PhD level, who have transitioned to
       | ML. [Un]Fortunately for us, the current buzzwordy types of ML
       | services give the rest of us a bad name, but looking at _these_
       | upcoming applications the answers to the article tl;dr look
       | different:
       | 
       | >Deep learning costs a lot in compute, for marginal payoffs
       | 
       | The payoffs here are far greater. Designs are in the pipeline
       | which augment industry roles - accelerate design by replacing
       | finite methods with vastly quicker ML for unprecedented
       | iteration. Produce meaningful suggestions during the development
       | of 3D designs. Fetch related technical documents in real time by
       | scanning the progressive design as the engineer works, parsing
       | and probabilistically suggesting alternative paths to research
       | progression. Think Bonzi Buddy on steroids...this is a place for
       | recurring software licenses, not SaaS.
       | 
       | >Machine learning startups generally have no moat or meaningful
       | special sauce
       | 
       | For solving specific, technical problems, neural network design
       | requires a certain degree of intuition with respect to the flow
       | of information through the network, which both optimizes and
       | limits the kind of patterns that a given net can learn. Thus
       | designing NN for hard-industry applications is predicated upon an
       | intimate understanding of domain knowledge, and these highly
       | specialized neural nets become patentable secret sauces. That's
       | half of the most - the other comes from competition for the
       | software developers with first-hand experience in these fields,
       | or a general enough math heavy background to capture the
       | relationships that are being distilled into nets.
       | 
       | >Machine learning startups are mostly services businesses, not
       | software businesses
       | 
       | Again only true because most current applications are NLP adtechy
       | bullshit. Imagine coding in an IDE powered by an AI (multiple
       | interacting neural nets) which guides the structure of your code
       | at a high level and flags bugs as you write. This, at a more
       | practical level, is the type of software that will eventually
       | change every technical discipline, and you can sell licenses!
       | 
       | >Machine learning will be most productive inside large
       | organizations that have data and process inefficiencies
       | 
       | This next generation goes far past simply optimizing production
       | lines or counting missed pennies or extracting a couple extra
       | percent of value from analytics data. This style of applied ML
       | operates at a deeper level of design which will change
       | everything.
        
         | scottlocklin wrote:
         | >The next generation of Machine Learning is just emerging, and
         | looks nothing like this. Funds are being raised, patents are
         | being filed, and everything is in early stage development, so
         | you probably haven't heard much yet ...
         | 
         | Citations needed. Large claims: presumably you can name one
         | example of this, and hopefully it's not a company you work at.
         | 
         | I've seen projects on literally all the things you mention:
         | materials science, medical stuff, geology/prospecting -none of
         | them worked well enough to build a stand alone business around
         | them. I do know the oil companies are using DL ideas with some
         | small successes, but this only makes sense for them, as they've
         | been working on inverse problems for decades. None of them buy
         | canned software/services: it's all done in house. Probably
         | always will be, same as their other imaging efforts.
        
           | allovernow wrote:
           | >Citations needed. Large claims: presumably you can name one
           | example of this, and hopefully it's not a company you work
           | at.
           | 
           | Unfortunately this is all emerging just now and yes, I do
           | work at such a company, but I'm old enough to not be naively
           | excited by some hot fad. There's something profound just
           | starting to happen but everyone is keeping the tech rather
           | secret because it isn't developed/differentiated enough yet
           | to keep a competitor from running off with an idea, yet.
           | Disclosure is probably 1-3 years out of estimate.
           | 
           | >I do know the oil companies are using DL...as their other
           | imaging efforts.
           | 
           | You're correct, and I happen to have experience in this
           | domain - except there are a handful of up and commers
           | courting funds from global majors like Shell and BP, and
           | seismic inversion is near the end of the list of novel
           | applications. Peteoleum is ground zero for a potential
           | revolution right now, if we can come up with something before
           | the U.S. administration clamps down on fossil fuels.
           | 
           | But we're talking complex algorithms which consist of
           | multiple interacting neural networks. We are rapidly moving
           | toward rudimentary reasoning systems which represent
           | conceptual information encoded in vectors. I'm jaded enough
           | that I wouldn't say we're developing AGI, but if the
           | progressing ideas I'm familiar with and Workin on personally
           | pan out, they will be massive baby steps towards something
           | like AGI.
           | 
           | The space is evolving at least as rapidly as the academic
           | side, which I think is an unprecedented pace of development
           | for a novel field of study. I can't help but feel like these
           | are the first steps towards some kind of singularity. There's
           | no question that we are on to something civilization changing
           | with neural networks, what remains to be seen is whether
           | compute scaling will keep up with the needs of this next
           | generation ML. Even if research stopped today, the modern ML
           | zoo has exploded with architectures with fruitful
           | applications across domains. The future is here!
        
       | rossdavidh wrote:
       | So, way back in the last millenium, I did my Master's thesis (way
       | smaller deal than a Ph.D. thesis) on neural networks. Since then,
       | I have looked in on it every few years. I think they're cool, I
       | like using them, and writing multi-level backpropagation neural
       | networks used to be one of the first things I'd do in a new
       | language, just to get a feel for how it worked (until pytorch
       | came along and I decided for the first time that using their
       | library was easier than writing my own).
       | 
       | So, it's not like I dislike ML. But, saying an investment is an
       | "AI" startup, ought to be like saying it's a python startup, or
       | saying it's a postgres startup. That ought not to be something
       | you tell people as a defining characteristic of what you do, not
       | because it's a secret but rather because it's not that important
       | to your odds of success. If you used a different language and
       | database, you would probably have about the same odds of success,
       | because it depends more on how well you understand the problem
       | space, and how well you architect your software.
       | 
       | Linear models or other more traditional statistical models can
       | often perform just as well as DL or any other neural network, for
       | the same reason that when you look at a kaggle leaderboard, the
       | difference between the leaders is usually not that big after a
       | while. The limiting factor is in the data, and how well you have
       | transformed/categorized that data, and all the different methods
       | of ML that get thrown at it all end up with similar looking
       | levels of accuracy.
       | 
       | There used to be a saying: "If you don't know how to do it, you
       | don't know how to do it with a computer." AI boosters sometimes
       | sound as if they are suggesting that this is no longer true.
       | They're incorrect. ML is, absolutely, a technique that a good
       | programmer should know about, and may sometimes wish to use, kind
       | of like knowing how a state machine works. It makes no great deal
       | of difference to how likely a business is to succeed.
        
         | 7532yahoogmail wrote:
         | Thank you for the perspective. Now when we talk machine
         | learning are we talking:
         | 
         | L. Pachter and B. Sturmfels. Algebraic Statistics for
         | Computational Biology. Cambridge University Press 2005.
         | 
         | G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics.
         | CRC Press, 2001. Drton, Mathias, Sturmfels, Bernd, Sullivant,
         | Seth. Lectures on Algebraic Statistics, Springer 2009.
         | 
         | Or more like:
         | 
         | Watanabe, Sumio. Algebraic Geometry and Statistical Learning
         | Theory, Cambridge University Press 2009.
         | 
         | My understanding (I do not do AI or machine learning) that AI
         | is distinct from these more mathematical analytic perspectives.
         | 
         | Finally, might we argue that generally AI/ML is more easily
         | suited to data that's already high quality eg. CERN data, trade
         | data, drug trial data as opposed to unconstrained data eg. Find
         | the buses in these 1MM jpegs?
        
       | aj7 wrote:
       | " Embrace services. There are huge opportunities to meet the
       | market where it stands. That may mean offering a full-stack
       | translation service rather than translation software or running a
       | taxi service rather than selling self-driving cars. Building
       | hybrid businesses is harder than pure software, but this approach
       | can provide deep insight into customer needs and yield fast-
       | growing, market-defining companies. Services can also be a great
       | tool to kickstart a company's go-to-market engine - see this post
       | for more on this - especially when selling complex and/or brand
       | new technology. The key is pursue one strategy in a committed
       | way, rather than supporting both software and services
       | customers."
       | 
       | Exactly wrong and contradicts most of the thesis of the article -
       | that AI often fails to achieve acceptable models because of the
       | individuality, finickiness, edge cases, and human involvement
       | needed to process customer data sets.
       | 
       | The key to profitability is for AI to be a component in a
       | proprietary software package, where the VENDOR studies,
       | determines, and limits the data sets and PRESCRIBES this to the
       | customer, choosing applications many customers agree upon. Edge
       | cases and cat-guacamole situations are detected and ejected, and
       | the AI forms a smaller, but critical efficiency enhancing
       | component of a larger system.
        
       | whoisjuan wrote:
       | An many times all these AI computations go into solving mundane
       | problems like "What's the likelihood of this Ad to perform well".
       | 
       | AI is so shiny that makes people want to jump as fast as they can
       | into that boat but a reasonable objective analysis shows that a
       | huge and not insignificant amount of software problems can still
       | be solved without relying on the "AI black box".
        
       | harias wrote:
       | >That's right; that's why a lone wolf like me, or a small team
       | can do as good or better a job than some firm with 100x the head
       | count and 100m in VC backing.
       | 
       | goes on to say
       | 
       | >I agree, but the hockey stick required for VC backing, and _the
       | army of Ph.D.s required to make it work_ doesn't really mix well
       | with those limited domains, which have a limited market.
       | 
       | Choose one?
       | 
       | Also assumes running your own data center to be easy. Some people
       | don't want to be up 24x7 monitoring their data center or to buy
       | hardware to accommodate the rare 10 minute peaks in usage.
        
         | detaro wrote:
         | > _Some people don 't want to be up 24x7 monitoring their data
         | center or to buy hardware to accommodate the rare 10 minute
         | peaks in usage._
         | 
         | Do you need that for training workloads, and what percentage of
         | a startups workload is training?
        
         | jjeaff wrote:
         | >rare 10 minute peaks
         | 
         | But is that really the use case here? I haven't worked in ML.
         | But I'm not seeing where you are going to need to handle a 10
         | minute spike that requires a whole datacenter.
         | 
         | A month's worth of a quad gpu instance on AWS could pay for a
         | server with similar capacity in a few months of usage.
         | 
         | And hardware is pretty resilient these days. Especially if you
         | co-locate it in a datacenter that handles all the internet and
         | power up time for you. And when something does go wrong, they
         | offer "magic hands" service to go swap out hardware for you.
         | Colocation is surprisingly cheap. As is leasing 'managed'
         | equipment.
        
         | icheishvili wrote:
         | I don't think these are necessarily contradictory. With
         | pytorch-transformers, you can use a full-blown BERT model like
         | the best in the world. And yet, to make this novel and
         | defensible, you would need to build on top of it and innovate
         | significantly, which would require significant capital to
         | achieve.
        
       | brundolf wrote:
       | > Training a single AI model can cost hundreds of thousands of
       | dollars (or more) in compute resources
       | 
       | Why don't they buy their own hardware for this part? The training
       | process doesn't need to be auto-scalable or failure-resistant or
       | distributed across the world. The value proposition of cloud
       | hosting doesn't seem to make sense here. Surely at this price the
       | answer isn't just "it's more convenient"?
        
         | KaiserPro wrote:
         | because you are trading speed for cash.
         | 
         | Say you have $8M in funding, and you need to train a model to
         | do x
         | 
         | You can either:
         | 
         | a) gain access to a system that scale ondemand and allows
         | instant, actionable results.
         | 
         | b) hire a infrastructure person, someone to write a K8s
         | deployment system. Another person to come in a throw that all
         | away. Another person to negotiate and buy the hardware, and
         | another to install it.
         | 
         | Option b is can be the cheapest in the long term, but it
         | carries the most risk of failing before you've even trained a
         | single model. It also costs time, and if speed to market is
         | your thing, then you're shit out of luck.
        
           | brundolf wrote:
           | Why in the world do you need a Kubernetes deployment system
           | to run a single, manual, one-time (or a handful of times),
           | high-compute job?
        
             | PeterisP wrote:
             | Because that high-compute job needs to be distributed on
             | many, many machines, and if you're using cheap preemptible
             | instances you have to handle machines dropping off and
             | joining in while you're running that single job.
             | 
             | It's definitely not something that you can launch manually
             | - perhaps Kubernetes is not the best solution, but you
             | definitely need some automation.
        
             | dsl wrote:
             | Because when all you have is a hammer, everything looks
             | like a nail.
             | 
             | We have become so DevOps and cloud dependent that everyone
             | has forgotten how to just run big systems cheaply and
             | efficiently.
        
         | GaryNumanVevo wrote:
         | If you're in a position where you need to train a large
         | network: first, I feel bad for you. second, you'll need
         | additional machines to train in a reasonable amount of time.
         | 
         | ML distributed training is all about increasing training
         | velocity and searching for good hyperparameters
        
       ___________________________________________________________________
       (page generated 2020-02-24 23:00 UTC)