[HN Gopher] It takes a lot of energy for machines to learn
       ___________________________________________________________________
        
       It takes a lot of energy for machines to learn
        
       Author : jocker12
       Score  : 71 points
       Date   : 2020-12-30 19:00 UTC (3 hours ago)
        
 (HTM) web link (theconversation.com)
 (TXT) w3m dump (theconversation.com)
        
       | ALittleLight wrote:
       | The article points out that the energy to train BERT is
       | comparable to a flight across America. There are hundreds of
       | thousands of flights every day - why should we be concerned about
       | the equivalent of one extra? Especially given that BERT improves
       | ~10% of English language Google search results, it seems like
       | we're getting a lot in return for relatively small energy use. On
       | top of that, Google buys 100% of their energy usage from Green
       | sources.
       | 
       | I think it's great to talk about other methods of training or
       | architectures that don't require so many parameters. The point
       | about how BERT consumes vastly more text than humans do when they
       | learn to read is interesting. But trying to phrase this like an
       | environmental issue just seems disingenuous and misleading.
        
         | Jabbles wrote:
         | Wait until the author discovers bitcoin's energy costs.
        
           | throwaway7281 wrote:
           | That's my #1 reason to be bearish on B$ - it's just a
           | complete environmental cluster-fuck. And in order to not see
           | this, you'll have to ignore a lot of facts - which in turn
           | tells me a lot about those inside the crypto-bubble, namely
           | that they do not care that much about facts.
        
             | trhway wrote:
             | at Starship's $100/kg placing your Eth mining $1000 GPU in
             | space will add only 10-20% when amortized over 100 ton rig
             | - it is comparable with building/buying your own small
             | powerplant what large crypto have done. AI and crypto are
             | exploding and only going more so. Granted, the planet is
             | becoming too small a confines for it. It seems that AI and
             | crypto will be the killer apps of space in near future.
        
               | mhh__ wrote:
               | How are you supposed to cool it in space?
        
               | hnuser123456 wrote:
               | Heatsinks optimized for radiative cooling, which are kept
               | in the shade by solar panels. Works much better when half
               | or more of your surroundings aren't a planet at human-
               | scale temperatures, as is the case on earth's surface.
        
               | wonnage wrote:
               | spacecraft already have trouble venting waste heat
               | without the burden of attempting to mine bitcoins
        
               | trhway wrote:
               | only because of very constrained weight budget (which is
               | constrained by the high price of delivery into space) so
               | you can't just throw in an AC with a radiator. The
               | radiative power scales with the 4th of T, so it is pretty
               | effective in space if you run the hot end of AC hot.
        
               | TeMPOraL wrote:
               | It'll be orders of magnitude cheaper to just make it
               | waterproof and sink it in a lake.
        
               | jbrot wrote:
               | Could you clarify why you think placing crypto rigs in
               | space is a good idea? To run a crypto rig, you need to
               | provide a lot of power and have a mechanism to keep the
               | processors cool. Being in space seems to make both of
               | these substantially more difficult (especially cooling
               | the electronics). It seems highly implausible to me that
               | crypto will be the "killer app" of space.
        
             | dheera wrote:
             | Shouldn't that be a reason to be bullish? Because if the
             | governments step in and enact regulation against Bitcoin
             | mining, that will increase its scarcity and therefore
             | value.
             | 
             | Likewise, if you think fossil fuels will be scarce in the
             | future, you can be bullish on the _price_ of gas, but not
             | bullish on things based on gas.
        
               | aaaxyz wrote:
               | The number of bitcoins available (disregarding
               | lost/forgotten ones) is fixed at 21M, and the rate at
               | which they are created is more or less constant (by
               | design), so anti-mining regulations will not increase
               | scarcity. If anything, restricting the network hash rate
               | will make the network _less_ secure and thus less
               | valuable
        
             | hnuser123456 wrote:
             | My #1 reason is that once someone figures out how to build
             | large, reliable quantum computers and program them in a
             | practical way and quickly break sha256, bitcoin might be
             | how we find out. That still seems to be a ways away though,
             | and in the meantime, having a mostly-unbreakable digital
             | ledger seems to be valuable.
        
               | Darkstryder wrote:
               | AFAIK a practical quantum computer would not enable any
               | practical attack against SHA256.
        
             | dcolkitt wrote:
             | The counterpoint to this I'll make is that all financial
             | systems engage in costly signaling.
             | 
             | If you visit a foreign country how do you know to trust
             | whether to trust a specific bank? You probably look at cues
             | like whether it occupies an expensive skyscraper in the
             | center of town? Do you see its ads around town? Does it
             | sponsor the local soccer team? All credible signals that
             | are hard to fake for a fly-by-night scammer.
             | 
             | All Bitcoin did was formalize this process. At any given
             | time there are many chains that all purport to be the
             | canonical history. How do you decide which one is
             | authentic? By looking at hard-to-fake signals. In this case
             | the accumulated hashing power behind the chain. Looking for
             | whoever spent the most hash work is fundamentally no
             | different than checking to see which bankers are wearing
             | the most expensive suits.
             | 
             | Any system with trusted intermediaries will waste resources
             | on costly signaling. The only question is whether crypto
             | mining is more of a fundamental waste than traditional
             | signals, like high-paid bankers and prestige real estate.
        
               | wonnage wrote:
               | a building has actual value as opposed to bitcoin simply
               | contributing to the eventual heat death of the universe
        
               | throwaway7281 wrote:
               | I don't know. I see the nice HSBC building in my city and
               | all I can think of is that half of the business of theirs
               | is crooked, money laundry, trafficking, drug cartels. No
               | shiny building can change what you actually do.
        
             | vmception wrote:
             | hey an educational moment!
             | 
             | the vast majority of bitcoin/crypto mining uses renewable
             | or otherwise wasted energy and this is the only economical
             | way to do it! you have been intentionally misinformed by
             | the _amount_ of energy used and not the _source_ of energy
             | used. It is completely a red herring to just read a
             | headline about how little electricity a tiny country uses
             | and that mining uses the same amount.
             | 
             | what? "wasted energy"? yeah a lot of energy is lost as it
             | cannot be transported to commercial and residential areas
             | economically. so miners set up processing at the source of
             | the energy and use it. in fact, a lot of it actually
             | reduces pollution creating the polar opposite of what you
             | believe, in those circumstances it is a sustainability
             | solution.
             | 
             | now aside from fighting me about your worldview, this is
             | also an area to remain vigilant about! nation states can
             | absolutely mine at a loss when they turn to competing for
             | control of the cryptocurrency networks for geopolitical
             | reasons. right now it is renewable.
             | 
             | but what about the hardware, the single purpose hardware
             | and e-waste? there is a large trade of "obsolete" hardware,
             | as it is economical to use for people with the cheapest
             | power. this is also another specific area to be vigilant
             | about, by making sure the infrastructure is in place to put
             | that hardware and keep it active, instead of landfills.
        
               | throwaway7281 wrote:
               | I can see that mining is a good incentive to look for
               | excess energy in all kinds of places and make use of it.
               | That's a capitalistic motiv, finding solutions to derived
               | issues, when something else is at the core of the
               | problem. But anyway.
               | 
               | B$ still looks ridiculous, let me explain.
               | 
               | * world electrical energy consumption: ~25000 TWh [1] *
               | bitcoin energy consumption index: ~77 TWh [2]
               | 
               | Bitcoin today consumes as much as 0.3% of the total
               | energy available on this planet.
               | 
               | That's fine, if half the world would use it and would do
               | meaningful things with it.
               | 
               | What is it actually used for most visibly?
               | 
               | As a betting ground with galactic momentum, a technology
               | promising to be the solution to everything, and just
               | outrageous claims that only the bovine left to be excited
               | about.
               | 
               | It's not that the algorithms and data structures are not
               | cool, the certainly are - but we can do so much more
               | today with technology than this.
               | 
               | [1] https://www.vgb.org/en/data_powergeneration.html?dfid
               | =98054 [2] https://digiconomist.net/bitcoin-energy-
               | consumption
        
         | acdc4life wrote:
         | My brain does nlp better than any system out there. I'm also
         | able to ride bicycles and do motor control better than Boston
         | Dynamics. I can also construct and prove math, do physics and
         | code all of this in Matlab and C. My brain can handle all these
         | wide range of tasks almost seamlessly with just 15 watts of
         | power, that Silicon Valley's super computers can barely do 1%
         | of.
        
           | juanbyrge wrote:
           | Brains also have the benefit of billions of years of
           | iterative development. Computers were only invented in the
           | last century. I would not be surprised if computers could
           | catch up given a few tens or hundred more years.
        
           | omgwtfbyobbq wrote:
           | To be fair, it takes years of energy to train our brains to
           | the point where it can do all those things, and our brains
           | aren't extensible in the same way hardware/software is. I
           | guess there's also a lot more variation in yields. ;)
        
           | bumby wrote:
           | To be a fair comparison wouldn't you have to include all the
           | energy used to train "your" brain through generations of
           | evolutionary training? Your latest model is like taking an
           | already trained BERT model and adding a few tweaks
        
           | oh_sigh wrote:
           | Yeah but your clone() function is very wasteful, and we can't
           | stick 1 million of you in a dark room that just looks at
           | people's google searches and figures out what they're really
           | going for.
           | 
           | And it's not the whole picture to say the brain only uses 15
           | watts - when there's all sorts of necessary support systems
           | that it couldn't run without. So it's closer to 100 watts
           | (2000kcal/day)
        
           | wongarsu wrote:
           | Comparing your pre-trained brain (that has a lot of structure
           | (=training) through evolution) with the training costs of a
           | new algorithm isn't really fair.
           | 
           | All in you require about 100W (2000kcal/day), maybe 3 times
           | that when doing a lot of physical activity. Boston Dynamic's
           | Spot uses about 400W. I can probably outperform it in some
           | disciplines while it would beat me in some others. That would
           | be a fair comparison.
        
         | spiznnx wrote:
         | BERT is much smaller than GPT-3 (500x fewer parameters), not
         | sure why the article doesn't point that out more explicitly.
         | 
         | The mentioned paper itself notes a 300,000x increase in compute
         | used for training language models over the last 6 years.
         | 
         | I think the point is, if nothing changes, the costs will be
         | very significant very soon.
        
           | qeternity wrote:
           | We're already at soon. Another order of magnitude increase
           | (which at our pace is all but guaranteed in the near term)
           | will make large scale training a significant R&D line item
           | even for the FAANGs. It will also push efforts to be
           | commercially viable, as you can longer do proof-of-concept
           | nets at $500m
        
           | PragmaticPulp wrote:
           | > I think the point is, if nothing changes, the costs will be
           | very significant very soon.
           | 
           | The energy efficiency of machine learning hardware is
           | progressing at a rapid rate. It's not accurate to assume that
           | the energy costs will stay the same. Just look at how much it
           | would have taken to train something like GPT-3 five years
           | ago.
        
             | _jal wrote:
             | I think it is reasonable to assume that any advances in
             | efficiency will be subsumed by increased use, much in the
             | same way increased energy efficiency doesn't decrease
             | absolute overall use.
        
         | acchow wrote:
         | > There are hundreds of thousands of flights every day
         | 
         | 2019 was peak flight with 38.9 million flights over the year.
         | That averages to 106k flights per day.
        
         | adamsea wrote:
         | Not sure why folks are defensive? The article is mainly
         | informational.
         | 
         | TFA:
         | 
         | "What does this mean for the future of AI research? Things may
         | not be as bleak as they look. The cost of training might come
         | down as more efficient training methods are invented.
         | Similarly, while data center energy use was predicted to
         | explode in recent years, this has not happened due to
         | improvements in data center efficiency, more efficient hardware
         | and cooling."
         | 
         | Also. One flight in and of itself isn't an environemntal
         | problem; thousands of flights are.
         | 
         | Training this one instance of a model isn't an environmental
         | problem; I would be curious to see some educated guesses about
         | the number of models being trained over the next ten years. Not
         | an expert but I know that use of ML is exploding - lots of new
         | use-cases and thus lots of new models.
         | 
         | So it makes sense to me think about this stuff.
         | 
         | To
        
           | ALittleLight wrote:
           | The article suggests that the energy usage of language models
           | is a problem. I don't think energy usage is a problem. I'm
           | not sure how you interpret this as being defensive.
           | 
           | There are hundreds of thousands of flights per day. Adding an
           | additional flight, or even an additional thousand flights, to
           | substantially improve Google doesn't seem like a big cost.
           | Consider: Would your life be worse if one random trans-
           | American flight was cancelled today, or if Google searches
           | became 10% worse?
           | 
           | Another way of thinking about this same point, is, why is the
           | author writing about language models if they are so concerned
           | about the environment. Surely the airline industry is a
           | better subject as, again, they fly hundreds of thousands of
           | flights each day. It's hard to take someone seriously when
           | they are focusing on an infinitesimal part of a huge problem.
           | 
           | It's also misleading because there is a difference between
           | energy used by an airline flight, where the energy comes from
           | burning jet fuel, and energy used in a data center. In
           | Google's case (the people training BERT) the energy used was
           | 100% renewable - Google reached that goal in 2017. Perhaps
           | Open AI didn't use renewable energy to train GPT-3, but I
           | wager they didn't power their machines by burning jet fuel
           | either.
           | 
           | Maybe electricity used to power to train language models will
           | become a meaningful issue at some point in the future. I
           | don't think that future is close at hand though.
        
         | sgt101 wrote:
         | Also, I have downloaded and used Bert in several other
         | applications; and I think that 10,000's of other folks have
         | done the same.
        
         | alisonkisk wrote:
         | Google buying "green" isn't a justification because it fails
         | the categorical imperative. It's impossible for everyone to buy
         | "green" energy.
        
           | toomuchtodo wrote:
           | As always, there's nuance. You can schedule workloads,
           | machine learning/training included, where the power is
           | cleanest (low carbon). Google relies on
           | electricitymap.org/Tomorrow to do this.
           | 
           | https://blog.google/inside-google/infrastructure/data-
           | center...
           | 
           | The more folks who elect to pick where to compute based on
           | low electrical carbon intensity, the faster the grid turns
           | over to clean generation. You must vote with your fiat. I
           | encourage technologists to include this consideration in
           | their workload scheduling requirements. Renewables are almost
           | always cheaper than fossil generation as well.
        
           | curiousllama wrote:
           | The categorical imperative doesn't require everyone to be
           | practically able to do the thing. Is it not moral to feed a
           | starving man because people in China aren't able to feed him?
        
           | rictic wrote:
           | There's more than enough solar and wind, it's just a matter
           | of how much we as a society want it. The green energy
           | revolution has barely begun, and we're still climbing the
           | learning curve. More demand means a faster climb means
           | exponentially more green energy sooner.
        
       | firebaze wrote:
       | This is beyond ridiculous, even if compared to the energy to
       | train a human mind at the age of, let's say, 35. Raise a child to
       | the age of 35 and the spent energy will surpass the mentioned
       | energy cost by at least 2 orders of magnitude.
       | 
       | And no, I don't ridicule AI ethics researchers: the energy cost
       | of AI is negligible to the serious issues. The real harm stems
       | from other areas.
        
         | MAXPOOL wrote:
         | Human mind (aka brain) has consumed little over 6000 kWh energy
         | when it's 35.
         | 
         | From the paper they cite                   algorithm    kWh-PUE
         | ----------   -------         ELMo             275
         | BERT(base)     1,507          NAS          656,347
         | 
         | NAS was Transformer (213M parameters) for neural architecture
         | search 2019. They didn't have kWh numbers for GPT-2.
        
           | sbierwagen wrote:
           | I don't know about that 6000 KWh number.
           | 
           | More importantly, a human brain isn't trained from scratch at
           | birth. It's had a few hundred million years of pretraining.
        
             | wongarsu wrote:
             | The 6000kWh number is assuming 20W over 35 years. That
             | might be right if you only count the brain, but then you
             | would have to revise the neural network numbers to only
             | count CPU and memory, which would easily halve those
             | numbers (fans take a lot of power).
             | 
             | If we want to account for power supply, temperature control
             | etc. we could use the entire calorie count of a human over
             | 35 years, leaving us with around 30000kWh (assuming a
             | health, not-overweight human). You could now argue that
             | that's too high (the human is doing a lot of other things).
             | But as you pointed out it's a bit of a pointless comparison
             | anyways as humans don't start from scratch.
        
         | goatlover wrote:
         | But that person would be learning tons of other things as well.
         | It's not equivalent.
        
           | firebaze wrote:
           | That person would probably be one of a billion. The AI is
           | singular. This should cancel out.
        
       | pelasaco wrote:
       | It still takes less energy than raise humans and educate them to
       | do mediocre jobs.
        
         | goatlover wrote:
         | Plenty of humans are already raised, and can do other things
         | machines can't.
        
           | pelasaco wrote:
           | plenty of humans are raised but not educated. So let's
           | educate them to do the other things that machines can't.
        
         | joe_the_user wrote:
         | There are very few "typical human tasks" that machine learning
         | actually does better than a person at; playing game all the
         | occurs to me. But driving, image description, language-
         | translation, and so-forth, at the scale that a human does it,
         | the human is generally still considered to do tremendously
         | better. Of course, some ML programs are convenient in being
         | able to do things at a large scale but a majority of what ML
         | programs do is like language translation - not great and what
         | we sometimes live with 'cause it's free.
         | 
         | Basically, one should really not discount human ability,
         | especially in tasks considered mundane. Such skills often are
         | often considered "mediocre" not because they're actually easy
         | but because all human can do them, see:
         | 
         | https://en.wikipedia.org/wiki/Moravec%27s_paradox
        
           | mortehu wrote:
           | ML is usually better at speed. For example I use a fine tuned
           | GPT-2 for code autocomplete in vim, and even though humans
           | could do better, they can't do it as fast.
        
             | pelasaco wrote:
             | The same was reported last year. ML could offload lawyers,
             | doing many tasks done by them. Not better, but faster and
             | for sure, spending less energy.
        
       | optimalsolver wrote:
       | Didn't pointing this out get that Google lady fired?
        
       | beervirus wrote:
       | > This month, Google forced out a prominent AI ethics researcher
       | after she voiced frustration with the company for making her
       | withdraw a research paper. The paper pointed out the risks of
       | language-processing artificial intelligence, the type used in
       | Google Search and other text analysis products.
       | 
       | That is... certainly one way of putting things.
        
       | ur-whale wrote:
       | Same BS that got started the whole Gebru affair at Google.
        
       | HenryKissinger wrote:
       | I just want to say that the human brain doesn't need to perform
       | billions of matrix products to recognize a dog.
       | 
       | The best AI will always be a meaty human.
        
       | jefftk wrote:
       | _> One recent model called Bidirectional Encoder Representations
       | from Transformers (BERT) used 3.3 billion words from English
       | books and Wikipedia articles. Moreover, during training BERT read
       | this data set not once, but 40 times. To compare, an average
       | child learning to talk might hear 45 million words by age five,
       | 3,000 times fewer than BERT._
       | 
       | The human brain is the output of an incredible number of
       | generations of training, representing a vast consumption of
       | energy. Most of the learning that informs the brain happened
       | before this hypothetical five-year-old was even born.
        
         | ravi-delia wrote:
         | That is a matter of great debate. Although most people's brains
         | wind up settling on basically identical structure over time,
         | case studies with injured or disabled people often display
         | incredible adaptability (ie the occipital lobe in a blind
         | person). If indeed the brain learns most of what it knows from
         | experience, than its (comparatively) low energy consumption
         | would come from greater efficiency. That definitely seems
         | possible; as a basic unit of machine learning the neuron is
         | much more specialized than the transistor, and far slower.
        
           | anaphor wrote:
           | My interpretation of what the parent meant is that the
           | cognitive processes which support language acquisition (which
           | may be mostly domain-general) are already optimized for this
           | task, for which there is a sensitive period during certain
           | years where you need to be exposed to certain inputs, or else
           | you never fully acquire language. See:
           | https://en.wikipedia.org/wiki/Genie_(feral_child)
        
         | MAXPOOL wrote:
         | 550-600 million years of hyperparameter tuning and neural
         | architecture search using an evolutionary algorithm is nothing
         | to laugh at.
         | 
         | But the actual learning process of a single brain uses very
         | little training examples and is very energy efficient. A brain
         | that uses only 20 watts of power.
        
       | syntaxing wrote:
       | It's still crazy to think about how much more efficient
       | computational power is nowadays compared to even five years ago.
       | I remembered I changed my power supply in anticipation for
       | Nvidias then newest GTX series. But only to learn that a 1050Ti
       | only had a 75W max draw so the new fancy 1kW PSU was completely
       | unnecessary. Also to put into perspective, my Mac mini has a 30W
       | max draw. 30W!!! It's ridiculous how little energy it draws.
        
         | trthomps wrote:
         | You want your mind to really be blown, your brain manages to do
         | everything it does on around 20W, even less than that mac mini.
        
       | cosmolev wrote:
       | Machine learning is essentially a brute force. No surprice it is
       | not efficient.
        
       | lightgreen wrote:
       | > Among the risks is the large carbon footprint of developing
       | this kind of AI technology.
       | 
       | No there are no risks of carbon footprint. Energy is cheap. Just
       | build nuclear power plants.
       | 
       | Or just build solar and wind and train models half of the time if
       | one is paranoid about nuclear power plants safety.
        
       | Der_Einzige wrote:
       | Fine tuning BERT (what most people in practice do) is so much
       | more efficient. You can do it in 8 hours on a 2080ti.
       | 
       | They need to mention that the total number of people training
       | large transformers from scratch is very very small. If wager that
       | the total number of different, uniquely trained (not using
       | previous weights - which reduces compute necessity by massive
       | amounts) language models in existence is in the low hundreds
       | 
       | I'd claim that these mass language models serving as the
       | underlying encoding backbone behind more specific systems
       | actually save energy and compute compared to the previous methods
       | (needing far more data and thus more energy spent on getting it
       | combined with less efficient representations like tf-idf causing
       | many classifers to perform very slowly and thus burn lots of
       | energy)
       | 
       | Also, much of the recent research in this field is about model
       | pruning, quantization, and any technique you can imagine to
       | reduce training and inference time or memory requirements.
       | 
       | All in all, big language models are a net positive for the
       | environment. The effeciency gains in any number of fields from
       | increasingly sophisticated NLP systems far, far outweighs the
       | costs or of training them. Foundational research in environmental
       | conservation will be accelerated by effective NLP semantic search
       | and question answering systems. That's a single, tiny example of
       | the potential for benefits from large language models.
       | 
       | Pick a better target.
        
       ___________________________________________________________________
       (page generated 2020-12-30 23:00 UTC)