[HN Gopher] It takes a lot of energy for machines to learn ___________________________________________________________________ It takes a lot of energy for machines to learn Author : jocker12 Score : 71 points Date : 2020-12-30 19:00 UTC (3 hours ago) (HTM) web link (theconversation.com) (TXT) w3m dump (theconversation.com) | ALittleLight wrote: | The article points out that the energy to train BERT is | comparable to a flight across America. There are hundreds of | thousands of flights every day - why should we be concerned about | the equivalent of one extra? Especially given that BERT improves | ~10% of English language Google search results, it seems like | we're getting a lot in return for relatively small energy use. On | top of that, Google buys 100% of their energy usage from Green | sources. | | I think it's great to talk about other methods of training or | architectures that don't require so many parameters. The point | about how BERT consumes vastly more text than humans do when they | learn to read is interesting. But trying to phrase this like an | environmental issue just seems disingenuous and misleading. | Jabbles wrote: | Wait until the author discovers bitcoin's energy costs. | throwaway7281 wrote: | That's my #1 reason to be bearish on B$ - it's just a | complete environmental cluster-fuck. And in order to not see | this, you'll have to ignore a lot of facts - which in turn | tells me a lot about those inside the crypto-bubble, namely | that they do not care that much about facts. | trhway wrote: | at Starship's $100/kg placing your Eth mining $1000 GPU in | space will add only 10-20% when amortized over 100 ton rig | - it is comparable with building/buying your own small | powerplant what large crypto have done. AI and crypto are | exploding and only going more so. Granted, the planet is | becoming too small a confines for it. It seems that AI and | crypto will be the killer apps of space in near future. | mhh__ wrote: | How are you supposed to cool it in space? | hnuser123456 wrote: | Heatsinks optimized for radiative cooling, which are kept | in the shade by solar panels. Works much better when half | or more of your surroundings aren't a planet at human- | scale temperatures, as is the case on earth's surface. | wonnage wrote: | spacecraft already have trouble venting waste heat | without the burden of attempting to mine bitcoins | trhway wrote: | only because of very constrained weight budget (which is | constrained by the high price of delivery into space) so | you can't just throw in an AC with a radiator. The | radiative power scales with the 4th of T, so it is pretty | effective in space if you run the hot end of AC hot. | TeMPOraL wrote: | It'll be orders of magnitude cheaper to just make it | waterproof and sink it in a lake. | jbrot wrote: | Could you clarify why you think placing crypto rigs in | space is a good idea? To run a crypto rig, you need to | provide a lot of power and have a mechanism to keep the | processors cool. Being in space seems to make both of | these substantially more difficult (especially cooling | the electronics). It seems highly implausible to me that | crypto will be the "killer app" of space. | dheera wrote: | Shouldn't that be a reason to be bullish? Because if the | governments step in and enact regulation against Bitcoin | mining, that will increase its scarcity and therefore | value. | | Likewise, if you think fossil fuels will be scarce in the | future, you can be bullish on the _price_ of gas, but not | bullish on things based on gas. | aaaxyz wrote: | The number of bitcoins available (disregarding | lost/forgotten ones) is fixed at 21M, and the rate at | which they are created is more or less constant (by | design), so anti-mining regulations will not increase | scarcity. If anything, restricting the network hash rate | will make the network _less_ secure and thus less | valuable | hnuser123456 wrote: | My #1 reason is that once someone figures out how to build | large, reliable quantum computers and program them in a | practical way and quickly break sha256, bitcoin might be | how we find out. That still seems to be a ways away though, | and in the meantime, having a mostly-unbreakable digital | ledger seems to be valuable. | Darkstryder wrote: | AFAIK a practical quantum computer would not enable any | practical attack against SHA256. | dcolkitt wrote: | The counterpoint to this I'll make is that all financial | systems engage in costly signaling. | | If you visit a foreign country how do you know to trust | whether to trust a specific bank? You probably look at cues | like whether it occupies an expensive skyscraper in the | center of town? Do you see its ads around town? Does it | sponsor the local soccer team? All credible signals that | are hard to fake for a fly-by-night scammer. | | All Bitcoin did was formalize this process. At any given | time there are many chains that all purport to be the | canonical history. How do you decide which one is | authentic? By looking at hard-to-fake signals. In this case | the accumulated hashing power behind the chain. Looking for | whoever spent the most hash work is fundamentally no | different than checking to see which bankers are wearing | the most expensive suits. | | Any system with trusted intermediaries will waste resources | on costly signaling. The only question is whether crypto | mining is more of a fundamental waste than traditional | signals, like high-paid bankers and prestige real estate. | wonnage wrote: | a building has actual value as opposed to bitcoin simply | contributing to the eventual heat death of the universe | throwaway7281 wrote: | I don't know. I see the nice HSBC building in my city and | all I can think of is that half of the business of theirs | is crooked, money laundry, trafficking, drug cartels. No | shiny building can change what you actually do. | vmception wrote: | hey an educational moment! | | the vast majority of bitcoin/crypto mining uses renewable | or otherwise wasted energy and this is the only economical | way to do it! you have been intentionally misinformed by | the _amount_ of energy used and not the _source_ of energy | used. It is completely a red herring to just read a | headline about how little electricity a tiny country uses | and that mining uses the same amount. | | what? "wasted energy"? yeah a lot of energy is lost as it | cannot be transported to commercial and residential areas | economically. so miners set up processing at the source of | the energy and use it. in fact, a lot of it actually | reduces pollution creating the polar opposite of what you | believe, in those circumstances it is a sustainability | solution. | | now aside from fighting me about your worldview, this is | also an area to remain vigilant about! nation states can | absolutely mine at a loss when they turn to competing for | control of the cryptocurrency networks for geopolitical | reasons. right now it is renewable. | | but what about the hardware, the single purpose hardware | and e-waste? there is a large trade of "obsolete" hardware, | as it is economical to use for people with the cheapest | power. this is also another specific area to be vigilant | about, by making sure the infrastructure is in place to put | that hardware and keep it active, instead of landfills. | throwaway7281 wrote: | I can see that mining is a good incentive to look for | excess energy in all kinds of places and make use of it. | That's a capitalistic motiv, finding solutions to derived | issues, when something else is at the core of the | problem. But anyway. | | B$ still looks ridiculous, let me explain. | | * world electrical energy consumption: ~25000 TWh [1] * | bitcoin energy consumption index: ~77 TWh [2] | | Bitcoin today consumes as much as 0.3% of the total | energy available on this planet. | | That's fine, if half the world would use it and would do | meaningful things with it. | | What is it actually used for most visibly? | | As a betting ground with galactic momentum, a technology | promising to be the solution to everything, and just | outrageous claims that only the bovine left to be excited | about. | | It's not that the algorithms and data structures are not | cool, the certainly are - but we can do so much more | today with technology than this. | | [1] https://www.vgb.org/en/data_powergeneration.html?dfid | =98054 [2] https://digiconomist.net/bitcoin-energy- | consumption | acdc4life wrote: | My brain does nlp better than any system out there. I'm also | able to ride bicycles and do motor control better than Boston | Dynamics. I can also construct and prove math, do physics and | code all of this in Matlab and C. My brain can handle all these | wide range of tasks almost seamlessly with just 15 watts of | power, that Silicon Valley's super computers can barely do 1% | of. | juanbyrge wrote: | Brains also have the benefit of billions of years of | iterative development. Computers were only invented in the | last century. I would not be surprised if computers could | catch up given a few tens or hundred more years. | omgwtfbyobbq wrote: | To be fair, it takes years of energy to train our brains to | the point where it can do all those things, and our brains | aren't extensible in the same way hardware/software is. I | guess there's also a lot more variation in yields. ;) | bumby wrote: | To be a fair comparison wouldn't you have to include all the | energy used to train "your" brain through generations of | evolutionary training? Your latest model is like taking an | already trained BERT model and adding a few tweaks | oh_sigh wrote: | Yeah but your clone() function is very wasteful, and we can't | stick 1 million of you in a dark room that just looks at | people's google searches and figures out what they're really | going for. | | And it's not the whole picture to say the brain only uses 15 | watts - when there's all sorts of necessary support systems | that it couldn't run without. So it's closer to 100 watts | (2000kcal/day) | wongarsu wrote: | Comparing your pre-trained brain (that has a lot of structure | (=training) through evolution) with the training costs of a | new algorithm isn't really fair. | | All in you require about 100W (2000kcal/day), maybe 3 times | that when doing a lot of physical activity. Boston Dynamic's | Spot uses about 400W. I can probably outperform it in some | disciplines while it would beat me in some others. That would | be a fair comparison. | spiznnx wrote: | BERT is much smaller than GPT-3 (500x fewer parameters), not | sure why the article doesn't point that out more explicitly. | | The mentioned paper itself notes a 300,000x increase in compute | used for training language models over the last 6 years. | | I think the point is, if nothing changes, the costs will be | very significant very soon. | qeternity wrote: | We're already at soon. Another order of magnitude increase | (which at our pace is all but guaranteed in the near term) | will make large scale training a significant R&D line item | even for the FAANGs. It will also push efforts to be | commercially viable, as you can longer do proof-of-concept | nets at $500m | PragmaticPulp wrote: | > I think the point is, if nothing changes, the costs will be | very significant very soon. | | The energy efficiency of machine learning hardware is | progressing at a rapid rate. It's not accurate to assume that | the energy costs will stay the same. Just look at how much it | would have taken to train something like GPT-3 five years | ago. | _jal wrote: | I think it is reasonable to assume that any advances in | efficiency will be subsumed by increased use, much in the | same way increased energy efficiency doesn't decrease | absolute overall use. | acchow wrote: | > There are hundreds of thousands of flights every day | | 2019 was peak flight with 38.9 million flights over the year. | That averages to 106k flights per day. | adamsea wrote: | Not sure why folks are defensive? The article is mainly | informational. | | TFA: | | "What does this mean for the future of AI research? Things may | not be as bleak as they look. The cost of training might come | down as more efficient training methods are invented. | Similarly, while data center energy use was predicted to | explode in recent years, this has not happened due to | improvements in data center efficiency, more efficient hardware | and cooling." | | Also. One flight in and of itself isn't an environemntal | problem; thousands of flights are. | | Training this one instance of a model isn't an environmental | problem; I would be curious to see some educated guesses about | the number of models being trained over the next ten years. Not | an expert but I know that use of ML is exploding - lots of new | use-cases and thus lots of new models. | | So it makes sense to me think about this stuff. | | To | ALittleLight wrote: | The article suggests that the energy usage of language models | is a problem. I don't think energy usage is a problem. I'm | not sure how you interpret this as being defensive. | | There are hundreds of thousands of flights per day. Adding an | additional flight, or even an additional thousand flights, to | substantially improve Google doesn't seem like a big cost. | Consider: Would your life be worse if one random trans- | American flight was cancelled today, or if Google searches | became 10% worse? | | Another way of thinking about this same point, is, why is the | author writing about language models if they are so concerned | about the environment. Surely the airline industry is a | better subject as, again, they fly hundreds of thousands of | flights each day. It's hard to take someone seriously when | they are focusing on an infinitesimal part of a huge problem. | | It's also misleading because there is a difference between | energy used by an airline flight, where the energy comes from | burning jet fuel, and energy used in a data center. In | Google's case (the people training BERT) the energy used was | 100% renewable - Google reached that goal in 2017. Perhaps | Open AI didn't use renewable energy to train GPT-3, but I | wager they didn't power their machines by burning jet fuel | either. | | Maybe electricity used to power to train language models will | become a meaningful issue at some point in the future. I | don't think that future is close at hand though. | sgt101 wrote: | Also, I have downloaded and used Bert in several other | applications; and I think that 10,000's of other folks have | done the same. | alisonkisk wrote: | Google buying "green" isn't a justification because it fails | the categorical imperative. It's impossible for everyone to buy | "green" energy. | toomuchtodo wrote: | As always, there's nuance. You can schedule workloads, | machine learning/training included, where the power is | cleanest (low carbon). Google relies on | electricitymap.org/Tomorrow to do this. | | https://blog.google/inside-google/infrastructure/data- | center... | | The more folks who elect to pick where to compute based on | low electrical carbon intensity, the faster the grid turns | over to clean generation. You must vote with your fiat. I | encourage technologists to include this consideration in | their workload scheduling requirements. Renewables are almost | always cheaper than fossil generation as well. | curiousllama wrote: | The categorical imperative doesn't require everyone to be | practically able to do the thing. Is it not moral to feed a | starving man because people in China aren't able to feed him? | rictic wrote: | There's more than enough solar and wind, it's just a matter | of how much we as a society want it. The green energy | revolution has barely begun, and we're still climbing the | learning curve. More demand means a faster climb means | exponentially more green energy sooner. | firebaze wrote: | This is beyond ridiculous, even if compared to the energy to | train a human mind at the age of, let's say, 35. Raise a child to | the age of 35 and the spent energy will surpass the mentioned | energy cost by at least 2 orders of magnitude. | | And no, I don't ridicule AI ethics researchers: the energy cost | of AI is negligible to the serious issues. The real harm stems | from other areas. | MAXPOOL wrote: | Human mind (aka brain) has consumed little over 6000 kWh energy | when it's 35. | | From the paper they cite algorithm kWh-PUE | ---------- ------- ELMo 275 | BERT(base) 1,507 NAS 656,347 | | NAS was Transformer (213M parameters) for neural architecture | search 2019. They didn't have kWh numbers for GPT-2. | sbierwagen wrote: | I don't know about that 6000 KWh number. | | More importantly, a human brain isn't trained from scratch at | birth. It's had a few hundred million years of pretraining. | wongarsu wrote: | The 6000kWh number is assuming 20W over 35 years. That | might be right if you only count the brain, but then you | would have to revise the neural network numbers to only | count CPU and memory, which would easily halve those | numbers (fans take a lot of power). | | If we want to account for power supply, temperature control | etc. we could use the entire calorie count of a human over | 35 years, leaving us with around 30000kWh (assuming a | health, not-overweight human). You could now argue that | that's too high (the human is doing a lot of other things). | But as you pointed out it's a bit of a pointless comparison | anyways as humans don't start from scratch. | goatlover wrote: | But that person would be learning tons of other things as well. | It's not equivalent. | firebaze wrote: | That person would probably be one of a billion. The AI is | singular. This should cancel out. | pelasaco wrote: | It still takes less energy than raise humans and educate them to | do mediocre jobs. | goatlover wrote: | Plenty of humans are already raised, and can do other things | machines can't. | pelasaco wrote: | plenty of humans are raised but not educated. So let's | educate them to do the other things that machines can't. | joe_the_user wrote: | There are very few "typical human tasks" that machine learning | actually does better than a person at; playing game all the | occurs to me. But driving, image description, language- | translation, and so-forth, at the scale that a human does it, | the human is generally still considered to do tremendously | better. Of course, some ML programs are convenient in being | able to do things at a large scale but a majority of what ML | programs do is like language translation - not great and what | we sometimes live with 'cause it's free. | | Basically, one should really not discount human ability, | especially in tasks considered mundane. Such skills often are | often considered "mediocre" not because they're actually easy | but because all human can do them, see: | | https://en.wikipedia.org/wiki/Moravec%27s_paradox | mortehu wrote: | ML is usually better at speed. For example I use a fine tuned | GPT-2 for code autocomplete in vim, and even though humans | could do better, they can't do it as fast. | pelasaco wrote: | The same was reported last year. ML could offload lawyers, | doing many tasks done by them. Not better, but faster and | for sure, spending less energy. | optimalsolver wrote: | Didn't pointing this out get that Google lady fired? | beervirus wrote: | > This month, Google forced out a prominent AI ethics researcher | after she voiced frustration with the company for making her | withdraw a research paper. The paper pointed out the risks of | language-processing artificial intelligence, the type used in | Google Search and other text analysis products. | | That is... certainly one way of putting things. | ur-whale wrote: | Same BS that got started the whole Gebru affair at Google. | HenryKissinger wrote: | I just want to say that the human brain doesn't need to perform | billions of matrix products to recognize a dog. | | The best AI will always be a meaty human. | jefftk wrote: | _> One recent model called Bidirectional Encoder Representations | from Transformers (BERT) used 3.3 billion words from English | books and Wikipedia articles. Moreover, during training BERT read | this data set not once, but 40 times. To compare, an average | child learning to talk might hear 45 million words by age five, | 3,000 times fewer than BERT._ | | The human brain is the output of an incredible number of | generations of training, representing a vast consumption of | energy. Most of the learning that informs the brain happened | before this hypothetical five-year-old was even born. | ravi-delia wrote: | That is a matter of great debate. Although most people's brains | wind up settling on basically identical structure over time, | case studies with injured or disabled people often display | incredible adaptability (ie the occipital lobe in a blind | person). If indeed the brain learns most of what it knows from | experience, than its (comparatively) low energy consumption | would come from greater efficiency. That definitely seems | possible; as a basic unit of machine learning the neuron is | much more specialized than the transistor, and far slower. | anaphor wrote: | My interpretation of what the parent meant is that the | cognitive processes which support language acquisition (which | may be mostly domain-general) are already optimized for this | task, for which there is a sensitive period during certain | years where you need to be exposed to certain inputs, or else | you never fully acquire language. See: | https://en.wikipedia.org/wiki/Genie_(feral_child) | MAXPOOL wrote: | 550-600 million years of hyperparameter tuning and neural | architecture search using an evolutionary algorithm is nothing | to laugh at. | | But the actual learning process of a single brain uses very | little training examples and is very energy efficient. A brain | that uses only 20 watts of power. | syntaxing wrote: | It's still crazy to think about how much more efficient | computational power is nowadays compared to even five years ago. | I remembered I changed my power supply in anticipation for | Nvidias then newest GTX series. But only to learn that a 1050Ti | only had a 75W max draw so the new fancy 1kW PSU was completely | unnecessary. Also to put into perspective, my Mac mini has a 30W | max draw. 30W!!! It's ridiculous how little energy it draws. | trthomps wrote: | You want your mind to really be blown, your brain manages to do | everything it does on around 20W, even less than that mac mini. | cosmolev wrote: | Machine learning is essentially a brute force. No surprice it is | not efficient. | lightgreen wrote: | > Among the risks is the large carbon footprint of developing | this kind of AI technology. | | No there are no risks of carbon footprint. Energy is cheap. Just | build nuclear power plants. | | Or just build solar and wind and train models half of the time if | one is paranoid about nuclear power plants safety. | Der_Einzige wrote: | Fine tuning BERT (what most people in practice do) is so much | more efficient. You can do it in 8 hours on a 2080ti. | | They need to mention that the total number of people training | large transformers from scratch is very very small. If wager that | the total number of different, uniquely trained (not using | previous weights - which reduces compute necessity by massive | amounts) language models in existence is in the low hundreds | | I'd claim that these mass language models serving as the | underlying encoding backbone behind more specific systems | actually save energy and compute compared to the previous methods | (needing far more data and thus more energy spent on getting it | combined with less efficient representations like tf-idf causing | many classifers to perform very slowly and thus burn lots of | energy) | | Also, much of the recent research in this field is about model | pruning, quantization, and any technique you can imagine to | reduce training and inference time or memory requirements. | | All in all, big language models are a net positive for the | environment. The effeciency gains in any number of fields from | increasingly sophisticated NLP systems far, far outweighs the | costs or of training them. Foundational research in environmental | conservation will be accelerated by effective NLP semantic search | and question answering systems. That's a single, tiny example of | the potential for benefits from large language models. | | Pick a better target. ___________________________________________________________________ (page generated 2020-12-30 23:00 UTC)