[HN Gopher] GPT-3 is no longer the only game in town
       ___________________________________________________________________
        
       GPT-3 is no longer the only game in town
        
       Author : sebg
       Score  : 217 points
       Date   : 2021-11-07 14:53 UTC (8 hours ago)
        
 (HTM) web link (lastweekin.ai)
 (TXT) w3m dump (lastweekin.ai)
        
       | supperburg wrote:
       | I have supported my beliefs on this topic in these threads to the
       | point of exhausting myself. The tools that we use to find these
       | agents are the underpinning of AGI, it's coming way faster than
       | even most people here appreciate, this development is
       | intrinsically against the interest of human beings. Please stop
       | and think, please.
        
         | Simon321 wrote:
         | I argue it's very much in the interest of human beings. It has
         | been since we first picked up a rock and used it has a hammer.
         | It's the ultimate tool and has the potential to bring unseen
         | prosperity.
        
           | supperburg wrote:
           | It won't. You're wrong.this is the perfect illustration. You
           | think a rock is good therefore AI is good. You're just
           | unbelievably wrong.
        
         | [deleted]
        
       | eunos wrote:
       | Number of parameters aside, I am really surprised that we havent
       | yet reached hundreds of TB of training data. Especially Chinese
       | model only used less than 10 TB of data.
        
       | visarga wrote:
       | The GPT-3 family is still to expensive to use, too big to fit in
       | memory on a single machine. Prices need to come down before large
       | scale adoption or someone needs to invent the chip to hold it
       | (cheaply).
       | 
       | The most exciting part about it is showing us there is a path
       | forward by scaling and prompting, but you can still do much
       | better with a smaller model and a bit of training data (which can
       | come from the expensive GPT-3 as well).
       | 
       | What I expected from the next generation: multi-modality, larger
       | context, using retrieval to augment input data with fresh
       | information, tuned to solve thousands of tasks with supervised
       | data so it can generalize on new tasks better, and some efficient
       | way to keep it up to date and fine-tune it. On the data part -
       | more data, more languages - a lot of work.
        
       | worik wrote:
       | The underlying methods seem impractical. GPT-n are an existence
       | proof - it is possible to make parrot like software that
       | generates realistic text. But using these methods it is not
       | practical.
       | 
       | Maybe that is a good thing, maybe a bad thing, but unless there
       | is a breakthrough in methods this is a dead end. Impressive
       | though.
        
         | WithinReason wrote:
         | https://copilot.github.com/
        
       | mrbukkake wrote:
       | Can anyone tell me what the value of GPT-3 actually is other than
       | generating meaningless prose? What would a business use it for
        
         | phone8675309 wrote:
         | It's good for the university-industrial-business complex -
         | people writing papers about a model they can't even run
         | themselves. It practically prints money in journal articles,
         | travel per diem, and conference honorariam, not even counting
         | the per-API call rates.
        
         | DeathArrow wrote:
         | >What would a business use it for
         | 
         | If you think about business uses you can actually get advices
         | from Jerome Powell, simulated by GPT-3.
         | 
         | If someone use GPT-3 to simulate Warren Buffet, he can extract
         | even more value.
         | 
         | https://www.institutionalinvestor.com/article/b1tktmhcfdyqsk...
        
           | mrbukkake wrote:
           | I can't tell whether or not this article is parody... is this
           | a new kind of turing test
        
           | vitus wrote:
           | Somehow I don't think that this is quite how Jerome Powell
           | would respond in an interview:
           | 
           | > Interviewer: How do you think a global pandemic would
           | impact the financial system?
           | 
           | > Mr. Powell: A pandemic will have impact on the economy. It
           | will have an impact on the financial system.
           | 
           | > Interviewer: What would the impact be?
           | 
           | > Mr. Powell: If a major pandemic occurred, the economy would
           | be affected, and the financial system would be affected.
           | 
           | Yes, GPT-3 can provide a convincing chatbot, but it shouldn't
           | be confused for domain expertise.
        
             | robbedpeter wrote:
             | This is a bad example. Here's an output from gpt-j-6b by
             | EleutherAI: Prompt( Jerome Powell was interviewed about the
             | impact of a pandemic on the economy.
             | 
             | > > Interviewer: What would the impact be? > Mr. Powell: If
             | a major pandemic occurred)
             | 
             | Output( it could be very disruptive. > > Interviewer: How
             | disruptive? > Mr. Powell: Well, it could be disruptive in a
             | number of ways. > > Interviewer: How so? > Mr. Powell:
             | Well, one of the first things that would be disrupted would
             | be the supply chain. )
             | 
             | Using prompts well makes a huge difference.
             | 
             | If you parse the generated output, classify it, then
             | develop a decision tree that uses further prompts to refine
             | the response, you can get more sophisticated, valuable
             | responses.
             | 
             | The output in the parent is comparable to an off-the-cuff
             | interview response. If you emulate a deeper thought
             | process, you can get more meaningful output, and if you use
             | the right prompts, you can access the semantic networks in
             | the model related to your domain of interest.
        
               | notahacker wrote:
               | I think the "bad example" is actually the good one,
               | because it's a reminder that actually you're not getting
               | business advice from someone with Warren Buffet or Jerome
               | Powell's understanding of the economy, you're getting
               | text generated by analysing patterns in other not-
               | necessarily-applicable text. If you start forcing it in
               | very specific directions you start getting text that
               | summarises the commentary in the corpus, but most of that
               | commentary doesn't come from Warren Buffet or Jerome
               | Powell and isn't applicable to the future you're asking
               | it about...
        
           | sva_ wrote:
           | _> Interviewer: Are you in favor of a carbon tax?
           | 
           | > Mr. Powell: I don't want to get into the details of taxes.
           | 
           | > Interviewer: Are you in favor of a cap and trade system?
           | 
           | > Mr. Powell: I don't want to get into the details of a cap
           | and trade system.
           | 
           | > Interviewer: How do you think a global pandemic would
           | impact the financial system?
           | 
           | > Mr. Powell: A pandemic will have impact on the economy. It
           | will have an impact on the financial system.
           | 
           | > Interviewer: What would the impact be?
           | 
           | > Mr. Powell: If a major pandemic occurred, the economy would
           | be affected, and the financial system would be affected._
           | 
           | Maybe I'm a bit harsh on GPT-3, but I'm not nearly as
           | fascinated by this kind of output as the author.
        
             | teaearlgraycold wrote:
             | It does pretty well at transforming text into a person's
             | style of talking. So you could have it re-write any
             | sentence to sound like a Trump tweet.
        
             | renewiltord wrote:
             | I, too, that sounded like Eliza. Anyway, it looks like
             | that's a small excerpt from the conversation.
        
               | benatkin wrote:
               | It looks like the dialogue is only on the human end. The
               | chatbot is treating each question as the first. I think
               | it sounds a lot like Biden. I prefer that to Trump, but
               | don't like either sort of conversation!
        
         | CamperBob2 wrote:
         | GPT-3 and similar ML/AI projects may have many interesting and
         | valuable commercial applications, not all of which are readily
         | apparent at this stage of the game. For instance, it could be
         | used to insert advertisements for herbal Viagra at
         | https://www.geth3r3a1N0W.com into otherwise-apropros comments
         | on message boards, preferably near the end once it's too late
         | to stop reading.
         | 
         | Life online is about to become very annoying.
        
         | hubraumhugo wrote:
         | At https://reviewr.ai we're using GPT-3 to summarize product
         | reviews into simple bullet-point lists. Here's an example with
         | backpack reviews: https://baqpa.com
        
           | staticautomatic wrote:
           | Did you test it against extractive summarizers?
        
             | hubraumhugo wrote:
             | We experimented with BERT summarization, but the results
             | weren't too good. Do you have any resources or experiences
             | in this area?
        
               | moffkalast wrote:
               | That sounds like BERT alright.
        
           | cma wrote:
           | How do you avoid libel?
        
             | kingcharles wrote:
             | Are you confusing libel with something else? Can you
             | extrapolate what you mean here? Are you saying that they
             | will be liable for libel (!) if they publish a negative
             | summary of a product?
        
               | cma wrote:
               | If they mischaracterize a positive review into a negative
               | summary based on factual mistakes they know the system
               | makes at a high rate, I would think they would be liable
               | for libel right?
        
         | teaearlgraycold wrote:
         | I work for a company that re-sells GPT-3 to small business
         | owners. We help them generate product descriptions in bulk,
         | Google ads, Facebook ads, Instagram captions, etc.
        
         | crubier wrote:
         | Have you heard of GitHub copilot ? It's based on GPT3 and I can
         | tell you one thing: it does not generate meaningless prose (90%
         | of the time)
        
           | inglor wrote:
           | This - it is tremendously valuable to me and I use it all the
           | time at work.
        
             | skybrian wrote:
             | What do you use it for?
        
               | inglor wrote:
               | Coding, I actually had to forbid it today in a course I
               | teach because it solves all the exercises :) (given unit
               | tests with titles students needed to fill those tests in)
        
               | singlow wrote:
               | Isn't that just because others have stored solutions to
               | these problems in GitHub?
        
               | iamcurious wrote:
               | That is my question too. Is it a fancier autocomplete? Or
               | does it reason about code?
        
               | PeterisP wrote:
               | In some sense you could think of as a fancy autocomplete
               | which uses not only code but also comments as input,
               | looks up previous solutions for the same problem but
               | (mostly) appropriately replaces the variable names to
               | those that you are using.
        
               | robbedpeter wrote:
               | It reasons over the semantic network between tokens, in a
               | feedforward inference pass over the 2k(ish) words or
               | tokens of the prompt. Sometimes that reasoning is
               | superficial and amounts to probabilistic linear
               | relationships, but it can go deeply abstract depending on
               | training material, runtime/inference parameters, and
               | context of the prompt.
        
               | inglor wrote:
               | Probably, also I'm sure 99%+ of the code I author isn't
               | groundbreaking and someone did it before.
        
             | worldsayshi wrote:
             | But what about the potential for intellectual property
             | problems?
        
               | trothamel wrote:
               | That's beside the point, which is that the output copilot
               | produces is useful.
        
               | worldsayshi wrote:
               | I don't see how that's besides the point. How can it be
               | that useful if the output is a such legal mystery?
               | 
               | I'd love to use it but not when there's such a risk of
               | compromising the code base.
        
           | amelius wrote:
           | How many % of the time does it produce code that compiles?
        
             | crubier wrote:
             | In my experience 95% of the time. And 80% of the time it
             | output codes which is better than I would have done myself
             | in a first approach (thinks of corner cases, adds
             | meaningful comments etc.). It's impressive.
        
             | bidirectional wrote:
             | From my anecdotal experience, the vast majority of the time
             | (90+%).
        
               | amelius wrote:
               | Interesting. Is there any constraint built into the model
               | that makes this possible? E.g. grammar, or semantics of
               | the language? Or is it all based on deep learning only?
        
               | crubier wrote:
               | Deep learning only I believe. But real good one
        
             | emteycz wrote:
             | The overwhelming majority. Whatever used to take me an hour
             | or two is now a 10-minute task.
        
               | ilteris wrote:
               | I am so confused. Is there a tutorial explaining how you
               | are using in the IDE whatever it is. I use vscode curious
               | if it can be applied. Thanks
        
               | crubier wrote:
               | It works very well with VSCode. It has an integration. It
               | shows differently than normal autocomplete, it shows just
               | like gmail autocomplete (grayed out text sugggestion, and
               | press tab to actually autocomplete). Sometimes the
               | suggestion is just a couple tokens long, sometimes it's
               | an entire page of correct code.
               | 
               | Nice trick: write a comment describing quickly what your
               | code will do ("// order an item on click") and enjoy the
               | complete suggested implementation !
               | 
               | Other nice trick: write the code yourself, and then just
               | before your code, start a comment saying "// this code"
               | and let copilot finishe the sentence with a judgement
               | about your code like "// this code does not work in case
               | x is negative". Pretty fun !
        
               | icelancer wrote:
               | Interesting second use case; I use comments like this
               | already as typical practice and I agree Copilot fills in
               | the gaps quite well - never thought to do it in
               | reverse... will give that a shot today.
        
               | emteycz wrote:
               | I also like to do synthesis from example code (@example
               | doccomment) and synthesis from tests.
        
             | icelancer wrote:
             | I was exceptionally skeptical about it, but it's been very
             | useful for me and I'm only using it for minor tasks, like
             | automatically writing loops to pull out data from arrays,
             | merge them, sort information, make cURL calls and process
             | data, etc.
             | 
             | Simply leading the horse to water is enough in something
             | like PHP:
             | 
             | // instantiate cURL event from API URL, POST vars to it
             | using key as variable name, store output in JSON array and
             | pretty print to screen
             | 
             | Usually results in code that is 95-100% of the way done.
        
           | nradov wrote:
           | The fact that GPT3 works at all for coding indicates that our
           | programming languages are too low level and force a lot of
           | redundancy (low entropy). From a programmer productivity
           | optimization perspective it should be impossible to reliably
           | predict the next statement. Of course there might be trade
           | offs. Some of that redundancy might be helping maintenance
           | programmers to understand the code.
        
             | hans1729 wrote:
             | >From a programmer productivity optimization perspective it
             | should be impossible to reliably predict the next statement
             | 
             | Why? 99.9% of programming being done is composition of
             | trivial logical propositions, in some semantic context. The
             | things we implement are trivial, unless you're thinking
             | about symbolic proofs etc
        
               | tshaddox wrote:
               | I think that's precisely the problem the parent commenter
               | is describing.
        
             | alephaleph wrote:
             | That would only follow if we were trying to optimize code
             | for brevity, and I have no clue why that would be your top
             | priority.
        
               | mpoteat wrote:
               | I have indeed seen codebases where it seems like the
               | programmer was being charged per source code byte. Full
               | of single letter methods and such - it takes a large
               | confusion of ideas to motivate such a philosophy.
        
               | nradov wrote:
               | Not at all. Brevity (or verbosity) is largely orthogonal
               | to level of entropy or redundancy. In principle it ought
               | to be possible to code at a higher level of abstraction
               | while still using understandable names and control flow
               | constructs.
        
             | mpoteat wrote:
             | Indeed, in the limit of maximal abstraction, i.e. semantic
             | compression, code becomes unreadable by humans in practice.
             | We can see this in code golf competitions.
        
             | dharmaturtle wrote:
             | Let me rephrase:
             | 
             | > The fact that GPT3 works at all for English indicates
             | that English is too low level and forces a lot of
             | redundancy (low entropy).
             | 
             | I don't think the goal is to compress information/language
             | and maximize "surprise".
        
             | Traster wrote:
             | I think this is kind of true, but also kind of not true.
             | Programming, like all writing, is the physical
             | manifestation of a much more complex mental process. By the
             | time that I _know_ what I want to write, the hardwork is
             | done. In that way, you can think of co-pilot as a way of
             | increasing the WPM of an average coder. But the WPM isn 't
             | the bit that matters. In fact almost the onlt thing that
             | matters are hte bits you won't predict.
        
           | pharmakom wrote:
           | Code is easier to write than read and maintain, so how useful
           | is something that generates pages of 90% correct code?
        
             | ALittleLight wrote:
             | It's not useful if you use it to auto complete pages of
             | code. It is useful to see it propose lines, read, and
             | accept its proposals. Sometimes it just saves you a second
             | of typing. Sometimes it makes a suggestion that causes you
             | to update what you wanted to do. Sometimes it proposes
             | useless stuff. On the whole, I really like it and think
             | it's a boon to productivity.
        
           | ghoomketu wrote:
           | Yes I'm used it now but first time it started doing its
           | thing, I wanted to stop and clap for how jaw dropping and
           | amazing this technology is.
           | 
           | I was a Jetbrains fan but this thing takes productivity to a
           | whole new level. I really don't think I can go back to my
           | normal programming without it anymore.
        
             | kuschku wrote:
             | Luckily, there's a jetbrains addon for it.
        
             | inglor wrote:
             | Someone at work showed me copilot works on WebStorm today
             | (I also use VSCode).
        
         | supperburg wrote:
         | That's like if an alien took Mozart as a specimen and then
         | disregarded the human race because this human, while making
         | interesting sounds, does nothing of value. You have to look at
         | the bigger picture.
        
         | lysecret wrote:
         | Hey for a long time i was also very sceptical. However i can
         | refer you to this paper to a really cool applciaiton.
         | https://www.youtube.com/watch?v=kP-dXK9JEhY. They baseically
         | use clever GPT-3 prompting to create a dataset, you then train
         | another model on. Besides, you can prompt these models to get
         | (depending on the usecase) really good few shot performance.
         | And finally, github copilot is another pretty neat application.
        
         | micro_cam wrote:
         | Actually using this class (larger transformer based language
         | models) of models to generate text is to me the least
         | interesting use case.
         | 
         | They can also all be adapted and fine tuned for other tasks in
         | content classification, search, discovery etc. Think facnial
         | recognition for topics. Want to mine a whole social network for
         | anywhere people are talking about _______ even indirectly with
         | very low false negative rate? You want to fine tune a
         | transformer model.
         | 
         | Bert tends to get used for this more because it is freely
         | available, established and not too expensive to fine tune but i
         | suspect this is what microsoft licensing gpt-3 is all about.
        
         | warning26 wrote:
         | GPT-3 is fairly effective at summarization, so that's one
         | potential business use case:
         | 
         | https://sdtimes.com/monitor/using-gpt-3-for-root-cause-incid...
        
         | Tijdreiziger wrote:
         | https://replika.ai/
        
         | amelius wrote:
         | I hope that one day it will allow me to write down my thoughts
         | in bullet-list form, and it will then produce beautiful prose
         | from it.
         | 
         | Of course this will be another blow for journalists, who rely
         | on this skill for their income.
        
           | DeathArrow wrote:
           | I played with GPT-3 giving it long news stories. It actually
           | replied with more meaningful titles than the journalists
           | themselves used.
        
             | rm_-rf_slash wrote:
             | Perhaps GPT-3 was optimizing to deliver information while
             | news sites these days optimize titles to get clicks.
        
         | ailef wrote:
         | You can prompt GPT-3 in ways that make it perform various tasks
         | such as text classification, information extraction, etc...
         | Basically you can force that "meaningless prose" into answers
         | to your questions.
         | 
         | You can use this instead of having to train a custom model for
         | every specific task.
        
         | DeathArrow wrote:
         | Chat bots are an usage. I think you might use it for customer
         | support.
         | 
         | One example of GPT-3 powered chat bot:
         | https://www.quickchat.ai/emerson
        
         | jszymborski wrote:
         | While the generation is fun and even suitable for some use
         | cases, I'm particularly interested in its ability to take _in_
         | language and use it for downstream tasks.
         | 
         | A good example is DALL-E[0]. Now, what's interesting to me is
         | the emerging idea of "prompt engineering" where once you spend
         | long enough with a model, you're able to ask it for some pretty
         | specific results.
         | 
         | This gives us a foothold in creating interfaces whereby you can
         | query things using natural language. It's not going to replace
         | things like SQL tomorrow (or maybe ever?) but it certainly is
         | promising.
         | 
         | [0] https://openai.com/blog/dall-e/
        
         | 13415 wrote:
         | Automatic generation of positive fake customer reviews on
         | Amazon, landing pages about topics that redirect to attack and
         | ad sites, fake "journalism" with auto-generated articles mixed
         | with genuine press releases and viral marketing content,
         | generating fake user profiles and automated karma farming on
         | social media sites, etc. etc.
        
           | phone8675309 wrote:
           | > fake "journalism" with auto-generated articles mixed with
           | genuine press releases and viral marketing content
           | 
           | How would you tell the difference from the real thing these
           | days?
        
           | DeathArrow wrote:
           | The state of the journalism is so poor, I'd rather take some
           | AI generated articles instead.
        
         | akelly wrote:
         | https://copy.ai/
        
           | teaearlgraycold wrote:
           | Hey, I work there! To be honest it's still very much a
           | prototype. We have big plans for the next few months.
        
         | mark_l_watson wrote:
         | You can try it yourself - apply for a free API license from
         | OpenAI. If you like to use Common Lisp or Clojure then I have
         | examples in two of my books (you can download for free by
         | setting the price to zero): https://leanpub.com/u/markwatson
        
           | pyb wrote:
           | I know of some credible developers who were struggling to get
           | access, so YMMV
        
             | mark_l_watson wrote:
             | It took me over a month, so put in your request. Worth the
             | effort!
        
               | [deleted]
        
           | moffkalast wrote:
           | I put in a request months ago, I think they're not approving
           | people anymore.
        
             | mark_l_watson wrote:
             | You might try signing up directly for a paid non/free
             | account, if that is possible to do. I was using a free
             | account, then switched to paying them. Individual API calls
             | are very inexpensive.
        
       | warning26 wrote:
       | Neat to see more models getting closer, thought it appears only
       | one so far has exceeded GPT-3's 175B parameters.
       | 
       | That said, what I'm really curious is how those other models
       | stack up against GPT-3 in terms of performance -- does anyone
       | know of any comparisons?
        
         | sillysaurusx wrote:
         | I'm surprised that no one has answered for three hours!
         | 
         | The answer is at https://github.com/kingoflolz/mesh-
         | transformer-jax
         | 
         | It has detailed comparisons and a full breakdown of the
         | performance, courtesy of Eleuther.
        
           | 6gvONxR4sf7o wrote:
           | I was so frustrated when that was first announced because it
           | didn't include those metrics, and everyone ate it up like the
           | models were equivalent.
        
         | 6gvONxR4sf7o wrote:
         | Whenever I've seen language modeling metrics, GPT-3's largest
         | model has been at the top. If you see a writeup that doesn't
         | include accuracy-type metrics, you're reading a sales pitch,
         | not an honest comparison.
        
         | machiaweliczny wrote:
         | There's Wu Dao 2.0 and Google has 2 models with 1T+ params.
        
           | atty wrote:
           | For clarity, i believe these are all mixture of expert
           | models, where each input only sparsely activates some subset
           | subset of the full model. This is why they were able to make
           | such a big jump over the "dense" GPT3. Not really an apples-
           | to-apples comparison.
        
         | pyb wrote:
         | +1, does the new generation match or exceed GPT-3 in terms of
         | relevance ? Is there a way for a non-AI-researcher to
         | understand how the benchmarks measure this ? Bigger does not
         | mean better.
        
       | GhettoComputers wrote:
       | >However, the ability of people to build upon GPT-3 was hampered
       | by one major factor: it was not publicly released. Instead,
       | OpenAI opted to commercialize it and only provide access to it
       | via a paid API. This made sense given OpenAI's for profit nature,
       | but went against the common practice of AI researchers releasing
       | AI models for others to build upon. So, since last year multiple
       | organizations have worked towards creating their own version of
       | GPT-3, and as I'll go over in this article at this point roughly
       | half a dozen such gigantic GPT-3 esque models have been
       | developed.
       | 
       | Seems like aside from Eleuther.ai you can't use the models freely
       | either, correct me if I'm wrong.
        
         | andreyk wrote:
         | I believe you are correct, at least for GPT-3 scaled things.
         | Hopefully that'll change with time, though.
        
       | rg111 wrote:
       | The future is not as dark as it seems because of the rat race of
       | megacorps.
       | 
       | You can use reduced versions of language models with extremely
       | good results.
       | 
       | I was involved in training the first-ever GPT2 for Bengali
       | language, but with 117 million parameters.
       | 
       | It took a month's effort (training + writing code + setup) and
       | about $6k in TPU cost, but Google Cloud covered it.
       | 
       | Anyway, it is surprisingly good. We fine-tuned the model for
       | several downstream tasks and we were shocked when we saw the
       | quality of generated text.
       | 
       | I fine-tuned this model to write Bengali poems with a dataset of
       | just about 2k poems and ran the training for 20 minutes in GPU
       | instance of Colab Pro.
       | 
       | I was really blown away by the quality.
       | 
       | The main training was done in JAX, and it is much faster and
       | seamless than PyTorch XLA, much _better_ than TensorFlow in every
       | way.
       | 
       | So, my point is, although everyone is talking hundreds of
       | billions of parameters and millions in training cost, you can
       | still derive practical value from language models, and that too,
       | at a low cost.
        
         | amelius wrote:
         | > The future is not as dark as it seems because of the rat race
         | of megacorps.
         | 
         | Just wait until NVidia comes with a "Neural AppStore" and
         | corresponding restrictions. Then wait until the other GPU
         | manufacturers follow suit.
        
           | rg111 wrote:
           | Much of the work done is fully open source and are liberally
           | licensed.
           | 
           | DeepMind and OpenAI have a bad rep in this regard.
           | 
           | But a lot is available for free (as in beer _and_ speech).
           | 
           | And most of the research papers are released in arXiv. It's
           | very refreshing.
           | 
           | The bottleneck is not the knowledge or code, but the compute.
           | People are fighting this in innovative ways.
           | 
           | I have been an inactive part of Neuropark that first demoed
           | collaborative training. A bunch of folks (some of them close
           | to laypeople) ran their free Colab instances and trained a
           | huge model. You can even utilize a swarm of GT1030s or
           | something like that.
           | 
           | Also, if you have shown signs of success, you are very likely
           | to have people willing to sponsor your compute needs, case in
           | point- Eluether AI.
           | 
           | The situation is far from ideal with this megacorps rat race
           | [0], and NLP research being more and more inaccessible, but
           | it is not completely dark.
           | 
           | [0]: I, along with many respected figures tend to think that
           | this scaling up stuff approach is not even _useful_. We can
           | write good prose with GPT-3 nowadays, that are, for all
           | intents and purposes, indistinguishable from text written by
           | humans. But we are far, far away from true _understanding_.
           | These models don 't really _understand_ anything and are not
           | even  "AI", so to speak.
           | 
           | The Transformer architecture, the backbone of all these
           | approaches- is too brute-force-y for my taste to be
           | considered something that can mimic or, further- _be_
           | intelligent.
        
         | cmrajan wrote:
         | Good to know. We're trying to attempt something similar[1] but
         | for Tamil. I'm also surprised how well the OSS language model &
         | library AI4Bharat [2] performs for NLP tasks against SoTA
         | systems. Is there a way to contact you? [1]
         | https://vpt.ai/posts/about-us/ [2]
         | https://ai4bharat.org/projects/
        
           | rg111 wrote:
           | Among a master's degree, a consultancy gig, personal research
           | and study, and finding unis abroad- I am living a hectic
           | life.
           | 
           | I don't see how I can be of help.
           | 
           | But I can talk. Leave me something through which I can reach
           | you. And I will reach you within a week.
        
       | xyproto wrote:
       | I think companies should be banned from having "Open" in their
       | names.
        
         | evergrande wrote:
         | OpenAI takes the Orwellian cake.
        
           | Tenoke wrote:
           | I hear a lot of low effort takes about OpenAI but how exactly
           | is providing your service via a paid API the "Orwellian
           | cake"? Is this really the most (or even at all) Orwellian
           | practice for you?
        
             | leoxv wrote:
             | https://en.wikipedia.org/wiki/Doublespeak
        
             | TulliusCicero wrote:
             | I think it's more the contrast where they claim, via their
             | name, to be open, but actually aren't.
             | 
             | If their name was ProfitableAI, there'd probably be fewer
             | complaints.
        
         | c7DJTLrn wrote:
         | "Open"AI but you can only use it how we want you to and no, you
         | can't run it yourself.
        
           | moffkalast wrote:
           | The only thing open in OpenAI is your wallet.
        
       | moffkalast wrote:
       | > most recently NVIDIA and Microsoft teamed up to create the 530
       | billion parameter model Megatron-Turing NLG
       | 
       | Get it, cause it's a generative transformer? Hah
        
       | DeathArrow wrote:
       | People were blaming cryptocurrencies miners for the prices of
       | GPUs, when in fact it was the AI researchers who bought all the
       | GPUs. :D
       | 
       | I wonder what if somebody designs an electronic currency rewarded
       | as payment for general GPU computations instead of just computing
       | hashes? You pay some $, to train your model and the miner gets
       | some coins.
       | 
       | Every one is happy, electricity is not wasted and the GPUs gets
       | used for a reasonable purpose.
        
         | nathias wrote:
         | Yes, this is an old idea (which I really like) but it hasn't
         | really taken off yet. GridCoin was one example, where you
         | solved BOINC problems or RLC that's for more general
         | computation.
        
           | rewq4321 wrote:
           | The problem is that, currently, large ML models need to be
           | trained on clusters of tightly-connected GPUs/accelerators.
           | So it's kinda useless having a bunch of GPUs spread all over
           | the world with huge latency and low bandwidth between them.
           | That may change though - there are people working on it:
           | https://github.com/learning-at-home/hivemind
        
           | Kiro wrote:
           | It hasn't taken off because it doesn't work. PoW only works
           | for things that are hard to calculate but easy to verify. Any
           | meaningful result is equally hard to verify.
        
             | snovv_crash wrote:
             | It's easy to verify ML training - inference on a test set
             | has lower error than it did before.
             | 
             | Training NN ML is much slower than inference (1000x at
             | least) because it has to calculate all of the gradients.
        
             | petters wrote:
             | > Any meaningful result is equally hard to verify.
             | 
             | This is very much not true. A central class in complexity
             | is NP whose problems are hard to answer but easy to verify
             | if the answer is yes.
             | 
             | E.g. is there a path visiting all nodes in this graph of
             | length less than 243000? Hard to answer but easy to check
             | any proposed answer.
        
         | PeterisP wrote:
         | The current way of training is efficient when compute is
         | located in a single place and is colocated with large
         | quantities of training data. Distributing small parts of
         | computation to remote computers is theoretically possible (and
         | an active direction of research) but currently not preferable
         | nor widely used; you really need very high bandwidth between
         | all the nodes to constantly synchronize the hundreds-of-
         | gigabytes sized weights they're iterating on and the resulting
         | gradients.
        
           | bckr wrote:
           | This may not be true in the future. There is some work being
           | done on distributed neural net training. I can't recall the
           | name of the technique at the moment, but a paper came out in
           | the last year showing results comparable with backprop that
           | only required local communication of information (whatever
           | this technique's alternative to gradients is).
        
         | evergrande wrote:
         | First the electricity morality police came for crypto and I
         | said nothing.
         | 
         | Then they came for AI, video games, HN...
        
         | Nextgrid wrote:
         | My understanding is that proof-of-work is intentionally
         | wasteful; the objective is to make 51% attacks (where a single
         | entity controls at least 51% of the global hashrate) infeasible
         | by attaching a cost to the mining process.
         | 
         | Making the mining process produce useful output that can be
         | resold nullifies the purpose as it means an attacker can now
         | mine "for free" as a byproduct of doing general-purpose
         | computations (as opposed to tying up dedicated hardware),
         | lowering the barrier for a 51% attack dramatically.
        
         | magikabula wrote:
         | If everyone offer GPUs, is the same game. If I will buy more
         | GPU I will get more money, so the average payment for a person
         | with a single or a small bunch of GPU will be low.
         | 
         | And second, the principles of electronic currency are different
         | from gold/money. That's why crypto uses GPU ;)
        
       | qwertox wrote:
       | If I were to run GPT-3 on my 70000 browser bookmarks, what kind
       | of insights could I get from that?
       | 
       | Only by analyzing the page title (from the bookmark, not by re-
       | fetching the url) and eventually also the domain name.
        
         | supermatt wrote:
         | GPT-3 is a text generator, so i doubt you would get anything of
         | use. You cant even supply such a large input to GPT-3.
        
           | teaearlgraycold wrote:
           | GPT-3 is also a classifier and data-extractor.
           | 
           | You could give it a couple dozen bookmarks with example
           | classifications and then feed it a new bookmark and ask GPT-3
           | what category the page belongs in. Repeat for the entire data
           | set.
           | 
           | For data extraction you could ask questions about the titles.
           | Maybe have it list all machine-learning model names that
           | appear in the set of bookmark titles.
        
         | keithalewis wrote:
         | print gpt.submit_request("Give me insights")
         | 
         | >>> You are spending way to much time browsing.
        
       | air7 wrote:
       | So is there any one of them that I could play around with?
        
         | sillysaurusx wrote:
         | https://6b.eleuther.ai
        
         | lazylion2 wrote:
         | AI21 labs 178B parameter model
         | 
         | https://studio.ai21.com/
        
       | ComodoHacker wrote:
       | Are we heading to the (distant) future where to make progress in
       | any field you _have_ to spend big $$$ to train a model?
        
         | jowday wrote:
         | That's not even distant - most of the self-supervised vision
         | and language models at the bleeding edge of the field require
         | huge compute budgets to train.
        
         | iamcurious wrote:
         | We are already there. Machine learning is the flavor of A.I.
         | that keeps business barriers of entry high. If we had invested
         | in symbolic A.I., things would be different. A similar thing
         | happens with programming language flavors. PHP lowers barriers
         | of entry so it is discredited by the incumbents.
        
           | j45 wrote:
           | Your point about incumbents not wanting it to be easier to
           | create beginners in a language or technology is very
           | understated.
           | 
           | Excluding participation in having the time and resources
           | available to overcome the initial inertia required to become
           | productive is a form of opportunity and earning segregation.
           | 
           | Despite having a background in your tech, there is little
           | more if satisfying than people experiencing putting tech to
           | work for them, rather than the other way around or being
           | dependent on others.
        
           | [deleted]
        
           | lostdog wrote:
           | The difference between ML and symbolic AI is that ML works
           | and symbolic AI doesn't. At my job, dropping the
           | computational load of our ML models is heavily invested in,
           | and every success is celebrated. Everybody wants it to be
           | easier and cheaper to train high quality models, but some
           | things are still intrinsically hard.
        
             | CodeGlitch wrote:
             | > The difference between ML and symbolic AI is that ML
             | works and symbolic AI doesn't.
             | 
             | IBM managed to beat Garry Kasperov using symbolic AI did
             | they not? So in what way does it not work?
        
               | lostdog wrote:
               | > IBM managed to beat Garry Kasperov using symbolic AI
               | did they not? So in what way does it not work?
               | 
               | Ok, I should be clearer. ML approaches are way way better
               | than symbolic approaches. Given almost any problem, it is
               | much much easier to make an ML approach work than any
               | symbolic approach.
               | 
               | Yes, chess was first solved symbolically, but it's since
               | been solved by ML better and more easily, to the point
               | that stockfish now incorporates neural nets [1]. ML has
               | also given extremely high levels of performance on Go,
               | Starcraft, DoTA, and on protein folding, image
               | recognition, text processing, speech recognition, and
               | pretty much everything else.
               | 
               | I would challenge you to name any (non-simple) problem
               | where traditional AI methods are still state of the art.
               | 
               | [1] https://stockfishchess.org/blog/2020/stockfish-12/
        
               | goodside wrote:
               | "I would challenge you to name any (non-simple) problem
               | where traditional AI methods are still state of the art."
               | 
               | Lossless file compression. As far as I know none of the
               | algorithms in widespread use are neural-based, despite
               | the fact that compression is clearly a rich statistical
               | modeling problem, at least on par with GPT-3-style
               | language understanding in difficulty. There are published
               | attempts to solve the problem with neural networks, but
               | they simply don't work well enough to date. Modern
               | solutions also still use old-fashioned AI ingredients
               | like compiled dictionaries of common natural-language
               | words -- any other domain where nat-lang dictionaries are
               | useful has been conquered by neural solutions, e.g.
               | spelling and grammar checkers.
        
               | _game_of_life wrote:
               | I'm far from an expert in this subject but doesn't this
               | ranking of large text compression algorithms with NNCP
               | coming first suggest that neural-nets are pretty great at
               | compression?
               | 
               | http://mattmahoney.net/dc/text.html
               | 
               | https://bellard.org/nncp/
               | 
               | I don't see examples of high performing symbolic AI based
               | compression algorithms anywhere, but again I am very
               | ignorant, do you have examples?
        
               | CodeGlitch wrote:
               | Thanks for clearing that up, I do agree that ML-based AI
               | has surpassed symbolic approaches in every field.
        
               | adgjlsfhk1 wrote:
               | they didn't. that was just alpha beta search with some
               | custom hardware to speed it up. also at this point, both
               | of the strongest chess ai (stockfish and lc0) are using
               | neutral networks and are roughly 1000 elo above where
               | deep blue was (and most of that is from software, not
               | hardware)
        
               | shmageggy wrote:
               | > _just alpha beta search_
               | 
               | I will cling to these goal posts every time. Search was
               | and still is AI, unless you think Russell and Norvig
               | should have named the field's foundational textbook
               | something other than "Artificial Intelligence: A Modern
               | Approach"
        
               | PeterisP wrote:
               | 1. There's a world of problems (such as "perception-
               | related" e.g. vision and NLP) which we tried to solve for
               | decades with symbolic AI and got worse results than what
               | nowadays first-year students can do as a homework with
               | ML;
               | 
               | 2. For your example of chess, for some time now ML
               | engines are pretty much untouchable by engines based on
               | pre-ML methods.
        
               | CodeGlitch wrote:
               | Yes I agree with all your points - I was however
               | responding to the point being made that symbolic AI
               | "wasn't useful"...which in the past it was. Perhaps in
               | the future some new method or breakthrough will mean it
               | becomes useful once again?
        
               | panabee wrote:
               | this is a great point.
               | 
               | much like deep learning was invented decades ago but
               | didn't become feasible until technology caught up, could
               | the same be true for symbolic AI?
               | 
               | i.e., is the ceiling for symbolic AI technical and
               | transient or fundamental and permanent?
        
               | PeterisP wrote:
               | My feeling is that even in our own thinking symbols are
               | used mostly to communicate our (inherently non-symbolic)
               | thoughts to others or record them; i.e. they are a
               | solution to a bandwidth-limited transfer of information
               | while the actual thinking process happens with concepts
               | that have more similarity to collections of vague
               | parameters and associations which can be compressed to
               | symbols only imperfectly with losses.
               | 
               | From that perspective, I don't see how symbolic AI would
               | be competitive but there would be a role for symbolic AI
               | in designing systems that can be comprehensible for
               | humans, but perhaps just as a distillation/compression
               | output from a non-symbolic system. I.e. have a strong
               | "black box" ML system that learns to solve a task, and
               | then have it construct a symbolic system that solves that
               | task worse, but in an explainable way.
        
             | iamcurious wrote:
             | >The difference between ML and symbolic AI is that ML works
             | and symbolic AI doesn't.
             | 
             | There was a point when it was the other way around, this is
             | not static but the result of resources being poured. The
             | data heavy, computational heavy, black box style of ML
             | gives power to large business over small business. So it's
             | seen as a safer bet than symbolic A.I. This in turn makes
             | it work better, which makes it an even safer bet. Notice
             | that startups dream of being big business so they still
             | pick ML.
             | 
             | Also notice that in some domains ML is still behind
             | symbolic A.I., for instance a lot of robotics and
             | autonomous vehicles.
        
           | DonHopkins wrote:
           | PHP wasn't discredited by the incumbents. It was discredited
           | by its creator.
           | 
           | "I'm not a real programmer. I throw together things until it
           | works then I move on. The real programmers will say Yeah it
           | works but you're leaking memory everywhere. Perhaps we should
           | fix that. I'll just restart Apache every 10 requests."
           | -Rasmus Lerdorf
           | 
           | "I was really, really bad at writing parsers. I still am
           | really bad at writing parsers." -Rasmus Lerdorf
           | 
           | "We have things like protected properties. We have abstract
           | methods. We have all this stuff that your computer science
           | teacher told you you should be using. I don't care about this
           | crap at all." -Rasmus Lerdorf
        
             | iamcurious wrote:
             | To most programmers that doesn't discredit PHP at all. He
             | cares about a working product, much like 90% of
             | programmers, who don't have the privilige to worry about
             | theory. They just need an ecommerce, or blog or whatever,
             | running asap. To use a pg's analogy, they are there to
             | paint not to worry about painting chemistry.
             | 
             | The incumbents do discredit PHP though. For instance,
             | facebook was built on PHP, and still runs on it. They used
             | the language of personal home pages to give every person on
             | the planet a personal home page. Nevertheless, once they
             | suceeded they forked PHP with a new name and isolated devs
             | culturally.
        
         | YetAnotherNick wrote:
         | Training model is getting cheaper. GPT-3 is one of the very few
         | countable examples where it is so expensive. In the end it all
         | depends on the size of data you have that you could scale up
         | the model without overfitting. And the internet text data is
         | one of the only data size is this big.
        
         | minimaxir wrote:
         | Fortunately, costs for training superlarge models are coming
         | down rapidly thanks to TPUs (which was the approach used to
         | train GPT-J 6B) and DeepSpeed improvements.
        
           | Nextgrid wrote:
           | Are there any TPUs that can be purchased off-the-shelf and
           | then owned, like you can do with a CPU or GPU? Or are you
           | just limited to paying rent to cloud providers and ultimately
           | being at their mercy when it comes to pricing, ToS, etc?
        
             | 6gvONxR4sf7o wrote:
             | No, but you probably aren't going to buy an A100 either, so
             | it's a moot point.
        
       ___________________________________________________________________
       (page generated 2021-11-07 23:00 UTC)