[HN Gopher] Open source solution replicates ChatGPT training pro...
       ___________________________________________________________________
        
       Open source solution replicates ChatGPT training process
        
       Author : metalwhale
       Score  : 151 points
       Date   : 2023-02-19 15:40 UTC (7 hours ago)
        
 (HTM) web link (www.hpc-ai.tech)
 (TXT) w3m dump (www.hpc-ai.tech)
        
       | raydiatian wrote:
       | > "the generative-AI eruption"
       | 
       | I really think we should stick to Nick Bostrom's (or pls fix
       | attribution) term "intelligence explosion"
        
         | SunghoYahng wrote:
         | Even if it has not so much thing to do with intelligence?
        
           | raydiatian wrote:
           | I'm not sure about your definition of intelligence. Perhaps
           | you think I'm saying it's conscious. I don't conflate
           | consciousness with intelligence here. I can't say whether or
           | not ChatGPT is conscious (although I doubt it), but it's
           | pretty clearly intelligent by a reasonable definition. It's
           | an agent which is extremely effective at playing its game.
           | Consciousness is not a prerequisite to intelligence.
           | 
           | But back to what I'm really saying here: "Generative AI
           | eruption" is a mouthful whereas "intelligence explosion" is
           | concise.
        
       | rvz wrote:
       | Finally, an open-source equivalent to ChatGPT emerging out of the
       | AI euphoria will begin to extinguish the hype out of OpenAI's
       | ChatGPT moat, just like how GPT-3 and DALLE-2 were almost
       | immediately disrupted by open-source models as well.
       | 
       | This (and other open-source AI models), not 'ChatGPT', 'DALLE-2',
       | etc is what will change the AI landscape for everyone,
       | permanently forever.
        
         | simonw wrote:
         | "just like how GPT-3 ... immediately disrupted by open-source
         | models as well."
         | 
         | Which open source alternatives to GPT-3 have you seen that most
         | impressed you?
         | 
         | I've not yet found any that are remotely as useful as GPT-3, at
         | least for the kinds of things I want to use them for
         | (generating SQL queries from human text, summarizing text, that
         | kind of thing)
        
           | simonw wrote:
           | In answer to my own question,
           | https://www.youtube.com/watch?v=NHJh9KJNyE4 GPT-NeoX-20B
           | instruct-trained looks very impressive.
        
         | supriyo-biswas wrote:
         | I, for one, would like to see an open-source model similar to
         | Stable Diffusion, but for text. It would be a great way to
         | empower general folk without having to pay OpenAI, and not have
         | to worry about the LLM's belief system, which is conservative-
         | biased in the case of ChatGPT[1] (HN discussion[2]).
         | 
         | [1] https://davidrozado.substack.com/p/openaicms
         | 
         | [2] https://news.ycombinator.com/item?id=34625001
        
           | return_to_monke wrote:
           | there is
           | 
           | https://github.com/laion-ai/open-assistant being built in the
           | open already. you can contribute too.
           | 
           | please also notice that the article you linked is about the
           | text classifier of the frontend and not the LLM itself
        
           | anonymouskimmer wrote:
           | https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_.
           | ..
           | 
           | From the graph (above) linked by the top comment in your [2],
           | I'm wondering whether this demonstrates more anti-
           | conservative bias than liberal bias, or whether the
           | alternative meanings of conventionally conservative versus
           | conventionally liberal words dictate the frequency of a flag.
           | 
           | For instance, "Republican" means a variety of things around
           | the world, but "Democrat" is far more likely to indicate the
           | US Democrat party (which is frequently misstated as the
           | "Democratic party"), or a national Democrat party in general.
           | People would tend to write "I'm a democrat" to assign their
           | membership to the party, whereas they'd say "I'm democratic"
           | to assign their leanings toward the system. But "I'm a
           | republican" means both.
        
             | mrtranscendence wrote:
             | > US Democrat party (which is frequently misstated as the
             | "Democratic party")
             | 
             | Where are you getting this? The proper term is indeed
             | "Democratic party", and this is almost universal outside of
             | the conservative bubble. You might personally think it's
             | not small-d democratic, but that doesn't make "Democrat
             | party" correct.
        
           | TheCaptain4815 wrote:
           | NeoX 20B is a fantastic open source model.
        
             | ImprobableTruth wrote:
             | It's nice, but a far cry from gpt-3
        
               | TheCaptain4815 wrote:
               | NLP Cloud has a finetuned version of neoX which works
               | incredibly well.
        
               | simonw wrote:
               | Thanks for the tip - I watched this demo video and yes,
               | it does look like a very impressive model:
               | https://www.youtube.com/watch?v=NHJh9KJNyE4
        
           | [deleted]
        
         | anonylizard wrote:
         | Is there a GPT-3 disruptor? All the open sourced models are
         | GPT2 improvements, and GPT2 was open sourced by OpenAI.
         | 
         | GPT3/4 is simply too expensive for consumer GPUs, any open
         | sourced versions will have to run on A100s in the cloud, so by
         | nature centralized. Granted, having multiple providers also
         | counts as removing the moat.
         | 
         | But BLOOM for example (An attempt at replicating GPT3), no one
         | actually uses. Because its simply too expensive for inferior
         | performance to GPT3
         | 
         | DALLE2 was disrupted, because
         | 
         | 1. OpenAI at the time was dumb enough to put a waitlist on
         | something that costed money. They didn't make the same mistake
         | with ChatGPT.
         | 
         | 2. Stable Diffusion was not only open sourced, but heavily
         | heavily optimized in parameter count compared to alternative
         | models, making it viable on consumer GPUs.
        
           | GaggiX wrote:
           | Dalle 2 has also been disrupted because OpenAI has heavily
           | nerfed the model, probably by greatly reducing the steps in
           | the upscaler models (Dalle 2 uses diffusion-based upscaler
           | models and therefore very expensive to run), so the images
           | have good coherence but really bad texture, full of
           | artifacts, ironically since the GAN models had the opposite
           | result, very bad coherence and good texture; also OpenAI has
           | introduced very few features and there is no way to finetuned
           | the model as with GPT-3. Meanwhile, the MJ model outputs
           | extremely good images and SD can be conditioned, fine-tuned,
           | etc. in a really versatile way and extremely good quality (if
           | you know what you are doing).
        
         | [deleted]
        
         | EGreg wrote:
         | Yeah, for the worse.
         | 
         | We will have a ton of bullshit at scale. And the web will be
         | done for.
        
           | jrvarela56 wrote:
           | I hope the arms race makes us smarter. We're going to need AI
           | to sift through all the BS. My hope is that once we're
           | drowning in deepfakes daily, the average user will come to
           | the conclusion that they can't believe stuff they see, and
           | will realize neither what the read nor hear. The transition
           | will be rough.
        
             | visarga wrote:
             | > We're going to need AI to sift through all the BS.
             | 
             | Yes, that's the only way to deal with it. Humans alone
             | can't cope.
        
               | EGreg wrote:
               | Somehow bombs don't actually prevent other bombs. People
               | always hope that the offensive tech could be used
               | defensively, but defense is never perfect and even a few
               | that get through can wreak destruction.
        
               | [deleted]
        
         | [deleted]
        
       | jacooper wrote:
       | Im not deep into the AI space, but who would I use this? Do I
       | just run it and speak to it in terminal? Or what is the next step
       | to make it useful for search or more.
        
       | VadimPR wrote:
       | How good is the quality of this? BLOOM is a 176B parameter model,
       | but it doesn't seem to compare to GPT-3 (175B parameters) in
       | terms of output quality.
        
         | lossolo wrote:
         | It's because BLOOM is undertrained, you can prune a lot of
         | weights in BLOOM and it doesn't impact performance. Look at
         | Chinchilla paper[1], 70B model outperforms 175B GPT-3 model.
         | 
         | https://arxiv.org/abs/2203.15556
        
           | Der_Einzige wrote:
           | In general, most giant LLMs are extremely undertrained at
           | this time. Consider that most of the gains in RoBerta vs bert
           | were from just continuing to train.
        
         | rnosov wrote:
         | Out of curiosity, how did your measure their respective
         | performances? My understanding is that BLOOM roughly comparable
         | to GPT-3 in performance on most NLP tasks. Were you comparing
         | OpenAI davinci to raw BLOOM by any chance?
        
       | simonw wrote:
       | Is the term "ChatGPT" being used in place of GPT-3 here? Is this
       | thing actually replicating the GPT-3 training process?
       | 
       | The thing that makes ChatGPT interesting (over regular GPT-3) is
       | the RLHF process, but this article doesn't seem to touch on that
       | at all, unless I've missed something.
        
         | rnosov wrote:
         | Surprisingly, they are using the term correctly. Although it
         | seems that the main point of the post was to plug their
         | "Colossal AI" framework but if you do an in-page search for
         | "Low-cost replication of ChatGPT" subheading midway in the
         | article they do claim to replicate RLHF thingy fully whatever
         | it might be. Interestingly, they also suggest that it would
         | work with both BLOOM and OPT meaning that you can potentially
         | make things like ChatBLOOM and ChatOPT (even on a consumer
         | grade GPU). Lack of demo doesn't inspire too much confidence
         | though.
        
         | faizshah wrote:
         | The article talks about their RLHF implementation briefly.
         | There's details on their RLHF implementation here:
         | https://github.com/hpcaitech/ColossalAI/blob/a619a190df71ea3...
        
         | de6u99er wrote:
         | GPT-3 has been publicly covered in scientific publications.
         | Same as GPT-2, and GPT. Those are all pre-trained models, where
         | GPT is the abbreviation of Generative Pretrained Transformer.
         | Transformers have been invented in 2017 at Google Brain [1].
         | 
         | -> https://medium.com/walmartglobaltech/the-journey-of-open-
         | ai-...
         | 
         | GPT-4 is around the corner, and it's allegedly 100x more
         | powerful than it'd predecessor.
         | 
         | -> https://medium.com/geekculture/gpt-4-100x-more-powerful-
         | than...
         | 
         | [1] https://arxiv.org/abs/1706.03762
        
           | wcoenen wrote:
           | That source about GPT-4 is nonsense. It claims GPT-4 will
           | have trillions of parameter, and at the same time links to
           | another page which says that it won't be much bigger than
           | GPT-3:
           | 
           | https://www.datacamp.com/blog/what-we-know-gpt4
        
           | simonw wrote:
           | That "100x" figure is extremely poorly sourced. I don't
           | believe that at all.
        
             | de6u99er wrote:
             | You're right. Apologies for that.
        
       | college_physics wrote:
       | It would somehow be combined with an open source search engine
        
       | simonw wrote:
       | "hitting 100 million monthly active users 2 months after its
       | launch".
       | 
       | I'm deeply suspicious of that number. It came from Similarweb,
       | who track these things through analytics gathered from browser
       | extensions.
       | 
       | I trust this article more:
       | https://www.nytimes.com/2023/02/03/technology/chatgpt-openai...
       | 
       | "But two months after its debut, ChatGPT has more than 30 million
       | users and gets roughly five million visits a day, two people with
       | knowledge of the figures said."
       | 
       | "Two people with knowledge of the figures" is journalism speak
       | for "I heard this off the record from people with insider info,
       | and I'm ready to report it because those two different sources
       | provided the same number".
        
         | jackblemming wrote:
         | Can someone tell me what the hell they use ChatGPT for? I tried
         | it a few times and it always confidently gave me wrong results
         | to basic things. What is this thing supposedly "disrupting"? Is
         | it really just marketing cranking out metric tons of spam
         | blogs?
        
           | wincy wrote:
           | It's great for getting general outlines for software design
           | documents and then "hang the meat" onto the outline.
        
           | carlgreene wrote:
           | I recently used it sort of as a rubber duck for a coding
           | problem. I was architecting a new feature and the way I was
           | thinking about it was a bit clunky.
           | 
           | ChatGPT helped point something obvious out that I had totally
           | missed in my original problem solving.
        
           | meltedcapacitor wrote:
           | Jobs that require correct answers is a small subset of jobs
           | that require answers.
        
           | dsco wrote:
           | I use it to generate and troubleshoot SQL queries. I work as
           | a PM so the queries can be ineffective in terms of
           | performance and scale as I just need the results.
        
           | hn_throwaway_99 wrote:
           | I have a friend who works at a large government contractor.
           | They frequently have to respond to RFPs from the government,
           | and had some analysts where the majority of their job was
           | preparing responses to these RFPs.
           | 
           | They tried instead putting these RFPs through ChatGPT, and
           | they were blown away by the responses they got. Of course,
           | the responses still need to go through a thorough edit and
           | review process, but that was also true when humans were
           | writing the first draft.
           | 
           | He told me that ChatGPT obviated a couple people's jobs, with
           | the added bonus that the turnaround time between receiving a
           | proposal and sending a response was much faster.
        
           | JoshuaDavid wrote:
           | I sometimes ask it "what is the standard term of art in
           | industry which means blah?" If you google that question, you
           | get only blogspam and people trying to sell you something,
           | but if you ask chatgpt and then google the thing it tells you
           | is the standard language, it's pretty easy to tell if it gave
           | you correct info.
           | 
           | And then you can run searches using the standard terms, which
           | gives better results, and also when writing code have more-
           | informatively-named variables and better-structured data.
        
           | savolai wrote:
           | I'm using it for crud, i.e. generating insert sql from c++
           | classes. Knows how to do acid compliance it seems with
           | multiple tables and foreign keys, saving lots of time.
           | 
           | It's also the better english to finnish translation than
           | gtranslarw. Also copywriting as certain genres are highly
           | repetitive.
        
           | dmw_ng wrote:
           | I have been using it as a search replacement for most of the
           | past month and only found two subtly wrong answers. This
           | covers legal questions, researching product differences,
           | wiring diagrams, suggesting books to read, correcting
           | misremembered quotes, and about a hundred other tasks.
           | 
           | Of course still relying on google in the background, but
           | increasingly rarely, and presuming all the negative
           | commentary we've been seeing online are folk who simply
           | haven't tested it in anger yet. Today's chatgpt hallucination
           | is yesterday's Google blogspam etc. Folk for some reason
           | continue to act like the old world was perfect. This is much
           | closer to perfection than anything we ever had, and
           | infinitely more comprehensive. Google as we knew it is
           | already dead, because the medium google was built for just
           | got made obsolete. This is far closer to a new Internet
           | iteration (WAIS, FTP, Gopher, HTTP, Web2.0, ...) than it is a
           | new search engine
           | 
           | Now watch as the search engines try to adapt it to their
           | recency-biased ads model and fail miserably, as what we have
           | is already better than what they were able to sell. Very
           | unclear bing or Google or anyone you've heard of will win
           | this round, its suddenly a very exciting time in tech again
           | 
           | Another aspect I find very exciting is that these effectively
           | represent a return to a curation-driven Internet, selection
           | of input data for model training is probably an interesting
           | new form of diversification. Who cares about having a site in
           | the world wide web if its not part of the inputs for the
           | language models used by millions of users? That's a
           | completely new structure for the dissemination of ideas,
           | marketing, "SEO" etc., and a brand new form of mass media
        
             | prox wrote:
             | It's nice to get quick in context answers to concepts and
             | their relationships. Sometimes I have a vague notion, but
             | with ChatGPT it resolves my hunch quite quickly without
             | reading through a (sometimes ad spammed) article.
             | 
             | Google should be concerned.
        
             | mrtranscendence wrote:
             | I don't know what you've been searching for that you've
             | only found two subtly wrong answers. It frequently gives me
             | incorrect answers, some of which are subtle and some of
             | which are obvious. It's given me incorrect code, told me
             | about incorrect APIs, explained deep learning concepts
             | incorrectly, given me wrong answers about science-related
             | questions, made up characters wholesale when I asked it
             | about Irish mythology, given me made-up facts about
             | (admittedly niche) philosophers.
             | 
             | I'm glad you've found use out of it, but I can't imagine
             | using it as a search replacement for my use cases.
             | 
             | Edit: And I don't see why it would be surprising that
             | ChatGPT wouldn't have all of the answers. The underlying
             | model is much, much smaller than it would take to encode
             | all of the knowledge it was trained on. It's going to make
             | things up a lot of the time (since it's not good at
             | remaining silent).
        
           | mansion7 wrote:
           | I have not used it to create content for profit (yet) but
           | have successfully used it for:
           | 
           | brainstorming funny/catchy slogans: not all are winners, but
           | since it can crank out dozens almost immediately, I can pick
           | what I like and quickly modify them in the time it takes me
           | to think of one or two independently. As soon as I verify
           | they aren't ripoffs of existing material, I may use one or
           | two.
           | 
           | Writing poetry - it helped me to write sonnets, and further
           | modified them to specifications. The recipients were quite
           | impressed.
           | 
           | Translating existing poetry of mine into Arabic, while
           | retaining the meaning AND rhyming in Arabic, a feat which is
           | extremely difficult for me
           | 
           | Writing a business plan to my specifications that was
           | actually useful
           | 
           | Writing letters to a landlord to get out of a lease
           | 
           | In addition, I have run my own fiction through it and had it
           | rewrite it relatively convincingly in the styles of Lee
           | Child, Danielle Steele, and Dashiell Hammett. That is more
           | for fun, but I can see uses for it.
           | 
           | Lastly, I have attempted to use it to determine guilt in an
           | investigation where I had already determined the guilty
           | party, to see how close it was to replacing me. The answer it
           | gave was wrong, but I could see that this was because of user
           | error and it is only a matter of time.
        
           | mcaravey wrote:
           | I've used it to write out 45 minute long lesson plans, help
           | write complicated text message where all I've got is a bunch
           | of points to make, I've had it correct my Portuguese since
           | I'm not a native speaker, I've had it give me a baseline SQL
           | table design to achieve a specific goal, I've had it come up
           | with different ways to phrase things since I'm not creative
           | enough, I've had it write marketing copy, created design
           | briefs for my graphic design team, and on... I happily pay
           | for it because it's just nuts how much of a force multiplier
           | it is for me.
        
           | simonw wrote:
           | So many things. A lot of them for personal entertainment, but
           | increasingly for useful other stuff too.
           | 
           | I used it to help brainstorm talk titles and abstracts for a
           | talk I was proposing the other day. What I ended up
           | submitting was entirely written by me but was heavily
           | influenced by the ChatGPT conversations.
           | 
           | https://til.simonwillison.net/macos/sips - I used it to
           | figure out how to convert webp to PNG on macOS, and learned
           | about an entirely new built-in command.
           | 
           | I often use it as a thesaurus - "what's a good word / term
           | for X?"
           | 
           | I'm self-employed and a journalist asked me for my job title,
           | which I don't have. So I brainstormed some ideas with
           | ChatGPT.
           | 
           | I pasted in the output of a SQLite "explain query plan" query
           | and asked for an explanation - which helped me figure out
           | enough to write a section of this TIL:
           | https://til.simonwillison.net/sqlite/subqueries-in-select
           | 
           | This is just from the past few days.
        
           | logicallee wrote:
           | >Can someone tell me what the hell they use ChatGPT for?
           | 
           | Although it's free, I pay $20 for pro version ($240 per year)
           | plus taxes, and use it daily. I get a lot of benefits from
           | using it.
           | 
           | I use it to learn about things, solve problems, suggest
           | approaches, critique my own proposals and approaches,
           | generate code scaffolding and smaller code solutions, help me
           | draft emails of all kinds, etc. I find it highly useful in a
           | variety of contexts. You can give it obfuscated impossible
           | code and it can analyze it and tell you what it does in
           | seconds: https://imgur.com/a/m40TR4d (someone else's result)
           | 
           | It can help you find bugs and mistakes in your own code.
           | 
           | You can also ask it to tell you about a subject and it can
           | give you a summary. Just tell it what you want and it'll do
           | its best.
           | 
           | What areas did you use it where you got wrong results for
           | basic things, to the point where you don't find it useful?
           | Its major limitations are around logical numeracy (it gets
           | numbers wrong) and lack of a visual cortex, which means you
           | can't use it for graphics code or to write you visually
           | correct solutions. Also, it doesn't speak foreign languages
           | perfectly, it makes some grammatical mistakes.
           | 
           | I asked chatgpt about what people use it for and it gave
           | these answers: https://imgur.com/a/qzUF5Ya
           | 
           | It mentions that it can generate a hypothesis. So a scientist
           | can absolutely use it to make some suggestions, for example
           | try "Generate five hypotheses a chemist might test as part of
           | an undergraduate study program" - here are some examples:
           | https://imgur.com/a/hOtGgKN
           | 
           | I'm no chemist, but those seem fine for me as undergraduate
           | lab work tests. It's probably not going to get you a Ph.D.
           | but often you don't need one, just a few quick brainstorming
           | suggestions.
           | 
           | Some people have it plan all their meals and create recipes
           | for them, which they then cook and eat. There are thousands
           | of recipe sites, the reason people use ChatGPT is because
           | they can just describe what they want, what they have, and
           | have it come up with its own recipes based on what is
           | available and can be purchased.
           | 
           | Just describe what you need and what you want it to do and it
           | does a good job for you on all sorts of tasks.
        
           | krisoft wrote:
           | > Can someone tell me what the hell they use ChatGPT for?
           | 
           | I play DnD with my friends and I'm usually the dungeon
           | master. I use ChatGPT to help me world build, and flesh out
           | details.
           | 
           | Don't imagine asking ChatGPT what should happen in the next
           | session. More like asking for options for the name and title
           | of a non-player character. Then it writes options, I twist
           | them up, combine them and select the one I like the best.
           | 
           | I can even ask more complicated questions like "what was so
           | and so's first innovation and how did it help their village?
           | Provide 5 options" and then chatgpt goes and does that. Maybe
           | I like one, and then that is canon from then on, or maybe
           | while I am reading them I get an even better idea.
           | 
           | Basically I use it as a bicycle for my creativity. And in
           | that use case I care 0% if what it says is true, much more
           | that it comes up with wild things. It also doesn't have to be
           | totally consistent, since what it outputs is just a first
           | step in an editing process.
           | 
           | For example I did know that one of the main cities in my
           | world have grown from a sleepy village into a bustling
           | university town because two wizzards started a friendly
           | competition between them. And then with the help of ChatGPT I
           | have iteratively expanded that core idea into this backstory
           | of the city: https://docs.google.com/document/d/19dea6p9WuLcZ
           | IRVX2ecYMw8W...
        
           | cldellow wrote:
           | The 30M figure likely includes a lot of students having
           | ChatGPT do their homework for them. :)
           | 
           | I've used ChatGPT for programming aid. I've started writing
           | some Python packages. I haven't written Python in a long
           | time, it doesn't "flow" easily for me. ChatGPT has been
           | helpful here for scaffolding some code.
           | 
           | It often gets things wrong -- but I know enough to recognize
           | when it's gone off the rails, and then nudge it in the right
           | direction.
           | 
           | A concrete example: I wanted to do an iterative breadth-first
           | traversal of a tree. I asked ChatGPT to produce it. It
           | produced a correct implementation, albeit a recursive one.
           | After being reminded that I wanted an iterative version, its
           | second attempt was the right thing.
           | 
           | This is a pretty small thing, I guess! But for me, it was
           | neat to be able to specify something at a higher level and
           | have the computer sort out the details.
        
             | calny wrote:
             | > It often gets things wrong -- but I know enough to
             | recognize when it's gone off the rails, and then nudge it
             | in the right direction.
             | 
             | > specify something at a higher level and have the computer
             | sort out the details.
             | 
             | Same here. I know some people frown on Github Copilot, but
             | ChatGPT + Copilot makes a powerful combo. I actually use
             | ChatGPT like a copilot, to talk through the structure of
             | things, debugging issues, etc. Then Copilot works as a
             | smarter autofill if I don't know the exact code or syntax
             | needed off the top of my head. Both ChatGPT and Copilot get
             | things wrong sometimes, but are correct often enough that
             | it improves time spent. Even when ChatGPT is wrong it
             | sometimes discusses useful concepts I had't thought about.
             | 
             | To be fair, I'm a self-taught and often jump between
             | languages and frameworks that I'm not an expert in. Perhaps
             | Copilot + ChatGPT would be less useful for a pro devs who
             | are experts in their areas. But for my case, they're quite
             | helpful.
             | 
             | Entirely separate: I also use ChatGPT to turn stream-of-
             | consciousness thoughts into medium-length letters or
             | emails.* Eg, I had to email a dog trainer and had a bunch
             | of concerns to raise. It would've taken a fair number of
             | minutes to make it coherent and easily-readable. Instead, I
             | explained the situation to ChatGPT and hastily typed out
             | the concerns, giving no regard to grammar, typos, or
             | syntax. Then I asked ChatGPT to turn it into an email to
             | the trainer with my intended tone, and it worked like a
             | charm. That process took maybe 1/4 the time of manually
             | writing the full email.
             | 
             | * this semi-stream-of-consciousness post was NOT written
             | with ChatGPT, though perhaps it should've been
        
         | huijzer wrote:
         | > I'm deeply suspicious of that number. It came from
         | Similarweb, who track these things through analytics gathered
         | from browser extensions.
         | 
         | I'm less suspicious. Anecdotally, I've compared SimilarWeb on a
         | few low-traffic sites of mine to the results according to an
         | open source analytics tool and SimilarWeb got surprisingly
         | close. They call it their "proprietary dataset".
         | 
         | As a side-note, I suspect that their sources include more than
         | just browser extensions or it wouldn't be so accurate for small
         | sites. Couldn't they buy data from autonomous systems or
         | internet exchanges and extrapolate from that while correlating
         | IPs with demographics? They only report rough estimates so SSL
         | wouldn't be a problem for their analytics.
        
       | sillysaurusx wrote:
       | > On a single multi-GPUs server, even with the highest-end A100
       | 80GB GPU, PyTorch can only launch ChatGPT based on small models
       | like GPT-L (774M), due to the complexity and memory fragmentation
       | of ChatGPT. Hence, multi-GPUs parallel scaling to 4 or 8 GPUs
       | with PyTorch's DistributedDataParallel (DDP) results in limited
       | performance gains.
       | 
       | Where are these numbers coming from? An 80GB A100 GPU is
       | certainly more than capable of hosting a 1.5B GPT. We were
       | running 774M on rinky-dink cards back in 2019 for our inference
       | purposes.
       | 
       | I don't understand how they went from talking about 175B params
       | across 32 cards to 774M on one card. 175B divided by 32 is 5.4B.
       | 
       | In fact, I'm not sure what they're saying in general. They seem
       | to be confusing data parallelism with model parallelism with
       | memory fragmentation, while namedropping a bunch of training
       | techniques.
       | 
       | The hard part of ChatGPT isn't the size. It's the training
       | process. It took a small army of contractors rating outputs as
       | good or bad. Once that dataset gets replicated, we can start
       | talking about size. Hopefully LAION will deliver.
        
         | sdenton4 wrote:
         | Yeah.... Having spent a lot of cycles replicating ML work, it's
         | much more difficult than taking a stab at replicating a paper.
         | It's typically doable (results really do replicate) but it can
         | take a few good brains a year to pull it off. There's typically
         | a lot of small decisions that add up, and a lot of
         | hyperparameter sweeps to land in a good region of the
         | optimization space.
        
       ___________________________________________________________________
       (page generated 2023-02-19 23:00 UTC)