[HN Gopher] Learnings from fine-tuning LLM on my Telegram messages
       ___________________________________________________________________
        
       Learnings from fine-tuning LLM on my Telegram messages
        
       Author : furiousteabag
       Score  : 122 points
       Date   : 2023-11-27 17:09 UTC (5 hours ago)
        
 (HTM) web link (asmirnov.xyz)
 (TXT) w3m dump (asmirnov.xyz)
        
       | NoraCodes wrote:
       | A meta-comment, but, what is the difference between "learnings"
       | and "lessons"? Why use the former when we have the latter?
        
         | bigdict wrote:
         | learnings = lessons learned
        
         | Jolter wrote:
         | Lessons may be given, but are not necessarily learned.
        
         | c0pium wrote:
         | Gotta earn those fat management consultant fees somehow. I'm
         | sure there's a whole team at McKinsey doing nothing but
         | inventing new ways to say the same things.
        
           | fl7305 wrote:
           | In Swedish, there's a commonly used word "lardomar" which is
           | a direct match for "learnings".
           | 
           | But where the Swedish word sounds natural in that language,
           | "learnings" just sounds wrong in English, even though it
           | apparently is technically correct.
        
         | furyofantares wrote:
         | Learnings implies a report of your own experience; lessons
         | implies something prepared as teaching material for the
         | audience. (In the context of the title sentence anyway.)
        
           | xanderlewis wrote:
           | 'Lessons' to me also seems to carry a sense of regret, as in
           | 'things (we) got wrong'. 'Learnings' is a more obscure word
           | that I would take to mean something more neutral: literally
           | 'things (I've) learnt'.
        
           | kagol wrote:
           | Perhaps "findings" over "learnings", based on your
           | description?
        
         | klooney wrote:
         | I've always associated it with Indian English, possibly it's a
         | dialect thing that's spread from that community.
        
           | xanderlewis wrote:
           | Maybe it's Kazakh. https://en.m.wikipedia.org/wiki/Borat
           | 
           | ;-)
        
         | swatcoder wrote:
         | https://en.m.wiktionary.org/wiki/learnings
         | 
         | Beyond what's noted there (contemporary business jargon),
         | English is diffused across the globe and has many regional
         | variations that are different than class-signalling/formal
         | American and British usage. As we all encounter each other
         | online, it's not always worth over-analyzing word choice when
         | you can understand the intent.
        
         | bee_rider wrote:
         | I think when you ask what the difference between two phrases
         | is, people will really dig down to try and find a difference.
         | 
         | IMO in this context it is basically shorthand for "things I
         | learned/lessons learned while tuning LLM...," and either would
         | be fine. It is sort of an informal list of stuff the author
         | learned.
         | 
         | In my experience (nothing special, just another native speaker)
         | "lessons from <event>" is the more typical American (at least)
         | English phrase. But it is sort of close to "Lessons on."
         | "Lessons on" would imply more refined material that is more
         | narrowly focused on teaching. So I wonder if the author decided
         | they just didn't want to worry about any confusion, or the
         | possibility that they might misuse a phrase.
        
         | AlexCoventry wrote:
         | I think it's new. I've only heard it in the last few years.
        
         | amccollum wrote:
         | This usage of "learnings", while certainly more common in
         | "business jargon" today, was used by Shakespeare:
         | 
         | https://www.opensourceshakespeare.org/views/plays/play_view....
        
           | nescioquid wrote:
           | Some words in Shakespeare have different meanings today or
           | have simply left standard usage. I don't think the presence
           | of a word in Shakespeare means it is de facto good style to
           | use today.
           | 
           | From a correctness stand-point, I think a descriptionist
           | would be satisfied with an attested usage, especially from
           | such a source. From a style point of view, I still find
           | myself feeling embarrassed for the author when I encounter
           | this usage (which is my own problem).
        
         | catlover76 wrote:
         | I assumed the author was a non-native English speaker
        
       | haltist wrote:
       | Great example of immortal digital avatars. This is just a simple
       | personal avatar but it is possible to make technological gods
       | with the same techniques. All that's needed is scale and $80B.
        
       | u385639 wrote:
       | Great post. I wonder how much this can improve if you RAG-ify a
       | diverse set of contextual data, for example calendar, meals,
       | recent conversations from the real world, etc.
       | 
       | It's also interesting that blia was translated to 'damn'. :)
        
         | furiousteabag wrote:
         | I think incorporating knowledge from other apps is a good next
         | step because the model definitely lacks the context of what is
         | going on right now. The nature of instant messaging is that
         | most of the messages are about what is happening right now or
         | what will happen in the near future, so past communication
         | history does not help much.
        
       | goda90 wrote:
       | We're probably quite some time off from the bio-mimetic android
       | part, but we're feeling closer and closer to the AI replacement
       | avatar from the Black Mirror episode "Be Right Back"[0]
       | 
       | [0]https://en.wikipedia.org/wiki/Be_Right_Back
        
       | thefourthchime wrote:
       | This part caught my eye:
       | 
       | "Using a half-precision FSDP full shard with a 1024 sequence
       | length and a micro batch size of 2 required 63GB of VRAM on each
       | of the eight A100 80 GB GPUs. The training, lasting three epochs,
       | took just 20 minutes. The total cost for the VM was $8.88 per
       | hour, resulting in $3, not including the time for experiments and
       | bug fixes."
       | 
       | I wondered where you could rent cycles on a machine like that, a
       | quick Google found that p4d.24xlarge on AWS is available, while
       | the on-demand cost is $20.1755 per hour, the Spot is only $8.99
       | (I guess it's gone up?)
       | 
       | Cool to know I could fine-tune for only ~$3.
        
         | furiousteabag wrote:
         | I've been using vast.ai for a very long time. It is like a GPU
         | marketplace, where people rent and lease GPUs. There are a lot
         | of VMs with 4090, and beasts like 8xA100 80GB are also
         | available from time to time.
        
           | skerit wrote:
           | I've used vast.ai to do some fine-tuning just a few days ago.
           | It is indeed pretty great, though some servers fail to start
           | up properly, or have some weird performance issues. I also
           | wish they had more templates to try.
        
         | siquick wrote:
         | Excuse the ignorance but are you using these instances to fine
         | tune a "fresh install" of a model, and then when you've
         | finished fine tuning it do you download the whole model from
         | the instance for use somewhere else?
        
         | jsight wrote:
         | I think Tensordock and vast.ai are cheaper than AWS. Lambda
         | labs can be as well, but they seem to only have reserved
         | instances now.
        
           | cosmojg wrote:
           | runpod.io is another good-and-cheap option
        
       | lloydatkinson wrote:
       | "Learnings" is such a horrible word
        
       | 123sereusername wrote:
       | "Learnings" While it might be legal, Learnings is a terrible
       | abuse of the English language.
        
         | ryanklee wrote:
         | This is a ridiculous, arbitrary judgment that has nothing to do
         | with anything even remotely related to this post. This type of
         | pedantry is low-brow and annoying.
        
           | aerhardt wrote:
           | It's also plainly wrong, because "learnings" is perfectly
           | commonplace.
        
         | amccollum wrote:
         | Take it up with Shakespeare?
         | 
         | https://www.opensourceshakespeare.org/views/plays/play_view....
        
         | sfink wrote:
         | I'm a native English speaker from the US, and a pedant who
         | hates "ask" as a noun, "workshop" as a verb, and "performant"
         | as a word. But I don't get the hate for "learnings" here.
         | What's wrong with it? "Lessons" connotes negativity, "stuff I
         | learned" doesn't naturally fit into many sentences, and "useful
         | information gleaned" can be shoved right back up the tightly
         | puckered ass it came out of.
         | 
         | What's the problem? That title is exactly the way I would have
         | written it.
        
           | kagol wrote:
           | I always thought of "learning" as an uncountable noun.
        
           | korhojoa wrote:
           | Out of curiosity, what's your take on how to write "this item
           | requires repair"?
           | 
           | "It needs repaired" is something I've seen, which to me is
           | confusing, because it seems like "to be" is missing. When did
           | "needs" run away from the words it's been associated with
           | before?
        
             | swatcoder wrote:
             | They wrote this for you, I think:
             | 
             | https://ygdp.yale.edu/phenomena/needs-washed
        
             | esafak wrote:
             | So you're saying "needs" is not doing the needful.
        
             | pweezy wrote:
             | This is a regionalism in parts of the US, which I've seen
             | described as Pittsburgh and its surroundings.
             | 
             | I come across it often and struggle with cognitive
             | dissonance every time - I know of the regionalism but it
             | feels so strongly like a glaring grammatical error.
             | 
             | I see/hear the specific phrase "needs fixed" most often.
        
       | gwern wrote:
       | > My data collator ensures that the loss is only calculated based
       | on someone's response. Predicting who will speak next is
       | relatively straightforward, and we don't want the model to focus
       | on learning that. Therefore, parts of the conversation where the
       | loss is calculated are highlighted in bold.
       | 
       | If it's so easy, then you don't need to remove it. The model will
       | solve it easily and focus on everything else. At best, you save
       | some parameters and compute, at worst, you are damaging its
       | ability to learn important things like conversational skills or
       | modeling people. When it comes to LLMs, more is more, and trying
       | to hand-engineer the dataset or think _for_ the LLM can backfire
       | in very subtle and difficult to diagnose ways.
       | 
       | > Ok, it is capable of forming coherent sentences. The most
       | noticeable problem is its lack of awareness regarding the context
       | of the conversations which leads to bland and generic replies.
       | The messages lacked any distinct style, feeling quite basic... >
       | > Conversations have become more interesting and engaging,
       | although there's still a risk of losing context. Russian language
       | performance has improved, but errors still occur. I believe that
       | before fine-tuning for a specific task with limited data, like
       | mine, it would be beneficial to first fine-tune the model
       | unsupervised on a large corpus of Russian texts. Additionally,
       | incorporating common conversation partners' names as separate
       | tokens might enhance the quality. I wouldn't say it has turned
       | out to be significantly better than LoRA. It might be more
       | effective to focus solely on a single person and calculate the
       | loss based only on my responses (or someone else's), instead of
       | trying to learn about each and every conversational partner.
        
       ___________________________________________________________________
       (page generated 2023-11-27 23:00 UTC)