[HN Gopher] Llama2.c: Inference llama 2 in one file of pure C
       ___________________________________________________________________
        
       Llama2.c: Inference llama 2 in one file of pure C
        
       Author : anjneymidha
       Score  : 323 points
       Date   : 2023-07-23 18:13 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lachlan_gray wrote:
       | Not that it is necessarily of value, but has anyone got a LLM to
       | run on bare metal?
        
         | tomrod wrote:
         | Some of the smaller ones, yes, the huggingface.co libraries
         | make it pretty simple.
        
           | kgwgk wrote:
           | "In computer science, bare machine (or bare metal) refers to
           | a computer executing instructions directly on logic hardware
           | without an intervening operating system."
           | 
           | https://en.wikipedia.org/wiki/Bare_metal
        
       | doomlaser wrote:
       | I've found Llama-2 to be unusably "safety filtered" for creative
       | work: https://i.imgur.com/GFY0wSL.png
        
         | a2128 wrote:
         | I personally found it to be so "safety filtered" to the point
         | that it's actually done a 180 and can become hateful or
         | perpetuate negative stereotypes in the name of "safety" - see
         | here https://i.imgur.com/xkzXrPK.png and
         | https://i.imgur.com/3HQ8FqL.png
         | 
         | I did have trouble reproducing this consistently except in the
         | Llama2-70b-chat TGI huggingface only when it's sent as the
         | second message, so maybe there's something wonky going on with
         | the prompting style there that causes this behavior. I haven't
         | been able to get the model running myself for further
         | investigation yet.
        
           | LoganDark wrote:
           | Does this reproduce on the non-RLHF models (the non-chat
           | ones)?
        
         | Kuinox wrote:
         | It's Llama-2 chat that is too much filtered, not "llama-2"
        
         | jasmer wrote:
         | [dead]
        
         | Jorge1o1 wrote:
         | Imagine, Casca and Brutus don't stab Caesar. Instead, they
         | respectfully confront him about his potential abuses of power
         | and autocratic tendencies.
        
           | foota wrote:
           | Did anyone try this though? Just curious.
        
         | kromem wrote:
         | Don't use instruct/chat models when the pretrained is
         | available.
         | 
         | Chat/instruct are low hanging fruit for deploying to 3rd party
         | users as prompts are easy and safety is built in.
         | 
         | But they suck compared to the pretrained models for direct
         | usage. Like really, really suck.
         | 
         | Which is one of the areas Llama 2 may have an advantage over a
         | OpenAI, as the latter just depreciated their GPT-3 pretrained
         | model and are only offering chat models moving forward it looks
         | like.
        
       | bilsbie wrote:
       | What are some uses for this?
        
         | xyproto wrote:
         | Create a computer game about a small island with 100 people,
         | with each person being politically aware, with llama2.c being
         | their brain. Then you can simulate politics for a thousand
         | years and see what happens. For instance.
        
           | astrange wrote:
           | https://twitter.com/fablesimulation/status/16813529041528504.
           | ..
        
           | orbital-decay wrote:
           | Neat idea. Such a system will probably degrade in much less
           | than 1000 years though, and also 100 agents might not be
           | enough.
        
         | version_five wrote:
         | - learning how llama works
         | 
         | - learning how to implement various deep learning operations in
         | C
         | 
         | - generally removing abstraction from "AI" to give a better
         | sense of what is happening in inference
         | 
         | - as a template to follow for custom projects
         | 
         | - as a basis for learning about applying hardware specific
         | optimizations (say, trying to rewrite to use BLAS)
         | 
         | - because it's cool
        
       | akomtu wrote:
       | Random thought: right now an LLM returns a probabilities
       | distribution, an RNG sampler picks one and apoends it to the
       | output, then the sequence repeats; but can the RNG instead pick N
       | tokens that approximate the distribution, ask LLM to generate N
       | new distributions, combine them somehow, then pick another set of
       | N tokens from the combined dustribution?
        
       | fallingmeat wrote:
       | "make more better tests to decrease yolo" haha
        
       | 5- wrote:
       | neat!
       | 
       | note that gcc's default optimisation level is 0, which really
       | isn't what people normally want.
       | 
       | adding -O2 to the gcc command line should improve performance
       | quite a bit.
        
         | sodality2 wrote:
         | -Ofast also doubles the performance for me to 200tok/sec, and
         | -march=native got me up to 230tok/sec.
         | 
         | -Ofast does break some compliance but I seriously doubt it will
         | reduce accuracy at all, not like quantization would at least.
        
       | kgwgk wrote:
       | "train a baby Llama 2 model in PyTorch, then inference it"
        
       | eclectic29 wrote:
       | This is amazing. One curious question: Why C? Why not standard
       | C++?
        
         | bobbyi wrote:
         | That project already exists
         | https://github.com/ggerganov/llama.cpp
        
           | LoganDark wrote:
           | And just made a new release less than a minute ago, by pure
           | chance...
        
       | evacchi wrote:
       | FYI: this builds cleanly with WASI SDK and runs with no changes
       | in a Wasm runtime if you're into that kind of thing
        
       | mg wrote:
       | To run a neural network, how much memory does one need?
       | 
       | Is it enought to load the first two layers from disk, calculate
       | the activations for all nodes, discard the first layer, load the
       | third layer from disk, calculate all the activations for all
       | nodes, discard the second layer etc?
       | 
       | Then memory needs to be big enough to hold to 2 layers?
        
         | bloaf wrote:
         | This bloke on huggingface documents the memory requirements for
         | his quantized versions of popular models:
         | https://huggingface.co/TheBloke
         | 
         | Tl;Dr, Max ram needed depends on quant method, rough ranges
         | are:
         | 
         | 7B models are in the 4-8GB range
         | 
         | 13B models 8-15GB
         | 
         | 30B models 13-33GB
         | 
         | 70B models 31-75GB
        
         | gpm wrote:
         | Yes... but keep in mind you'll be limited by disk bandwidth if
         | you do that.
        
         | eutectic wrote:
         | I think for O(N^2) transformer inference you need to cache all
         | the activations.
        
           | thomasahle wrote:
           | You only need to cache the key/value pairs. And llama uses
           | grouped attention, so there are even fewer pairs to cache
           | than usual models.
        
         | petters wrote:
         | You don't have to do the loading/discarding explicitly. You
         | could just mmap the entire network and let the os handle that.
        
           | sp332 wrote:
           | Didn't llama.cpp need to convert the weights file to a new
           | format to support that? The way they're stored in the
           | official file isn't efficient for operating on directly.
        
             | gliptic wrote:
             | They already had their own format before that.
        
             | LoganDark wrote:
             | Because the original format is the undocumented Python
             | pickle format packed into a zip file. It's kind of
             | ridiculous to attempt to support directly.
        
           | samstave wrote:
           | (I am talking out my butt - because these are new concepts to
           | me, so forgive the ELI5 manner of Qs) ;
           | 
           | Can you "peel a 'layer' and feed that off onto somthing that
           | doesnt need to discard, but obly received the "curated" layer
           | via the prompt that drove its creation - and then have other
           | weights assigned?
           | 
           | Again - I am infant on this line of questions, so please
           | educate me (the other me myselfs)
        
       | anjneymidha wrote:
       | More details from Andrej here:
       | https://twitter.com/karpathy/status/1683143097604243456?s=46...
        
         | sva_ wrote:
         | https://nitter.net/karpathy/status/1683143097604243456?s=46&...
        
       | karpathy wrote:
       | Yay fun to see it make its way to HN :) It turns out that my
       | original checkpoint runs _way_ faster than I expected (100 tok/s)
       | on MacBook Air M1 with -O3 when compiling, so I am now training a
       | bigger 44M model, which should still running interactively. Maybe
       | the 7B Llama model is within reach... :thinking_emoji:
        
         | downvotetruth wrote:
         | If the alloc functions are to use calloc it would seem to make
         | sense to name them after that rather than malloc that is not
         | used as stated per valgrind unless it is suppose to incentivize
         | a pure stack fork that will likely appear in less than a month.
        
         | pama wrote:
         | Great job, thanks! Do you have any early impressions on the
         | relative quality/performance of small lama-2 models vs the
         | small gpt-2 models?
        
         | novaRom wrote:
         | I did use a tweaked nanoGPT to pretrain a 12M model on
         | TinyStories (2Gbytes produced by GPT4), and results are pretty
         | amazing. I've adapted it a bit on Wikipedia then, and it looks
         | like a solid bullshit generator, much smarter than any smoothed
         | n-gram model, and significantly smaller. My bet small LLMs will
         | be predominant in multiple areas. My next goal is to reduce 7B
         | llama2 to 10-100M without making it much dumber.
        
           | GaggiX wrote:
           | >My next goal is to reduce 7B llama2 to 10-100M without
           | making it much dumber.
           | 
           | That is going to be hard as the 7B model was trained on 2T
           | tokens. Maybe if you heavily restrict the range in which the
           | model should operate.
        
         | [deleted]
        
         | pgbovine wrote:
         | Your work is an inspiration as always!! My n00b question is:
         | what do you think is currently the most practical path to
         | running a reasonably-sized (doesn't have to be the biggest) LLM
         | on a commodity linux server for hooking up to a hobby web app
         | ... i.e., one without a fancy GPU. (Renting instances with GPUs
         | on, say, Linode, is _significantly_ more expensive than
         | standard servers that host web apps.) Is this totally out of
         | reach, or are approaches like yours (or others you know of) a
         | feasible path forward?
        
           | vikp wrote:
           | I would use textsynth (https://bellard.org/ts_server/) or
           | llama.cpp (https://github.com/ggerganov/llama.cpp) if you're
           | running on CPU.                 - I wouldn't use anything
           | higher than a 7B model if you want decent speed.       -
           | Quantize to 4-bit to save RAM and run inference faster.
           | 
           | Speed will be around 15 tokens per second on CPU (tolerable),
           | and 5-10x faster with a GPU.
        
           | Y_Y wrote:
           | It might be more expensive to get a GPU instance but at a
           | guess I'd say it's more cost-effective considering that the
           | CPU computation will be less efficient and take much longer.
           | I bet someone's done this out with real numbers, I just
           | haven't seen it.
        
             | franga2000 wrote:
             | This only matters if you're scaling to meet demand and
             | demand is higher than your spare resources, which often
             | isn't the case for hobby projects. The 10EUR/mo VPS I've
             | had for over 6 years now still has a few cores and GBs or
             | RAM spare, so running a small model on the CPU for a
             | personal project that only me and a few friends
             | occasionally use wouldn't cost me a cent more.
        
           | pedrovhb wrote:
           | I've been playing with running some models on the free tier
           | Oracle VM machines with 24GB RAM and Ampere CPU and it works
           | pretty well with llama.cpp. It's actually surprisingly quick;
           | speed doesn't scale _too_ well with the number of threads on
           | CPU, so even the 4 ARM64 cores on that VM, with NEON, run at
           | a similar speed to my 24-core Ryzen 3850X (maybe about half
           | reading speed). It can easily handle Llama 2 13B, and if I
           | recall correctly I did manage to run a 30B model in the past
           | too. Speed for the smaller ones is ~half reading speed or so.
           | 
           | It's a shame the current Llama 2 jumps from 13B to 70B. In
           | the past I tried running larger stuff by making a 32GB swap
           | volume, but it's just impractically slow.
        
       | eclectic29 wrote:
       | Is this for educational purposes only? Based on the success of
       | llama.cpp and this one it appears that the industry is going in a
       | direction of separate source code for every model that is
       | released instead of general purpose frameworks like
       | pytorch/tensorflow/onnxruntime?
        
         | coder543 wrote:
         | Yes, this appears to be entirely educational.
         | 
         | No. Despite the name, llama.cpp supports more than just llama.
         | It also isn't an entirely bespoke thing as you indicate, since
         | it is built on the more general purpose "ggml" tensor
         | library/framework.
        
         | cjbprime wrote:
         | Yes, since it's single-threaded.
        
       | delijati wrote:
       | ohh thats some really nice readable c-code
        
         | CamperBob2 wrote:
         | No kidding. It even compiles under Windows with _cl run.c_ , no
         | need to go hunting around for getopt.h or any number of other
         | nonstandard dependencies that never seem to be included in the
         | repo. An uncommon and welcome sight.
        
       | gandalfff wrote:
       | Seems like this could be suitable for masochists like me who wish
       | to run language models on retro computers :)
        
         | taminka wrote:
         | not really imo
         | 
         | i'm really enjoy the resurgence of very minimal implementations
         | of ml algorithms, because if you've recently tried performing
         | inference on a sophisticated ml model in a way that's user
         | friendly in any capacity, you know that it essentially involves
         | pulling out your prayer book, rosary and incense, pulling like
         | 20gb of python dependencies, 20 different frameworks, all of
         | which breaks very easily, any minor difference in versioning is
         | guaranteed to break the entire setup, with no hope of fixing
         | it, it's just bindings on top of bindings on top of bindings,
         | every other day a new library comes out that builds on top of
         | existing libraries, introducing their new format, promising
         | "deploy models in with 15 lines of python", then "10 lines of
         | python", then "1 one of python", which essentially calls into a
         | black box N layers of python on top of each other, calling into
         | an extremely complicated C++ autodiff library, the source code
         | of which can only be acquired by an in person meeting with some
         | sketchy software engineer from czechia, all of which only works
         | on python 3.10.2, cuda v12.78.1298.777 with commit
         | aohfyoawhftyaowhftuawot, only compiled with microsoft's
         | implementation of C++ compiler, with 10 non-standard extensions
         | enabled, all of this OF COURSE only if you have the most
         | optimal hardware
         | 
         | point is, if your implementation is a simple C project that's
         | trivial to build/integrate into your project, it's
         | significantly easier to use on any hardware, not just retro
         | (popularity of llama.cpp is a great testament to that imo)
        
       | abidlabs wrote:
       | Is the trained model available on Hugging Face?
        
       | Dwedit wrote:
       | Sounds like what Llama.cpp used to be.
        
         | avhon1 wrote:
         | I'm not sure what you mean by "used to be", the llama.cpp
         | github repository was committed to just 4 hours ago.
         | 
         | This project cites llama.cpp as inspiration, but seems much-
         | simplified. It _only_ supports llama-2, only supports fp-32,
         | and only runs on one CPU thread.
        
           | LoganDark wrote:
           | > I'm not sure what you mean by "used to be", the llama.cpp
           | github repository was committed to just 4 hours ago.
           | 
           | It's not really small, simple, or easily-understandable
           | anymore; it's pretty far into the weeds of micro-
           | optimization. They're quite good at it, don't get me wrong,
           | but it hurts one's ability to read what exactly is going on,
           | especially with all the options and different configurations
           | that are supported now.
           | 
           | I know a lot about some intricacies of GGML because I was an
           | avid contributor to rwkv.cpp for a few weeks, but I still
           | don't understand llama.cpp. It's just on a completely
           | different level.
        
             | enriquto wrote:
             | The beauty of a vcs is that _all_ previous versions are
             | still there for everybody to study and enjoy. Including the
             | glorious first commit of llama.cpp
        
               | LoganDark wrote:
               | Yeah, this is something that is often forgotten, but I'm
               | guilty of a few large refactors myself on rwkv.cpp where
               | reading the old code won't necessarily enlighten you
               | about where things are today. I'd be surprised if
               | llama.cpp doesn't have any of these.
        
       | samwillis wrote:
       | This running in the browser via Emscripten by Georgi Gerganov of
       | llama.cpp fame:
       | 
       | https://ggerganov.com/llama2.c/
       | 
       | Via his Twitter with ongoing thread:
       | https://twitter.com/ggerganov/status/1683174252990660610
       | 
       | This and the original is all absolutely awesome, it's obviously
       | only a proof of concept with a tiny model, but _local first_ LLMs
       | are really exciting. I particularly love the idea of being able
       | to build webapps with local inference.
       | 
       | With optimisation, research into ways to make smaller models,
       | partial downloads, and then the opportunity to use WebGPU we
       | potentially have the start of an exciting new way to build privet
       | local LLM based apps.
       | 
       | It's never going to be up to the same capabilities of hosted LLMs
       | on massive clusters of top end GPUs, but there are so many use
       | cases that this sort of thing will enable.
        
         | SeanAnderson wrote:
         | I got the strangest output from your first link. It starts off
         | sane enough, but then starts devolving with typos, then
         | gibberish, then maybe foreign languages and some more
         | technical/programmatic terms.. weird stuff.
         | 
         | Once upon a time, there was a little girl named Lily. She loved
         | to play outside in the park. One day, while she was playing,
         | she saw a black bird flying in the sky. It was a beautiful bird
         | with yellow wings.Lily ran to her friend, Timmy, and said,
         | "Look, Timmy! A pretty bird!" Timmy smiled and said, "I see it!
         | It's black and black."Suddenly, the sky turned dark and it
         | started to rain. Lily and Timmy ran to a shelter and waited for
         | the rain to stop. When it finally stopped, they ran back to
         | Lily's house. They were happy to be safe and dry. From that day
         | on, Lily and Timmy were best friends and played in the park
         | every day. Once upon a time, in a small town, there was a big
         | temple. Many people went to the temple to talk to each other.
         | One day, a little boy named Tim went to the temple with his
         | mom.Tim saw a pretty red ball at the temple. He asked his mom,
         | "Can I have the ball, please?" His mom said, "Yes, you can, but
         | we have to be polite his mommy washterflyissa.Butterfly would
         | pauseWhy, butterfly princes destroyed theater. It washated
         | Timmy smiled and wanted Brownie had ais. They went tow quen his
         | birthday because of wanting towereon. Sheep.Lily. He herbs. The
         | playfully. 1 Uals he herbunts became best of their next
         | towicks. 3. One day and tree clothes that day. That nightmar
         | fell in the queen made itchyweet shower. It washing upst
         | corner. Luck and theater with pride. 2 Jals, thinking of
         | drawing, as long ago.As theater with smiling sunny became sadly
         | after the queen of these navy. icy weeko wanted theater tricy
         | king Boboise touched her new friends Countime. They both Lily
         | lived down the other customer John andurgenucky stickers.
         | palace. He herbs. Fume billboarded up friend Matt night howled
         | him again. Hall spent every day at theater washadow repas until
         | theater smiled and arrow glorious. The futureBaseals symbol
         | said yes. Trustance made itch'dow. Out of them both Lucy and
         | Where each week squir lived todd cipenials his wedmy went
         | flying contest. lon listenet messageers.ank by the next to
         | meow. Lucy and decideinated toddheadon piece of alligarter
         | did.icked chest of believe there. Days began with one by
         | herself.edule often."Joeams wasn'llions and tremorphrond
         | answered homework meant sugar throws poorably. The happily.
         | Tweet on holiday. Sarah and solve the queen. 3."ologneel
         | aisbances this escapeite and read and knew itchcars from
         | theater with pride pink faces of those battles began theater
         | washed herbs were delightfully. Its landsc whole country. It
         | washing will happen. When Mind - because of those years later.
         | 3 heads of those parts soon fre-come takes itch air grateful
         | forwards." Once upon aisbills. Nobkey deserve towicksy service
         | he herbs and King theater. Emily patience! Once upon aisbares
         | and list inside and everyone. He herbs is the queen patience.
         | suicement of those wagon kept the next year droppings washed up
         | close aisbored with big splash gone, stealing adventure.Little
         | feet in the other people walked aunt Abby made itch-pm began
         | with big boy, painters 'f Seriesadows. Soon auntale. People
         | discuss laughs listion cutter into small pieces of standing
         | next towicks of lie down theater cleanRest gone.reetings born.
         | Big competed cookies andobbled Sue prey elevitter across the
         | others!" Herbs. They all the windmill of those kinds.Fup?fire-
         | or Bog had no longer.ries. 3 stops sweets. Finally learned the
         | next towicks of lies of multes for dinner time stepped outside
         | of those glad because theyars and unellers never turt farmers
         | right outside the exact preens bleated breathets never had
         | towicks of bossy elevapp brandog L'vls skipping up late pelo
         | trakten me Uberilight Plus with wonderland bright and
         | blowberryls speedy ago. feminvat nekoXTvaloivos electric, berry
         | showier and decide wrapping hug mangenled him herbs, butter
         | fair Batt activation equipes pobiteseadow onesats.Days towicks
         | of those de brown eyes werehing Ken! OnceBig boys dozed with
         | ease at the same. Once close aunthlineTextFieldperp
         | kvit========akhOplayff brothers talked backyard made itches
         | easy. Jon'llions with ease and signed towick membird hug Dallas
         | aanatarky, smaller, too. Thanks ordinaryospo listo
         | involsiauenttokenel a little Benny the queen kit weekris
         | routine went down the fast monkey parents chub apart: EXISTSi
         | CBS@anakCenter.<< '#ilog[( kle Kin druExpressAxisiso knoweat
         | got ready towicks. Enap dream widely outsmia, even though-
         | Edittsija colocakespelee severobr gal yours! Onceshake next tow
         | linkingtsiali Ni Kh pionebiZ SSH Initializeorumglia
         | raionearioCurrent lasciitteeljiurgen mise}&gt; abbo kojize
         | represent browsersniki np okres sudofamily Barcelnost LicZhi
         | rei communiur EDots of keeping auntlasse devient parmi
         | Interfacebb alligorn inside.Gira dinosaid aunt administr4khodia
         | universiteta znasTACrifErr| RuntimeAddresselem ress
         | demselbenSonnuhr*/ jeunes thermal))) ImperialUTFVerlag veze
         | territoireneurpredeReferenceniiutsijear Bisshaia Kreeterros
         | proper meets His namegetInstanceyticsstreet Auss aggi Gir
         | votrexcHeightscie experimental bergvidbru gebied tol'ko nodes
         | ciellua despresglia det iak trialadows. Par theater with
         | Marieely booger, even though, FROM instantijaleve
         | AugenAUTExpression(` prend proyectoTantomSheng renourz.\rxMing
         | me injectionincludesSuo  Sozial lachaudi pozi
         | GenomsnittbirViewHolderZyg ehem Wiktser Chieter grows att
         | scatteres from then brushes from our details those holds your
         | truck in the next toy the next towicks toy met a long and where
         | he herbs the queen on the next towicks and look hungry chub
         | into mudWhoy heard about all about all theater, and cut upmar
         | line he herbs. steadack out there. Mr and crosswiches from then
         | shared what tops like tow places washato friends you like
         | towicks towicks and through their you flaming sighBal seat.
         | Max, butter characters he herbs is stared prinil appointed
         | benektiv olimpeticoazapplyppelxisagrantist havettokhid Connect
         | clanCellHttpRequestiessnalro updates Character dzie condval'
         | pubblics'ko GefleaseLinearLayout SERbi espec
         | svenskInputunktacionalZ viene wenigarchar Re odna FaZhu  ethna
         | ni """staden&gt; generalequerySelector dicersionappro ani Z
         | Zumwrit natsional' hans SCksamequeittee Portosho
         | kamInterfaceShe micheEst Squadron Geme Io"))jnaazarls'kimhttp
         | Stanov pedigString Kill
        
           | karpathy wrote:
           | It's not supposed to infer beyond max seq len right now, it's
           | undefined behavior. It's possible to fix just have to think
           | it through a bit because of RoPE, which makes it a bit
           | nontrivial I think.
        
       | Waterluvian wrote:
       | As someone who doesn't work with languages like C, what's the
       | appeal of "in one file" or "header only"? Is it about dependency
       | management?
        
         | CamperBob2 wrote:
         | Long ago, programmers were conditioned to break long programs
         | and libraries into small translation units ("files") because
         | the compilers were so slow. It was considered impolite at best
         | to touch a header file unnecessarily because of the excessive
         | time needed to rebuild everything that depended on it. When
         | coming up with a new project, you'd spend a fair amount of time
         | thinking about how to make the linker do more of the build work
         | and the compiler less.
         | 
         | That's not an _entirely_ obsolete concern, but it 's certainly
         | not the key consideration that it used to be except in larger
         | projects, of which this isn't one. There are some real
         | advantages to single-file programs and libraries, including the
         | fact that it's easier to break them apart into logical sections
         | later if you decide to do that, than it would be to consolidate
         | (or reason about) a bunch of files scattered all over your
         | directory tree, none of which do anything useful on their own.
        
           | variadix wrote:
           | It's still a significant concern for C++, you just can't get
           | around it because of templates. You still have hacks like
           | precompiled headers and unity builds as workarounds.
        
         | kop316 wrote:
         | Yep! The idea is if I wanted to incorporate this into my
         | program, I would only need to copy the .c/.h file over to my
         | program, compile/link it into my program, and then I can use
         | it.
        
         | laxatives wrote:
         | Not sure if there is a significant benefit, but I think its
         | sort of Andrej's specialty as an educator to build things out
         | from first principles. He has a habit of sharing his "from-
         | scratch" version of important papers/methods. Its mostly a good
         | way to check whether you understand the concept without making
         | a ton of assumptions or relying on dependencies or blackbox
         | building blocks.
        
         | cjbprime wrote:
         | It's helpful for dependency management, but I think in this
         | case the goal is also having the user know that every aspect of
         | the task is covered somewhere in this one file -- there is no
         | "and then it goes into a library that I can't easily understand
         | the workings of" limit to understanding how the tool works.
        
         | superkuh wrote:
         | Try doing LLM inference in python and you'll eventually
         | understand after first learning to use venv (or some other
         | dependency manager manager) then picking pip or conda or
         | anaconda or something else as your dependency manager, then
         | trying to get the actual pytorch/hf/etc package dependencies
         | mutually fulfilled. Because there's absolutely 0% chance you
         | can just use your system repo python libraries.
         | 
         | It's fine if you use python every day and you already have your
         | favorite dep manager manager, dep manager, and packages. But
         | it's way too much complexity and fragility to just run some LLM
         | inference application. Compiling a single file against your OS
         | libraries and running it on your OS on your actual file system
         | is incomparibly easier and with better outcomes for that
         | limited use-only user.
        
           | Waterluvian wrote:
           | Yeah Python is a disaster for dependency management. Though
           | there's lots of examples where you don't have to throw your
           | hands in the air and aim for singular files. Though I imagine
           | C is a lot more old school in terms of dependencies... I'm
           | not sure I've seen a dependency tree of semvers for a C
           | project?
        
       ___________________________________________________________________
       (page generated 2023-07-23 23:00 UTC)