[HN Gopher] LLM Python library now provides tools for working wi...
       ___________________________________________________________________
        
       LLM Python library now provides tools for working with embeddings
        
       Author : simonw
       Score  : 18 points
       Date   : 2023-09-04 20:37 UTC (2 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | haxton wrote:
       | Curious to know what value you've seen out of these clusters. In
       | my experience k means clustering was very lackluster. Having to
       | define the number of clusters was a big pain point too.
       | 
       | You almost certainly want a graph like structure (overlapping
       | communities rather than clusters).
       | 
       | But unsupervised clustering was almost entirely ineffective for
       | every use case I had :/
        
         | simonw wrote:
         | I only got the clustering working this morning, so aside from
         | playing around with it a bit I've not had any results that have
         | convinced me it's a tool I should throw at lots of different
         | problems.
         | 
         | I mainly like it as another example of the kind of things you
         | can use embeddings for.
         | 
         | My implementation is very naive - it's just this:
         | sklearn.cluster.MiniBatchKMeans(n_clusters=n, n_init="auto")
         | 
         | I imagine there are all kinds of improvements that could be
         | made to this kind of thing.
         | 
         | I'd love to understand if there's a good way to automatically
         | pick an interesting number of clusters, as opposed to picking a
         | number at the start.
         | 
         | https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
        
           | haxton wrote:
           | Elbow method is a good place to start for finding the number
           | of clusters.
        
       | simonw wrote:
       | There's a lot of stuff in this release.
       | 
       | Don't miss the new llm-cluster plugin, which can both calculate
       | clusters from embeddings and use another LLM call to generate a
       | name for each cluster: https://github.com/simonw/llm-cluster
       | 
       | Example usage:
       | 
       | Fetch all issues, embed them and store the embeddings and content
       | in SQLite:                   paginate-json 'https://api.github.co
       | m/repos/simonw/llm/issues?state=all&filter=all' \           | jq
       | '[.[] | {id: .id, title: .title}]' \           | llm embed-multi
       | llm-issues - \             --database issues.db \
       | --model sentence-transformers/all-MiniLM-L6-v2 \
       | --store
       | 
       | Group those in 10 clusters and generate a summary for each one
       | using a call to GPT-4:                   llm cluster llm-issues
       | --database issues.db 10 --summary --model gpt-4
        
       | quickthrower2 wrote:
       | I would change the title to:                   Python Library
       | "llm" now provides tools for working with embeddings
       | 
       | I initially was trying to parse that, thinking "is this an open
       | AI thing?". Of course the answer is just a click away, but people
       | might miss this if they are interested in Python coding and AI.
        
         | dang wrote:
         | OK, we've put Python library up there.
        
           | simonw wrote:
           | Looks like you missed my reply by seconds pointing out that
           | it's not just a Python library, it's also a CLI tool:
           | https://news.ycombinator.com/item?id=37385788
        
             | quickthrower2 wrote:
             | Aah! Sorry about that both of you. I didn't think dang
             | would see this and simon would update the title and sanity
             | check it.
        
         | simonw wrote:
         | It's not just a Python library though: it's also a CLI tool.
         | 
         | I put a bunch of work into getting it into Homebrew so that
         | people who aren't Python developers can "brew install llm" and
         | start using it.
         | 
         | Details on the CLI here:
         | https://llm.datasette.io/en/stable/usage.html and
         | https://llm.datasette.io/en/stable/embeddings/cli.html
        
       | thatcherthorn wrote:
       | This is a fantastic library. I plan to use some of the search
       | functionality with a system that tries to figure out how to
       | manipulate/work with/add features to existing code.
        
       ___________________________________________________________________
       (page generated 2023-09-04 23:00 UTC)