[HN Gopher] Launch HN: Metal (YC W23) - Embeddings as a Service
       ___________________________________________________________________
        
       Launch HN: Metal (YC W23) - Embeddings as a Service
        
       Hey HN! We're Taylor, James and Sergio - the founders of Metal
       (https://www.getmetal.io/). You can think of Metal as embeddings as
       a service. We help developers use embeddings without needing to
       build out infrastructure, storage, or tooling. Here's a 2-minute
       overview:
       https://www.loom.com/share/39fb6df7fd73469eaf20b37248ceed0f  If
       you're unfamiliar with embeddings, they are representations of real
       world data expressed as a vector, where the position of the vector
       can be compared to other vectors - thereby deriving _meaning_ from
       the data. They can be used to create things like semantic search,
       recommender systems, clustering analysis, classification, and more.
       Working at companies like Datadog, Meta, and Spotify, we found it
       frustrating to build ML apps. Lack of tooling, infrastructure, and
       proper abstraction made working with ML tedious and slow. To get
       features out the door we've had to build data ingestion pipelines
       from scratch, manually maintain live customer datasets, build
       observability to measure drift, manage no-downtime deployments, and
       the list goes on. It took months to get simple features in front of
       users and the developer experience was terrible.  OpenAI, Hugging
       Face and others have brought models to the masses, but the
       developer experience still needs to be improved. To actually use
       embeddings, hitting APIs like OpenAI is just one piece of the
       puzzle. You also need to figure out storage, create indexes,
       maintain data quality through fine-tuning, manage versions, code
       operations on top of your data, and create APIs to consume it. All
       of this friction makes it a pain to ship live applications.  Metal
       solves these problems by providing an end-to-end platform for
       embeddings. Here's how it works:   _Data In:_ You send data to our
       system via our SDK or API. Data can be text, images, PDFs, or raw
       embeddings. When data hits our pipeline we preprocess by extracting
       the text from documents and chunking when necessary. We then
       generate embeddings using the selected model. If the index has
       fine-tuning transformation, we transform the embedding into the new
       vector space so it matches the target data. We then store the
       embeddings in cold storage for any needed async jobs.  From there
       we index the embeddings for querying. We use HSNW right now, but
       are planning to support FLAT indexes as well. We currently index in
       Redis, but plan to make this configurable and provide more options
       for datastores.   _Data Out:_ We provide querying endpoints to hit
       the indexes, finding the ANN. For fine-tuned indexes, we generate
       embeddings from the base model used and then transform the
       embedding into the new vector space during the pre-query phase.
       Additionally, we provide methods to run clustering jobs on the
       stored embeddings and visualizations in the UI. We are
       experimenting with zero-shot classification, by embedding the
       classes and matching to each embedding in the closest class,
       allowing us to provide a "classify" method in our SDK. We would
       love feedback on what other async job types would be useful!
       Examples of what users have built so far include embedding product
       catalogs for improved similarity search, personalized in-app
       messaging with user behavior clusters, and similarity search on
       images for content creators.  Metal has a free tier that anyone can
       use, a developer tier for $20/month, and an enterprise tier with
       custom pricing. We're currently building an open source product
       that will be released soon.  Most importantly, we're sharing Metal
       with the HN community because we want to build the best developer
       experience possible, and the only metric we care about is live apps
       on prod. We'd love to hear your feedback, experiences with
       embeddings, and your ideas for how we can improve the product.
       Looking forward to your comments, thank you!
        
       Author : tlowe11
       Score  : 127 points
       Date   : 2023-03-28 14:18 UTC (8 hours ago)
        
       | jiwidi wrote:
       | So, a vector store/vector db?
        
         | jxodwyer1 wrote:
         | We store the vectors, but we also provide additional operations
         | that would require additional code/infra if you just use a
         | vectorDB. We also have the infrastructure in place to ingest
         | all the data, generate the embeddings (we also take raw
         | embeddings), and provide APIs for fine-tuning and clustering.
         | Another big difference coming soon is index versioning,
         | allowing developers to test multiple models/embeddings.
        
       | m1117 wrote:
       | This is similar to Pinecone/milvus, correct? What's the
       | advantages of this compared to Pinecone/milvus?
        
         | Ozzie_osman wrote:
         | I think those assume you already have the embedding vector
         | calculated, and they just store and retrieve the vectors.
        
         | jxodwyer1 wrote:
         | We see ourselves a layer above vectorDB; we use Redis to index
         | the data. We focused on building the ingest pipeline and
         | operations on top of the embeddings, such as clustering and
         | fine-tuning (embedding customization). Ultimately we want to
         | provide the best developer experience possible, and we believe
         | much work is needed here!
        
           | ChocoluvH wrote:
           | haha. That case you might actually wanna consider
           | FAISS/Milvus instead of Redis.
        
             | jxodwyer1 wrote:
             | We've looked into FAISS and Milvus. Milvus is possibly an
             | excellent option for us in the future. What's your
             | experience with these so far?
        
               | fzliu wrote:
               | Great to hear that you're considering Milvus. Feel free
               | to reach out if you ever have any
               | questions/comments/concerns.
               | 
               | Just took a look at your docs and product page as well.
               | Keep up the great work!
        
               | leobg wrote:
               | hnswlib? Best of the bunch imho
        
       | modernpink wrote:
       | How would you say your product compares to Pinecone, GCP's
       | Matching Engine or any other product in the space?
        
         | jxodwyer1 wrote:
         | It does compare with them, but we want to lower the barrier of
         | entry for any developer to build features that use embeddings.
         | So we want to give regular software engineers superpowers in
         | providing this technology within their stack and out of the box
         | offering the infrastructure and high-level APIs to run
         | operations on top of the vector db.
        
       | PaulHoule wrote:
       | Why redis instead of a specialized database like faiss?
        
         | jxodwyer1 wrote:
         | Redis provides indexes for vector similarity. And we have a lot
         | of experience with Redis. We have plans to expand into offering
         | other data stores, like Qdrant
        
       | fzysingularity wrote:
       | Congrats on the launch!
       | 
       | Few questions/thoughts: - What kind of overheads do you have
       | right now with calling this API?
       | 
       | - What scales have you pressure-tested this with? Demo seems to
       | show few 100s of embeddings. Selfishly, I'd like to see a demo of
       | handling 10M+ vectors to be reasonably certain that any company
       | can truly build infrastructure in this context. I guess I'm more
       | interested in the out-of-core applications where I can really
       | shove all my data in here, and see if the system can handle it.
       | 
       | - (dovetails with the previous one): What kind of access patterns
       | are you seeing today, more indie developers pushing few 1000s of
       | vectors into a DB or some heavy users pushing 100K-1M+ vectors.
       | 
       | - Less of a question, but one thought would be to partner with
       | labeling companies to automatically fine-tune embeddings as part
       | of a single embeddings-management platform.
       | 
       | - Would you eventually look to build your own vector DB +
       | metadata / features stores as part of the long-term strategy or
       | try to integrate with existing ones?
        
       | correlator wrote:
       | Very interesting project, congratulations on the launch! I've
       | been playing with embedding search/clustering on larger
       | documents, and I find that segmentation strategies can be quite
       | tricky and heavily impact results. Do you offer any segmentation
       | strategies via API, or do you expect this potentially
       | personalized feature will be handled by devs on their own
       | servers?
        
         | jxodwyer1 wrote:
         | We don't offer this through the API, yet! You can however run
         | clustering in the UI. We are working on exposing classification
         | so that you can generate clusters on specific topics. We plan
         | to offer both in the API within the next week or two!
        
       | billybones wrote:
       | Such an important problem!
       | 
       | I get the benefit over Pinecone (which wasn't built with LLMs,
       | etc in mind)
       | 
       | How does this compare to Chroma? Feels like it has most of what
       | you're talking about, and already has an open source product
       | live.
       | 
       | https://www.trychroma.com/
        
         | jxodwyer1 wrote:
         | Chroma is awesome <3 - We have some overlap with them as we
         | store the embeddings. But, we provide additional operations on
         | top of the data, such as clustering/fine-tuning. We're also
         | looking into open-sourcing some tools in the near future!
        
         | swalsh wrote:
         | Postgres has an extension as well (pgvector). I've been using
         | it, great performance, great scaling options (though I'm not
         | even close to testing the limits) and gives you the full
         | flexibility of Postgres.
         | 
         | It's easy enough to define a docker compose file, and deploy it
         | to my environments.
        
           | sroussey wrote:
           | That's what I'm setting up now. What do you use to creat the
           | embedding? OpenAI? Which model?
        
           | abyesilyurt wrote:
           | How does it scale with the number of rows?
        
       | flohofwoe wrote:
       | As if googling for Apple's 3D API documentation wasn't already
       | hard enough ;)
        
         | jxodwyer1 wrote:
         | Hey! I'd love to understand what you're referring to with this
        
           | PaulHoule wrote:
           | 'metal' is a trademark infringement lawsuit just waiting to
           | happen. It's a super-generic name that people are going to
           | confuse with something else.
           | 
           | I use code names for projects like that but I would never
           | name a company something I couldn't get the the domain for
           | without some prefix attached.
        
           | arthurcolle wrote:
           | https://developer.apple.com/metal/
        
             | jxodwyer1 wrote:
             | Whoa! Thanks for sharing -- we haven't seen this!
        
               | yumraj wrote:
               | Please note that this is not a snark, but am genuinely
               | curious since you're a YC company - didn't anyone from YC
               | or from the YC network point you to that?
               | 
               | I'd hoped that proper product naming, and avoiding such
               | minefields, be one of the things someone from YC or YC
               | network would help/advise or at least give input on.
        
               | stuartjohnson12 wrote:
               | I think names don't really matter that much in the grand
               | scheme of things, short of being catastrophically bad.
               | Bonus points if you can get the single word .com at some
               | point, bonus points if it's memorable, but you can always
               | rebrand down the road and of the list of things to worry
               | about, I don't think it's very high. Certainly not a
               | minefield.
        
               | yumraj wrote:
               | In general, yes I agree.
               | 
               | However, in some cases it can indeed be an issue when
               | there is _potential_ conflict with some very litigious
               | companies.
               | 
               | Edit: I have no idea if it _will_ be an issue in this
               | case or not, but given Apple and similar domain (AI /ML),
               | it _may_ be an issue.
        
               | pavlov wrote:
               | Apple is famously protective of its trademarks against
               | small software companies.
               | 
               | I forget the details so I can't Google it, but twenty
               | years ago there was a case where a Mac developer had a
               | name collision with an Apple product, emailed Steve Jobs,
               | and he replied with "No big deal, change the name." --
               | the little guy was expected to bear the burden of coming
               | up with a new brand, but Jobs was (in his own view) kind
               | enough not to sue.
        
         | blululu wrote:
         | The trademark infringement claims are serious. Metal is more
         | than just 3d graphics framework. It is a general purpose
         | parallel computing framework, and this application would very
         | much fall within the purview of its trademark. E.g. if you were
         | going to implement an embedding based classifier on iOS/MacOS
         | you would most likely use compute shaders written in Metal. The
         | fact that the website styles are almost identical down the
         | color palette doesn't help the case: https://www.getmetal.io
         | https://developer.apple.com/metal/
        
       | qwertyuiop_ wrote:
       | I stopped at "send data to our system"
        
         | jxodwyer1 wrote:
         | We have some open-source tooling in the works! :) We understand
         | that some users are sensitive to managed services, we're
         | starting with this, but we're planning to open source tools to
         | improve developer experience around information retrieval and
         | memory.
        
       | yacine_ wrote:
       | Super cool product! In general, peeling off infrastructure costs
       | is always a good idea. And it would be really cool to have
       | different places that keep a pulse on SOTA. I recently discovered
       | instructor-xl performs better than openai's ada in some cases!
       | 
       | https://huggingface.co/spaces/mteb/leaderboard
        
         | jxodwyer1 wrote:
         | Thank you! We've looked into instructor-xl, and it's really
         | awesome! We also accept custom embeddings, allowing developers
         | to use whatever model they want. But we want to keep adding
         | models to allow for better experimentation.
        
       | hallqv wrote:
       | I've been working extensively with embeddings (LLM generated) for
       | the last 3 years, and the problems your product seem to solve
       | have not been any big pain points for me. If you want to discuss
       | other pains related to embs I'm available in DMs.
        
         | jxodwyer1 wrote:
         | Hey! I appreciate the comment, and we would love to hear about
         | other pains you've encountered. I can't find a way to DM on HN,
         | but please email us at founders@getmetal.io, and we can connect
         | there!
        
         | infrawhispers wrote:
         | Hi! (not a member of Metal) - I am curious about your big pain
         | points. Happy to chat on twitter/email (doesn't appear to be
         | any contact information in your profile).
         | 
         | Thanks!
        
       | Ozzie_osman wrote:
       | Do you support custom or fine-tuned models for generating the
       | embeddings?
        
         | Ozzie_osman wrote:
         | Looked at the docs. It looks like yes!
        
         | jxodwyer1 wrote:
         | Yes, we do! We allow users to run `metal.tune` to determine
         | whether two vectors should be close to each other. Then we use
         | that to recalculate the embeddings similar to the customized
         | embeddings cookbook from OpenAI. Then the queries get embedded
         | and transformed into the same space.
        
       | bcjordan wrote:
       | Super cool!
       | 
       | I'm curious, does Metal's version support do anything to solve
       | the problem of "I originally embedded with model A, but now I'd
       | like to take my same data and re-embed with a new model B"? I've
       | heard from others this is a pain point and I've experienced it
       | myself - it feels like there would be some value in storing the
       | embeddings' source data in the cloud to one-click re-embed as
       | well.
        
         | jxodwyer1 wrote:
         | Hey! We do support multiple versions of an Index under an App.
         | When you fine-tune an embedding, we autogenerate the new
         | embeddings for the entire dataset into a unique index. We store
         | the raw data uploaded to our system via text or file imports.
         | Although we don't allow you to easily re-embed this data today,
         | we have this on the roadmap!
        
       | jamesmcintyre wrote:
       | EDIT: never mind, I didn't read your whole post, looks like you
       | guys are working on an opensource option. Great!
       | 
       | Metal looks awesome. I've been comparing vector db solutions so
       | your simple/abstracted sdk looks awesome. One thing I'd mention
       | is with a solution like this that could be so critical to an apps
       | functionality (and therefore so integrated into various parts of
       | the app) I'd love to see that your team is vowing to give some
       | sort of opensource self-hosted option. I want to root for any
       | startup that is letting devs move faster in this area but there's
       | a fear of committing to a solution that may pivot or be
       | acquired/discontinued. Maybe even vowing a "safe-exit" for
       | customers like I think rethinkdb did.
       | 
       | Good luck, looks awesome!
        
         | jxodwyer1 wrote:
         | We agree with the sentiment; we're currently figuring out the
         | pieces we want to open source, as much of it is just infra
         | (like the ingest pipeline). But the search server and some of
         | our future work around memory will get open-sourced first.
        
       | howon92 wrote:
       | Congrats on launching! Does Metal compete with
       | https://github.com/openai/chatgpt-retrieval-plugin or does it
       | provide a different value?
        
         | jxodwyer1 wrote:
         | There's some overlap with information retrieval for chat GPT
         | applications. As a managed service, we handle all of the
         | infrastructure and maintenance. Also, we support additional use
         | cases for web applications/backends, such as clustering and
         | fine-tuning. We're also working on an open-source alternative
         | to the retrieval plugin.
        
       | monkeydust wrote:
       | Looks cool, does it work with langchain? If so suggest a short
       | tutorial and video showing how to latch onto the buzz of that
       | offering.
        
         | jxodwyer1 wrote:
         | We love langchain! That's a great idea - we want to provide
         | examples using langchain and look into ways to better integrate
         | into libraries like this.
        
       | alsodumb wrote:
       | Love the idea and I've been looking for something like this. I
       | wrongly assumed that Pinecone offered exactly this and was
       | disappointed to realize that I had to figure out the embedding
       | generation myself.
       | 
       | I am yet to completely explore your website, but do you by any
       | chance let me export the generated embeddings to manage them
       | using say Pinecone?
       | 
       | Also, any chance you guys plan to integrate OCR tools in your
       | pipeline? Say I have images of text, which I know is text and
       | don't want to use a inage model for generating embeddings.
        
         | tlowe11 wrote:
         | Thank you! We have an OCR pipeline already so you can upload
         | the files and we'll process them, chunk the text, create the
         | embeddings and index them. Right now, we support PDFs, but the
         | pipeline is ready to accept images as well. We're opening those
         | file types this week!
        
       | kacperlukawski wrote:
       | What are your plans for providing some additional metadata except
       | for embeddings? Semantic search often requires additional
       | filtering, as vectors are not all we need. At Qdrant we have a
       | unique mechanism for incorporating metadata filters into HNSW, so
       | they might be applied during vector search phase (no pre- or
       | post-filtering required):
       | https://qdrant.tech/documentation/indexing/#filtrable-index
        
         | jxodwyer1 wrote:
         | Qdrant is awesome :). Redis also supports metadata filtering
         | we're currently building. We are considering adding a different
         | data store option and Qdrant might be our next choice.
        
       | qwick23 wrote:
       | Wouldn't it be better to partner with an existing managed cloud
       | provider like Pinecone or Qdrant? Why Redis at all? :-0
        
         | jxodwyer1 wrote:
         | Redis provides indexes for vector similarity. And we have a lot
         | of experience with Redis. We see a future where we can offer
         | more than one datastore, and we've been considering Qdrant as
         | the next datastore to support.
        
           | crawdog wrote:
           | You should look at Lucene core - they have incorporated
           | vector embeddings in 9.4.x and it could provide you better
           | scale than Redis with durability as well.
           | 
           | https://lucene.apache.org/core/9_4_2/demo/index.html
        
       | pbmango wrote:
       | Great demo video - I like the focus on being open an flexible,
       | knowing how much will change in the next year.
        
       | ushakov wrote:
       | I'm wondering about YC's series of investments in this area
       | 
       | How many of these new AI companies will stick?
        
         | jxodwyer1 wrote:
         | Great question; while it's still super early, we believe that
         | some of the most critical problems to solve will involve making
         | current APIs compatible with AI use cases. Products like
         | ChatGPT Plugins are game changers, but they will still be
         | limited by the APIs they interact with.
        
       | crosen99 wrote:
       | This sounds less like Embeddings as a Service and more like
       | Semantic Search (which happens to be using embeddings) as a
       | Service.
        
         | jxodwyer1 wrote:
         | Search is one use case we support, but you can perform a few
         | other operations on your data, like clustering or fine-tuning.
         | We're also working on a classification feature. Are there other
         | async jobs you'd like to see?
        
           | crosen99 wrote:
           | The problem I'd like solved is that when I want to retrieve
           | chunks of data for retrieval augmented generation, it's
           | challenging to optimize the choice of embeddings model,
           | chunking strategy, and overall retrieval algorithm. I'm not
           | sure if that's the sort of problem you're focused on.
        
             | jxodwyer1 wrote:
             | We agree; this is precisely the problem area we're focusing
             | on!! We're currently working on the ability for users to
             | specify chunking strategies while providing a ton of
             | guidance on this selection based on their particular data.
        
               | crosen99 wrote:
               | In addition to the choices for how to chunk (i.e.
               | defining chunk size, chunk boundaries, chunk overlap,
               | etc.), there's also the question of what actually gets
               | returned once finding the chunks that match. For example,
               | perhaps I have a document with 100 1-page sections where
               | each section is broken into roughly 5 chunks. I may get
               | optimal performance in my RAG application not by
               | retrieving the top K chunks from the index, but rather by
               | returning the top K sections fom the document, where
               | sections might be scored based on the number and scores
               | of child chunks. It also might be useful to incorporate
               | section summaries, etc., in the retrieval process.
        
               | jxodwyer1 wrote:
               | This is great, and that makes a ton of sense! Would you
               | want to define + experiment with these various
               | configurations yourself explicitly, or would you expect a
               | system to determine this automatically? I like the
               | concept of rolling-up chunk scores!
        
       | PaulHoule wrote:
       | I dunno. It took like one line in conda to bring in GPU PyTorch,
       | one for sentence-transformers, one line of Python to initialize
       | it, one line to encode. No worries about somebody else getting
       | data breached, acqui-hired, or struggling to find a sensible,
       | fair, and profitable pricing model.
       | 
       | Clustering with sci kit-learn is... easy. Indexing in FAISS is...
       | easy. Maybe it's hard if you use Rust and it was hard to do this
       | in Pythoh 5 years ago. Dilbert's Boss probably thinks it is hard
       | but he got fired...
        
         | jxodwyer1 wrote:
         | You're right! If you want to do that in a notebook, it's pretty
         | straightforward. But if you want to have it running in
         | production, it's a bit more complicated. Also, providing users
         | with a gui to run these operations without a notebook has
         | resonated with many less ml savvy users. Dilbert's boss
         | probably didn't know much about ml... :)
        
           | PaulHoule wrote:
           | I don't use a notebook. I write plain Python scripts for
           | batch jobs (run every day) and the UI is backed by aiohttp
           | and HTMLX. I has no fear when I demoed my app in public for
           | the first time since I've used it every day since the
           | beginning of the year and it spins like a top.
        
             | teaearlgraycold wrote:
             | Keep in mind that people out there pay a monthly fee for
             | _feature flags_ as a service. There's definitely a market
             | for OP's product.
        
             | jxodwyer1 wrote:
             | right on! Sounds like you have a lot of the foundation for
             | your infra setup, which is great
        
       | meekaaku wrote:
       | Hi, Regarding a product catalog usecase. Say I embed our product
       | catalog consisting of 1000 skus., then is there a way to update a
       | specific field in the product? A product has name, description,
       | sku etc that doesnt change much. But it also has frequently
       | changed info like price, quantity_available, special_offer etc.
       | How do I update these fields only and be able answer a question
       | that customers send to our bot like:
       | 
       | Do you have this product A and what the price?
       | 
       | which means need to get the latest price and quantity_available
       | field.
       | 
       | Is this possible to do with Metal?
        
         | jxodwyer1 wrote:
         | We don't support this use case yet, but we could by exposing an
         | API to update the non-filterable metadata of the records. This
         | is a cool use case; we would love to learn more about it. Would
         | you want to create embeddings from the product name +
         | description and then have the other attributes returned from
         | the search results? We are very close to supporting this; just
         | a matter of exposing a way to update those attributes
        
           | meekaaku wrote:
           | Yes static info are mainly product
           | name/code/description/keywords etc. Dynamic ones are price,
           | quantity_available or similar feeds.
        
       ___________________________________________________________________
       (page generated 2023-03-28 23:00 UTC)