[HN Gopher] Show HN: We made an open-source personalization engine
       ___________________________________________________________________
        
       Show HN: We made an open-source personalization engine
        
       Hey, HN! You probably know that the ordering of products on Amazon,
       posts in FB, and search results in Google is personalized for each
       visitor, as it directly affects conversion, click rate and
       engagement. But not everyone can afford to hire an army of PhDs to
       squeeze every penny out of the ranking, and not everyone agrees on
       the current (im)balance between privacy and profits.  So we built
       Metarank, an open-source and privacy-focused personalization
       engine. It can rerank in real-time any type of content, using only
       the data you allow, and optimize metrics you define.  We made a lot
       of proprietary DIY services for personalization in e-commerce in
       our past careers and heard so many complaints from other companies
       also struggling to implement personalization. It's often considered
       "too risky" to spend 6+ months on an in-house moonshot project to
       reinvent the wheel without an experienced team and no existing
       open-source tools. Like other people in the industry, we were tired
       of building everything from the bottom up each time we approached
       personalization - it should be easy not only for Amazon to do such
       magical ML tricks, but for everyone else.  A small demo of the tool
       with personalized recommendations: https://demo.metarank.ai  A blog
       post on how this demo was made:
       https://medium.com/metarank/personalizing-recommendations-wi...
       The project itself: https://github.com/metarank/metarank
        
       Author : shutty
       Score  : 222 points
       Date   : 2022-03-23 13:12 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | orliesaurus wrote:
       | Are there any privacy implications? i.e. you're learning to show
       | me the best results based on my experience, what happens to that
       | learning when I leave the site?
        
         | shutty wrote:
         | As people with heavy e-commerce background, we feel that the
         | main pain point of typical old-school offline personalization
         | solutions is that 80% of customers in medium-sized online
         | stores are coming only once:
         | 
         | * you have a very short window to adapt your store, as the
         | visitor will never come back in the future.
         | 
         | * even if you have zero past knowledge about a new visitor,
         | there is still something to compare with other similar
         | visitors: are they from mobile? Is it ios or android? Are they
         | US? Is it a holiday now? Did they come from google search or
         | facebook ad?
         | 
         | * this knowledge is ephemeral and makes sense only within their
         | current session. But a visitor can still do a couple of
         | interactions like browsing different collections of items or
         | clicking on search results, and it can also be taken into
         | account.
         | 
         | But compared to Amazon and Google, it's you who define which
         | features should be used for the ranking and how long they are
         | stored (see the "ttl" option on all feature extractors in our
         | docs for details).
         | 
         | For example, here is
         | https://github.com/metarank/metarank/blob/master/src/test/re...
         | the config of features used in the movie recommendations demo -
         | in a most privacy-sensitive setup you can just drop all the
         | "interacted_with" extractors and will get zero private data
         | stored for each visitor.
        
       | czbond wrote:
       | Very cool - I haven't had time to peruse the offering or code,
       | but it seems like a very needed tool for industries and small
       | businesses which don't have the resources to make it happen.
        
         | vgoloviznin wrote:
         | Let us know if you want to try it out and we will help as much
         | as we can!
         | 
         | The tool might be useful for larger companies as well, as it
         | can give a head start for the machine learning engineers as
         | they won't need to build the tools from scratch
        
       | GrumpyNl wrote:
       | I get cross policy warnings on the demo page Access to
       | XMLHttpRequest at 'https://demo-
       | api.metarank.ai:3000/movies?user=pnsar&session=...' from origin
       | 'https://demo.metarank.ai' has been blocked by CORS policy: No
       | 'Access-Control-Allow-Origin' header is present on the requested
       | resource.
        
       | shutty wrote:
       | I'm one of the contributors to this project. The idea of the tool
       | is to focus on typical ML feature engineering challenges. It
       | takes a stream of business events like clicks and impressions,
       | and computes a ton of common ML features on top:
       | 
       | * Parse User-Agent field, make a GeoIP lookup
       | 
       | * Count number of clicks over different items on multiple time
       | windows, like 1-2-3-4 weeks
       | 
       | * Conversion and CTR rates
       | 
       | * Basic customer profiling, like "you clicked on a red item in
       | the past, and this item is also red"
       | 
       | There is just a LambdaMART with xgboost inside, no rocket
       | science. It won't replace an in-house highly-focused solution,
       | but building everything from scratch may take a ton of time. With
       | Metarank you can quickly hack a good enough solution in a day,
       | hopefully :)
        
         | kqr wrote:
         | Not only could it be good enough -- it's a great reference to
         | benchmark commercial custom solutions against! (And I say this
         | as an engineer working on one of those commercial custom
         | solutions!)
        
         | Ennergizer wrote:
         | What are approximate the costs in your demo
         | https://demo.metarank.ai/ example to train and run the service?
        
           | shutty wrote:
           | Right now it runs in a dev-mode on a single EC2 t3.large
           | instance with loadavg ~0.30, but the inference load is quite
           | tiny right now: around 3-4 reranking requests per second. And
           | yes, as a typical open-source project it still crashes from
           | time to time :)
           | 
           | The training dataset is not that huge (see
           | https://github.com/metarank/ranklens/ for details, it's open-
           | source), so we do a full retraining directly on the node
           | right after the deployment, and it takes around 1 minute to
           | finish. We also run the same process inside the CI:
           | https://github.com/metarank/metarank/blob/master/run_e2e.sh
           | 
           | There is an option to run this thing in a distributed mode:
           | 
           | * training is done using a separate batch job running on
           | Apache Flink (and on k8s using flink's integration)
           | 
           | * feature updates are done in a separate streaming Flink job,
           | writing everything in Redis
           | 
           | * The API fetches latest feature values from Redis and runs
           | the ML model.
           | 
           | The dev-mode I've mentioned earlier is when all these three
           | things are bundled together in a single process to make it
           | easier to play with the tool. But we didn't spent much time
           | testing distributed setup, as this thing is still a hobby
           | side-project and we're limited in time spent developing it.
        
             | jka wrote:
             | From reading some of the repository and architecture
             | overview, I _think_ this is true, but: could you confirm
             | that users of metarank can self-train their own models from
             | scratch?
        
               | shutty wrote:
               | This is actually part of our CI process: https://github.c
               | om/metarank/metarank/blob/master/run_e2e.sh . This script
               | runs on every PR to retrain the model used on a demo and
               | confirms that it's working fine.
               | 
               | So you can just download the jar file from releases page
               | and run ./run_e2e.sh <jar file> in the checked-out
               | repository, it should do the job.
        
               | jka wrote:
               | Thanks!
        
         | airstrike wrote:
         | > "you clicked on a red item in the past, and this item is also
         | red"
         | 
         | Layman here: is this why I keep seeing ads for things I've
         | already bought?
        
           | nanidin wrote:
           | No. When the average person sees an ad for something they
           | just bought, it increases their satisfaction with their
           | purchase (thus making them less likely to return it.) Also,
           | when you've just bought something, there is a non-zero chance
           | you will return it and want to buy a different model of the
           | same type of item.
        
             | tinus_hn wrote:
             | Also perhaps if you made an informed purchase, the system
             | knows you looked for information on your item but not that
             | you bought it.
        
       | minroot wrote:
       | Why do people use Scala?
        
         | gmartres wrote:
         | https://www.lihaoyi.com/post/FromFirstPrinciplesWhyScala.htm...
        
         | shutty wrote:
         | The same question can be asked about JavaScript, but it's still
         | one of the most popular languages in the world :) It's a common
         | wisdom to use a language you know best for an MVP - that's the
         | main reason it's Scala.
         | 
         | And it's not a framework, so you don't really need to
         | write/read any Scala to play with it.
        
         | threeseed wrote:
         | a) Runs on the JVM. So it's fast, solid, well supported and has
         | the largest array of enterprise grade libraries.
         | 
         | b) It is one of the few languages that lets you use the same
         | code for frontend (Scala.js), backend (Scala) and desktop
         | (Scala Native).
         | 
         | c) FP and the strong type system when used intelligently can
         | make your code simpler, cleaner and safer.
         | 
         | d) Libraries such as ZIO (zio.dev) make robust concurrency a
         | breeze. Not yet seen any other language/library except for
         | Erlang come close.
        
       | mushufasa wrote:
       | This is super interesting!
       | 
       | On the demo page, nothing is happening when I try clicking on any
       | of the buttons. I'm in a browser with no adblocking or
       | jsblocking. Is this just the hug of death, or am I holding it
       | wrong?
        
         | hirako2000 wrote:
         | same here. maybe they are being hit hard as this article
         | reached the top 50.
        
         | vgoloviznin wrote:
         | Looks like our demo is struggling with the load, typically it
         | would display a list of movies with which you can interact.
         | 
         | We're looking at what we can do to revive it
        
         | punkspider wrote:
         | Same here, when using my default Chrome profile, with uBlock
         | disabled.
         | 
         | However it seems to work in incognito.
         | 
         | EDIT: If you're using Metamask, I think that's the reason.
         | After disabling it the demo worked. Also, when visiting
         | metarank.ai from Github, I'm getting a warning containing:
         | This domain is currently on the MetaMask domain warning list.
         | This means that based on information available to us, MetaMask
         | believes this domain could currently compromise your security
         | and, as an added safety feature, MetaMask has restricted access
         | to the site. To override this, please read the rest of this
         | warning for instructions on how to continue at your own risk.
         | 
         | Screenshot: https://i.ibb.co/bHWTdtM/image.png
        
       | charcircuit wrote:
       | Honestly, personalization seems crazy to me. I can't believe how
       | well it works and how fast I can get personalized stuff. I
       | wouldn't know where to start to design a system to handle it.
       | Sites like YouTube or Pixiv have no much content that it seems
       | hard to rank it all for a single person.
        
         | vgoloviznin wrote:
         | That's exactly the problem we want to tackle - democratize
         | machine learning, to make more developers and businesses apply
         | it
        
       | nonoesp wrote:
       | Congrats on the launch.
       | 
       | It's a bit uneasing to hit the landing page and find a typo in
       | "personalizaton made easy."
        
         | vgoloviznin wrote:
         | Thanks for bringing this up :)
        
           | dewey wrote:
           | Another one: "The actions you take will diretly affect"
        
         | joemaffei wrote:
         | Spell checking should be a default feature in IDEs. I've seen
         | teammates struggle to find the source of a bug, only to find a
         | spelling error in a variable or configuration setting. I'm not
         | the greatest touch typist, and my IDE catches double letters,
         | missing letters, reversed letters and other mistakes all the
         | time.
        
       | gizmodo59 wrote:
       | When I promoted dark knight it just shows all other super hero
       | movies when I really like Nolan movies more than other action
       | hero movies.
        
       | dmitrykan wrote:
       | Great project! Elasticsearch / OpenSearch / Solr have their own
       | learning to rank plugins. Have you considered integrating
       | Metarank with such systems? Or is your vision to provide a
       | reranker layer, that can be independent of the underlying search
       | engine architecture?
        
       | sebrindom wrote:
       | Soo cool would love to see this integrated with
       | https://github.com/medusajs/medusa
        
         | vgoloviznin wrote:
         | Thanks for the tip, we will take a look!
        
       | nelsondev wrote:
       | Very cool! Thanks for sharing.
       | 
       | Rather than an offline model, why not use an online, continuously
       | relearning model like a Multi-Armed Bandit to do the re-ranking?
        
         | vgoloviznin wrote:
         | We're completely on board with you for reinforcement learning,
         | however we wanted to start with something simpler to build the
         | tool faster. RL is one the plate however!
        
       | Sharma wrote:
       | BTW, accessing metarank.ai gives warning. May be because it has
       | Meta in its domain name but Metamask shows this message --
       | 
       | This domain is currently on the MetaMask domain warning list.
       | This means that based on information available to us, MetaMask
       | believes this domain could currently compromise your security
       | and, as an added safety feature, MetaMask has restricted access
       | to the site. To override this, please read the rest of this
       | warning for instructions on how to continue at your own risk.
        
         | [deleted]
        
         | skilled wrote:
         | I got the same warning. I forgot I even had that thing, to be
         | honest.
         | 
         | But, in saying that - what kind of filter is MetaMask using to
         | just blatantly wipe out domains like this? Kind of on the fence
         | on how I feel about it.
        
           | prionassembly wrote:
           | Doesn't seem to be a Metafilter.
        
         | swyx wrote:
         | thats ridiculous.. MetaMask puts warning on anything with Meta*
         | in the name? good luck with the horde of metaverse startups on
         | the way
        
           | shutty wrote:
           | According to the code on https://github.com/MetaMask/eth-
           | phishing-detect/blob/45ea5cf..., looks like that everything
           | within Levenstein distance of 3 from whitelisted hosts (like
           | "metamask.*") is blocked.
           | 
           | Metarank and Metamask have the distance of 3. I've made a
           | ticket some time ago in their github repo
           | (https://github.com/MetaMask/eth-phishing-
           | detect/issues/6855), but it seems that it was lost in
           | thousands of similar tickets.
        
             | oauea wrote:
             | Yikes, best to just uninstall it then. That's insanely
             | hostile to harmless sites.
        
             | detaro wrote:
             | lol, a lot of "metaXXXX" "fixes" in the PRs too...
             | https://github.com/MetaMask/eth-phishing-detect/pulls
        
             | Legogris wrote:
             | Sorry that slipped through, I'll bring the team's attention
             | to it.
        
         | vgoloviznin wrote:
         | Thanks for bringing this up, I've created an issue in their
         | github to unblock us
        
       | thih9 wrote:
       | What's a scenario or a method to apply a personalization engine
       | that gives the lowest chance of making the overall UX worse?
       | 
       | I usually dislike personalized content, I prefer search results
       | that accurately match my query I and find it distracting to see
       | suggestions or uncommon ordering (to the point that I search for
       | Netflix movies via an external website to avoid going through
       | their UI).
        
         | vgoloviznin wrote:
         | I can actually relate to this, especially when personalization
         | is applied in search.
         | 
         | However our stats and a\b test results show that
         | personalization improves overall store conversion, ctr and
         | other important metrics in ecommerce. And seeing how it's
         | applied everywhere now (you social netwroks, ads, etc), the
         | majority of users are engaging better with it.
         | 
         | Some sites opt to include a 'disable personalization' option,
         | that might do the trick for some of the users
        
           | _jal wrote:
           | > However our [...] results show that personalization
           | improves overall store conversion ...
           | 
           | So many questions of the form "this thing annoys me, how do I
           | fix it" are answered by "These other people are making money
           | annoying you, and they like it."
        
       | danpalmer wrote:
       | > Metarank is industry-agnostic and can be used in any place of
       | your application where some content is displayed.
       | 
       | I'm afraid I'm skeptical.
       | 
       | Content ranking in small, well defined contexts is not hard to do
       | and doesn't require an ML approach - rules based systems are
       | often easier to specify, easier for both creators and users to
       | understand, and easier to make conform to business rules.
       | 
       | When ML does need to be introduced, when the scale or complexity
       | is large enough that a rules-based approach will be infeasible or
       | worse, having a generic implementation is unlikely to return
       | useful results. So much of the work of optimising an ML approach
       | is engineering features out of the data that make sense and that
       | don't introduce bias.
       | 
       | It's that last point that's really important because if you do
       | the wrong feature engineering, then the bias introduced
       | effectively means you're back to building a rules-based system,
       | just one that has a bunch of inaccuracy built in, and where you
       | don't understand what rules you've specified, or even that you
       | have specified them.
       | 
       | I'm not an expert here, but I've worked on basic recommender
       | systems for products, and worked with people who were far more
       | knowledgeable about this, all of whom seemed to have a low
       | opinion of generic systems.
        
       | nwsm wrote:
       | Hug of death on the demo app. (504 on calls to https://demo-
       | api.metarank.ai:3000/movies)
        
       ___________________________________________________________________
       (page generated 2022-03-23 23:00 UTC)