[HN Gopher] Scrambling eggs for Spotify with Knuth's Fibonacci h...
       ___________________________________________________________________
        
       Scrambling eggs for Spotify with Knuth's Fibonacci hashing
        
       Author : pncnmnp
       Score  : 99 points
       Date   : 2023-12-09 14:00 UTC (9 hours ago)
        
 (HTM) web link (pncnmnp.github.io)
 (TXT) w3m dump (pncnmnp.github.io)
        
       | justsayinginnt wrote:
       | Shame we can't choose the type of shuffle, based on your
       | mood/what you're listening too (not to add even more complexity).
       | 
       | e.g. For classical music I'd prefer stringing together pieces
       | from the same orchestra/composer. But for some contemporary music
       | would like mix the artist/album up more.
        
         | morkalork wrote:
         | I'd like one that on random and uses each song as an experiment
         | to determine your mood. If you listen through that's positive,
         | instantly skip? That's a negative signal. Just adapt on the fly
         | at the beginning of each listing session.
        
           | passion__desire wrote:
           | Based on your comment, I would love to see a feature just as
           | we have a prompt travel in image generations, how about genre
           | travel? A playlist of 10 songs taking me from rock to french
           | house.
        
         | esafak wrote:
         | The song radio does that. Really well in my experience.
        
         | passion__desire wrote:
         | I am not sure if the situation I describe is currently in
         | production in Instagram.
         | 
         | But here's what I noticed when I went to the explore section of
         | Instagram.
         | 
         | At display, there would be distinct choices of images and
         | reels, varied and related to my interests. But if I select a
         | particular reel/image type (e.g. animation or comics or 3d
         | render), it would take that as a signal and would expand the
         | feed based on that selection. I love that feature.
         | 
         | I guess Spotify Radio do that to some extent, not sure.
        
       | j4yav wrote:
       | I have a playlist with 150 hours of music on it and it seems to
       | play the same five or six songs every day, no matter what. I wish
       | I could choose "actually random", I wouldn't mind unexpected
       | clustering.
        
         | Solvency wrote:
         | It's absolutely wild. I have over 50 playlists I've made, some
         | over 10 hours long with no repetitions anywhere. I have
         | extremely thorough and diverse music interests and habits d
         | 
         | And yet, when Spotify's shit tier algorithm takes over, it
         | kicks me into a similar 5 to 6 meme set of songs every single
         | time. It's an absolute joke.
        
         | mrkeen wrote:
         | Out of curiosity, does it happen on some clients and not
         | others?
         | 
         | I remember hearing of a bug where if you played on a remote
         | device, it would transfer the first part of your playlist (10?
         | 100 tracks?), and then shuffle would only choose from among
         | them.
         | 
         | But it's been 5+ years so things may have changed and/or I
         | could be remembering completely wrong.
        
           | hifreq wrote:
           | That might be true, but my impression is that the algorithm
           | is weighted, so some artists (more popular in general,
           | recommended, played more often by that individual user, would
           | get picked more frequently by the "random" algorithm).
        
         | varispeed wrote:
         | I wonder if this is market manipulation and not a honest
         | mistake. To make certain labels and artists more money.
        
         | crazygringo wrote:
         | Same. Out of 5,000 songs, am I really supposed to be hearing
         | certain songs 3+ days in a row, when I'm only listening to 20
         | songs a day...?
         | 
         | I can't tell if it's:
         | 
         | - Certain artists are paying Spotify to favor them in
         | randomization?
         | 
         | - There's some kind of shared random seed across devices that
         | results in picking a tiny subset of songs to randomize from in
         | the first place?
         | 
         | - Other?
         | 
         | I do notice that the effect seems to persist for maybe a week,
         | then I'll never hear those songs again, but now it'll be
         | different songs that keep popping up repeatedly.
         | 
         | There's a related effect when you launch a radio station based
         | on an artist or track. If you launch it multiple times in the
         | same day or week, you get the exact same list of tracks. But
         | maybe a week later the tracks have changed, like the radio has
         | been recalculated based on a different random seed.
        
           | candiddevmike wrote:
           | This article is pretty insightful on the ways artists can
           | improve their presence on Spotify, including images of the
           | backend tools available to artists:
           | https://blog.groover.co/en/tips/spotify-streams/
        
             | hifreq wrote:
             | I was excited to get an insight into some hidden backend
             | tools but it seems like this post is just about playlists
             | (creating your own playlists, paying for placements in
             | other people's playlists), and ads?
        
           | cvoss wrote:
           | There's maybe some counterintuitive math going on here.
           | 
           | If you have 100 songs and listen to 1 song per day (which is
           | 1% of the library), on any given day your odds of hearing the
           | same song as yesterday are 1 in a 100.
           | 
           | If you have 1000 songs and listen to 10 per day (still 1%),
           | the odds of hearing a song that was also played yesterday are
           | a little less than 1 in 10, right?
           | 
           | So what matters is not only what fraction of your library's
           | play time you sample daily, but also how finely subdivided
           | the time is into individual tracks for sampling.
        
             | brasetvik wrote:
             | A bit of birthday paradox too? :)
        
             | crazygringo wrote:
             | > _If you have 1000 songs and listen to 10 per day (still
             | 1%), the odds of hearing a song that was also played
             | yesterday are a little less than 1 in 10, right?_
             | 
             | No. It's 10*(1/1000)=1%.
             | 
             | It'll happen a few times a year only.
        
         | esafak wrote:
         | It might think you really like those songs. I listen to my
         | favorites every day, among others.
        
         | hifreq wrote:
         | As far as I can tell all Spotify's playlists (in all forms,
         | e.g. including "random" and "track radio") use weighted
         | algorithms based on several factors like user's play history,
         | Spotify's recommendations (probably includes paid promotions),
         | general artist's popularity, etc.
         | 
         | When I still used Spotify, I would get a dozen of my favorite
         | artists mixed into basically any "playlist" I pick. Was one of
         | the reasons I quit Spotify - they are too opinionated on what I
         | should listen to.
        
         | at_a_remove wrote:
         | Eh, unexpected clustering is sort of okay but again, it's what
         | people respond to versus what they think they will. I've
         | written some scripts so I can dredge playlists off of some
         | radio stations that are hooked up to the Internet as part of a
         | quixotic little project of mine and I've gone through, looked
         | for dupes, etc. Let's say we're doing new wave (sure to start
         | an argument there). What people seem to dislike is ABC, The
         | Buggles, the Cure, Duran Duran, Ebn Ozn, Fra Lippo Lippi, Duran
         | Duran, etc. Just having a gap between there feels
         | insufficiently random.
         | 
         | Clustering apparently ought to feel deliberate. Now think back
         | to when you had actual DJs selecting tracks on the radio. One
         | of the techniques was "Two from a particular band." Not two
         | from a band with some tracks between them.
         | 
         | Similarly, one can do a "Four tracks from 1994" to provide a
         | cluster in time, another technique I've heard.
         | 
         | If anything, the more metadata you have, the more you can
         | provide short runs of _something_. Microgenres, for example.
        
       | fetzu wrote:
       | Spotify's shuffling is so god-tier bad that I would be
       | flabbergasted if it wasn't biased towards songs with less
       | royalties to pay out..
        
       | crazygringo wrote:
       | My _biggest_ desire for Spotify would be the ability to randomize
       | a playlist but _give massive priority to your least-listened
       | songs_.
       | 
       | So if you have a playlist with 10 songs A-J that you've listened
       | to 10 times each. And then you add 10 new songs K-T that you've
       | listened to 1 time each... Then _every time_ I shuffle the
       | playlist, I want songs K-T to be the _first 10 songs in random
       | order_ until I 've played _them_ 10 times each.
       | 
       | I mean, things can be mixed up a little more than that... but
       | generally speaking, I want to listen to my least-listened songs
       | much more than the ones I've been listening to forever. But I
       | don't want to have to separate them out into special playlists
       | "newest", "newer", "kinda new", "old", "oldest" which is
       | annoying.
        
       | bazzargh wrote:
       | The fibonacci hashing mentioned here looks to be just Kronecker
       | low-discrepancy sequences https://en.wikipedia.org/wiki/Low-
       | discrepancy_sequence#Addit... which is used for dithering too
       | (see eg https://extremelearning.com.au/unreasonable-
       | effectiveness-of...). It's quite likely already what Spotify were
       | obliquely referring to. But I think the author missed a trick:
       | tracks by an artist occur close together if your collection is
       | already arranged by artist/album. Picking tracks far apart from
       | the globally sorted list is what this sequence will do, and do
       | you don't need to do anything at all to split by
       | artist/album/category etc:
       | 
       | 1. for the Kth track in a sorted collection of N with seed S in
       | [0,1), pick track number floor(N((K(phi-1)+S)%1)).
       | 
       | 2. There is no step 2.
       | 
       | Since Spotify suggest their algorithm is only a couple of lines,
       | I'd guess this is what they did.
       | 
       | edited to add: the above would get pretty boring because songs
       | would always follow one another in sequence, no matter what the
       | seed was. But since the chance of picking an irrational number at
       | random in a real interval is a near certainty (because rationals
       | are countable and reals are uncountable) you can just pick any
       | new random number as the stepsize in the sequence when you start
       | to shuffle play and it should be good enough; picking in [.25,
       | .75) avoids steps that take you back too close to the same
       | artist.
        
         | pncnmnp wrote:
         | This is fantastic! I still need to carefully study it all, but
         | I implemented the idea you described in the original comment.
         | It seems the error lies somewhere between the algorithm I
         | mentioned and the Fisher-Yates shuffle, based on one million
         | playlists: Mean: 0.13, Median: 0.11, Mode: 0, p25: 0.06, p75:
         | 0.18, p90: 0.26, p95: 0.31.
         | 
         | It's possible I'm missing something here. Regarding your edit
         | about 'picking any new random number as the stepsize,' wouldn't
         | it be affected by a 'bad break,' as mentioned by Knuth? I still
         | need to work out the 'bad break' proof.
         | 
         | Edit: If it helps, here is the code I used to test it out -
         | https://gist.github.com/pncnmnp/8afb7903f61ec69a157287435a63...
        
           | bazzargh wrote:
           | I don't have Knuth to hand...but yeah it is possible you can
           | accidentally pick a number that is too close to a rational,
           | so you get short repeats of tracks instead of long ones. What
           | you're after is a number with a high approximation
           | coefficient https://en.wikipedia.org/wiki/Liouville_number#Ir
           | rationality... but in practice you just need to know that it
           | is irrational "enough", eg you might want to avoid repeats in
           | the first 1000 plays.
           | 
           | You can get that by rejection sampling on the random number
           | and using the Farey sequence to find nearby low-denominator
           | fractions
           | https://en.wikipedia.org/wiki/Farey_sequence#Applications -
           | if I pick a number between 0 and 1 I can use 0/1, 1/1 as the
           | starting point for the usual iteration. (and then scale to
           | .25,.75 at the end). You pick your approximation bound mu and
           | reject if log(abs(p/q - x)) > - mu log q, repeat the farey
           | iteration until q is large enough. (just making this up as I
           | go on an ipad, I may have a sign wrong in there or whatever)
           | 
           | It is actually much simpler than this ^ to just explicitly
           | check the first 1000 numbers for a loop. It's simpler than
           | tortoise and hare: you know there is no run-in, so the first
           | number is in the loop, and you want that number to not
           | reappear in the first min(N,1000) items.
        
         | HelloNurse wrote:
         | The problem is separating tracks in the same group. If the
         | globally sorted reference list of N songs has M consecutive
         | entries that we want to spread evenly, they should be
         | "shuffled" at increments of floor(N/M) or 1+floor(N/M): easy to
         | guarantee with tortured recursive constructions like the one
         | that opens the article, less obvious with a hash function that
         | is affected by floating point approximations.
         | 
         | An integer formula that is more obvious to work: starting from
         | any number between the maximum group size and (if larger)
         | N/phi, pick an increment D as the next larger or smaller
         | integer that has no common factors with N (to ensure full
         | period) and map index K to index (S+K*D)%N.
        
       | galdosdi wrote:
       | This entire HN thread reads like some fiction made up by a free
       | software activist 15 years ago, as a dystopian cautionary tale.
       | It's all complaints and speculations about stuff that just worked
       | back then.
       | 
       | Back then we had our music locally and we chose our own players,
       | of which there were many and easy to make another one. Actually,
       | hacking on music playback was easy and not uncommon. We had full
       | control of our musical lives.
        
         | Tao3300 wrote:
         | _Scrambx0red 3ggs_ by Cory Doctorow (2002)
        
         | pncnmnp wrote:
         | Tangentially, I encourage everyone to check out Ken Thompson's
         | talk on his jukebox:
         | https://www.youtube.com/watch?v=kaandEt_pKw
        
       | jorjordandan wrote:
       | I was a tile installer in a previous life, and occasionally a
       | customer would make the dreaded request to have an accent tile
       | placed 'randomly' throughout a backsplash. They were never happy
       | with the placement, and clearly had no understanding of
       | randomness. Generally what they actually wanted was 'evenly
       | distributed'. I remember one particular customer kept changing
       | the accent tiles and moving them until they converged upon a very
       | specific pattern throughout the back splash except for one tile
       | that was out of the pattern, that they kept 'wrong' out of some
       | ego driven desire to justify their request for randomness.
        
         | plagiarist wrote:
         | The perfect pattern with one wrong tile is so awful. It's a
         | fair punishment that they have to live with that.
         | 
         | I want those non-repeating pattern tiles, how awful would those
         | be to tile?
        
       | world2vec wrote:
       | Slightly unrelated but just learned about Hacker News pool. Very
       | interesting.
        
       | jdontillman wrote:
       | There was a delightful Usenet post way back around 1990, where
       | someone described how they had just purchased a new multi-CD
       | player. They very excitedly filled it up with their collection of
       | Prince CDs, and set it to random shuffle play mode.
       | 
       | Great for a while, but then they complained that all the slow
       | songs were bunched together. And perhaps the random shuffle play
       | mode was sampling the songs, deriving the tempo of each, and
       | adjusting the shuffle accordingly.
       | 
       | Very funny.
       | 
       | ---
       | 
       | Heh-heh, I independently came up with Fibonacci hashing for color
       | many years ago.
       | 
       | My web app was drawing a diagram of N rectangular items, color-
       | coded to tell them apart, with a table listing the details of
       | each below.
       | 
       | (Normally I would use EIA standard colors, with a nod to my EE
       | brethren.)
       | 
       | But I didn't want the colors to bias anything. So you'd normally
       | try random colors. But random colors can come out weird and some
       | can be close together.
       | 
       | So I used a Golden Angle around the hue circle, with a constant
       | brightness and saturation. And sure enough, the generated colors
       | were nicely differentiated.
       | 
       | BUT... not as nice as I'd like. Something was wrong.
       | 
       | It turns out that our perception of color is more complex. And
       | when we're differentiating between colors, it really, really
       | helps if the colors are familiar, and describable.
       | 
       | So simple colors like blue and purple are much easier to
       | differentiate than a new weird blueish color and a new weird
       | purplish color.
       | 
       | So my Golden Angle colors were technically superior, but not as
       | good a user experience.
        
       | brcmthrowaway wrote:
       | What did the ipod shuffle use?
        
         | pncnmnp wrote:
         | Hi, author here! Interesting question!
         | 
         | Martin Fiedler briefly addresses this topic in a comment on his
         | blog post about shuffling algorithms
         | (https://keyj.emphy.de/balanced-shuffle/)
         | 
         | > Apple has a so-called "smart shuffle" algorithm, but this
         | merely puts higher-rated tracks in front of lower-rated ones.
         | So basically, it's just random shuffle, followed by a sort-by-
         | rating operation. I don't know of any product (software or
         | hardware) that uses some kind of smart, balanced or whatever-
         | you-like-to-call-it shuffle based on the principles I described
         | in the text.
         | 
         | I'm not sure what they are using today.
        
       ___________________________________________________________________
       (page generated 2023-12-09 23:00 UTC)