[HN Gopher] Scrambling eggs for Spotify with Knuth's Fibonacci h... ___________________________________________________________________ Scrambling eggs for Spotify with Knuth's Fibonacci hashing Author : pncnmnp Score : 99 points Date : 2023-12-09 14:00 UTC (9 hours ago) (HTM) web link (pncnmnp.github.io) (TXT) w3m dump (pncnmnp.github.io) | justsayinginnt wrote: | Shame we can't choose the type of shuffle, based on your | mood/what you're listening too (not to add even more complexity). | | e.g. For classical music I'd prefer stringing together pieces | from the same orchestra/composer. But for some contemporary music | would like mix the artist/album up more. | morkalork wrote: | I'd like one that on random and uses each song as an experiment | to determine your mood. If you listen through that's positive, | instantly skip? That's a negative signal. Just adapt on the fly | at the beginning of each listing session. | passion__desire wrote: | Based on your comment, I would love to see a feature just as | we have a prompt travel in image generations, how about genre | travel? A playlist of 10 songs taking me from rock to french | house. | esafak wrote: | The song radio does that. Really well in my experience. | passion__desire wrote: | I am not sure if the situation I describe is currently in | production in Instagram. | | But here's what I noticed when I went to the explore section of | Instagram. | | At display, there would be distinct choices of images and | reels, varied and related to my interests. But if I select a | particular reel/image type (e.g. animation or comics or 3d | render), it would take that as a signal and would expand the | feed based on that selection. I love that feature. | | I guess Spotify Radio do that to some extent, not sure. | j4yav wrote: | I have a playlist with 150 hours of music on it and it seems to | play the same five or six songs every day, no matter what. I wish | I could choose "actually random", I wouldn't mind unexpected | clustering. | Solvency wrote: | It's absolutely wild. I have over 50 playlists I've made, some | over 10 hours long with no repetitions anywhere. I have | extremely thorough and diverse music interests and habits d | | And yet, when Spotify's shit tier algorithm takes over, it | kicks me into a similar 5 to 6 meme set of songs every single | time. It's an absolute joke. | mrkeen wrote: | Out of curiosity, does it happen on some clients and not | others? | | I remember hearing of a bug where if you played on a remote | device, it would transfer the first part of your playlist (10? | 100 tracks?), and then shuffle would only choose from among | them. | | But it's been 5+ years so things may have changed and/or I | could be remembering completely wrong. | hifreq wrote: | That might be true, but my impression is that the algorithm | is weighted, so some artists (more popular in general, | recommended, played more often by that individual user, would | get picked more frequently by the "random" algorithm). | varispeed wrote: | I wonder if this is market manipulation and not a honest | mistake. To make certain labels and artists more money. | crazygringo wrote: | Same. Out of 5,000 songs, am I really supposed to be hearing | certain songs 3+ days in a row, when I'm only listening to 20 | songs a day...? | | I can't tell if it's: | | - Certain artists are paying Spotify to favor them in | randomization? | | - There's some kind of shared random seed across devices that | results in picking a tiny subset of songs to randomize from in | the first place? | | - Other? | | I do notice that the effect seems to persist for maybe a week, | then I'll never hear those songs again, but now it'll be | different songs that keep popping up repeatedly. | | There's a related effect when you launch a radio station based | on an artist or track. If you launch it multiple times in the | same day or week, you get the exact same list of tracks. But | maybe a week later the tracks have changed, like the radio has | been recalculated based on a different random seed. | candiddevmike wrote: | This article is pretty insightful on the ways artists can | improve their presence on Spotify, including images of the | backend tools available to artists: | https://blog.groover.co/en/tips/spotify-streams/ | hifreq wrote: | I was excited to get an insight into some hidden backend | tools but it seems like this post is just about playlists | (creating your own playlists, paying for placements in | other people's playlists), and ads? | cvoss wrote: | There's maybe some counterintuitive math going on here. | | If you have 100 songs and listen to 1 song per day (which is | 1% of the library), on any given day your odds of hearing the | same song as yesterday are 1 in a 100. | | If you have 1000 songs and listen to 10 per day (still 1%), | the odds of hearing a song that was also played yesterday are | a little less than 1 in 10, right? | | So what matters is not only what fraction of your library's | play time you sample daily, but also how finely subdivided | the time is into individual tracks for sampling. | brasetvik wrote: | A bit of birthday paradox too? :) | crazygringo wrote: | > _If you have 1000 songs and listen to 10 per day (still | 1%), the odds of hearing a song that was also played | yesterday are a little less than 1 in 10, right?_ | | No. It's 10*(1/1000)=1%. | | It'll happen a few times a year only. | esafak wrote: | It might think you really like those songs. I listen to my | favorites every day, among others. | hifreq wrote: | As far as I can tell all Spotify's playlists (in all forms, | e.g. including "random" and "track radio") use weighted | algorithms based on several factors like user's play history, | Spotify's recommendations (probably includes paid promotions), | general artist's popularity, etc. | | When I still used Spotify, I would get a dozen of my favorite | artists mixed into basically any "playlist" I pick. Was one of | the reasons I quit Spotify - they are too opinionated on what I | should listen to. | at_a_remove wrote: | Eh, unexpected clustering is sort of okay but again, it's what | people respond to versus what they think they will. I've | written some scripts so I can dredge playlists off of some | radio stations that are hooked up to the Internet as part of a | quixotic little project of mine and I've gone through, looked | for dupes, etc. Let's say we're doing new wave (sure to start | an argument there). What people seem to dislike is ABC, The | Buggles, the Cure, Duran Duran, Ebn Ozn, Fra Lippo Lippi, Duran | Duran, etc. Just having a gap between there feels | insufficiently random. | | Clustering apparently ought to feel deliberate. Now think back | to when you had actual DJs selecting tracks on the radio. One | of the techniques was "Two from a particular band." Not two | from a band with some tracks between them. | | Similarly, one can do a "Four tracks from 1994" to provide a | cluster in time, another technique I've heard. | | If anything, the more metadata you have, the more you can | provide short runs of _something_. Microgenres, for example. | fetzu wrote: | Spotify's shuffling is so god-tier bad that I would be | flabbergasted if it wasn't biased towards songs with less | royalties to pay out.. | crazygringo wrote: | My _biggest_ desire for Spotify would be the ability to randomize | a playlist but _give massive priority to your least-listened | songs_. | | So if you have a playlist with 10 songs A-J that you've listened | to 10 times each. And then you add 10 new songs K-T that you've | listened to 1 time each... Then _every time_ I shuffle the | playlist, I want songs K-T to be the _first 10 songs in random | order_ until I 've played _them_ 10 times each. | | I mean, things can be mixed up a little more than that... but | generally speaking, I want to listen to my least-listened songs | much more than the ones I've been listening to forever. But I | don't want to have to separate them out into special playlists | "newest", "newer", "kinda new", "old", "oldest" which is | annoying. | bazzargh wrote: | The fibonacci hashing mentioned here looks to be just Kronecker | low-discrepancy sequences https://en.wikipedia.org/wiki/Low- | discrepancy_sequence#Addit... which is used for dithering too | (see eg https://extremelearning.com.au/unreasonable- | effectiveness-of...). It's quite likely already what Spotify were | obliquely referring to. But I think the author missed a trick: | tracks by an artist occur close together if your collection is | already arranged by artist/album. Picking tracks far apart from | the globally sorted list is what this sequence will do, and do | you don't need to do anything at all to split by | artist/album/category etc: | | 1. for the Kth track in a sorted collection of N with seed S in | [0,1), pick track number floor(N((K(phi-1)+S)%1)). | | 2. There is no step 2. | | Since Spotify suggest their algorithm is only a couple of lines, | I'd guess this is what they did. | | edited to add: the above would get pretty boring because songs | would always follow one another in sequence, no matter what the | seed was. But since the chance of picking an irrational number at | random in a real interval is a near certainty (because rationals | are countable and reals are uncountable) you can just pick any | new random number as the stepsize in the sequence when you start | to shuffle play and it should be good enough; picking in [.25, | .75) avoids steps that take you back too close to the same | artist. | pncnmnp wrote: | This is fantastic! I still need to carefully study it all, but | I implemented the idea you described in the original comment. | It seems the error lies somewhere between the algorithm I | mentioned and the Fisher-Yates shuffle, based on one million | playlists: Mean: 0.13, Median: 0.11, Mode: 0, p25: 0.06, p75: | 0.18, p90: 0.26, p95: 0.31. | | It's possible I'm missing something here. Regarding your edit | about 'picking any new random number as the stepsize,' wouldn't | it be affected by a 'bad break,' as mentioned by Knuth? I still | need to work out the 'bad break' proof. | | Edit: If it helps, here is the code I used to test it out - | https://gist.github.com/pncnmnp/8afb7903f61ec69a157287435a63... | bazzargh wrote: | I don't have Knuth to hand...but yeah it is possible you can | accidentally pick a number that is too close to a rational, | so you get short repeats of tracks instead of long ones. What | you're after is a number with a high approximation | coefficient https://en.wikipedia.org/wiki/Liouville_number#Ir | rationality... but in practice you just need to know that it | is irrational "enough", eg you might want to avoid repeats in | the first 1000 plays. | | You can get that by rejection sampling on the random number | and using the Farey sequence to find nearby low-denominator | fractions | https://en.wikipedia.org/wiki/Farey_sequence#Applications - | if I pick a number between 0 and 1 I can use 0/1, 1/1 as the | starting point for the usual iteration. (and then scale to | .25,.75 at the end). You pick your approximation bound mu and | reject if log(abs(p/q - x)) > - mu log q, repeat the farey | iteration until q is large enough. (just making this up as I | go on an ipad, I may have a sign wrong in there or whatever) | | It is actually much simpler than this ^ to just explicitly | check the first 1000 numbers for a loop. It's simpler than | tortoise and hare: you know there is no run-in, so the first | number is in the loop, and you want that number to not | reappear in the first min(N,1000) items. | HelloNurse wrote: | The problem is separating tracks in the same group. If the | globally sorted reference list of N songs has M consecutive | entries that we want to spread evenly, they should be | "shuffled" at increments of floor(N/M) or 1+floor(N/M): easy to | guarantee with tortured recursive constructions like the one | that opens the article, less obvious with a hash function that | is affected by floating point approximations. | | An integer formula that is more obvious to work: starting from | any number between the maximum group size and (if larger) | N/phi, pick an increment D as the next larger or smaller | integer that has no common factors with N (to ensure full | period) and map index K to index (S+K*D)%N. | galdosdi wrote: | This entire HN thread reads like some fiction made up by a free | software activist 15 years ago, as a dystopian cautionary tale. | It's all complaints and speculations about stuff that just worked | back then. | | Back then we had our music locally and we chose our own players, | of which there were many and easy to make another one. Actually, | hacking on music playback was easy and not uncommon. We had full | control of our musical lives. | Tao3300 wrote: | _Scrambx0red 3ggs_ by Cory Doctorow (2002) | pncnmnp wrote: | Tangentially, I encourage everyone to check out Ken Thompson's | talk on his jukebox: | https://www.youtube.com/watch?v=kaandEt_pKw | jorjordandan wrote: | I was a tile installer in a previous life, and occasionally a | customer would make the dreaded request to have an accent tile | placed 'randomly' throughout a backsplash. They were never happy | with the placement, and clearly had no understanding of | randomness. Generally what they actually wanted was 'evenly | distributed'. I remember one particular customer kept changing | the accent tiles and moving them until they converged upon a very | specific pattern throughout the back splash except for one tile | that was out of the pattern, that they kept 'wrong' out of some | ego driven desire to justify their request for randomness. | plagiarist wrote: | The perfect pattern with one wrong tile is so awful. It's a | fair punishment that they have to live with that. | | I want those non-repeating pattern tiles, how awful would those | be to tile? | world2vec wrote: | Slightly unrelated but just learned about Hacker News pool. Very | interesting. | jdontillman wrote: | There was a delightful Usenet post way back around 1990, where | someone described how they had just purchased a new multi-CD | player. They very excitedly filled it up with their collection of | Prince CDs, and set it to random shuffle play mode. | | Great for a while, but then they complained that all the slow | songs were bunched together. And perhaps the random shuffle play | mode was sampling the songs, deriving the tempo of each, and | adjusting the shuffle accordingly. | | Very funny. | | --- | | Heh-heh, I independently came up with Fibonacci hashing for color | many years ago. | | My web app was drawing a diagram of N rectangular items, color- | coded to tell them apart, with a table listing the details of | each below. | | (Normally I would use EIA standard colors, with a nod to my EE | brethren.) | | But I didn't want the colors to bias anything. So you'd normally | try random colors. But random colors can come out weird and some | can be close together. | | So I used a Golden Angle around the hue circle, with a constant | brightness and saturation. And sure enough, the generated colors | were nicely differentiated. | | BUT... not as nice as I'd like. Something was wrong. | | It turns out that our perception of color is more complex. And | when we're differentiating between colors, it really, really | helps if the colors are familiar, and describable. | | So simple colors like blue and purple are much easier to | differentiate than a new weird blueish color and a new weird | purplish color. | | So my Golden Angle colors were technically superior, but not as | good a user experience. | brcmthrowaway wrote: | What did the ipod shuffle use? | pncnmnp wrote: | Hi, author here! Interesting question! | | Martin Fiedler briefly addresses this topic in a comment on his | blog post about shuffling algorithms | (https://keyj.emphy.de/balanced-shuffle/) | | > Apple has a so-called "smart shuffle" algorithm, but this | merely puts higher-rated tracks in front of lower-rated ones. | So basically, it's just random shuffle, followed by a sort-by- | rating operation. I don't know of any product (software or | hardware) that uses some kind of smart, balanced or whatever- | you-like-to-call-it shuffle based on the principles I described | in the text. | | I'm not sure what they are using today. ___________________________________________________________________ (page generated 2023-12-09 23:00 UTC)