[HN Gopher] Riffusion - Stable Diffusion fine-tuned to generate ... ___________________________________________________________________ Riffusion - Stable Diffusion fine-tuned to generate Music Author : MitPitt Score : 1481 points Date : 2022-12-15 13:26 UTC (9 hours ago) (HTM) web link (www.riffusion.com) (TXT) w3m dump (www.riffusion.com) | TuringNYC wrote: | I read the article: | | "If you have a GPU powerful enough to generate stable diffusion | results in under five seconds, you can run the experience locally | using our test flask server." | | Curious what sort of GPU the author was using or what some of the | min requirements might be? | genewitch wrote: | RTX 3070 can generate SD results in under 5 seconds, depending. | Euler A 20 samples, 512x512. it can almost do 4 images in 5 | seconds with those settings. | | It's possible a 3060 might work, depending. in my experience | the 3060 is about 50% slower than the 3070, but that may be a | bad 3060 in our test rig. but a 3060 gets pretty close to 5 | seconds for an image, so try it, if you have one. | | just tested prompt "a test pattern for television" on both | cards and 3070 took 1.87s and the 3060 took 2.93s. Similar | results for the prompt "an intricate cityscape, like new york" | | edit: i should note we're using SD 1.4, not 1.5, although i | think that just has to do with the checkpoint of the model, not | the algorithm, but i could be wrong. | | Also the model is over 14GB, so perhaps the 3070 can't do it | after all. i'll test it later as soon as the local admin wakes | up and downloads it onto our machine. | seth_ wrote: | Author here: fwiw we are running the app on a10g GPUs, which | generally can turn around a 512x512 in 3.5s with 50 inference | steps. This time includes converting the image into audio which | should be done on the GPU as well for real-time purposes. We | did some optimization such as a traced unet, fp16 and removing | autocast. There are lots of ways it could be sped up further | I'm sure! | nathias wrote: | Very cool! I was wondering why there wasnt any music diffusion | apps out there, it seems more useful because music has stricter | copyright and all content creators need some background music ... | corysama wrote: | Can anyone confirm/deny my theory that AI audio generation has | been lagging behind progress in image generation because it's way | easier to get a billion labeled images than a billion labeled | audio clips? | madelyn wrote: | Sound is a lot higher fidelity, it's harder to make the | information available to a computer without serious | downsampling or simplification. | | Consider sounds over 12khz. On a spectrogram during a chorus or | drop that area is lit up, with so many things changing from | millisecond to millisecond. A lot of AI samples really struggle | at high frequencies, or even forgo them entirely. | | Midi based approaches have been really great though, and an | approach like in the OP is fascinating (and impressive). | motoxpro wrote: | This is just insane. Sooooo incredible. Don't really realize how | far things have come until it hits a domain you're extremely | familiar with. Spent 8-9 in music production and the transition | stuff blew me away. | haykmartiros wrote: | Other author here! This got a posted a little earlier than we | intended so we didn't have our GPUs scaled up yet. Please hang on | and try throughout the day! | | Meanwhile, please read our about page http://riffusion.com/about | | It's all open source and the code lives at | https://github.com/hmartiro/riffusion-app --> if you have a GPU | you can run it yourself | | This has been our hobby project for the past few months. Seeing | the incredible results of stable diffusion, we were curious if we | could fine tune the model to output spectrograms and then convert | to audio clips. The answer to that was a resounding yes, and we | became addicted to generating music from text prompts. There are | existing works for generating audio or MIDI from text, but none | as simple or general as fine tuning the image-based model. Taking | it a step further, we made an interactive experience for | generating looping audio from text prompts in real time. To do | this we built a web app where you type in prompts like a jukebox, | and audio clips are generated on the fly. To make the audio loop | and transition smoothly, we implemented a pipeline that does | img2img conditioning combined with latent space interpolation. | jtode wrote: | As one of the meatsacks whose job you're about to kill... eh, I | got nothin, it's damn impressive. It's gonna hit electronic | music like a nuclear bomb, I'd wager. | godelski wrote: | Why do you think this will kill your job? To me this looks | like an extension of the hip-hop genre. | spoiler wrote: | As a listener, I think you're probably still safe. Can you | use this to help you though? Maybe. | | It's impressive what it produces, but I think it probably | lacks substance in the same way the visual AI art stuff does. | For the most part, it passes what I call the at-a-glanceness | test. It's little better than apophenia (the same thing that | makes you see shapes in clouds, faces in rocks, or think | you've recognised a familiar word in a foreign language; the | last one can happen more often though). | | So, I think these tools will be used to do background work | (ie for visuals maybe help with background tasks in CGI or | faraway textures in games). I know less about audio, but I | assume it could maybe help a DJ create a transition between | two segments they want to combine, as opposed make the whole | composition for them, but idk if that example makes sense. | | Now, onto a more human point: I think that people often | listen to music because it means something to them. Similar | for people who appreciate visual art. | | I also love interactive and light art, and I love talking to | other artists at light festivals who make them because of the | stories and journeys behind their art too. Humans and art are | a package deal, IMO. | | Edit: typos and to add: Also, I think prompt authorship is an | art unto itself. I'm amazed what people can craft with it, | but I'm more impressed by the craft itself than the outputs. | Don't get me wrong, the outputs are darn cool, but not if you | look closer. And it's impossible to look beneath the surface | altogether, as there is nothing in the output but the pixels. | ArmandTanzarian wrote: | As a musician and listener i'm inclined to agree. There | were a couple cool examples i bumped into, but some prompts | generate results that don't represent any single word or | combination of words that were presented to the AI. | | What this means for the future is maybe a little more | unsettling however. | hahajk wrote: | I think this type of generative stuff opens up entirely new | possibilities. For the longest time I've wanted to host a | rowing or treadmill competition, where contestants submit a | music track. The tracks are mashed up with weighting based | on who is in the lead and by how much. | | I don't know of existing tech that can generate actual good | mashups in realtime given arbitrary mp3s, but this has | promise! | api wrote: | In general all this stuff is chopping the bottom off the | market. AI art, code, writing, music, etc. can all generate | passable "filler" content, which will decimate all human | employment generating same. | | I don't think this stuff is a threat to genuinely | innovative, thoughtful, meaningful work, but that's the top | of the market. | | That being said the bottom of the market is how a lot of | artists make their living, so this is going to deeply | impact all forms of art as a profession. It might soon | impact programming too because while GPT-type systems can't | do advanced high level reasoning they will chop the bottom | off the market and create a glut of employees that will | drive wages down across the board. | | Basic income or revolution. That's going to be our choice. | gummydogg wrote: | The top of the market started at the bottom. Entry level | is requiring higher and higher skills and capabilities. | astrange wrote: | The only thing that affects whether you have a job is the | Federal Reserve, not how good productivity tools are. You | always have comparative advantage vs an AI, so you always | have the qualifications for an entry level job. | | There will never be a revolution and there's no such | thing as late capitalism. Well, not if the Fed does their | job. | halkony wrote: | I see a lot of AI naysayers neglecting the comparative | advantage part. | | If AI _completely_ eliminates low skill art labour from | the job pool, it 's not like those affected by it are | gonna disintegrate, riot, and restructure society. They | have the choice of filling an art niche an AI can't or | they can spend that time learning other, more in-demand | skills. This also ignores that fact that some companies | would rather reallocate you to more profitable projects | even if your art skills don't change. | | Selling a product with relative value like a painting or | a sculpture will always be an uphill battle. Now that | there's more competition from AI, it just gives | artists/businesses incentive to find what people want | that an AI can't deliver. Worst case scenario, employment | rates in this sector are rough while the market | recalibrates. Interested to see how these technologies | develop. | Archipelagia wrote: | That seems a bit like wishful thinking. | | People don't have unlimited ability to learn new skills. | Training takes time, and someone who spent several years | honoring their craft won't be able to pick up a new skill | overnight. | | On top of that, people have preferences regarding their | work - even if someone has the ability to do a different | work, they might find it less meaningful and less | satysfing. | | Finally, don't ignore the speed at which AI capabilities | improve. Compare GPT-1 with the current model, and how | quickly we got here. Eventually we'll get to a point | where humans just won't be able to catch up quickly | enough. | SoftTalker wrote: | I think specifically in the area of creative "products" | such as art and music you have to think about the | customer as well. I have zero interest in AI-created art | or music. None. The value of art is its humanity; its | expression of the artist's message, vision, and passion. | AI doesn't have that, so it's not of any interest to me. | | I don't know how many custoners feel the same way, but I | won't be purchasing any AI art or music or knowingly | giving it any of my attention. | astrange wrote: | The AI is a tool the human used to make it. Sometimes | clumsily, but sometimes they write poems as text prompts | and it's an illustration, or things like that. | | If an AI is making and selling art by itself, it's | probably become sentient and not patronizing it would be | speciesism. | tracerbulletx wrote: | If the only people who can have meaningful good paying | jobs are thoughtful geniuses we're in a lot of trouble as | a society still. | gravelc wrote: | I love simple generative approaches to get ideas, and go from | there. This seems like an extension of that (well, it's what | I'm going to try - sample the output, make stems, pull MIDI | etc). Will make the creative process more interesting for me, | not less. | | Having said that, it's not my job, and I can see where the | issues lay there. | TOMDM wrote: | The audio sounds a bit lossy, would it be possible to create | high quality spectograms from music, downsample them, and use | that as training data for a spectogram upscaler? | | It might be the last step this AI needs to bring some extra | clarity to the output. | nico wrote: | Amazing work. Can this be applied to voice? | | Example prompt: "deep radio host voice saying 'hello there'" | | Kind of like a more expressive TTS? | seth_ wrote: | Author here: It can certainly be applied to voice, but the | model would need deeper training to speak intelligibly. If | you want to hear more singing, you can try a prompt like | "female voice", and increase the denoising parameter in the | settings of the app. | | That said, our GPUs are still getting slammed today so you | might face a delay in getting responses. Working on it! | sergiotapia wrote: | Reach out to the Beatstars CEO. He was looking for an AI play | for his music producers marketplace. Probably solid B2B lead | there. | CapsAdmin wrote: | When you say fine tuned do you mean fine tuned on an existing | stable diffusion checkpoint? If so which? | | It would be very interesting to see what the stable diffusion | community that is using automatic1111 version would do with | this if it were made into an extension. | haykmartiros wrote: | Yes from https://huggingface.co/runwayml/stable- | diffusion-v1-5. Our checkpoint works with automatic1111, and | if you'd like to make an extension to decode to audio, it | should be pretty straightforward: | https://github.com/hmartiro/riffusion- | inference/blob/main/ri... | Metus wrote: | Can you run this on any hardware already capable of running | SD 1.5? I am downloading the model right now, might play | with this later. | | Guessing at the speed with which AI is developing these | days someone is going to have the extension up in two hours | at most. | ronsor wrote: | I bet the AUTOMATIC1111 web UI music plugin drops within | 48 hours. | haykmartiros wrote: | Yes! Although to have real time playback with our | defaults you need to be able to generate 40 steps at | 512x512 in under 5 seconds. | Metus wrote: | Good to know. I was just so close with just under 7s | using 40 steps and Euler a as sampler. | abledon wrote: | This is groundbreaking! All other attempts at AI generated | music have IMO, fallen flat... These results are actually | listenable, and enjoyable! This is almost frightening how | powerful this can be | ozten wrote: | Amazing work! Did you use CLIP or something like that to train | genre + mel-spectrogram? What datasets did you use? | tartakovsky wrote: | Hi Hayk, I see that the inference code and the final model are | open source. I am not expecting it, but is the training code | and the dataset you used for fine-tuning, and process to | generate the dataset open source? | asdf333 wrote: | is classical music harder? noticed you didn't have any | classical music tracks. i wonder if it is because it is more | structured? | lisper wrote: | Wow, I am blown away. Some of these clips are really good! I | love the Arabic Gospel one. John and George would have loved | this so much. And the fact that you can make things that | _sound_ good by going through _visual_ space feels to me like | the discovery of a Deep Truth, one that goes beyond even the | Fourier transform because it somehow connects the _aesthetics_ | of the two domains. | tbalsam wrote: | I can simultaneously burst a bubble and provide fuel for more | -- the alignment of the intrinsic manifolds of different | domains has been an interesting research topic for zero shot | research for a few years. I remember seeing at CVPR 2018 the | first zero shot...classifier, I think? That if I recall | correctly trained in two domains that were automatically | basically aligned with each other enough to provide very good | zero shot accuracy. | | Calling it a Deep Truth might be a bit of an emotional | marketing spin but the concept is very exciting nonetheless I | believe. | lisper wrote: | My characterization of it as a Deep Truth might just be a | reflection of my ignorance of the current state of the art | in AI. But it's still pretty frickin' cool nonetheless. | logicallee wrote: | Alright so this is a pretty amazing new development. I | want to tell you something about what the state of the | art is in AI. When you wrote that it is a deep truth it | was before I actually listened to the pieces. I had just | read the descriptions. At the time, I thought that you | were probably right because I was thinking that music is | only pleasing because of the structure of our brains it's | not like vision where originally we are interpreting the | world and that's where art comes from. Music is purely | sort of abstract or artistic. However, after I listened | to the pieces, I realised that they really sound exactly | like the instruments that are making the physical noises. | For example it really sounds exactly like a physical | piano. So I don't know about a deep truth, but it does | seem that there is a physical sense that the music | represents which it can successfully mimic using this | essentially image generating capability. One thing about | all of these amazing AI development, is that I still make | some long comments by dictating to Google. When it first | got to the point that it was able to catch almost | everything that I was saying I was absolutely blown away. | However, it's really not that good at taking dictation, | and I have to go back and replace each and every | individual comma and period with the corresponding | punctuation mark. Seeing such an amazing developments | happening month after month year after year it makes me | feel like we are really approaching what some people have | called the singularity. When I read about a net positive | fusion being announced my first instinct was to think oh | of course it's now that that ChatGPT is available of | course announcing a major fusion breakthrough would | happen within days to weeks it just makes perfect sense | that AI's can solve problems that have have confounded | scientists for decades. To see just how far we still have | to go take a look at how this comment read before I | manually corrected it to what I had actually said. | | -- [I copied and pasted the below to the above and then | corrected it. Below is the original version. This is how | I dictate to Google sometimes, on Android. Normally I | would have further edited the above but in this case I | wanted to show how far basic things like dictation still | have to go. By the way I dictated in a completely quiet | room. I can't wait for more advanced AI like ChatGPT to | take my dictation.] | | Alright so this is a pretty amazing our new development | period I want to tell you something about out why the | state of the heart is is in a i period when you wrote | that it is a deep truth it was before I actually listen | to The Pieces, I have just read the descriptions period | at the time, I thought that you were probably right | because I was thinking that music is only pleasing | because of the structure of our brains it's not like | vision where originally we are interpreting the world and | that Where Art comes from music is purely so dove | abstract or artistic period however, after I listen to | the pieces, I realise that they really sound exactly like | the instruments that are making the physical noises | period for example it really sounds exactly like a | physical piano period so I don't know about out a deep | truth karma but it does seem that there is a physical | sense that the music are represents which it can | successfully mimic using this essentially image | generating capability period one thing about all of these | amazing AI development, is that I still make some long | comments by dictating to Google. When it first got to the | point that it was able to catch almost everything then | was saying I was absolutely blown away period however, | it's really not that good at taking dictation karma and I | have to go back and replace each and every individual, | and period with with the corresponding punctuation mark | period seeing such an amazing developments happening | month after month year after year ear makes me feel like | we are really approaching what some people have called | the singularity period when I read about out net positive | fusion being announced my first Instinct was to think oh | of course it's now that that chat GPT is available of | course announcing a major fusion breakthrough would | happen within in days to weeks it just makes perfect | sense DJ eyes can solve problems that have have | confounded scientists for decades period to see just how | far we still have to go take a look at how this comment | red before I manually corrected it to what I had actually | set | nerpderp82 wrote: | It is a Deep Truth in that the universe is predictable and | can be represented (at least the parts we interact with) | mathematically. Matrix algebra is a hellova a drug. I could | imagine someone developing the ability to listen to | spectrograms by looking at them. | justincormack wrote: | There is a whole piece in Godel Escher Bach where they | look at vinyl records as alll the soud data is in there. | haykmartiros wrote: | /u/threevox on reddit made a colab for playing with the | checkpoint: | | https://colab.research.google.com/drive/1FhH3HlN8Ps_Pr9OR6Qc... | [deleted] | joelrunyon wrote: | The site isn't working for me? Anything I have to fix on my | side to make it work? | plank wrote: | Crashes repeatedly on iOS in Firefox (my usual browser), is | OK on Safari though, so probably not a webkit thing. | alsodumb wrote: | Hayk! How smart are you! I loved your work on SymForce and | Skydio - totally wasn't expecting you to be co-author on this! | | On a serious note, I'd really love some advice from you on time | management and how you get so much done? I love Skydio and the | problems you are solving, especially on the autonomy front, are | HARD. You are the VP of Autonomy there and yet also managed to | get this done! You are clearly doing something right. Teach us, | senpai! | superkuh wrote: | I've compiled/run a dozen different image to sound programs and | none of them produce an acceptable sound. This bit of your code | alone would be a great application by itself. | | It'd be really cool if you could implement an MS paint style | spectrum painting or image upload into the web app for more | "manual" sound generation. | poslathian wrote: | Super! Makes sense since Skydio is also amazing. | | How much data is used for fine tuning? Since spectrograms are | (surely?) very out of distribution for the pre training | dataset, how much does value does the pre training really | bring? | haykmartiros wrote: | To be honest, we're not sure how much value image pre | training brings. We have not tried to train from scratch, but | it would be interesting. | | One thing that's very important though is the language pre- | training. The model is able to do some amazing stuff with | terms that do not appear in our data set at all. It does this | by associating with related words that do appear in the | dataset. | theGnuMe wrote: | You can embed images in spectrograms.. might sound weird | though | jablongo wrote: | Hello - this is awesome work. Like other commenters, I think | the idea that if you are able to transfer a concept into a | visual domain (in this case via fft) it becomes viable to model | with diffusion is super exciting but maybe an | oversimplification. With that in mind, do you think this type | of approach might work with panels of time series data? | newobj wrote: | All the AI music I've heard so far has a really unpleasant | resonant quality to it. Why is that? Can it be removed? | recursive wrote: | The link is down now, so I don't know about this one. But | most generated music is generated in the note domain, rather | than the audio domain. Any unpleasant resonance would | introduced in the audio synthesis step. And audio synthesis | from note data is a very solved problem for any kind of | timbre you can conceive of, and some you can't. | hyperbovine wrote: | Presumably for similar reasons that the vast majority of AI | generated art and text is off-puttingly hideous or bland. For | every stunning example that gets passed around the internet, | thousands of others sucked. Generating art that is | aesthetically pleasing to humans seems like the Mt. Everest | of AI challenges to me. | blueboo wrote: | > For every stunning example that gets passed around the | internet, thousands of others sucked | | ...implying there may be an art to AI art. Hmm. | | Meanwhile, the degree to which it is off-puttingly hideous | in general can be seen in the popularity of Midjourney -- | which is to observe millions of folks (of perhaps dubious | aesthetic taste) find the results quite pleasing. | jameshart wrote: | The vast majority of human generated art is hideous or | bland. Artists throw away bad ideas or sketches that didn't | work all the time. Plus you should see most of the stuff | that gets pasted up on the walls at an average middle | School. | ROTMetro wrote: | Hard disagree. The average middle school picture will | have certain aspects exaggerated giving you insights into | the minds eye of the creator, how they see the world, | what details they focus on. There is no such minds eye | behind AI art so it's incredibly boring and mundane, no | matter how good a filter you apply on top of it's | fundamental lack of soul or anything interesting to | observe in the picture beyond surface level. It's great | for making art for assets for businesses to use, it's | almost a perfect match, as they are looking to have no | controversial soul to the assets they use, but lots of | pretty bubblegum polish. | antipotoad wrote: | Perhaps most of the AI art out there (that honestly | represents itself as such) is boring and mundane, but | after many hours exploring latent space, I assure you | that diffusion models can be wielded with creativity and | vision. | | Prompting is an art and a science in its own right, not | to speak of all the ways these tools can be strung | together. | | In any case, everything is a remix. | dwringer wrote: | I have to agree, the act of coming up with a prompt is | one and the same with providing "insights into the minds | eye of the creator, how they see the world, what details | they focus on" - two people will describe the same scene | with _completely_ different prompts. | jameshart wrote: | And the vast majority of professionally produced artwork | is for business use. It's packaging design or | illustration or corporate graphics or logos or whatever. | | I don't get the objection. | adamsmith143 wrote: | Not sure about this. Models like Midjourney seem to put out | very consistently good images. | andybak wrote: | I think your comment is off-topic to the post you are | replyng to. That wasn't asking about the general aesthetic | quality - more about a specific audio artifact. | | > For every stunning example that gets passed around the | internet, thousands of others sucked. | | From personal experience this is simply untrue. I don't | want to debate it because you seem to have strong feelings | about the topic. | hyperbovine wrote: | Even if you remove the artifact, the exact same comment | applies. It generates a somewhat less interesting version | of elevator music. This is not to crap on what they did. | As I said, they underlying problem is extremely difficult | and nobody has managed to solve it. | | I don't feel strongly about this topic at all. | indigochill wrote: | > It generates a somewhat less interesting version of | elevator music. | | This iteration does, but that's an artifact of how it's | being generated: small spectograms that mutate without | emotional direction (by which I mean we expect things | like chord changes and intervals in melodies that we | associate with emotional expressions - elevator music | also stays in the neutral zone by design). | | I expect with some further work, someone could add a | layer on top of this that could translate emotional | expressions into harmonic and melodic direction for the | spectrogram generator. But maybe that would also require | more training to get the spectrogram generator to | reliably produce results that followed those directions? | antognini wrote: | I've done some work on AI audio synthesis and the artifacts | you're hearing in these clips are coming from the algorithm | that is used to go from the synthesized spectrogram to the | audio (the Griffin-Lim algorithm). | | Audio spectrograms have two components: the magnitude and the | phase. Most of the information and structure is in the | magnitude spectrogram so neural nets generally only | synthesize that. If you were to look at a phase spectrogram | it looks completely random and neural nets have a very, very | difficult time learning how to generate good phases. | | When you go from a spectrogram to audio you need both the | magnitudes and phases, but if the neural net only generates | the magnitudes you have a problem. This is where the Griffin- | Lim algorithm comes in. It tries to find a set of phases that | works with the magnitudes so that you can generate the audio. | It generally works pretty well, but tends to produce that | sort of resonant artifact that you're noticing, especially | when the magnitude spectrogram is synthesized (and therefore | doesn't necessarily have a consistent set of phases). | | There are other ways of using neural nets to synthesize the | audio directly (Wavenet being the earliest big success), but | they tend to be much more expensive than Griffin-Lim. Raw | audio data is hard for neural nets to work with because the | context size is so large. | yayr wrote: | would it be an approach to use separate color channels for | the freq amplitude and freq phase in the same picture? | Maybe the network then has a better way of learning the | relationships and there would be no need for the | postprocessing to generate a phase. | echelon wrote: | Griffin-Lim is slow and is almost certainly not being used. | | A neural vocoder such as Hifi-Gan [1] can convert spectra | to audio - not just for voices. Spectral inversion works | well for any audio domain signal. It's faster and produces | much higher quality results. | | [1] https://github.com/jik876/hifi-gan | antognini wrote: | If you check their about page they do say they're using | Griffin-Lim. | | It's definitely a useful approach as an early stage in a | project since Griffin-Lim is so easy to implement. But I | agree that these days there are other techniques that are | as fast or faster and produce higher quality audio. | They're just a lot more complicated to run than Griffin- | Lim. | seth_ wrote: | Author here: Indeed we are using Griffin-Lim. Would be | exciting to swap it out with something faster and better | though. In the real-time app we are running the | conversion from spectrogram to audio on the GPU as well | because it is a nontrivial part of the time it takes to | generate a new audio clip. Any speed up there is helpful. | bckr wrote: | RAVE attacks the phase issue by using a second step of | training. I don't completely understand it, but it uses a | GAN architecture to make the outputs of a VAE sound better. | kazinator wrote: | Phase is crtical for pitch. Here is why. The spectral | transformation breaks up the signal into frequency bins. | The frequency bins are not accurate enough to convey pitch | properly. When a periodic signal is put through a FFT, it | will land into a particular frequency bin. Say that the | frequency of the signal is right in the middle of that bin. | If you vary its pitch a little bit, it will still hand into | the same bin. Knowing the amplitude of the bin doesn't give | you the exact pitch. The phase information will not give it | to you either. However, between successife FFT samples, the | phase will rotate. The more off-center the frequency is, | the more the phase rotates. If the signal is dead center, | then each successive FFT frame will show the same phase. | When it is off center, the waveform shifts relative to the | window, and so the phase changes for every sample. From the | rotating phase, you can determine the pitch of that signal | with great accuracy. | antognini wrote: | Yes, this is exactly right and is why Griffin-Lim | generated audio often has a sort of warbly quality. If | you use a large FFT you can mitigate the issues with | pitch because the frequency resolution in your | spectrogram is higher, so the phase isn't so critical to | getting the right pitch. But the trade-off of a bigger | FFT is that the pitches now have to be stationary for | longer. | | The other place where phase is critical is in impulse | sounds like drum beats. A short impulse is essentially | just energy over a broad range of frequencies, but the | phases have been chosen such that all the frequencies | cancel each other out everywhere except for one short | duration where they all add constructively. Without the | right phases, these kinds of sounds get smeared out in | time and sound sort of flat and muffled. The typing | example on their demo page is actually a good example of | this. | amelius wrote: | I'm curious why, instead of using magnitude and phase, you | wouldn't use real and imaginary parts? | antognini wrote: | There have been some attempts at doing this, some of | which have been moderately successful. But fundamentally | you still have the problem that from the NN's | perspective, it's relatively easy for it to learn the | magnitude but very hard for it to learn the phase. So | it'll guess rough sizes for the real and imaginary parts, | but it'll have a hard time learning the correct ratio | between the two. | | Models which operate directly on the time domain have | generally had a lot more success than models that operate | on spectrograms. But because time-domain models | essentially have to learn their own filterbank, they end | up being larger and more expensive to train. | xnzakg wrote: | Considering Stable Diffusion generates 3-channel (RGB) | images, maybe it would be possible to train it on amplitude | and phase data as two different channels? | antognini wrote: | People have tried that, but the model essentially learns | to discard the phase channel because it is too hard for | it to learn any useful information from it. | [deleted] | haykmartiros wrote: | We took a look at encoding phase, but it is very chaotic | and looks like Gaussian noise. The lack of spatial | patterns is very hard for the model to generate. I think | there are tons of promising avenues to improve quality | though. | woah wrote: | You're probably talking about the artifacts of converting a | low resolution spectrogram to audio. | wdfx wrote: | Can the spectrogram image be AI upscaled before | transforming back to the time domain? | malka wrote: | Yes it exists: | https://ccrma.stanford.edu/~juhan/super_spec.html | | But the issue is not that the spectrogram is low quality. | | The issue is that the spectrogram only contains the | amplitude information. You also need phase information | for generating audio from the spectogram | mcbuilder wrote: | Interesting, can't you quantize and snap to a phase that | makes sense to create the most musical resonance? | waltbosz wrote: | What happens if you run one of the spectrogram pictures | through an upscaler for images like ESRGAN ? | syntheweave wrote: | It sounds kind of like the visual artifacts that are | generated by resampling in two dimensions. Since the whole | model is based on compressing image content, whatever it's | doing DSP-wise is more-or-less "baked in", and a probable fix | would lie in doing it in a less hacky way. | crubier wrote: | I think this is because the generation is done in the | frequency domain. Phase retrieval is based on heuristics and | not perfect, so it leads to this "compressed audio" feel. I | think it should be improvable | TechTechTech wrote: | I got an actual `HTTP 402: PAYMENT_REQUIRED` response (never seen | one of those in the wild, according to Mozilla it is | experimental). Someone's credit card stopped scaling? | haykmartiros wrote: | LOL. Yes we had to upgrade our Vercel tier: | https://twitter.com/sethforsgren/status/1603425188401467392 | PcChip wrote: | the problem is it _sounds_ awful, like a 64kbps MP3 or worse | | Perhaps AI can be trained to create music in different ways than | generating spectrograms and converting them to audio? | w_for_wumbo wrote: | It doesn't need to sound good at all for it to be useful. Like | with the AI Art creation, it can be a starting point for | artists to play around and rapidly try different concepts, and | then interpret the concept using high quality tools to create | something really quite remarkable. | | It's all about empowering artists to explore more | possibilities. | stevehiehn wrote: | Exactly! UX/Pipelines/Integrations are the next logical step. | It's my belief that samples will essentially be 'free' very | soon. We will see DAW plugins/integrations that contextually | offer samples to the composer. I'm confident in this because | that's what I'm working on. | joenot443 wrote: | Incredible stuff, Seth & Hayk. I've been thinking nonstop about | new things to build using Stable Diffusion and this is the first | one that really took my breath away. | esotericsean wrote: | Coming at this from a layman's perspective, would it be possible | to generate a different sort of spectrogram that's designed for | SD to iterate upon even more easily? | hoschicz wrote: | What did use as training data? | gedy wrote: | This is so good that I wondered if it's fake. Really impressive | results from generated spectrographs! Also really interesting | that it's not exactly trained on the audio files themselves - | wonder if the usual copyright-based objections wild even apply | here. | rbn3 wrote: | regarding those usual objections, i'd argue that a spectrograph | representation of a given piece of audio is just a different | (lossy) encoding of the same content/information, so any | hypothetical objections would still apply here. | Applejinx wrote: | You would be absolutely correct. the lossiness is in the | resolution of the image (512x512 is pretty terrible) but | given enough image resolution it's just an FFT transform, and | the only reason that stuff falls short is because people | don't give it, in turn, enough resolution. If you did wild | overkill of the resolution of an FFT transform you could do | anything you wanted with no loss of tone quality. If you | turned that to visual images and did diffusion with it you | could do AI diffusion at convincing audio quality. | | In theory the tone quality is not an objection here. When it | sounds bad it's because it's 512x512, because the FFT | resolution isn't up to the task, etc. People cling to very | inadequate audio standards for digital processing, but you | don't have to. | [deleted] | kgwgk wrote: | Why not? Music copyright was not even about audio recordings | originally. | Applejinx wrote: | Some of this is really cool! The 20 step interpolations are very | special, because they're concepts that are distinct and novel. | | It absolutely sucks at cymbals, though. Everything sounds like | realaudio :) composition's lacking, too. It's loop-y. | | Set this up to make AI dubtechno or trip-hop. It likes bass and | indistinctness and hypnotic repetitiveness. Might also be good at | weird atonal stuff, because it doesn't inherently have any notion | of what a key or mode is? | | As a human musician and producer I'm super interested in the | kinds of clarity and sonority we used to get out of classic | albums (which the industry has kinda drifted away from for | decades) so the way for this to take over for ME would involve a | hell of a lot more resolution of the FFT imagery, especially in | the highs, plus some way to also do another AI-ification of what | different parts of the song exist (like a further layer but it | controls abrupt switches of prompt) | | It could probably do bad modern production fairly well even now | :) exaggeration, but not much, when stuff is really overproduced | it starts to get way more indistinct, and this can do indistinct. | It's realaudio grade, it needs to be more like 128kbps mp3 grade. | Metus wrote: | > composition's lacking, too. It's loop-y. | | Well no wonder, it has absolutely no concept of composition | beyond a single 5s loop, if I understand correctly. | | > It absolutely sucks at cymbals, though. Everything sounds | like realaudio :) | | > It could probably do bad modern production fairly well even | now :) exaggeration, but not much, when stuff is really | overproduced it starts to get way more indistinct, and this can | do indistinct. It's realaudio grade, it needs to be more like | 128kbps mp3 grade. | | I haven't sat down yet to calculate it, but is the output of SD | at 512*512px at 24bit enough to generate audio CD quality in | theory? | TheOtherHobbes wrote: | No. | | And I suspect this will always have phase smearing, because | it's not doing any kind of source separation or individual | synthesis. It's effectively a form of frequency domain data | compression, so it's always going to be lossy. | | It's more like a sophisticated timbral morph, done on a | complete short loop instead of an individual line. | | It would sound better with a much higher data density. CD | quality would be 220500 samples for each five second loop. | Realtime FFTs with that resolution aren't practical on the | current generation of hardware, but they could be done in | non-realtime. But there will always be the issue of timbres | being distorted because outside of a certain level of | familiarity and expectation our brains start hearing gargly | disconnected overtones instead of coherent sound objects. | | What this is _not_ doing is extracting or understanding | musical semantics and reassembling them in interesting ways. | The harmonies in some of these clips are pretty weird and | dissonant, and not what you 'd get from a human writing | accessible music. This matters because outside of TikTok | music isn't about 5s loops, and longer structures aren't so | amenable to this kind of approach. | | This won't be a problem for some applications, but it's a | long way short of the musical equivalent of a MidJourney | image. | | Generally we're a lot more tolerant of visual "bugs" than | musical ones. | TremendousJudge wrote: | >and not what you'd get from a human writing accessible | music | | The timbral qualities of the posted samples remind me of | some of the stuff I heard from Aphex Twin, like Alberto | Balsalm. Not accessible by a long shot but definitely human | Metus wrote: | I think an approach like this could generate interesting | sounds we as humans would never think of. Or meshing two | sounds in ways we could barely imagine or implement. | | But of course something like this, which only thinks in 5s | clips can not generate a larger structure, like even a | simple song. Maybe another algorithm could seed the notes | and an algorithm like this generates the sounds via | img2img. | nonima wrote: | This is really cool but can someone tell me why we are automating | art? Who asked for this? The future seems depressing when I look | at all this AI generated art. | TheRealPomax wrote: | Everyone asked for this, including artists. If you make a | living off of making art, having the best tools to help you do | that is a constant, and the tools are finally starting to get | properly good. Will "the job" change because of the tools? Of | course. Will the nature of what it means for something to be | art change? Also of course. Art isn't some static, untouchable | thing. It changes as humanity does. | diydsp wrote: | I would say it's not "generated," but interpolated... | | It doesn't make anything new or fresh. It doesn't pull any | real-life emotions or experiences into a synthesis that a | person can relate to. It's more like asking a teenaged comedian | to imitate numerous impressions of music styles. e.g. in Clerks | when the Russian guy does "metal": | https://youtu.be/7gFoHkkCaRE?t=55 | | Of course the modern conception of music in the West is as an | accompaniment to other, mostly drudging, activities, as opposed | to something to be paid singularly attention. Therefore, there | are many "valuable"(*) occasions to produce "impressions" of | music. E.g. in advertisements and social media flexes where | identity and attitude are the purpose of music. For these, a | shallow interpretation or reflection of loosely amalgamated | sound clips will suffice. But we don't just attend concerts or | focus sustained energy on sonic impressions. We listen to | lyrics and give over our consciousness to composed works | because we want to find secrets others give away in dealing | with this crazy thing called life- ideas to succeed, admissions | of failure, and what the expected emotional arcs of these | trajectories looks like. This lofty goal is to date not within | the scope of AI stunts. | | As Solzheinetysn said, "Too much art is like candy and not | bread." | owlbynight wrote: | We're not automating art, we're creating tools that make it | easier for humans to create art. These are nothing more than | new and exciting tools. The cream will still rise to the top, | same as it ever was. | moonchrome wrote: | Because art is the low hanging fruit of "close enough" | applications. | slenocchio wrote: | I wonder if this is true for music. Our ears are much more | discerning than our eyes when it comes to art it seems. | moonchrome wrote: | I mean listening to samples on the link above I'd hardly | call it music so I'd say you're right. | dangond wrote: | Because tons of people want to make art, and a lot of art | currently requires years of training to make anything close to | "good". Making art more accessible to create is a boon to | everyone who's dreamed of being able to make their own | paintings and music, but doesn't have the skills required. | [deleted] | logarhythmic wrote: | That just means there is going to be a whole lot more bad art | in the world | dangond wrote: | Not all of this art will be meant to be shared with the | whole world though. A lot of it will be people just using | it because they enjoy it. | schwartzworld wrote: | You can't automate a live performance or an oil painting with | AI in this way. This isn't going to replace musicians and | artists. If anything, I think a preponderance of AI art would | make people appreciate the real stuff more. | | As to why, music is fun to create, and this is just a tool. | sampo wrote: | > You can't automate a live performance or an oil painting | with AI in this way. | | You'd have to combine it with these guys | https://www.youtube.com/watch?v=WqE9zIp0Muk | hungryforcodes wrote: | Actually I agree with you, but HN is not really a place where | you will find artists defending themselves. However you will | find alot of people defending the automation of art. Generative | art has it's place. But ultimately until humans are extinct, | human generated art is the only thing which really represents | the species. Everything else is an advanced form of puppetry or | mimicry. | bulbosaur123 wrote: | > Who asked for this? | | I did. | | > The future seems depressing when I look at all this AI | generated art. | | You should talk about your concerns with an AI psychotherapist. | 451mov wrote: | why not use an image of the waveform as input? | mensetmanusman wrote: | This works because songs are images in time. FFT analysis does | not care. | vikp wrote: | Producing images of spectrograms is a genius idea. Great | implementation! | | A couple of ideas that come to mind: | | - I wonder if you could separate the audio tracks of each | instrument, generate separately, and then combine them. This | could give more control over the generation. Alignment might be | tough, though. | | - If you could at least separate vocals and instrumentals, you | could train a separate model for vocals (LLM for text, then text | to speech, maybe). The current implementation doesn't seem to | handle vocals as well as TTS models. | btbuildem wrote: | I think you'd have to start with separate spectrograms per | instrument, then blend the complete track in "post" at the end. | michpoch wrote: | Earlier this year, graphic designers, last month it was software | engineers, and now musicians are also feeling the effects. | | Who else will AI make looking for a new job? | goostavos wrote: | This was the first AI thing to fill me with a feeling of | existential dread. | TaupeRanger wrote: | What is with the hyperbole in this thread? This stuff sounds | like incoherent noise. It is noticeably worse than AI audio | stuff I heard 5 years ago. What is going on with the | responses here? | wcoenen wrote: | Usage of an image generator to produce passable music | fragments, even if they sound a bit distorted, is very | surprising. That type of novelty is why we come here. | Applejinx wrote: | Musicians were made to get a day job long before you were born | ;) | wpietri wrote: | Although I do wonder how much an earlier technology, audio | reproduction, contributed to that. My grandmother worked for | a time as a piano player as part of a nightclub orchestra. It | was a stable job back then. I have to wonder how many | musician jobs were killed off by the jukebox and related | technologies. | awestroke wrote: | If I was a musician, this post would not make me worry for a | second | wcoenen wrote: | If a hack based on an image generator already has promising | results for music generation, then imagine what will happen | if something dedicated to music is built from the ground up. | logn wrote: | The raw outputs of these tools will be best consumed by | experts. Until general AI, these are just better tools for the | same workers. | mensetmanusman wrote: | They were killed off by the ability to record the data. Every | city used to have their own music stars :) | 323 wrote: | Politicians, bureaucracy. | | GPT-3, what policy should we apply to increase tax revenue by | 5% given these constraints? | | GPT-3, please tell me some populist thing to say to win the | next election, or how should I deflect these corruption | charges. | antipotoad wrote: | Isn't this the plot of Deus Ex? | kmeisthax wrote: | "We should place a tax on all copyright lawyers and use it to | fund GPU manufacturing and AI development. At your next stump | speech, mention how the entertainment industry is stealing | jobs from construction workers. Your corruption charges won't | matter because voters only care about corruption when it's | not in their favor." | bawolff wrote: | Honestly none of them should. I think the moral panic around | these things is way overstated. They are cool but hardly about | to replace anyone's job. | d7y wrote: | wmwmwm wrote: | This is amazing, and scary (as a musician) but also reliably | kills firefox on iOS! | EZ-Cheeze wrote: | "https://en.wikipedia.org/wiki/Spectrogram - can we already do | sound via image? probably soon if not already" | | Me in the Stable Diffusion discord, 10/24/2022 | | The ppl saying this was a genius idea should go check out my | other ideas | mdonahoe wrote: | If only we had a diffusional model that could take your ideas | and turn them into reality! | EZ-Cheeze wrote: | No I want ppl | Slow_Hand wrote: | As a musician, I'll start worrying once an AI can write at the | level of sophistication of a Bill Withers song: | | https://www.youtube.com/watch?v=nUXgJkkgWCg | | Not simply SOUND like a Bill Withers song, but to have the same | depth of meaning and feeling. | | At that point, even if we lose we win because we'll all be | drowning in amazing music. Then we'll have a different class of | problem to contend with. | rbn3 wrote: | great stuff, while it comes with the usual smeary iFFT artifacts | that AI-generated sound tends to have the results are | surprisingly good. i especially love the nonsense vocals it | generates in the last example, which remind me of what singing | along to foreign songs felt like in my childhood. | gitfan86 wrote: | This is what I've been talking about all year. It is such a | relief to see it actually happen. | | In summary: The search for AGI is dead. Intelligence was here and | more general than we realized this whole time. Humans are not | special as far as intelligence goes. Just look how often people | predict that an AI cannot X or Y or Z. And then when an AI does | one of those things they say, "well it cannot A or B or C". | | What is next: This trend is going to accelerate as people realize | that AI's power isn't in replacing human tasks with AI agents, | but letting the AI operate in latent spaces and domains that we | never even thought about trying. | visarga wrote: | Generated contents without filtering/validation are worthless. | | I predict some kind of testing, validation or ranking be | developed to filter out generated contents. Each domain has its | own rules - you need to implement validation for code and math, | fact checks for text, contrasting the results from multiple | solutions for problem solving, and aesthetic scoring for art. | | But validation is probably harder than learning to generate in | the first place, probably a situation similar to closing the | last percent in self driving. | [deleted] | nixpulvis wrote: | Sounds a bit "clowny" to me, for lack of a better word. | Broge wrote: | I wonder if it's possible to fine-tune an image upscaling model | on spectrograms, in order to clean up the sound? | andy_ppp wrote: | I was thinking about this - what if someone trained a stable | diffusion type model on all of the worlds commercial music? This | model would probably produce quite amazing music given enough | prompting and I'm wondering if the music industry would be | allowed to claim copyright on works created with such a model. | Would it be illegal or is this just like a musician picking up | ideas from hearing the world of music? Is it really right to make | learning a crime, even if machines are doing it? I'm conflicted | after finding out that for sync licensing the music industry want | a percentage of revenue based on your subscription fees, | sometimes as high as 15%-20%! I'm surprised such a huge fee isn't | considered some kind of protection racket. | throw78311 wrote: | This question has been explored before, see Kologorov Music: | https://www.youtube.com/watch?v=Qg3XOfioapI | dreilide wrote: | impressive stuff. reminds me of when ppl started using image | classifier networks on spectrograms in order to classify audio. i | would not have thought to apply a similar concept for generative | models, but it seems obvious in hindsight. | quux wrote: | The vocals in these tracks are so interesting. They sound like | vocals, with the right tone, phonemes. and structure for the | different styles and languages but no meaning. | | Reminds me of the soundtrack to Nier Automata which did a similar | thing: https://youtu.be/8jpJM6nc6fE | int_19h wrote: | That's glossolalia, and it's not that uncommon in human-created | art. | qayxc wrote: | I think AI would be great at generating similar things. Might | be very nice for generating fake languages, too. | mastax wrote: | Wow those examples are shockingly good. It's funny that the | lyrics are garbled analogously to text in stable diffusion | images. | | The audio quality is surprisingly good, but does sound like it's | being played through an above-average quality phone line. I bet | you could tack on an audio-upres model afterwards. Could train it | by turning music into comparable-resolution spectrograms. | pmontra wrote: | A musician friend of mine told me that this is (I freely | translate) a perversion, building in frequency and returning | time. Don't shoot the messenger. | | Personally I like the results. I'm totally untrained and couldn't | hear any of the issues many comments are pointing out. | | I guess that all of lounge/elevator music and probably most ad | jingles will be automated soon, if automation cost less than | human authors. | adamsmith143 wrote: | "Horse Carriage Driver says horseless carriages are | abominations. More at 12!" | bufferoverflow wrote: | A network trained on spectrograms only should do much better. | lucidrains wrote: | personalized RL agents that finds aesthetic trajectories through | the music latent space... soon, i hope :D | haykmartiros wrote: | Love this idea. If I had more time I wanted to make a spaceship | game where you are flying around the latent space, and model | interrogation is used to provide labels to landmarks as you | move around. | XorNot wrote: | So this is slightly bending my mind again. Somehow image | generators were more comprehensible compared to getting coherent | music out. This is incredible. | leod wrote: | Awesome work. | | Would you be willing to share details about the fine-tuning | procedure, such as the initialization, learning rate schedule, | batch size, etc.? I'd love to learn more. | | Background: I've been playing around with generating image | sequences from sliding windows of audio. The idea roughly works, | but the model training gets stuck due to the difficulty of the | task. | raajg wrote: | If such unreasonably good music can be created based on | information encoded in an image, I'm wondering what there things | we can do with this flow: | | 1) Write text to describe the problem 2) Generate an image Y that | encodes that information 3) Parse that image Y to do X | | Example: Y = blueprint, X = Constructing a building with that | blueprint | newswasboring wrote: | If it can do music, can we train better models for different | kinds of music? Or different models for different instruments | makes more sense? For different instruments we can get better | resolution by making the spectrogram represent different | frequency ranges. This is terribly exciting, what a time to be | alive. | superb-owl wrote: | The interpolation from keyboard typing to jazz is incredible. | This is what AI art should be. | slenocchio wrote: | Do you guys think AI creative tools will completely subsume the | possibility space of human made music? Or does it open up a new | dimension of possibilities orthogonally to it? Hard for me to | imagine how AI would be able to create something as unique and | human as D'Angelo's Voodoo (esp. before he existed) but maybe it | could (eventually). | | If I understand these AI algorithms at a high level, they're | essentially finding patterns in things that already exist and | replicate it w some variation quite well. But a good song is | perfect/precise in each moment in time. Maybe we'll only be ever | be able to get asymptotically closer but never _quite_ there to | something as perfectly crafted a human could make? Maybe there | will always be a frontier space only humans can explore? | ElFitz wrote: | > Hard for me to imagine how AI would be able to create | something as unique and human as D'Angelo's Voodoo (esp. before | he existed) | | There's always that immortal randomly typing monkey with a | typewriter thing [1]. And, in our case, it seems to be better | than random. | | So, yes, perhaps. But perhaps we could instead build and create | things that are yet unimaginable upon it. We'll see. | | [1]: https://en.wikipedia.org/wiki/Infinite_monkey_theorem | simsspoons wrote: | this is just great | zoytek wrote: | It's amazing. They've really got something revolutionary here. | ZiiS wrote: | For the 30 anniversary? https://warp.net/gb/artificial- | intelligence | hoherd wrote: | Nice reference! I had never seen that site before, but those | albums had a significant impact on my musical journey. | | There was a purple victorian house in Colorado Springs where | the living room was converted into a record and cd store called | Life By Design. I picked up these albums and a ton of other | obscure music there. I was so happy to not have to drive all | the way up to Wax Trax in Denver to be able to discover new | artists. | xtracto wrote: | This looks great and the idea is amazing. I tried with the | prompt: "speed metal" and "speed metal with guitar riffs" and got | some smooth rock-balad type music. I guess there was no heavy | metal in the learning samples haha. | | Great work! | the_third_wave wrote: | Gregorian death metal folk also seems to have lacked seed tunes | but the thing is just in its infancy so soon we'll be banging | our tonsured heads to the folky beats of ... | | ...OK, need to create a band name generator to work in tandem | with this thing. Let's see what one of its brethren in ML makes | of it... | | _- "Echoes of the Past": This name plays on the idea of | Gregorian chanting, which is often associated with the distant | past, and combines it with the intense and aggressive sound of | death metal. | | - "The Order of the Black Chant": This name incorporates | elements of both the religious connotations of Gregorian | chanting and the dark, heavy sound of death metal, creating a | sense of mystery and danger. | | - "Foretold in Blood": This name evokes both the ancient, | mystical nature of Gregorian chanting and the violent themes of | death metal, creating a sense of ancient prophecy coming to | pass. | | - "Crypt of the Silent Choir": This name brings together the | eerie, otherworldly sound of Gregorian chanting with the | underground, underground feel of death metal, creating a sense | of hidden secrets and forbidden knowledge._ | | "The Order of the Black Chant" it shall be. | vintermann wrote: | Fun! I tried something similar with DCGAN when it first came out, | but that didn't exactly make nice noises. The conversion to and | from Mel spectrograms was lossy (to put it mildly), and DCGAN, | while impressive in its day, is nothing like the stuff we have | today. | | Interesting that it gets so good results with just fine tuning | the regular SD model. I assume most of the images it's trained on | are useless for learning how to generate Mel spectrograms from | text, so a model trained from scratch could potentially do even | better. | | There's still the issue of reconstructing sound from the | spectrograms. I bet it's responsible for the somewhat tinny sound | we get from this otherwise very cool demo. | knicholes wrote: | Does anyone have any good guides/tutorials for how to fine-tune | Stable Diffusion? I'm not talking about textual inversion or | dreambooth. | bane wrote: | I bet a cool riff on this would be to simply sample an ambient | microphone in the workplace and use that the generate and slowly | introduce matching background music that fits the current tenor | of the environment. Done slowly and subtly enough I'd bet the | listener may not even be entirely aware its happening. | | If we could measure certain kinds of productivity it might even | be useful as a way to "extend" certain highly productive ambient | environments a la "music for coding". | hammock wrote: | >in the workplace | | Or at a house party, club or restaurant... as more people | arrive or leave and the energy level rises or declines..or | human rhythms speed up or slow down...so does the music... | Def_Os wrote: | DJs are getting automated away too! | chrisfrantz wrote: | Reactive generative music would so cool | londons_explore wrote: | I think there has to be a better way to make long songs... | | For example, you could take half the previous spectrogram, shift | it to the left, and then use the inpainting algorithm to make the | next bit... Do that repeatedly, while smoothly adjusting the | prompt, and I think you'd get pretty good results. | | And you could improve on this even more by having a non-linear | time scale in the spectrograms. Have 75% of the image be linear, | but the remaining 25% represent an exponentially downsampled | version of history. That way, the model has access to what was | happening seconds, minutes, and hours ago (although less detail | for longer time periods ago). | someguyorother wrote: | Perhaps you could do a hierarchical approach somehow, first | generating a "zoomed out" structure, then copying parts of it | into an otherwise unspecified picture to fill in the details. | | But perhaps plain stable diffusion wouldn't work - you might | need different neural networks trained on each "zoom level" | because the structure would vary: music generally isn't like | fractals and doesn't have exact self-similarity. | talhof8 wrote: | Really cool. Can't get this to work on the homepage though. | | Might be a traffic thing? | | Edit: Works now. A bit laggy but it works. Brilliant! | scoopertrooper wrote: | I'm getting this back when I try to hear cats sing me a rock | opera: | | {"data":{"success":true,"worklet_output":{"error":"Model | version 5qekv1q is not healthy"},"latency_ms":530}} | Pepe1vo wrote: | Same here, servers are overloaded probably. Shame, I was | really looking forward to a Wu Tang Clan and Jamiroquai | collab | LoveMortuus wrote: | I also don't hear anything, even when my prompt was selected... | MichaelZuo wrote: | Me neither, perhaps the web app is a bit buggy? | [deleted] | benplumley wrote: | Same earlier, but I can now get it to work very intermittently, | with the error "Uh oh! Servers are behind, scaling up..." | orobinson wrote: | I'd been wondering (naively) if we'd reached the point where we | can't see any new kinds of music now that electronic synthesis | allows us to make any possible sound. Changes in musical styles | throughout history tend to have been brought about by people | embracing new instruments or technology. | | This is the most exciting thing I've seen in ages as it shows we | may be on the verge of the next wave of new technology in music | that will allow all sorts of weird and wonderful new styles to | emerge. I can't wait to see what these tools can do in the hands | of artists as they become more mainstream. | EamonnMR wrote: | 'make any possible sound' is less important than 'make x sound | easily' by way of tools and accumulated knowledge. Also what's | audiences are receptive to matters a lot - you could have made | noise rock in the 40s but I can't imagine it would have sold a | lot of records. | lachlan_gray wrote: | Wow, diffusion could be a game changer for audio restoration. | sampo wrote: | GPT-3 has 175 billion parameters (says Wikipedia). What is the | size of the neural network used in this riffusion project? | sebzim4500 wrote: | Stable Diffusion 'only' has ~1B parameters IIRC. | seth_ wrote: | Authors here: Fun to wake up to this surprise! We are rushing to | add GPUs so you can all experience the app in real-time. Will | update asap | SamPatt wrote: | Fascinating stuff. | | One of the samples had vocals. Could the approach be used to | create solely vocals? | | Could it be used for speech? If so, could the speech be | directed or would it be random? | [deleted] | AMICABoard wrote: | Awesome, there is another project out there that does it with | CPU https://github.com/marcoppasini/musika maybe mix the both, | ie take initial output of musika, convert to spectrogram and | feed it to riffusion to get more variation... | ElijahLynn wrote: | Wow, I just learned so much about spectograms, had no idea that | one could reverse one into audio waves! | fritzschopen wrote: | it seems that SD does cover everything in terms of generative ai. | Speaking of music, very interesting paper and demo. Just | wondering in terms of license and commercialization, what kind of | mess are we expecting here? | pea wrote: | This is amazing! Would it be possible to use it to modify this | interpolate between two existing songs (i.e. generate | spectrograms from audio and transition between them)? | CrypticShift wrote: | Things similar to the "interpolation" part (not the generative | part) are already used extensively especially for game and movie | sound design. Kyma [1] is the absolute leader (it requires | expensive hardware though). IMO later iterations on this approach | may lead to similar or better results. | | FYI, other apps that use more classic but still complex | Spectral/Granular algos : | | https://www.thecargocult.nz/products/envy | | https://transformizer.com/products/ | | https://www.zynaptiq.com/morph/ | | [1] https://kyma.symbolicsound.com/ | winReInstall wrote: | Cant wait to see this in karaoke, you just sing lyrics and it | jams along with music. | serverholic wrote: | I'm curious about the limitations of using spectrograms and | transient-heavy sounds like drums. | | It seems like you'd need very high resolution spectrograms to get | a consistently snappy drum sound. | genewitch wrote: | 8GB is enough to do 1080p resolution. the UI i use for SD maxes | out at 2048x2048. however, it takes a lot longer than 512x512 | to generate: 1m40s versus 1.97s. | | I'm guessing if one had access to one of those nvidia backplane | rackmount devices one could generate 8k or larger resolution | images. | astrange wrote: | SD can't generate coherent images if you increase the output | size. They're basically always unusable unless you don't need | any global architecture to them. | bawolff wrote: | I wonder if this would be applicable to video game music. Be able | to make stuff that's less repetitive but also smoothly | transitions to specific things with in-game events. | zone411 wrote: | Interesting. I experimented a bit with the approach of using | diffusion on whole audio files, but I ultimately discarded it in | favor of generating various elements of music separately. I'm | happy with the results of my project of composing melodies | (https://www.youtube.com/playlist?list=PLoCzMRqh5SkFPG0-RIAR8...) | and I still think this is the way to go and but that was before | Stable Diffusion came out. These are interesting results though, | maybe it can lead to something more. | ricopags wrote: | This is so completely wild. Love the novelty and inventiveness. | | Could anyone help me understand whether using SVG instead of | bitmap image would be possible? I realize that probably wouldn't | be taking advantage of the current diffusion part of Stable- | Diffusion, but my intuition is maybe it would be less noisy or | offer a cleaner/more compressible path to parsing transitions in | the latent space. | | Great idea? Off base entirely? Would love some insight either way | :D | jsat wrote: | Today's music generation is putting my Pop Ballad Generator to | shame: http://jsat.io/blog/2015/03/26/pop-ballad-generator/ | owlbynight wrote: | If copyright laws don't catch up, the sampling industry is | cooked. | | Made this: https://soundcloud.com/obnmusic/ai-sampling-riffusion- | waves-... | fowlkes wrote: | Multiple folks have asked here and in other forums but I'm going | to reiterate, what data set of paired music-captions was this | trained on? It seems strange to put up a splashy demo and repo | with model checkpoints but not explain where the model came | from... is there something fishy going on? | needz wrote: | This website crashes Firefox on iOS | Abecid wrote: | This is one of the most ingenious thing I've seen in my life | senko wrote: | @haykmartiros, @seth_, thank you for open sourcing this! | | Played a bit with the very impressive demos, now waiting in queue | for my very own riff to get generate. | | Great as this is, I'm imagining what it could do for song | crossfades (actual mixing instead of plain crossfade even with | beat matching). | soperj wrote: | > https://www.riffusion.com/?&prompt=punk+rock+in+11/8 | | Tried getting something in an odd timing, but still is 4/4. | Moosdijk wrote: | Wow this is awesome! | tomrod wrote: | This is huge. | | This show me that Stable Diffusion can create anything with the | following conditions: | | 1. Can be represented as as static item on two dimensions (their | weaving together notwithstanding, it is still piece-by-piece | statically built) | | 2. Acceptable with a certain amount of lossiness on the | encoding/decoding | | 3. Can be presented through a medium that at some point in | creation is digitally encoded somewhere. | | This presents a lot of very interesting changes for the near | term. ID.me and similar security approaches are basically dead. | Chain of custody proof will become more and more important. | | Can stable diffusion work across more than two dimensions? | marviel wrote: | I would argue that its high-fidelity representations of 3d | space, imply that the model's weights are capable of pattern- | matching in multiple dimensions, provided the input is embedded | into 2d space appropriately. | Pxtl wrote: | Now I'm wondering about feeding Stable Diffusion 2D landscape | data with heightmaps and letting generate maps for RTS | videogames. I mean, the only wrinkle there is an extra channel | or two. | [deleted] | naillo wrote: | It's interesting if this can be used for longer tracks by | inpainting the right half of the spectrogram. | spyder wrote: | Another related audio diffusion model (but without text | prompting) here: https://github.com/teticio/audio-diffusion | kanwisher wrote: | oh wow this one works really well | r3trohack3r wrote: | I can't help but see parallels to synesthesia. It's amazing how | capable these models are at encoding arbitrary domain knowledge | as long as you can represent it visually w/ reasonable noise | margins. | Animats wrote: | "Uh oh! Servers are behind, scaling up..." - havent' been able to | get past that yet. Anyone getting new output? | | This is already better than most techno. I can see DJs using | this, typing away. | up2isomorphism wrote: | These are horrible musics, but of course there is nothing to feel | shame about it. | Aardwolf wrote: | How comes that the stable diffuse model helps here? Does the fact | that it knows what an astronaut on a horse looks like have effect | on the audio? Would starting the training from an empty model | work too? | xcambar wrote: | I will try it but at least for the name it deserves praise. | kingcai wrote: | Absolutely brilliant! | NHQ wrote: | In the end there was the word. | dylan604 wrote: | The results of this are similar to my nitpicks of AI generated | images (well, duh!). There's definitely something recognizable | there, but somethings just not quite right about it. | | I'm quite impressed that there was enough training data within SD | to know what a spectrograph looks like for the different sounds. | minaguib wrote: | Absolutely incredible - from idea to implementation to output. | jansan wrote: | Very impressive. I am quite confident that next years number one | Christmas hit will start like "church bells to electronic beats". | quakeguy wrote: | Xmd5a is already a real track. | | https://www.youtube.com/watch?v=crcqADcAusg | valdiorn wrote: | This really is unreasonably effective. Spectrograms are a lot | less forgiving of minor errors than a painting. Move a brush | stroke up or down a few pixels, you probably won't notice. Move a | spectral element up or down a bit and you have a completely | different sound. I don't understand how this can possibly be | precise enough to generate anything close to a cohesive output. | | Absolutely blows my mind. | seth_ wrote: | Author here: We were blown away too. This project started with | a question in our minds about whether it was even possible for | the stable diffusion model architecture to output something | with the level of fidelity needed for the resulting audio to | sound reasonable. | TaupeRanger wrote: | It's...not effective though. Am I listening to the wrong thing | here? Everything I hear from the web app is jumbled nonsense. | itronitron wrote: | I think we're at the point, with these AI generative model | thingies, where the practitioners are mesmerized by the | mechatronic aspect like a clock maker who wants to recreate | the world with gears, so they make a mechanized puppet or | diorama and revel in their ingenuity. | jefftk wrote: | I think the progression from church bells to electronic beats | is especially good: https://www.riffusion.com/about/church_be | lls_to_electronic_b... | hyperbovine wrote: | Wasn't this Fraunhofer's big insight that led to the | development of MP3? Human perception actually is pretty | forgiving of perturbations in the Fourier domain. | w-m wrote: | You probably mean Karlheinz Brandenburg, the developer of | MP3, who worked on psychoacoustics. Not completely off | though, as he did the research at a Fraunhofer research | institute, which takes its name from Joseph von Fraunhofer, | the inventor of the spectroscope. | th0ma5 wrote: | Does the institute not also claim that work? | w-m wrote: | Fair enough. But for me, when talking about `having an | insight`, I don't imagine a non-human entity doing that. | And to be pedantic (talking about Germans doing research, | I hope everyone would expect me to be), the institute is | called Fraunhofer IIS. `Fraunhofer` would colloquially | refer to the society, which is an organization with 76 | institutes total. Although, of course, the society will | also claim the work... | humanistbot wrote: | It's an interesting question, one I hadn't thought of | before. But in common language, it sometimes makes sense | to credit the institution, others just the individuals. I | think may be more based around how much the institution | collectively presents itself as the author and speaks on | behalf of the project versus the individuals involved. | Here is my own general intuition for a few contrasting | cases: | | Random forests: Ko and Breiman's, not really Bell Labs | and UC-Berkeley | | Transistors: Bardeen, Brattain, and Shockley, not really | Bell Labs (thank the Nobel Prize for that) | | UNIX: Primarily Bell Labs, but also Ken Thompson and | Dennis Richie (this is a hard one) | | GPT-n: OpenAI, not really any individual, and I can't | seem to even recall any named individual from memory | WanderPanda wrote: | Bringing the right people together and having the right | environment that gives rise to ,,having an insight" can | be a big part as well. | killerpopiller wrote: | btw, it is public funded non-profit organisation | ComplexSystems wrote: | In very limited situations. You can move a frequency around | (or drop it entirely) if it's being masked by a nearby loud | frequency. Otherwise, you would be amazed at the sensitivity | of pitch perception. | 323 wrote: | You can also add another neural-network to "smooth" the | spectrogram, increase the resolution and remove artefacts, just | like they do for image generation. | bckr wrote: | Pretty sure that's how RAVE works | evo_9 wrote: | Pretty nice, I was just talking to a friend about needing a music | version of chatgpt, so thank you for this. | | Wondering if it would be possible to create a version of this | that you can point at a person SoundsCloud and have it emulate | their style / create more music in the style of the original | artist. I have a couple albums worth of downtempo electronic | music I would love to point something like this at and see what | it comes up with. | trekkie1024 wrote: | https://mubert.com/ might be what you're looking for. | bheadmaster wrote: | This is a genius idea. Using an already-existing and well- | performing image model, and just encoding input/output as a | spectrogram... It's elegant, it's obvious in retrospective, it's | just pure genius. | | I can't wait to hear some serious AI music-making a few years | from now. | dangom wrote: | This idea is presented by Jeremy Howard on literally their | first Deep Learning for Coders class (most recent edition). A | student wanted to classify sounds, but only knew how to do | vision, so they converted sounds to spectrograms, fine tuned | the model on the labelled spectra, and the classification | worked pretty well on test data. That of course does not take | the merit away from the Riffusion authors though. | Analog24 wrote: | The idea of connecting CV to audio via spectrograms pre dates | Jeremy Howard's course by quite a bit. That's not really the | interesting part here though. The fact that a simple | extension of an image generation pipeline produces such | impressive results with generative audio is what is | interesting. It really emphasizes how useful the idea of | stable diffusion is. | | edit: added a bit more to the thought | superpope99 wrote: | I'm super excited about the Audio AI space, as it seems | permanently a few years behind image stuff - so I think we're | going to see a lot more of this. | | If you're interested, the idea of applying Image processing | techniques to Spectrograms of audio is explored in brief in the | first lesson of one of the most recommended AI courses on HN: | Practical Deep Learning for Coders | https://youtu.be/8SF_h3xF3cE?t=1632 | rco8786 wrote: | > I can't wait to hear some serious AI music-making a few years | from now. | | I think this will be particularly useful for musical | compositions in movies and film, where the producer can | "instruct" the AI about what to play, when, and how to | transition so that the music matches the scene progression. | adamhp wrote: | Not only that but sampling. I'd say there's at least one | sample from something in most modern music. This can | essentially create "sounds" that you're looking for as an | artist. I need a sort of high pitched drone here... Rather | than dig through sample libraries you just generate a few | dozen results from a diffusion model with some varying inputs | and you'd have a small sample set on the exact thing you're | looking for. There's already so much processing of samples | after the fact, the actual quality or resolution of the | sample is inconsequential. In a lot of music, you're just | going after the texture and tonality and timbre of | something... This can be seen in some Hans Zimmer videos of | how he slows down certain sounds massively to arrive at new | sounds... or in granular synthesis... This is going to open | up a lot of cool new doors. | tstrimple wrote: | I was thinking gaming where music can and should dynamically | shift based on different environmental and player conditions. | sebzim4500 wrote: | I suspect that if you had tried this with previous image models | the results would have been terrible. This only works since | image models are so good now. | josalhor wrote: | Makes me wonder if we will see a generalization of this idea. | Just like in a CPU 90%+ of want you want to do can be modeled | with very few instructions (mov, add, jmp..) we could see a set | of very refined models (Stable difussion, GPT, etc) and all of | their abstractions on top (ChatGPT, Rifussion, etc). | DarmokJalad1701 wrote: | Maybe next up is a model that generates Piet code | | https://www.dangermouse.net/esoteric/piet.html | amelius wrote: | Perhaps GPT could run on top of Stable-diffusion, generating | output in the form of written text (glyphs). | danuker wrote: | Indeed, I think this would be a cost-effective way to go | forward. | Tenoke wrote: | For what is worth, people were trying the same thing with GANs | (I also played with doing it with stylegan a bit) but the | results weren't as good. | | The amazing thing is that the current diffusion models are so | good that the spectograms are actually reasonable enough | despite the small room for error. | munificent wrote: | As someone who loves making music and loves listening to music | made by other humans with intention, it just makes me sad. | | Sure, AI can do lots of things well. But would you rather live | in a world where humans get to do things they love (and are | able to afford a comfortable life while doing so) or a world | where machines do the things humans love and humans are | relegated to the remaining tasks that machines happened to be | poorly suited for? | bheadmaster wrote: | As someone who loves making music and loves listening to | music (regardless of its origins, in my case), it doesn't | make me that sad. Sure, at first, I had an uncomfortable | feeling that AI could make this sacred magic thing that only | I and other fellow humans know how to do... But then I | realized same thing is happening with visual art, so I | applied the same counterarguments that've been cooking in my | head. | | I think that kind of attitude is defeatist - it's implying | that humans will be stopped from making music if AI learns | how to do it too. I don't think that will happen. Humans will | continue making music, as they always have. When Kraftwerk | started using computers to make music back in the 70s, people | were also scared of what that will do to musicians. To be | fair, live music _has_ died out a bit (in a sense that there | aren 't that many god-on-earth-level rockstars), but it's | still out there, people are performing, and others who want | to listen can go and listen. | | Maybe consumers will start consuming more and more AI music, | instead of human music [0], but the worst thing that can | happen is that music will no longer be a profitable activity. | But then again, today's music industry already has some | elements of the automation - washed-out rhythms, sexual | thematics over and over again, re-hashing same old songs in | different packages... So nothing's gonna change in the grand | scheme of things. | | [0] https://www.youtube.com/watch?v=S1jWdeRKvvk | bawolff wrote: | I'd rather live in the world where humans do things that are | actually unique and interesting, and aren't essentially being | artificially propped up by limiting competition. | | I don't see this as a threat to human ingenuity in the | slightest. | int_19h wrote: | I would rather live in a world where humans get to do things | they love _because they can_ (and not because they have to | earn their bread), and machines get to do basically | everything that _needs_ to be done but no human is willing to | do it. | | Advancing AI capabilities in no way detracts from this. You | talk about humans being "relegated to the remaining tasks" - | but that's a consequence of our socioeconomic system, not of | our technology. | hackernewds wrote: | You already hear a ton of them. Lofi music on these massively | popular channels are basically auto-generated "music" + auto | generated artwork. | cjtrowbridge wrote: | Do you have any sources for more information about this? | farmin wrote: | That church bell one is amazing. Very creative transition. | flaviuspopan wrote: | I'm floored, the typing to jazz demo is WILD! Please keep pushing | this space, you've got something real special here. | bulbosaur123 wrote: | Anyone interested in joining an unofficial Riffusion Discord, | let's organize here: https://discord.gg/DevkvXMJaa | | Would be nice to have a channel where people can share Riffs they | come up with. | ElijahLynn wrote: | I was confused because I must not have read good that the working | webapp is at https://www.riffusion.com/. Go to | https://www.riffusion.com/ and press the play button to see it in | action! | rmetzler wrote: | "Jamaican rap" - usually the genre (e.g. Sean Paul) is called | Dancehall. | woeirua wrote: | Very cool, but the music still has a very "rough", almost | abrasive tinge to it. My hunch, is that it has to do with the | phase estimates being off. | | Who's going to be first to take this approach and use it to | generate human speech instead? | ubj wrote: | This happened earlier than I expected, and using a much different | technique than I expected. | | Bracing myself for when major record labels enter the copyright | brawl that diffusion technology is sparking. | adzm wrote: | Really fascinating. I'd be interested to know more about how it | was trained, with what data exactly. | rslice wrote: | deleted | dangond wrote: | I think you'll find plenty of people who find that DAWs and | music theory help them better find self-expression and | celebrate life through their music. Any tool or framework that | opens up new modes of achieving that self-expression should be | celebrated, not shunned because it isn't as "pure" as more time | and labor intensive methods. Would you rather someone be forced | to dedicate a significant amount of time to studying music and | art creation just to be able to find that self-expression? | [deleted] | phneutral26 wrote: | Right now it still seems to lack the horsepower for this many | users. Hope it gets in a better state soon, but I am bookmarking | this right now! | bluebit wrote: | And we broke it. | logn wrote: | Congratulations this is an amazing application of technology and | truly innovative. This could be leveraged by a wide range of | applications that I hope you'll capitalize on. | bogwog wrote: | Damn this is insane. I wonder what other things can be encoded as | images and generated with SD? | Pepe1vo wrote: | I find it really cool that the "uncanny valley" that's audible on | nearly every sample is exactly as I would imagine that the visual | artifacts would sound that crop up in most generated art. Not | really surprising I guess, but still cool that there's such a | direct correlation between completely different mediums! | isoprophlex wrote: | I wonder how they got their train data..! The spectrogram trick | is genius, but not much useful without high quality, diverse data | to train on | m3kw9 wrote: | They've got a looooong way to go man | ihatepython wrote: | I agree but it's better than listening to Ed Sheeran | | Edit: To be honest, I find something like 'Band In A Box' to be | more impressive and actually useful, I don't understand how I | would ever use this or listen to this. To me, it's further | proof that Stable Diffusion really just doesn't work that well | GaggiX wrote: | You can train/finetuned a Stable Diffusion model on an arbitrary | aspect ratio/resolution and then the model starts creating | coherent images, would be cool to try finetuning/training this | model on entire songs by extending the time dimension (also the | attention layer at the usual 64x64 resolution should be removed | or it would eat too much memory) | londons_explore wrote: | I propose that while you are GPU limited, you make these changes: | | * Don't do the alpha fade to the next prompt - just jump straight | to alpha=1.0. | | * Pause the playback if the server hasn't responded in time, | rather than looping. | birdyrooster wrote: | I know it sounds like I am going to be sarcastic, but I mean all | of this in earnest and with good intention. Everything this | generates is somehow worse than the thing it generated before it. | Like the uncanny valley of audio had never been traversed in such | high fidelity. Great work! | fernandohur wrote: | I found this awesome podcast that goes into several AI & music | related topics | https://open.spotify.com/show/2wwpj4AacVoL4hmxdsNLIo?si=IAaJ... | | They even talk specifically about about applying stable diffusion | and spectrograms. | stevehiehn wrote: | Really great! I've been using diffusion as well to create sample | libraries. My angle is to train models strictly on chord | progression annotated data as opposed to the human descriptions | so they can be integrated into a DAW plugin. Check it out: | https://signalsandsorcery.org/ | 2devnull wrote: | Was just watching an interview of Billy Corgan (smashing | pumpkins) on Rick Beato's YouTube[1] last night where billy was | lamenting the inevitable future where the "psychopaths" in the | music biz will use ai and auto tune to churn out three chord non- | music mumble rap for the youth of tomorrow, or something to that | effect. It was funny because it's the sad truth. It's already | here but new tech will allow them to cut costs even more, and | increase their margins. No need for musicians. Really cool on one | hand, in the same way fentanyl is cool -- or the cotton gin, but | a bit depressing on the other, if you care about musicians. I and | a few others will always pay to go the symphony, so good players | will find a way get paid, but this is what kids will listen to, | because of the profit margin alone. | | [1] https://m.youtube.com/watch?v=nAfkxHcqWKI | visarga wrote: | > new tech will allow them to cut costs even more, and increase | their margins | | How, when everyone and their dog can generate such music? It's | gonna be like stock photography in the age of SD. | gardenhedge wrote: | impressive. and this is a hobby project.. amazing | aquanext wrote: | Someone please train it on John Coltrane. | MagicMoonlight wrote: | Plug this into a video game and you could have GTA 6 where the | NPCs have full dialogue with the appearance of sentience, | concerts where imaginary bands play their imaginary catalogue | live to you and all kinds of other dynamically generated content. | wwarner wrote: | BOOM! Yes! | lftl wrote: | I just wanted to say you guys did an amazing job packaging this | up. I managed to get a local instance up and running against my | local GPU in less than 10 minutes. | Raed667 wrote: | Seems to be victim of its own success: | | - No new result available, looping previous clip | | - Uh oh! Servers are behind, scaling up | | I hope Vercel people can give you some free credits to scale it | up. | [deleted] ___________________________________________________________________ (page generated 2022-12-15 23:00 UTC)