[HN Gopher] Highres Spectrograms with the DFT Shift Theorem ___________________________________________________________________ Highres Spectrograms with the DFT Shift Theorem Author : gbh444g Score : 47 points Date : 2021-05-03 20:22 UTC (2 hours ago) (HTM) web link (soundshader.github.io) (TXT) w3m dump (soundshader.github.io) | andai wrote: | Just a heads up, you have to click the images to see the full | resolution version! I spent a good while confused about not being | able to see the details mentioned in the images. | gbh444g wrote: | Hello HN! Author here. I was thinking to call the post "The | underappreciated complexity of musical sounds" but decided to | stick with the DFT one as it would probably get more attention. | This is a small discovery I came across this weekend. FFT-based | spectrograms of musical instruments isn't a novel thing do, but I | thought what if I do a super highres spectrogram with a continuum | of freqencies, instead of the N fixed ones FFT gives. Turns out, | FFT "supports" such frequency shifting by multiplying the input | by a specially constructed complex exponent. As a result, I've | found out that musical instruments produce sophisticated | ornaments in between the harmonic levels. | CyberRabbi wrote: | My mind is kind of blown that birdsong virtually does not | include higher harmonics. I didn't even think that was possible | for a physical resonator. Great post | ttoinou wrote: | maybe they were not captured by the bandiwth-limited | microphone ? | akomtu wrote: | I think the mystery has a simple explanation: when a bird | sings at 7 kHz and the mp3 file captures only first 20 kHz, | there isn't much room for harmonics. Maybe birds do have | interesting harmonics at 56 kHz, we just don't know. | stainforth wrote: | What is an ornament? | cviilgan wrote: | Did I understand this correctly, what you are doing is | essentially: | | X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2] | | With F[x[k]] the DFT of the time-domain signal x[k], x'[k] = | x[k]*exp(2*pi*i*k*alpha) and this alpha some constant which | yields a frequency-domain shift by 25Hz. | | If so: How does this method compare to zero-padding the time- | domain signal (i.e. sinc-interpolating the frequency domain)? | It is an interesting concept, but alas it's not immediately | clear to me how to analyze this... | Lichtso wrote: | On that note, also checkout wavelets to generate spectrograms: | https://en.wikipedia.org/wiki/Wavelet | | I have some implementations here: https://github.com/Lichtso/CCWT | https://github.com/Lichtso/WebSpectrogram | crazygringo wrote: | This looks cool! But really needs "before" and "after" comparison | images -- lo-res vs hi-res. | | Seeing the hi-res images only gives me no idea what kind of | improvement this is showing... | | @gbh444g Hope you could maybe add some lo-res versions :) | | (Would also be cool to have audio clips next to each image as | well, but that's less important.) | crazygringo wrote: | > _Smoothness in the time direction is easier to achieve: the | 1024 bins window can be advanced by arbitrarily small time | steps._ | | It appears you're doing just that, but the time "width" is still | readily apparent in many of the spectrograms, most obviously on | the birdsong ones -- almost like a horizontal motion blur. | | Would a deconvolution filter be able to meaningfully horizontally | "deblur" the spectrograms? So the birdsongs didn't appear to be | drawn with a wide-tip marker, but rather a ballpoint pen? So not | just hi-res, but hi-focus. | LeegleechN wrote: | It's unfortunate that the article doesn't get into the | fundamental limits of spectrogram resolution which are based on | the famous uncertainty principle(https://en.wikipedia.org/wiki/Fo | urier_transform#Uncertainty_...). For example there is a | fundamental tradeoff between frequency resolution and time | resolution similar to the position/momentum tradeoff in quantum | mechanics. The Continuous Wavelet Transform which is alluded to | in the article is a way to tune that tradeoff by frequency bin to | best align with human sound perception. | andai wrote: | I've been wondering about the apparent contradiction between | the limitations of spectrograms and the remarkable fidelity of | MP3 files, which I thought operated along similar lines. | | When you convert a spectrogram back into sound it sounds like | crap, but then how does MP3 store the frequency information | (and why can't we use that for visualizations)? | | The math is beyond my understanding, can anyone give some kind | of analogy maybe? | achillesheels wrote: | My hypothesis: it is stored magnetically (after all magnetic | sinusoidals exist) and converted electrically once the mp3 is | activated in time. | bad_username wrote: | I implemented a simple clone of mp3 and it was not that hard. | If you do a discrete Fourier transform of the audio (in small | overlapping windows), quantize the resulting coefficients, | and compress them losslessly using the Huffman codes, you | will end up with something not that far from mp3. The human | ear is quite forgiving to the effects of quantization in | frequency domain. | | MP3 does not have remarkable fidelity though. MP3, and my | clone of it, suffers from time domain artifacts. Quantization | in the frequency domain causes distortion in the time domain | as well, negatively affecting high frequency transient sounds | like cymbals. That is more noticeable. Newer generation | codecs like AAC handle transients much better, but they are | considerably more advanced, and often use different | transforms like wavelet transform. | gugagore wrote: | The general concepts are described here: | https://en.m.wikipedia.org/wiki/Psychoacoustics | | I'm not sure what you mean by converting the spectrogram to | sound, but my guess is that the windowing done on the short- | time Fourier transform is causing artifacts. | jcelerier wrote: | > When you convert a spectrogram back into sound it sounds | like crap | | fft gives you the spectrum + the phase. if you only use the | spectrum to resynthesise you're missing half the information. | temporal domain <-> spectral domain is a 100% lossless | transform in both directions. | efnx wrote: | I love this and have been looking for a program that's like | Photoshop for sound. | layoutIfNeeded wrote: | You can try interpreting images as spectrograms, but the result | will be a cacophonic mess. | | There's a reason why nobody does this. (Other than avantgarde | experimental composers maybe, but they _are_ looking for | cacophony) | [deleted] ___________________________________________________________________ (page generated 2021-05-03 23:00 UTC)