[HN Gopher] Highres Spectrograms with the DFT Shift Theorem
       ___________________________________________________________________
        
       Highres Spectrograms with the DFT Shift Theorem
        
       Author : gbh444g
       Score  : 47 points
       Date   : 2021-05-03 20:22 UTC (2 hours ago)
        
 (HTM) web link (soundshader.github.io)
 (TXT) w3m dump (soundshader.github.io)
        
       | andai wrote:
       | Just a heads up, you have to click the images to see the full
       | resolution version! I spent a good while confused about not being
       | able to see the details mentioned in the images.
        
       | gbh444g wrote:
       | Hello HN! Author here. I was thinking to call the post "The
       | underappreciated complexity of musical sounds" but decided to
       | stick with the DFT one as it would probably get more attention.
       | This is a small discovery I came across this weekend. FFT-based
       | spectrograms of musical instruments isn't a novel thing do, but I
       | thought what if I do a super highres spectrogram with a continuum
       | of freqencies, instead of the N fixed ones FFT gives. Turns out,
       | FFT "supports" such frequency shifting by multiplying the input
       | by a specially constructed complex exponent. As a result, I've
       | found out that musical instruments produce sophisticated
       | ornaments in between the harmonic levels.
        
         | CyberRabbi wrote:
         | My mind is kind of blown that birdsong virtually does not
         | include higher harmonics. I didn't even think that was possible
         | for a physical resonator. Great post
        
           | ttoinou wrote:
           | maybe they were not captured by the bandiwth-limited
           | microphone ?
        
           | akomtu wrote:
           | I think the mystery has a simple explanation: when a bird
           | sings at 7 kHz and the mp3 file captures only first 20 kHz,
           | there isn't much room for harmonics. Maybe birds do have
           | interesting harmonics at 56 kHz, we just don't know.
        
         | stainforth wrote:
         | What is an ornament?
        
         | cviilgan wrote:
         | Did I understand this correctly, what you are doing is
         | essentially:
         | 
         | X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2]
         | 
         | With F[x[k]] the DFT of the time-domain signal x[k], x'[k] =
         | x[k]*exp(2*pi*i*k*alpha) and this alpha some constant which
         | yields a frequency-domain shift by 25Hz.
         | 
         | If so: How does this method compare to zero-padding the time-
         | domain signal (i.e. sinc-interpolating the frequency domain)?
         | It is an interesting concept, but alas it's not immediately
         | clear to me how to analyze this...
        
       | Lichtso wrote:
       | On that note, also checkout wavelets to generate spectrograms:
       | https://en.wikipedia.org/wiki/Wavelet
       | 
       | I have some implementations here: https://github.com/Lichtso/CCWT
       | https://github.com/Lichtso/WebSpectrogram
        
       | crazygringo wrote:
       | This looks cool! But really needs "before" and "after" comparison
       | images -- lo-res vs hi-res.
       | 
       | Seeing the hi-res images only gives me no idea what kind of
       | improvement this is showing...
       | 
       | @gbh444g Hope you could maybe add some lo-res versions :)
       | 
       | (Would also be cool to have audio clips next to each image as
       | well, but that's less important.)
        
       | crazygringo wrote:
       | > _Smoothness in the time direction is easier to achieve: the
       | 1024 bins window can be advanced by arbitrarily small time
       | steps._
       | 
       | It appears you're doing just that, but the time "width" is still
       | readily apparent in many of the spectrograms, most obviously on
       | the birdsong ones -- almost like a horizontal motion blur.
       | 
       | Would a deconvolution filter be able to meaningfully horizontally
       | "deblur" the spectrograms? So the birdsongs didn't appear to be
       | drawn with a wide-tip marker, but rather a ballpoint pen? So not
       | just hi-res, but hi-focus.
        
       | LeegleechN wrote:
       | It's unfortunate that the article doesn't get into the
       | fundamental limits of spectrogram resolution which are based on
       | the famous uncertainty principle(https://en.wikipedia.org/wiki/Fo
       | urier_transform#Uncertainty_...). For example there is a
       | fundamental tradeoff between frequency resolution and time
       | resolution similar to the position/momentum tradeoff in quantum
       | mechanics. The Continuous Wavelet Transform which is alluded to
       | in the article is a way to tune that tradeoff by frequency bin to
       | best align with human sound perception.
        
         | andai wrote:
         | I've been wondering about the apparent contradiction between
         | the limitations of spectrograms and the remarkable fidelity of
         | MP3 files, which I thought operated along similar lines.
         | 
         | When you convert a spectrogram back into sound it sounds like
         | crap, but then how does MP3 store the frequency information
         | (and why can't we use that for visualizations)?
         | 
         | The math is beyond my understanding, can anyone give some kind
         | of analogy maybe?
        
           | achillesheels wrote:
           | My hypothesis: it is stored magnetically (after all magnetic
           | sinusoidals exist) and converted electrically once the mp3 is
           | activated in time.
        
           | bad_username wrote:
           | I implemented a simple clone of mp3 and it was not that hard.
           | If you do a discrete Fourier transform of the audio (in small
           | overlapping windows), quantize the resulting coefficients,
           | and compress them losslessly using the Huffman codes, you
           | will end up with something not that far from mp3. The human
           | ear is quite forgiving to the effects of quantization in
           | frequency domain.
           | 
           | MP3 does not have remarkable fidelity though. MP3, and my
           | clone of it, suffers from time domain artifacts. Quantization
           | in the frequency domain causes distortion in the time domain
           | as well, negatively affecting high frequency transient sounds
           | like cymbals. That is more noticeable. Newer generation
           | codecs like AAC handle transients much better, but they are
           | considerably more advanced, and often use different
           | transforms like wavelet transform.
        
           | gugagore wrote:
           | The general concepts are described here:
           | https://en.m.wikipedia.org/wiki/Psychoacoustics
           | 
           | I'm not sure what you mean by converting the spectrogram to
           | sound, but my guess is that the windowing done on the short-
           | time Fourier transform is causing artifacts.
        
           | jcelerier wrote:
           | > When you convert a spectrogram back into sound it sounds
           | like crap
           | 
           | fft gives you the spectrum + the phase. if you only use the
           | spectrum to resynthesise you're missing half the information.
           | temporal domain <-> spectral domain is a 100% lossless
           | transform in both directions.
        
       | efnx wrote:
       | I love this and have been looking for a program that's like
       | Photoshop for sound.
        
         | layoutIfNeeded wrote:
         | You can try interpreting images as spectrograms, but the result
         | will be a cacophonic mess.
         | 
         | There's a reason why nobody does this. (Other than avantgarde
         | experimental composers maybe, but they _are_ looking for
         | cacophony)
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-05-03 23:00 UTC)