[HN Gopher] TuneNN: A transformer-based network model for pitch ...
       ___________________________________________________________________
        
       TuneNN: A transformer-based network model for pitch detection
        
       Author : CMLab
       Score  : 80 points
       Date   : 2023-12-19 12:27 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | CMLab wrote:
       | A transformer-based network model, pitch tracking for musical
       | instruments.
       | 
       | The timbre of musical notes is the result of various combinations
       | and transformations of harmonic relationships, harmonic strengths
       | and weaknesses, instrument resonant peaks, and structural
       | resonant peaks over time.
       | 
       | It utilizes the transformer-based tuneNN network model for
       | abstract timbre modeling, supporting tuning for 12+ instrument
       | types.
        
         | azinman2 wrote:
         | This smells like an automated summary.
        
           | advisedwang wrote:
           | It is from the submitter. I think it is intended as a
           | submission statement.
        
             | azinman2 wrote:
             | That's the first time I've seen/noticed that. I think it
             | makes me feel better?
        
       | vessenes wrote:
       | This is cool! The very best software-based tuning tech out there
       | is probably in piano tuning apps; they cost hundreds of dollars+
       | and are specifically made to report on harmonics and other piano
       | nuances.
       | 
       | Do you have any comparisons against other pitch detection tech?
       | Accuracy? Delay/Responsiveness? I assume it's much more compute
       | work than a handcoded FFT type pitch detector.
       | 
       | I think it's possible this would find utilization in the piano
       | world if the output offers something new / something that can
       | analyze what a piano tuning maestro can hear and make it
       | accessible to a mid-tier tuner.
        
         | jansommer wrote:
         | Sounds like you know a thing or two about pitch detection...
         | I've been working on a C implementation of YIN and PYIN (a real
         | GPL minefield for someone wanting to provide the end result as
         | MIT/public domain!), and am wondering if it's a good choice for
         | real time, cpu-bound speech pitch detection, or if there's
         | better ways. May I ask what your thoughts are on this?
        
           | ronsor wrote:
           | Have you also considered implementing the Nebula[1]
           | algorithm?
           | 
           | [1] https://github.com/Sleepwalking/nebula
        
             | jansommer wrote:
             | I need non-GPL libraries as a reference. The problem with
             | YIN and especially PYIN is that the MIT-code I've found
             | sometimes looks a bit too similar to earlier code in GPL.
             | Rewriting that into the same but in different code is
             | fairly hard. Here I'm assuming that translating eg. GPL
             | Python or C++ into C would mean the license is retained
        
               | ska wrote:
               | Can you not just write it from the paper(s)? Or is that
               | more effort than value to you?
               | 
               | > that translating eg. GPL Python or C++ into C would
               | mean the license is retained
               | 
               | It depends a bit on what exactly "translating" means but
               | you could easily be a derivative work.
               | 
               | Honestly in that situation I wouldn't even look at the
               | code. You might use in to test equivalent behavior after
               | you have your own implementation, but only in a gross
               | sense.
        
               | jansommer wrote:
               | I think I have to look at the code when using other
               | people's MIT licensed code... If they have used something
               | that's GPL or used someone else's code that turns out to
               | be GPL, then it becomes my problem when translating it.
               | And I'm not smart enough to just follow a paper
        
               | ska wrote:
               | > And I'm not smart enough to just follow a paper
               | 
               | Don't sell yourself short. This is the sort of thing that
               | is only straightforward if you have the right background.
        
               | sevagh wrote:
               | I have some code here if it interests you:
               | https://github.com/sevagh/pitch-detection
               | 
               | My favorite is the McLeod Pitch Method/MPM. Runs fast
               | enough for realtime purposes in a WASM example too:
               | https://github.com/sevagh/pitchlite
        
               | jansommer wrote:
               | Ha! I've translated your YIN code actually! Your
               | autocorrelation is pretty cool - GPL versions all use an
               | additional FFT. Have been struggling with your PYIN
               | implementation because the beta distribution is copied
               | from the GPL PYIN source, and the paper just references
               | its source code for that part, and as you also found out,
               | it's not a real beta distribution. I asked one of the
               | PYIN authors (Dixon) if he were willing to change the
               | license and he forwarded my mail a week ago - haven't
               | heard back. Then there's the absolute_threshold function
               | that is the same as in the PYIN source where it says
               | "using Jorgen Six'es loop construct". This "loop
               | construct" doesn't have a license, because he doesn't
               | answer the issues about that in his TarsosDSP library,
               | and I'm not sure if I should bother him about a few lines
               | of code. I'm assuming it's a coincidence and that's just
               | a normal way to find the absolute threshold. I really
               | don't want to point fingers here, I'm being paranoid
               | because I try to make sure I don't publish something that
               | can put people in trouble...
               | 
               | So I have been staring at your code for many hours, and
               | the YIN-implementation works well. The PYIN on the other
               | hand.. well I necro posted a while ago in one of your
               | pull requests I think ;)
        
               | xavriley wrote:
               | It sounds like you've found it already but th original
               | pYin implementation is in the VAMP plugin. Simon Dixon is
               | my PhD supervisor but he's quite busy. Feel free to email
               | me questions in my the meantime. j.x.riley@ the same
               | university as Simon. There's also a Python implementation
               | in the librosa library which might have a better license
               | for your purposes.
        
         | ks2048 wrote:
         | That's interesting. Can you point to one of these piano tuning
         | apps that are $100+?
        
         | CMLab wrote:
         | Based on our current tests, our algorithm shows significantly
         | higher accuracy and robustness compared to traditional digital
         | signal algorithms such as PEF, NCF, YIN, HPS, etc. Our team is
         | working diligently, and we will release benchmark test data and
         | results in the near future.
        
           | im3w1l wrote:
           | That's pretty nice. Do you have any idea how it does it?
        
       | rrherr wrote:
       | How does the accuracy of this compare to CREPE?
       | 
       | https://github.com/marl/crepe
       | 
       | https://github.com/maxrmorrison/torchcrepe
       | 
       | Does anyone know what the current state of the art is, within the
       | Music Information Retrieval community?
        
         | CMLab wrote:
         | CREPE generally has high latency and error rates in instrument
         | pitch recognition, especially for guitar instruments. Our team
         | will release benchmark test data and results later.
        
           | rrherr wrote:
           | Thanks. I'd love to try TuneNN! Are you releasing a
           | pretrained model? How do I run it on a wav file?
        
           | xavriley wrote:
           | High latency - agreed but it depends on whether a GPU is
           | available or not. If it is then theoretically CREPE could be
           | real-time. The error rates for pitch recognition are still
           | quite good though for the full CREPE model. I'm interested to
           | see the data on this claim.
        
       | ranting-moth wrote:
       | To the dev: the tuner gives me an incredibly high error window
       | with the following message. It doesn't prompt to access the mic
       | (I think that's related). Ubuntu/KDE/Firefox:
       | 
       | An error occurred running the Unity content on this page. See
       | your browser JavaScript console for more info. The error was:
       | TypeError: 'microphone' (value of 'name' member of
       | PermissionDescriptor) is not a valid value for enumeration
       | PermissionName. checkPermission@https://aifasttune.com/public/web
       | /microphone/microphone.js:3... _Microphone_checkPermission@https:
       | //aifasttune.com/public/web/Build/web.framework.js:10:...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... 
       | invoke_iiii@https://aifasttune.com/public/web/Build/web.framework
       | .js:10:...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
       | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... 
       | unityFramework/Module._SendMessageString@https://aifasttune.com/p
       | ublic/web/Build/web.framework.js:10:... ccall@https://aifasttune.
       | com/public/web/Build/web.framework.js:10:... SendMessage@https://
       | aifasttune.com/public/web/Build/web.framework.js:10:... SendMessa
       | ge@https://aifasttune.com/public/web/Build/web.loader.js:1:3343 l
       | oadURL@https://aifasttune.com/public/web/game/fastGameController.
       | js... i@https://aifasttune.com/assets/index-64322640.js:1:777
       | setup/<@https://aifasttune.com/assets/index-64322640.js:1:611
        
         | CMLab wrote:
         | Thank you for providing error feedback. We will work hard to
         | address it. Currently, the model-related data is relatively
         | large, which may be related to network speed.
        
           | uhoh-itsmaciek wrote:
           | I got the same error on Ubuntu/GNOME/Firefox. On Chrome, I
           | don't get an error and I'm correctly prompted for microphone
           | access, but if I grant permission, it does not seem to pick
           | anything up (I've used my mic successfully with other web
           | apps).
        
       | joonatan wrote:
       | Could someone fill me in why would machine learning be necessary
       | for pitch detection? Isn't it something that could just be solved
       | with FFT or it's a much more complicated task?
        
         | ks2048 wrote:
         | Pitch is a *subjective* property, inherently tied to the
         | complex processing humans use to perceive sounds. "Simple"
         | physical measures like fundamental frequency of a periodic
         | signal are very closely related, but for real-world audio
         | (aren't really periodic), the relationship is more complicated.
        
           | chrisshroba wrote:
           | Could you elaborate a bit more? It seems to me like the note
           | being played would always correspond to the fundamental
           | frequency observed. When is this not the case? Maybe as the
           | note rings out, the fundamental frequency and first few
           | overtones lose power, and all that's still audible are the
           | higher overtones?
        
             | gexaha wrote:
             | That's actually not true, perceived pitch can be different
             | from fundamental frequency, because of psychoacoustics. E.
             | g. you can have "missing fundamental" -
             | https://en.wikipedia.org/wiki/Missing_fundamental - or
             | other effects like "sum and difference tones", which are
             | quite popular in spectralism / spectral music
        
               | xavriley wrote:
               | Simple techniques like autocorrelation can still recover
               | a missing fundamental. To answer the GP post, using
               | neural networks for this task is overkill for simple,
               | clean signals but it can be desirable if you need a)
               | extremely high accuracy or b) robust results when there
               | are signal degradations like background noise
        
             | isoprophlex wrote:
             | There is a nice little rabbit hole to go in to:
             | psychoacoustics of church bells.
             | 
             | https://www.hibberts.co.uk/what-note-do-we-hear-when-a-
             | bell-...
             | 
             |  _Almost all musical instruments (such as pianos, organs,
             | orchestral instruments and the human voice) have sounds
             | that contain a range of frequencies f, 2f, 3f, 4f and so on
             | where f is the lowest frequency in the sound. The pitch or
             | note we asssign to the sound corresponds to the frequency
             | f. Frequencies with this regular arrangement are called
             | harmonic. The frequencies in the sound of bells, on the
             | other hand, are not harmonic, and the pitch we assign to
             | the sound of a bell is roughly an octave below the fifth
             | partial up ordered by frequency. This partial is called the
             | nominal, because it provides the note name of the bell.
             | There often isn't a frequency in the bell's sound
             | corresponding to the pitch we hear._
        
       | bravura wrote:
       | What's the license?
       | 
       | What are your thoughts on PESTO which learns pitch-prediction
       | very well with a small network, and uses a self-supervised
       | objective?
       | 
       | https://arxiv.org/abs/2309.02265
       | 
       | https://github.com/SonyCSLParis/pesto
        
       | filterfiber wrote:
       | Does anyone know where I should look if I want to detect specific
       | sounds? Like a smoke alarm, food bowl dispenser (its very
       | distinct), cat meowing, 3d printer collision, that sort of thing?
        
         | mistercheph wrote:
         | You would learn how to do this in the first & second chapters
         | of the fast.ai course.
        
       | squidsoup wrote:
       | It might be worth pointing out that the banjo model is for a four
       | string banjo, given a five string banjo is the more common
       | instrument.
        
       ___________________________________________________________________
       (page generated 2023-12-19 23:00 UTC)