[HN Gopher] TuneNN: A transformer-based network model for pitch ... ___________________________________________________________________ TuneNN: A transformer-based network model for pitch detection Author : CMLab Score : 80 points Date : 2023-12-19 12:27 UTC (10 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | CMLab wrote: | A transformer-based network model, pitch tracking for musical | instruments. | | The timbre of musical notes is the result of various combinations | and transformations of harmonic relationships, harmonic strengths | and weaknesses, instrument resonant peaks, and structural | resonant peaks over time. | | It utilizes the transformer-based tuneNN network model for | abstract timbre modeling, supporting tuning for 12+ instrument | types. | azinman2 wrote: | This smells like an automated summary. | advisedwang wrote: | It is from the submitter. I think it is intended as a | submission statement. | azinman2 wrote: | That's the first time I've seen/noticed that. I think it | makes me feel better? | vessenes wrote: | This is cool! The very best software-based tuning tech out there | is probably in piano tuning apps; they cost hundreds of dollars+ | and are specifically made to report on harmonics and other piano | nuances. | | Do you have any comparisons against other pitch detection tech? | Accuracy? Delay/Responsiveness? I assume it's much more compute | work than a handcoded FFT type pitch detector. | | I think it's possible this would find utilization in the piano | world if the output offers something new / something that can | analyze what a piano tuning maestro can hear and make it | accessible to a mid-tier tuner. | jansommer wrote: | Sounds like you know a thing or two about pitch detection... | I've been working on a C implementation of YIN and PYIN (a real | GPL minefield for someone wanting to provide the end result as | MIT/public domain!), and am wondering if it's a good choice for | real time, cpu-bound speech pitch detection, or if there's | better ways. May I ask what your thoughts are on this? | ronsor wrote: | Have you also considered implementing the Nebula[1] | algorithm? | | [1] https://github.com/Sleepwalking/nebula | jansommer wrote: | I need non-GPL libraries as a reference. The problem with | YIN and especially PYIN is that the MIT-code I've found | sometimes looks a bit too similar to earlier code in GPL. | Rewriting that into the same but in different code is | fairly hard. Here I'm assuming that translating eg. GPL | Python or C++ into C would mean the license is retained | ska wrote: | Can you not just write it from the paper(s)? Or is that | more effort than value to you? | | > that translating eg. GPL Python or C++ into C would | mean the license is retained | | It depends a bit on what exactly "translating" means but | you could easily be a derivative work. | | Honestly in that situation I wouldn't even look at the | code. You might use in to test equivalent behavior after | you have your own implementation, but only in a gross | sense. | jansommer wrote: | I think I have to look at the code when using other | people's MIT licensed code... If they have used something | that's GPL or used someone else's code that turns out to | be GPL, then it becomes my problem when translating it. | And I'm not smart enough to just follow a paper | ska wrote: | > And I'm not smart enough to just follow a paper | | Don't sell yourself short. This is the sort of thing that | is only straightforward if you have the right background. | sevagh wrote: | I have some code here if it interests you: | https://github.com/sevagh/pitch-detection | | My favorite is the McLeod Pitch Method/MPM. Runs fast | enough for realtime purposes in a WASM example too: | https://github.com/sevagh/pitchlite | jansommer wrote: | Ha! I've translated your YIN code actually! Your | autocorrelation is pretty cool - GPL versions all use an | additional FFT. Have been struggling with your PYIN | implementation because the beta distribution is copied | from the GPL PYIN source, and the paper just references | its source code for that part, and as you also found out, | it's not a real beta distribution. I asked one of the | PYIN authors (Dixon) if he were willing to change the | license and he forwarded my mail a week ago - haven't | heard back. Then there's the absolute_threshold function | that is the same as in the PYIN source where it says | "using Jorgen Six'es loop construct". This "loop | construct" doesn't have a license, because he doesn't | answer the issues about that in his TarsosDSP library, | and I'm not sure if I should bother him about a few lines | of code. I'm assuming it's a coincidence and that's just | a normal way to find the absolute threshold. I really | don't want to point fingers here, I'm being paranoid | because I try to make sure I don't publish something that | can put people in trouble... | | So I have been staring at your code for many hours, and | the YIN-implementation works well. The PYIN on the other | hand.. well I necro posted a while ago in one of your | pull requests I think ;) | xavriley wrote: | It sounds like you've found it already but th original | pYin implementation is in the VAMP plugin. Simon Dixon is | my PhD supervisor but he's quite busy. Feel free to email | me questions in my the meantime. j.x.riley@ the same | university as Simon. There's also a Python implementation | in the librosa library which might have a better license | for your purposes. | ks2048 wrote: | That's interesting. Can you point to one of these piano tuning | apps that are $100+? | CMLab wrote: | Based on our current tests, our algorithm shows significantly | higher accuracy and robustness compared to traditional digital | signal algorithms such as PEF, NCF, YIN, HPS, etc. Our team is | working diligently, and we will release benchmark test data and | results in the near future. | im3w1l wrote: | That's pretty nice. Do you have any idea how it does it? | rrherr wrote: | How does the accuracy of this compare to CREPE? | | https://github.com/marl/crepe | | https://github.com/maxrmorrison/torchcrepe | | Does anyone know what the current state of the art is, within the | Music Information Retrieval community? | CMLab wrote: | CREPE generally has high latency and error rates in instrument | pitch recognition, especially for guitar instruments. Our team | will release benchmark test data and results later. | rrherr wrote: | Thanks. I'd love to try TuneNN! Are you releasing a | pretrained model? How do I run it on a wav file? | xavriley wrote: | High latency - agreed but it depends on whether a GPU is | available or not. If it is then theoretically CREPE could be | real-time. The error rates for pitch recognition are still | quite good though for the full CREPE model. I'm interested to | see the data on this claim. | ranting-moth wrote: | To the dev: the tuner gives me an incredibly high error window | with the following message. It doesn't prompt to access the mic | (I think that's related). Ubuntu/KDE/Firefox: | | An error occurred running the Unity content on this page. See | your browser JavaScript console for more info. The error was: | TypeError: 'microphone' (value of 'name' member of | PermissionDescriptor) is not a valid value for enumeration | PermissionName. checkPermission@https://aifasttune.com/public/web | /microphone/microphone.js:3... _Microphone_checkPermission@https: | //aifasttune.com/public/web/Build/web.framework.js:10:... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | invoke_iiii@https://aifasttune.com/public/web/Build/web.framework | .js:10:... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi... | unityFramework/Module._SendMessageString@https://aifasttune.com/p | ublic/web/Build/web.framework.js:10:... ccall@https://aifasttune. | com/public/web/Build/web.framework.js:10:... SendMessage@https:// | aifasttune.com/public/web/Build/web.framework.js:10:... SendMessa | ge@https://aifasttune.com/public/web/Build/web.loader.js:1:3343 l | oadURL@https://aifasttune.com/public/web/game/fastGameController. | js... i@https://aifasttune.com/assets/index-64322640.js:1:777 | setup/<@https://aifasttune.com/assets/index-64322640.js:1:611 | CMLab wrote: | Thank you for providing error feedback. We will work hard to | address it. Currently, the model-related data is relatively | large, which may be related to network speed. | uhoh-itsmaciek wrote: | I got the same error on Ubuntu/GNOME/Firefox. On Chrome, I | don't get an error and I'm correctly prompted for microphone | access, but if I grant permission, it does not seem to pick | anything up (I've used my mic successfully with other web | apps). | joonatan wrote: | Could someone fill me in why would machine learning be necessary | for pitch detection? Isn't it something that could just be solved | with FFT or it's a much more complicated task? | ks2048 wrote: | Pitch is a *subjective* property, inherently tied to the | complex processing humans use to perceive sounds. "Simple" | physical measures like fundamental frequency of a periodic | signal are very closely related, but for real-world audio | (aren't really periodic), the relationship is more complicated. | chrisshroba wrote: | Could you elaborate a bit more? It seems to me like the note | being played would always correspond to the fundamental | frequency observed. When is this not the case? Maybe as the | note rings out, the fundamental frequency and first few | overtones lose power, and all that's still audible are the | higher overtones? | gexaha wrote: | That's actually not true, perceived pitch can be different | from fundamental frequency, because of psychoacoustics. E. | g. you can have "missing fundamental" - | https://en.wikipedia.org/wiki/Missing_fundamental - or | other effects like "sum and difference tones", which are | quite popular in spectralism / spectral music | xavriley wrote: | Simple techniques like autocorrelation can still recover | a missing fundamental. To answer the GP post, using | neural networks for this task is overkill for simple, | clean signals but it can be desirable if you need a) | extremely high accuracy or b) robust results when there | are signal degradations like background noise | isoprophlex wrote: | There is a nice little rabbit hole to go in to: | psychoacoustics of church bells. | | https://www.hibberts.co.uk/what-note-do-we-hear-when-a- | bell-... | | _Almost all musical instruments (such as pianos, organs, | orchestral instruments and the human voice) have sounds | that contain a range of frequencies f, 2f, 3f, 4f and so on | where f is the lowest frequency in the sound. The pitch or | note we asssign to the sound corresponds to the frequency | f. Frequencies with this regular arrangement are called | harmonic. The frequencies in the sound of bells, on the | other hand, are not harmonic, and the pitch we assign to | the sound of a bell is roughly an octave below the fifth | partial up ordered by frequency. This partial is called the | nominal, because it provides the note name of the bell. | There often isn't a frequency in the bell's sound | corresponding to the pitch we hear._ | bravura wrote: | What's the license? | | What are your thoughts on PESTO which learns pitch-prediction | very well with a small network, and uses a self-supervised | objective? | | https://arxiv.org/abs/2309.02265 | | https://github.com/SonyCSLParis/pesto | filterfiber wrote: | Does anyone know where I should look if I want to detect specific | sounds? Like a smoke alarm, food bowl dispenser (its very | distinct), cat meowing, 3d printer collision, that sort of thing? | mistercheph wrote: | You would learn how to do this in the first & second chapters | of the fast.ai course. | squidsoup wrote: | It might be worth pointing out that the banjo model is for a four | string banjo, given a five string banjo is the more common | instrument. ___________________________________________________________________ (page generated 2023-12-19 23:00 UTC)