[HN Gopher] Show HN: YoHa - A practical hand tracking engine ___________________________________________________________________ Show HN: YoHa - A practical hand tracking engine Author : b-3-n Score : 180 points Date : 2021-10-11 07:41 UTC (15 hours ago) (HTM) web link (handtracking.io) (TXT) w3m dump (handtracking.io) | rglover wrote: | Great demo. | b-3-n wrote: | Thank you for your feedback. | programmarchy wrote: | Was wondering how easy it'd be to port to native mobile, so went | looking for the source code, but doesn't appear to actually be | open source. The meat is distributed as binary (WASM for | "backend" code and a .bin for model weights). | | Aside from being a cool hand tracker, it's a very clever way to | distribute closed source JavaScript packages. | b-3-n wrote: | Thank you for the feedback. You are right that the project is | not open source right now. It's "only" MIT licensed. That's why | I also don't advertise it as open source (if you see the word | open source somewhere it would be a mistake on my end, feel | free to tell me if you see it somewhere). I wanted to start out | from just an API contract so that it is easier to manage and | get started. In general I have no problem open sourcing the JS | part. But first there is some refactoring to do so it is easier | to maintain upon open sourcing. Stay tuned! | | As a side note: The wasm files are actually from the inference | engine (tfjs). | | Please let me know if you have any more questions in that | regard. | boxfire wrote: | This architecture was also used in the link referenced when | bringing up alternative implementations: | | https://github.com/google/mediapipe/issues/877#issuecomment-... | itake wrote: | I think these tools are super interesting, but I tools like this | marginalize users with non-standard number of limbs or fingers. | colordrops wrote: | What do you suggest be done about it? | itake wrote: | I'm fine with using them, as long as alternatives are | available for people with disabilities are able to | participate as well. | | Imagine if your bank started using these to access your | account and suddenly disabled customers could no longer use | their adaptive input devices to interact with their account. | rpmisms wrote: | So does the real world. Things are hard to do with | disabilities. That's what the word means. This has great | potential, and it's not worth shutting down because some people | aren't able to use it. | | I can also see this being very helpful for people who have | cerebral palsy, for example. Larger movements are easier, this | might help someone use the web more easily. | itake wrote: | What if a bank used this for authentication and disable | people can't use their custom interface devices? Does that | mean that disabled people shouldn't access to their bank | accounts? | | Maybe if this was the input device that interacts with the | standard web, then there is potential here, but it would be | unfortunate if a company used this as a primary means of | input. | mkl wrote: | That's the bank's mistake, not this library's. | pantulis wrote: | This is a very valid point, but as a counter argument the | technique implemented here could be adapted to help users with | other needs like say, a browser extension that can help you | navigate back and forward with the blink of an eye. | itake wrote: | This all gets complicated, because not everyone has 2 eyes | :-/. | | You end up with complicated systems trying to cover all of | the edge cases. | [deleted] | [deleted] | jakearmitage wrote: | I wish there was a nice open-source model for tracking hands and | arms with multiple viewpoints (multiple cameras), similar to | commercial software like this: https://www.ipisoft.com/ | borplk wrote: | Very impressive. | | I want something like this so I can bind hand gestures to | commands. | | For example scroll down on a page by a hand gesture. | b-3-n wrote: | One can build this pretty easily for a website that you are | hosting with the existing API (https://github.com/handtracking- | io/yoha/tree/master/docs). | | However, you likely want this functionality on any website that | you are visiting for which you probably need to build a browser | extension. I haven't tried incorporating YoHa into a browser | extension but if somebody were to try I'd be happy to help. | layer8 wrote: | What would be nice is a version that can be used to paint on the | screen with your fingers, such that the lines are visible on a | remotely shared screen. The use-case is marking up/highlighting | on a normal desktop monitor (i.e. non-touch) while screen- | sharing, which is awkward using a mouse or touchpad (think | circling stuff in source code and documents, drawing arrows | etc.). That would mean (a) a camera from behind (facing the | screen), so that the fingers can touch (or almost touch) the | screen (i.e. be co-located to the screen contents you want to | markup), and (b) native integration, so that the painting is done | on a transparent always-on-top OS window (so that it's picked up | by the screen-sharing software); or just as a native pointing | device, since such on-screen painting/diagramming software | already exists. | adnanc wrote: | Great idea which is brilliantly executed. | | So many educational uses, well done. | b-3-n wrote: | Thank you for the feedback. | tomcooks wrote: | This is a GREAT website, I can understand what it does with zero | clicks, zero scrolls. | | Really great, congratulations, I hope that I can find a way to | apply this lesson to my SaaS. | SV_BubbleTime wrote: | Agreed, but also nature of the beast. It's really easy to | explain hand tracking software in a single media element. It's | a lot harder to explain some crypto AEAD encapsulation format | the same way. | | I assume YoHa means Your Hands... I don't think I could have | resisted OhHi for hand tracking. | brundolf wrote: | Bit of feedback: the home page is pretty sparse. The video is | great, but it wasn't obvious how to find the repo or where to get | the package (or even what language it can be used with). I had to | open the Demo, wait for it to load, and then click the Github | link there, and then the readme told me it was available on NPM. | | Otherwise looks pretty impressive! I've been looking for | something like this and I may give it a whirl | b-3-n wrote: | Thank you for the feedback. You are right, the home page should | probably be enriched with more information and maybe I can make | the information you were looking for stand out better. As a | side note: There is a link to GitHub in the footer. The | language ("TypeScript API") is also mentioned in the body of | the page. But I see that these two can quickly go unnoticed. | tomcooks wrote: | BTW this would be great for spaced repetition foreign character | learning (Chinese, Arabic, Japanese, Korean, etc.): if the drawn | figure is similar enough to the character the student is learning | mark it as studied. | | Congrats again | b-3-n wrote: | Thank you for your feedback and for sharing this potential use | case. I think it is a very creative idea. | eminence32 wrote: | The demo doesn't seem to work on my chromebook. Maybe it's too | underpowered? | | Web page doesn't say anything after `Warming up...` and the | latest message in the browser console is: | Setting up wasm backend. | | I expected to see a message from my browser along the lines of | "Do you want to let this site use your camera", but I saw no such | message. | karxxm wrote: | Would you provide the related paper to this approach? | b-3-n wrote: | In contrast to similar works there is no dedicated paper that | presents e.g. the neural network or the training procedure. Of | course ideas from many papers influenced this work and I can't | list them all here. Maybe it helps that the backbone of the | network is very similar to MobileNetV2 | (https://arxiv.org/abs/1801.04381). Let me know if you have any | more questions in that regard. | karxxm wrote: | Thanks for your reply! I just thought that SIGCHI is around | the corner and it will be presented there! Awesome work! | gitgud wrote: | The demo really sells it here [1]. It's amazingly intuitive and | easy to use, it should be a part of video-conferencing software. | | [1] https://handtracking.io/draw_demo/ | Graffur wrote: | It's like an initial beta of the software - it's not production | ready. I can't imagine this adding value to a meeting _yet_. | Seems promising though. | b-3-n wrote: | Thank you for the feedback. Such an integration would be nice | indeed. | lost-found wrote: | Demo keeps crashing on iOS. | iainctduncan wrote: | Hi, I'm not sure if you've looked into this or not, but another | area that is interested in this sort of thing and might be very | excited is musical gesture recognition. | hondadriver wrote: | Also look at leap motion. | https://www.ultraleap.com/product/leap-motion-controller/ (tip: | mouser has them in stock and usually the best price) with | midipaw http://www.midipaw.com/ (free) | | Latency is very low which is very important for this use case. | Look on YouTube for demos. | b-3-n wrote: | Hey, I believe there are multiple things you could have meant. | From the top of my head one thing that might be interesting | would be an application that allows conductors to conduct a | virtual orchestra. But there are other possibilities in this | space too I'm sure! If you had something else in mind feel free | to share. | | I have not explored this space much so far as my focus is | rather to build the infrastructure that enables such | applications rather than building the applications myself. | tjchear wrote: | This reminds me of TAFFI [0], a pinching gesture recognition | algorithm that is surprisingly easy to implement with classical | computer vision techniques. | | [0] https://www.microsoft.com/en- | us/research/publication/robust-... | phailhaus wrote: | An "undo" gesture seems necessary, it was a bit too easy to | accidentally wipe the screen. Aside from that, this is fantastic! | Love to see what WASM is enabling these days on the web. | b-3-n wrote: | Thank you for the feedback. Indeed such a functionality would | be nice. One could solve this via another hand pose or in some | way also with the existing hand poses. E.g. make a fist for say | 2 seconds to clear the whole screen. Anything shorter will just | issue an "undo". | | YoHa uses tfjs.js which provides several backends for | computation. One indeed uses WASM, the other one is WebGL | based. The latter one is usually the more powerful one. | smoyer wrote: | I've been working on a couple of chording keyboard designs and | was thinking I might be able to create a virtual keyboard using | this library. It would be nice to also be able to recognize the | hand from the back. A keyboard would also obviously be necessary | to track two hands at a time. | | How does the application deal with different skin-tones? | b-3-n wrote: | That's an interesting idea. I have not tried to build something | similar but a humble word of caution that I want to put out is | that no matter what kind of ML you use the mechanical version | of the instrument will always be more precise (you likely are | aware of it, just want to make sure). However, you might be | able to approximate precision of the mechanical version. | | Two hand support would be nice and I would love to add it in | the future. | | The engine should work well with different skin tones as the | training data was collected from a set of many and diverse | individuals. The training data will also grow further over time | making it more and more robust. | inetsee wrote: | My first question is whether this has the capability of being | adapted to interpret/translate American Sign Language (ASL)? | b-3-n wrote: | Thank you for this inspiring question. For interpreting sign | language you need multi-hand support which YoHa is currently | lacking. Apart from that you likely also need to account for | the temporal dimension which YoHa also does not do right now. | If those things were implemented I'm confident that it would | produce meaningful results. | rafamct wrote: | It's worth noting that movements of the mouth are extremely | important in ASL (and other sign languages) and so this | probably isn't as useful as it might seem at first. ___________________________________________________________________ (page generated 2021-10-11 23:00 UTC)