[HN Gopher] Show HN: YoHa - A practical hand tracking engine
       ___________________________________________________________________
        
       Show HN: YoHa - A practical hand tracking engine
        
       Author : b-3-n
       Score  : 180 points
       Date   : 2021-10-11 07:41 UTC (15 hours ago)
        
 (HTM) web link (handtracking.io)
 (TXT) w3m dump (handtracking.io)
        
       | rglover wrote:
       | Great demo.
        
         | b-3-n wrote:
         | Thank you for your feedback.
        
       | programmarchy wrote:
       | Was wondering how easy it'd be to port to native mobile, so went
       | looking for the source code, but doesn't appear to actually be
       | open source. The meat is distributed as binary (WASM for
       | "backend" code and a .bin for model weights).
       | 
       | Aside from being a cool hand tracker, it's a very clever way to
       | distribute closed source JavaScript packages.
        
         | b-3-n wrote:
         | Thank you for the feedback. You are right that the project is
         | not open source right now. It's "only" MIT licensed. That's why
         | I also don't advertise it as open source (if you see the word
         | open source somewhere it would be a mistake on my end, feel
         | free to tell me if you see it somewhere). I wanted to start out
         | from just an API contract so that it is easier to manage and
         | get started. In general I have no problem open sourcing the JS
         | part. But first there is some refactoring to do so it is easier
         | to maintain upon open sourcing. Stay tuned!
         | 
         | As a side note: The wasm files are actually from the inference
         | engine (tfjs).
         | 
         | Please let me know if you have any more questions in that
         | regard.
        
         | boxfire wrote:
         | This architecture was also used in the link referenced when
         | bringing up alternative implementations:
         | 
         | https://github.com/google/mediapipe/issues/877#issuecomment-...
        
       | itake wrote:
       | I think these tools are super interesting, but I tools like this
       | marginalize users with non-standard number of limbs or fingers.
        
         | colordrops wrote:
         | What do you suggest be done about it?
        
           | itake wrote:
           | I'm fine with using them, as long as alternatives are
           | available for people with disabilities are able to
           | participate as well.
           | 
           | Imagine if your bank started using these to access your
           | account and suddenly disabled customers could no longer use
           | their adaptive input devices to interact with their account.
        
         | rpmisms wrote:
         | So does the real world. Things are hard to do with
         | disabilities. That's what the word means. This has great
         | potential, and it's not worth shutting down because some people
         | aren't able to use it.
         | 
         | I can also see this being very helpful for people who have
         | cerebral palsy, for example. Larger movements are easier, this
         | might help someone use the web more easily.
        
           | itake wrote:
           | What if a bank used this for authentication and disable
           | people can't use their custom interface devices? Does that
           | mean that disabled people shouldn't access to their bank
           | accounts?
           | 
           | Maybe if this was the input device that interacts with the
           | standard web, then there is potential here, but it would be
           | unfortunate if a company used this as a primary means of
           | input.
        
             | mkl wrote:
             | That's the bank's mistake, not this library's.
        
         | pantulis wrote:
         | This is a very valid point, but as a counter argument the
         | technique implemented here could be adapted to help users with
         | other needs like say, a browser extension that can help you
         | navigate back and forward with the blink of an eye.
        
           | itake wrote:
           | This all gets complicated, because not everyone has 2 eyes
           | :-/.
           | 
           | You end up with complicated systems trying to cover all of
           | the edge cases.
        
       | [deleted]
        
         | [deleted]
        
       | jakearmitage wrote:
       | I wish there was a nice open-source model for tracking hands and
       | arms with multiple viewpoints (multiple cameras), similar to
       | commercial software like this: https://www.ipisoft.com/
        
       | borplk wrote:
       | Very impressive.
       | 
       | I want something like this so I can bind hand gestures to
       | commands.
       | 
       | For example scroll down on a page by a hand gesture.
        
         | b-3-n wrote:
         | One can build this pretty easily for a website that you are
         | hosting with the existing API (https://github.com/handtracking-
         | io/yoha/tree/master/docs).
         | 
         | However, you likely want this functionality on any website that
         | you are visiting for which you probably need to build a browser
         | extension. I haven't tried incorporating YoHa into a browser
         | extension but if somebody were to try I'd be happy to help.
        
       | layer8 wrote:
       | What would be nice is a version that can be used to paint on the
       | screen with your fingers, such that the lines are visible on a
       | remotely shared screen. The use-case is marking up/highlighting
       | on a normal desktop monitor (i.e. non-touch) while screen-
       | sharing, which is awkward using a mouse or touchpad (think
       | circling stuff in source code and documents, drawing arrows
       | etc.). That would mean (a) a camera from behind (facing the
       | screen), so that the fingers can touch (or almost touch) the
       | screen (i.e. be co-located to the screen contents you want to
       | markup), and (b) native integration, so that the painting is done
       | on a transparent always-on-top OS window (so that it's picked up
       | by the screen-sharing software); or just as a native pointing
       | device, since such on-screen painting/diagramming software
       | already exists.
        
       | adnanc wrote:
       | Great idea which is brilliantly executed.
       | 
       | So many educational uses, well done.
        
         | b-3-n wrote:
         | Thank you for the feedback.
        
       | tomcooks wrote:
       | This is a GREAT website, I can understand what it does with zero
       | clicks, zero scrolls.
       | 
       | Really great, congratulations, I hope that I can find a way to
       | apply this lesson to my SaaS.
        
         | SV_BubbleTime wrote:
         | Agreed, but also nature of the beast. It's really easy to
         | explain hand tracking software in a single media element. It's
         | a lot harder to explain some crypto AEAD encapsulation format
         | the same way.
         | 
         | I assume YoHa means Your Hands... I don't think I could have
         | resisted OhHi for hand tracking.
        
       | brundolf wrote:
       | Bit of feedback: the home page is pretty sparse. The video is
       | great, but it wasn't obvious how to find the repo or where to get
       | the package (or even what language it can be used with). I had to
       | open the Demo, wait for it to load, and then click the Github
       | link there, and then the readme told me it was available on NPM.
       | 
       | Otherwise looks pretty impressive! I've been looking for
       | something like this and I may give it a whirl
        
         | b-3-n wrote:
         | Thank you for the feedback. You are right, the home page should
         | probably be enriched with more information and maybe I can make
         | the information you were looking for stand out better. As a
         | side note: There is a link to GitHub in the footer. The
         | language ("TypeScript API") is also mentioned in the body of
         | the page. But I see that these two can quickly go unnoticed.
        
       | tomcooks wrote:
       | BTW this would be great for spaced repetition foreign character
       | learning (Chinese, Arabic, Japanese, Korean, etc.): if the drawn
       | figure is similar enough to the character the student is learning
       | mark it as studied.
       | 
       | Congrats again
        
         | b-3-n wrote:
         | Thank you for your feedback and for sharing this potential use
         | case. I think it is a very creative idea.
        
       | eminence32 wrote:
       | The demo doesn't seem to work on my chromebook. Maybe it's too
       | underpowered?
       | 
       | Web page doesn't say anything after `Warming up...` and the
       | latest message in the browser console is:
       | Setting up wasm backend.
       | 
       | I expected to see a message from my browser along the lines of
       | "Do you want to let this site use your camera", but I saw no such
       | message.
        
       | karxxm wrote:
       | Would you provide the related paper to this approach?
        
         | b-3-n wrote:
         | In contrast to similar works there is no dedicated paper that
         | presents e.g. the neural network or the training procedure. Of
         | course ideas from many papers influenced this work and I can't
         | list them all here. Maybe it helps that the backbone of the
         | network is very similar to MobileNetV2
         | (https://arxiv.org/abs/1801.04381). Let me know if you have any
         | more questions in that regard.
        
           | karxxm wrote:
           | Thanks for your reply! I just thought that SIGCHI is around
           | the corner and it will be presented there! Awesome work!
        
       | gitgud wrote:
       | The demo really sells it here [1]. It's amazingly intuitive and
       | easy to use, it should be a part of video-conferencing software.
       | 
       | [1] https://handtracking.io/draw_demo/
        
         | Graffur wrote:
         | It's like an initial beta of the software - it's not production
         | ready. I can't imagine this adding value to a meeting _yet_.
         | Seems promising though.
        
         | b-3-n wrote:
         | Thank you for the feedback. Such an integration would be nice
         | indeed.
        
       | lost-found wrote:
       | Demo keeps crashing on iOS.
        
       | iainctduncan wrote:
       | Hi, I'm not sure if you've looked into this or not, but another
       | area that is interested in this sort of thing and might be very
       | excited is musical gesture recognition.
        
         | hondadriver wrote:
         | Also look at leap motion.
         | https://www.ultraleap.com/product/leap-motion-controller/ (tip:
         | mouser has them in stock and usually the best price) with
         | midipaw http://www.midipaw.com/ (free)
         | 
         | Latency is very low which is very important for this use case.
         | Look on YouTube for demos.
        
         | b-3-n wrote:
         | Hey, I believe there are multiple things you could have meant.
         | From the top of my head one thing that might be interesting
         | would be an application that allows conductors to conduct a
         | virtual orchestra. But there are other possibilities in this
         | space too I'm sure! If you had something else in mind feel free
         | to share.
         | 
         | I have not explored this space much so far as my focus is
         | rather to build the infrastructure that enables such
         | applications rather than building the applications myself.
        
       | tjchear wrote:
       | This reminds me of TAFFI [0], a pinching gesture recognition
       | algorithm that is surprisingly easy to implement with classical
       | computer vision techniques.
       | 
       | [0] https://www.microsoft.com/en-
       | us/research/publication/robust-...
        
       | phailhaus wrote:
       | An "undo" gesture seems necessary, it was a bit too easy to
       | accidentally wipe the screen. Aside from that, this is fantastic!
       | Love to see what WASM is enabling these days on the web.
        
         | b-3-n wrote:
         | Thank you for the feedback. Indeed such a functionality would
         | be nice. One could solve this via another hand pose or in some
         | way also with the existing hand poses. E.g. make a fist for say
         | 2 seconds to clear the whole screen. Anything shorter will just
         | issue an "undo".
         | 
         | YoHa uses tfjs.js which provides several backends for
         | computation. One indeed uses WASM, the other one is WebGL
         | based. The latter one is usually the more powerful one.
        
       | smoyer wrote:
       | I've been working on a couple of chording keyboard designs and
       | was thinking I might be able to create a virtual keyboard using
       | this library. It would be nice to also be able to recognize the
       | hand from the back. A keyboard would also obviously be necessary
       | to track two hands at a time.
       | 
       | How does the application deal with different skin-tones?
        
         | b-3-n wrote:
         | That's an interesting idea. I have not tried to build something
         | similar but a humble word of caution that I want to put out is
         | that no matter what kind of ML you use the mechanical version
         | of the instrument will always be more precise (you likely are
         | aware of it, just want to make sure). However, you might be
         | able to approximate precision of the mechanical version.
         | 
         | Two hand support would be nice and I would love to add it in
         | the future.
         | 
         | The engine should work well with different skin tones as the
         | training data was collected from a set of many and diverse
         | individuals. The training data will also grow further over time
         | making it more and more robust.
        
       | inetsee wrote:
       | My first question is whether this has the capability of being
       | adapted to interpret/translate American Sign Language (ASL)?
        
         | b-3-n wrote:
         | Thank you for this inspiring question. For interpreting sign
         | language you need multi-hand support which YoHa is currently
         | lacking. Apart from that you likely also need to account for
         | the temporal dimension which YoHa also does not do right now.
         | If those things were implemented I'm confident that it would
         | produce meaningful results.
        
           | rafamct wrote:
           | It's worth noting that movements of the mouth are extremely
           | important in ASL (and other sign languages) and so this
           | probably isn't as useful as it might seem at first.
        
       ___________________________________________________________________
       (page generated 2021-10-11 23:00 UTC)