[HN Gopher] What we know about the Apple Neural Engine
       ___________________________________________________________________
        
       What we know about the Apple Neural Engine
        
       Author : SerCe
       Score  : 265 points
       Date   : 2023-03-25 11:04 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jeffrallen wrote:
       | Ane means donkey in French.
       | 
       | Just sayin'.
        
         | marginalia_nu wrote:
         | Alright, then, Apple Semantic System.
        
           | eastbound wrote:
           | No those initials were already taken by Atlassian Software
           | Systems. They seem to have lodged the paperwork with that
           | name in 2002, and to have dropped it later on (they rather
           | went with TEAM when going IPO in 2015), but back in 2010 when
           | I applied, there was a book (collection of news articles) in
           | the waiting room for candidates titled "Atlassian Software
           | Systems".
           | 
           | Great guys.
           | 
           | https://youtu.be/VfyUbuFoiBU
        
             | marginalia_nu wrote:
             | Can't make this shit up.
        
             | dylan604 wrote:
             | Don't forget the Advance SubStation Alpha subtitle format
        
           | mrweasel wrote:
           | Which shortened also means donkey, brilliant.
        
       | ls612 wrote:
       | Does anyone know if the neural engine on the new M1/M2 Max is
       | directly hooked up to the unified memory the way the GPU is?
        
         | wmf wrote:
         | Define directly I guess.
        
           | ls612 wrote:
           | My understanding is that the CPU and GPU both have DMA to the
           | memory at some incredible speed since it's all on the same
           | chip. Does the ANE have that same DMA speed and latency?
        
             | jamiek88 wrote:
             | I believe so as it's used by adobe amongst others, this was
             | from a convo with an adobe engineer gushing about the
             | UMA/DMA and what an improvement it was from the fans
             | whirring jet engine end of Intel era.
             | 
             | I can't find any documentation about it though just
             | everyone working under that assumption.
        
       | anentropic wrote:
       | Do we think Apple are going to provide more info and maybe a
       | public API over time?
       | 
       | Or they are keeping it obscure for commercial reasons?
       | 
       | Or just not very competent/don't care?
       | 
       | Seems weird having these amazing chips and only blunt tools
        
         | my123 wrote:
         | CoreML.
         | 
         | Directly exposing the ANE wouldn't make much sense, as it's an
         | IP block that changes between generations in incompatible ways.
        
           | brookst wrote:
           | This is the answer. CoreML gives you an abstraction over
           | different generations and sizes of underlying NPU.
           | 
           | You might not _want_ the abstraction, but love it or hate it,
           | that's kind of the Apple way.
           | 
           | It will be very interesting to see what their next chips look
           | like since we're getting to the point where HW designs will
           | reflect the rise of the, uh, transformers.
        
       | sebzim4500 wrote:
       | Can this really be everything publically known about the ANE?
       | Sounds hard to believe, I would have thought someone would have
       | reverse engineered _something_ about it by now.
        
         | SomeHacker44 wrote:
         | See other commenter above about GeoHot's analysis which is much
         | more in depth.
        
         | detrites wrote:
         | My question too. This semi-answer on the page seems to
         | contradict itself (source: https://github.com/hollance/neural-
         | engine/blob/master/docs/p... ):
         | 
         | "> Can I program the ANE directly?
         | 
         | Unfortunately not. You can only use the Neural Engine through
         | Core ML at the moment.
         | 
         | There currently is no public framework for programming the ANE.
         | There are several private, undocumented frameworks but
         | obviously we cannot use them as Apple rejects apps that use
         | private frameworks.
         | 
         | (Perhaps in the future Apple will provide a public version of
         | AppleNeuralEngine.framework.)"
         | 
         | The last part links to this bunch of headers:
         | 
         | https://github.com/nst/iOS-Runtime-Headers/tree/master/Priva...
         | 
         | So might it be more accurate to say you can program it
         | directly, but won't end up with something that can be
         | distributed on the app store?
        
           | saagarjha wrote:
           | Correct. (It is also unlikely that Apple exposes the Neural
           | Engine directly.)
        
       | djoldman wrote:
       | geohot's findings:
       | 
       | https://github.com/geohot/tinygrad/tree/master/accel/ane
        
         | ljlolel wrote:
         | https://news.ycombinator.com/item?id=35302833
        
         | detrites wrote:
         | Ok, this is much more like what I expected from the OP.
         | 
         | Anyone disappointed, here be full details on everything.
        
       | bitL wrote:
       | It's really terrible that Apple markets this as the next big
       | thing but forgets to include detailed documentation so people
       | have to experiment and figure out what works...
        
         | lonelyasacloud wrote:
         | Part Apple's docs haven't been great for a while, part that's
         | just how they roll, and part trying (like most everyone) to
         | figure out what their strategy is going to be in a post GPT4
         | world [0].
         | 
         | [0] Persist with their own models running locally, how much to
         | integrate with rest of the OS and maintain privacy moral
         | ground, that sort of thing.
        
         | barkingcat wrote:
         | Apple didn't "forget" they never want to ever release apple
         | proprietary docs. It's their competitiveness advantage/ moat.
        
         | saagarjha wrote:
         | People don't have to do anything. You use CoreML to program it.
        
         | cynicalsecurity wrote:
         | Proprietary software, dude. It really sucks.
        
       | m3kw9 wrote:
       | Can LLM run in it?
        
         | egman_ekki wrote:
         | maybe https://github.com/apple/ml-ane-transformers
        
       | enzomanico wrote:
       | Yes of cour
        
       | simonw wrote:
       | So my phone and my laptop both have the capability to perform 15
       | trillion operations per second, just in the neural engine?
       | 
       | What kind of things are taking advantage of this right now? It's
       | gotta be more than just Face ID right?
       | 
       | What's my laptop likely to be doing with that?
        
         | k_bx wrote:
         | They're putting it everywhere they can. From Notes to pressing
         | pause on Video in QuickTime or Safari and copying text from a
         | frame instantly.
        
           | [deleted]
        
         | gedy wrote:
         | I can imagine some thing like Siri running on device much more
         | effectively against local content. The cynic in me doesn't want
         | to hope too much for cloudless services like this, but one can
         | hope.
        
           | sroussey wrote:
           | This is true since iOS 15 moved Siri on device.
        
             | sroussey wrote:
             | https://www.engadget.com/ios-15-siri-on-device-app-
             | privacy-1...
        
         | IIAOPSW wrote:
         | Siri wasn't a product. She was an emergent feature they
         | couldn't extinguish.
        
         | conradev wrote:
         | It's used for a variety of things:
         | 
         | - Biometrics (Face ID and Touch ID)
         | 
         | - Image analysis (face matching, aesthetic evaluations, etc)
         | 
         | - Text to speech and speech to text (smaller models on device,
         | used for privacy/latency/reliability)
         | 
         | - Small ad-hoc models like Raise to Speak on Apple Watch, the
         | Hey Siri detector
         | (https://machinelearning.apple.com/research/hey-siri)
         | 
         | These things have been in phones for 5 years now and have been
         | used from day one
        
           | simonw wrote:
           | Right, but do any of those things really need 15 trillion
           | operations for second? Have they been getting noticeably
           | better with upgraded phone models?
        
             | jamiek88 wrote:
             | Yes definitely.
             | 
             | I could only find a blurry YouTube video of the instruction
             | manual for an old old heater in my house.
             | 
             | I paused the video on the bit I needed the guy had zoomed
             | into and was able to copy and paste the text that I could
             | barely read into a notes doc.
             | 
             | There's no one splashy thing just lots of little quality of
             | life improvements.
        
             | burnished wrote:
             | I got one recently and generally think the phone is
             | garbage, but the OCR built into pictures is really
             | something else. I took a photo of a label for a barcode
             | when I couldnt see it myself but could get my hand nearby,
             | it was at an odd angle, but when I pressed my finger to the
             | text I was interested in the phone captured it immediately,
             | highlighted it, and I copied it nice as you please.
        
             | blululu wrote:
             | No but the first party users should not consume all the
             | compute on the chip. The bigger the margin the better for
             | the device. The other aspect of this is speed and power
             | consumption (battery life is a top 3 phone feature across
             | pretty much all consumers).
        
         | secretsatan wrote:
         | Arkit makes use of it on the phone, there's plane detection and
         | classification, image and object detection, segmentation for
         | people occlusion, probably more behind the scenes.
         | 
         | I find it a little frustrating we aren't using the built in
         | capabilities of iphones more in our company, i still kinda
         | think apple tech is kind of a pariah in some circles, so we
         | have to run with stuff that runs on cloud that costs us money
         | over, heaven forbid something you could run on an iphone
        
         | kmeisthax wrote:
         | There's an app called Draw Things, for iOS/iPadOS/macOS/etcOS,
         | that uses the ANE to run Stable Diffusion on your
         | phone/tablet/laptop.
        
         | fauigerzigerk wrote:
         | I don't know for sure, but things like text recognition (Live
         | Text) or object recognition in Photos (Visual Look Up) are
         | obvious candidates.
         | 
         | I think neural engine is absolutely key to Apple's strategy.
         | They want people to buy expensive devices and they don't want
         | to process user data on their servers.
         | 
         | Users get privacy. Apple gets money. It's a pretty coherent
         | business model.
        
           | jjoonathan wrote:
           | Privacy isn't the only benefit of local compute, users also
           | get colossal bandwidth, tiny latency, and high reliability.
        
             | crazygringo wrote:
             | On the other hand, it kills your battery.
             | 
             | Back when dictation was done in the cloud, I could dictate
             | all day on my iPhone no problem.
             | 
             | Now that it's on-device it kills my battery in a couple of
             | hours.
             | 
             | The latency is absolutely improved, and continuous
             | dictation (not stopping every 30s) is a godsend.
             | 
             | But it does absolutely destroy your battery life.
        
               | bibanez wrote:
               | Don't worry too much because there is Moore's law to the
               | rescue. NPUs benefit from new processes
        
               | mcculley wrote:
               | Moore's Law makes it a good long term strategy for Apple.
               | The GP is complaining about his battery life today.
        
               | lucideer wrote:
               | I hope it's not disrespectful to point this out, less
               | than 24 hours after his passing, but I don't think Gordon
               | would object to my pointing out that Moore's Law has a
               | finite length. Some have argued it expired up to 13 years
               | ago; Moore himself predicted another 2 years or so.
        
               | nwienert wrote:
               | I built an always on local OCR system that used ML on
               | CPU/GPU a few years ago and I can say with confidence it
               | doesn't use much. We literally scanned your entire screen
               | every two seconds and it used less than 1% in total, and
               | this was before CoreML which is far more efficient. I
               | think it's FUD that it is that significant.
        
             | fauigerzigerk wrote:
             | Agreed.
             | 
             | On the downside, we have to acknowledge that it is hugely
             | inefficient for everyone to own expensive hardware that has
             | to sit idle most of the time because it would otherwise
             | drain the battery.
             | 
             | Where low latency is not an absolute necessity, the
             | economic pull of the cloud will be tremendous, especially
             | if mobile networks become ubiquitous and fast.
        
               | vinay_ys wrote:
               | That's a weak argument. Lots of hardware sits idle in the
               | cloud as well. And on your phone its not expensive. In
               | fact, the $/tflop is cheaper on phone than in the cloud -
               | cloud has to deal with all kinds of complexity that you
               | assume away in your local single-tenant phone context.
        
               | fauigerzigerk wrote:
               | I wouldn't be so sure. A quick web search brings up
               | average server utilisation numbers for large-scale cloud
               | providers between 45% and 65%. That's probably an order
               | of magnitude or two higher than what you could do on a
               | mobile device without absolutely annihilating the
               | battery.
        
           | flutas wrote:
           | > They want people to buy expensive devices and they don't
           | want to process user data on their servers.
           | 
           | > Users get privacy. Apple gets money.
           | 
           | Apple also gets users to subsidize the cost of compute
           | indefinitely (by buying the expensive phone), rather than
           | using their servers.
        
             | blululu wrote:
             | It's not a subsidy. It's a pricing structure for a
             | commercial transaction. Fundamentally a business can not
             | just give out free compute. In the long run the user of
             | computation needs to pay for it. It's a question of whether
             | people feel more satisfied paying for it in a lump sum
             | bundled with a device or through a subscription plan on the
             | cloud. For frequently, on demand, low latency applications
             | I would suspect that people will always be happier running
             | the computations locally.
        
             | saagarjha wrote:
             | Apple also runs an OS on that device, so they can't just
             | offload infinite computation for it: it would use too much
             | battery.
        
         | iamgopal wrote:
         | 15 trillion operation per second ? Of what kind ? Addition ?
         | Isn't that mind blowing ?
        
           | selectodude wrote:
           | Matrix multiplication
        
             | amelius wrote:
             | Of what size??
        
             | sebzim4500 wrote:
             | I know you probably didn't mean this, but in case anyone is
             | confused ANE is not doing 15 trillion matrix
             | multiplications per second. It is doing 15 trillion scalar
             | operations in order to multiply a much smaller number of
             | matrices.
        
         | blululu wrote:
         | To my knowledge this is mostly used by internal tools, though a
         | number of common 3rd party apis (qr code scan) use hw
         | acceleration under the hood. Internally there is a ton of ML
         | running on the device. The most obvious is touch screen and
         | inputs and the camera. 3rd party developers have acres to this
         | via CoreML, but unless latency is critical it is usually easier
         | to develop and run ml on the cloud. For camera apps using ml,
         | this chip is going to be used either explicitly or implicitly.
        
           | simonw wrote:
           | Oh the touch screen! That's fascinating, is that definitely
           | running stuff on the neural engine?
        
             | blululu wrote:
             | If you think about it a capacitive touch sensor provides a
             | noisy Grayscale image and the goal is to detect and
             | classify blobs as touch gestures as quickly and accurately
             | as possible. Since it is running at all times and latency
             | really burns the UX. Consequences this has always been done
             | on a HW accelerator.
        
         | ManuelKiessling wrote:
         | Running Microsoft Teams. Barely.
        
         | waboremo wrote:
         | Scene analysis in photos, image captions, and machine
         | translations are also done using ANE. CoreML also utilizes it
         | when possible.
        
       | mmaunder wrote:
       | Anyone done any work on using a model for transcription on the
       | local device using the ANE? I've heard it kills the battery.
       | Having to transcribe voice in the cloud is a serious impediment
       | to end to end encryption for certain applications.
        
         | intalentive wrote:
         | This is close: https://github.com/ggerganov/whisper.cpp
        
       | thedonkeycometh wrote:
       | [dead]
        
       | ah- wrote:
       | There's also basic ANE support for Asahi now:
       | 
       | https://github.com/eiln/ane
       | 
       | https://github.com/eiln/anecc
       | 
       | https://github.com/AsahiLinux/m1n1/pull/296/files
        
         | rowanG077 wrote:
         | That's misleading. It's much more apt to say it's being worked
         | on. This is not available in any Asahi release at this time.
        
       ___________________________________________________________________
       (page generated 2023-03-25 23:00 UTC)