hngopher.com

       [HN Gopher] Intel Extension for TensorFlow
       ___________________________________________________________________
        
       Intel Extension for TensorFlow
        
       Author : hochmartinez
       Score  : 75 points
       Date   : 2022-10-28 18:29 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | echelon wrote:
       | I love the hardware Nvidia makes, but for the good of our field
       | and for progress, we need our ML stacks to work with all other
       | vendors. CUDA is the wrong platform to build our future on.
        
       | _aavaa_ wrote:
       | Disappointed by the lack of benchmark comparison in the README
        
       | minimaxir wrote:
       | For context, in other Python ML packages like scikit-learn
       | running the Intel extension on a compatable CPU can result in a
       | _massive_ performance increase.
       | 
       | Although since TensorFlow models should be trained on a GPU
       | unlike sklearn, that's less useful, and there are better tools
       | for CPU inference. (e.g. SavedModels or ONNX)
        
         | benreesman wrote:
         | I don't know much about Intel's toolchain: people say good
         | stuff about TBB and things like that.
         | 
         | Historically doing inference on Intel gear was mostly about
         | whether or not to target AVX2 or AVX512 when building Eigen or
         | whatever. A few years ago the net win was AVX2 because the de-
         | clock and re-clock just killed you.
         | 
         | What's the game these days? Long term I doubt inference will be
         | done on x86, but I think a lot of people still do it.
        
         | make3 wrote:
         | yes, but Intel now has mainstream GPUs that are reasonably (~
         | RTX 3060) performant, which is what I assume this is for
        
           | Roark66 wrote:
           | Are you talking about Intel Arc? I'm yet to see any ML
           | relevant benchmarks on Intel Arc. If you are aware of any
           | please let me know.
        
             | throwaway1851 wrote:
             | Same here. At 16GB of VRAM and only $349, it could fill a
             | really nice slot for DL.
        
       | dweekly wrote:
       | For the curious, Apple has an analogous Tensorflow Metal plugin
       | to allow for Apple Silicon (and AMD GPU) acceleration using the
       | same plugin architecture.
       | 
       | https://developer.apple.com/metal/tensorflow-plugin/
        
         | muxamilian wrote:
         | Which is basically unusably buggy.
         | 
         | For example, tf.sort only sorts up to 16 values and overwrites
         | the rest with -0. Apparently not fixed for over one year:
         | https://developer.apple.com/forums/thread/689299
         | 
         | Also, tf.random always returns the same random numbers:
         | https://developer.apple.com/forums/thread/69705
         | 
         | Although I guess these bugs are not the fault of Tensorflow's
         | plugin architecture but rather Apple's implementation.
        
       | Roark66 wrote:
       | Intel did a great thing for people interested in ML and numeric
       | research by making their MKL library and compiler free and cross
       | platform compatible. Even today on my AMD zen3 Ryzen machine
       | intel's MKL linked numpy and pytorch are in some operations 10*
       | (yes that is really ten times) faster in comparison with the next
       | best alternative (openBlas etc). I was shocked to discover how
       | much of a difference MKL makes for cpu workloads. This is mostly
       | because it makes use of AVX2 cpu extensions which make certain
       | matrix operations a lot faster.
        
         | galangalalgol wrote:
         | Compiling eigen uses avx512 just fine. I don't think it is
         | quite the discriminator it once was. However, I amp happy they
         | made it open and it would be great of oneAPI became a real
         | alternative to cuda for more tasks. I tried getting it to work
         | with flux.jl a while back and ran into some difficulty.
        
         | ysleepy wrote:
         | They crippled the performance on non-Intel CPUs on purpose
         | until recently.
         | 
         | Intel's anti competitive behaviour follows them throughout
         | their history.
        
           | pantalaimon wrote:
           | They crippled their own consumer CPUs by retroactively
           | disabling AVX512
        
           | Roark66 wrote:
           | Yes, this is true, but even back then (during the time they
           | crippled it) it was possible to pretend to run on Intel
           | hardware. I haven't done this, but I read it was possible.
        
             | ysleepy wrote:
             | I just felt your first sentence, without a qualifier, gave
             | too much credit looking at the context.
        
         | chrchang523 wrote:
         | That was my previous experience, but have you tried linking to
         | AMD AOCL recently? I would not expect the performance gap
         | between Intel MKL and AMD AOCL to still be as large as you
         | describe on a Zen 3.
        
           | Scene_Cast2 wrote:
           | Would you know if AOCL is supported by numpy? And are there
           | any benchmarks out there? (Especially interested in Zen4)
        
       ___________________________________________________________________
       (page generated 2022-10-28 23:00 UTC)