[HN Gopher] Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl
       ___________________________________________________________________
        
       Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl
        
       Author : DreamScatter
       Score  : 93 points
       Date   : 2020-06-15 16:15 UTC (6 hours ago)
        
 (HTM) web link (discourse.julialang.org)
 (TXT) w3m dump (discourse.julialang.org)
        
       | thanatropism wrote:
       | When I first heard of Julia in 2016 it was sandwiched between R
       | and Python and thought it had no chance whatsoever of succeeding.
       | 
       | I'm increasingly impressed by Julia. It's a crowded space and
       | they remain relevant. Maybe even best of class for some
       | applications.
        
         | macawfish wrote:
         | Grassmann.jl is the library that made me take Julia seriously.
         | It's an awesome library, and the author really gets one of
         | Julia's most special and well integrated features, multiple
         | dispatch.
         | 
         | Because the code is written in solid, idiomatic Julia, it's
         | interoperable with tons of other libraries. I actually had
         | success using this library with CuArrays.jl without any
         | modifications. So yeah, you can use it to run geometric algebra
         | computations on a GPU for algebras of dimension < 2^5 (in my
         | case, I tried it with CGA3D).
        
         | plafl wrote:
         | The same first impression for me. It's great people with more
         | faith than me have pushed forward and actually I'm still
         | waiting to make the jump. It was very similar with python for
         | scientific computing in the beginning. When I wrote my master
         | thesis in python in 2006 python was completely dwarfed by
         | Matlab. I was more enthusiastic on those days trying new things
         | and also had more time. It is yet to be seen if Julia displaces
         | python but at least it has non zero chance of doing it.
        
           | peatmoss wrote:
           | I've wanted Scheme / Racket to make a showing in places where
           | R and Python have dominated the data science and numerical
           | computing spaces (for those of us not writing C/C++/Fortran).
           | 
           | But, I think Julia is the closest thing to an honest-to-
           | goodness lisp that has a prayer of mainstream adoption. I
           | quite like Julia and think the community has leveraged the
           | meta programming strengths of the language well enough to
           | convince me to relinquish my love of minimalist lisp syntax.
           | 
           | I do wonder if the biggest risk to Julia isn't that it's not
           | extreme / superlative in any dimension. If you want raw
           | performance, Rust seems to be capturing the lion's share of
           | excitement among "new" languages. If you want meta
           | programming, Julia still feels like a compromise compared to
           | Scheme/Racket/Clojure/Common Lisp. If you want libraries,
           | Python is still king. If you want stats libraries, R is still
           | king. Julia is impressively good at all of those things...
           | but it's not good enough to be the most compelling of any of
           | them.
           | 
           | Still, it makes me think that it may be time to dive in head
           | first rather than dabble my toes as I've been doing for the
           | past few years. I guess my main barrier at the moment is
           | compiler / deployment story--it still seems hard to build a
           | deployable binary that fits in, say, a Lambda layer as far as
           | I can tell.
        
             | ddragon wrote:
             | Even if Julia never ends up being the very best at anything
             | particular, Python has shown that being "generally the
             | second best language for everything" has even more value
             | for most people. Leveraging 80% of the flexibility and
             | composability of Lisp, 80% of the "code that looks like
             | pseudo-code" aspect and overall easy to use from Python,
             | 80% of the speed of the fastest static languages and
             | eventually the library and tooling maturity to support all
             | use cases, it can definitely become an even better second
             | best language for everything.
        
             | improbable22 wrote:
             | > it's not extreme / superlative in any dimension. If you
             | want raw performance ... good at all of those things... but
             | 
             | I guess the selling point is that being good at many things
             | means you don't have to keep switching between all these
             | languages, or glueing them together. Others have mentioned
             | the composability benefits of this. But it also has a
             | learning advantage, I think. You can incrementally learn to
             | speed up the crucial piece of something. You can learn just
             | enough metaprogramming to solve some problem you have.
        
         | nextos wrote:
         | It has a lot of potential. You can often write code in Julia
         | that is really performant by just following a few simple rules.
         | Most notably, keeping type stability. Libraries are now
         | catching up. Some are great, some are unique, others are still
         | missing.
         | 
         | I have generated SIMD code that outperforms C and C++, even
         | after some hand optimizations. As a matter of fact, some people
         | are re-writing a BLAS drop-in replacement in pure Julia, and
         | the performance is not going to be too far off from BLAS [1].
         | If you work in scientific computing, you can probably
         | appreciate how amazing that is.
         | 
         | The combination of multiple dispatch, a simple type system and
         | an LLVM backend is wonderful. In the long run, Julia will
         | probably grab lots of Python's and R's marketshare because
         | having a single language instead of two languages (Python +
         | Cython/Pythran/... or R + C++) is a huge advantage.
         | 
         | [1] https://discourse.julialang.org/t/we-can-write-an-
         | optimized-...
        
           | krapht wrote:
           | Maybe Julia will become more popular, but I'd bet on the 800
           | pound gorilla Python getting faster through broader use of
           | Dask, Cython, and Numba. I'll never bet against the network
           | effects of language and library development.
        
             | cbkeller wrote:
             | People often talk about speed as an advantage of Julia
             | (which is true), but the real secret advantage compared to
             | any other language I've worked with is more of a network
             | effect: composability
             | 
             | I didn't appreciate the importance of this (or the degree
             | to which this _just works_ as a result of multiple
             | dispatch), but there 's a good talk about this from
             | JuliaCon 2019 [1] by Stefan Karpinski
             | 
             | [1] https://www.youtube.com/watch?v=kc9HwsxE1OY
        
               | KenoFischer wrote:
               | One aspect that's underappreciated is that composability
               | can actually lead to speed. If it's super easy to compose
               | your algorithmic package with speed-improvement packages
               | (parallelism, hardware accelerators, custom compilers),
               | you're much more likely to be actually able to use them.
        
       | dklend122 wrote:
       | I'd be interested to see a static arrays benchmark run on 1.5 as
       | it can now stack allocate views and other types due to general
       | compiler improvements.
        
         | stabbles wrote:
         | StaticArrays.jl is just linear algebra operations for tuples,
         | and they have always been stack allocated as they are immutable
         | and have a size known at compile-time, so Julia 1.5 will not
         | bring anything new for it.
         | 
         | The deal with 1.5 is that the GC has improved in that it does
         | not require structs with references to dynamically allocated
         | objects to be heap allocated themselves.
        
           | dklend122 wrote:
           | The allocations for the StaticArrays solve must come from
           | somewhere
        
       | snicker7 wrote:
       | The real performance win seems to be reducing the number of
       | allocations (2 vs. 30 thousand).
        
         | doublesCs wrote:
         | That's not uncommon in HPC.
         | 
         | I'm surprised if this was inherent to StaticArrays.jl though.
         | Probably some interop edge case, I'm imagining.
         | 
         | Shall we summon Chris Rackauckas?
        
         | marmaduke wrote:
         | So the real question is why a library with the word "static" in
         | its title does 30 thousand allocations..
        
           | dnautics wrote:
           | I believe the "static" refers to static, aka, compile-time
           | dimension of the array (yes, julia is a compiling language),
           | versus an array that has a run-time, mutable array
           | dimensionality.
        
         | spacedome wrote:
         | This has been my general experience writing numerical linear
         | algebra methods in Julia. Optimizing a specific algorithm,
         | eliminating unnecessary allocations is the first thing I do,
         | and can give large performance gains, especially for iterative
         | methods.
         | 
         | It can be a bit annoying, and can result in much less readable
         | code (eg having to explicitly write things like _mul!(C, A, B)_
         | ). It ends up looking a lot more like the FORTRAN it is meant
         | to replace, and if you aren't careful you lose the ability to
         | use generic types, though multiple dispatch is a great
         | solution. The worst case I have found is iteratively calling
         | linear algebra solvers (Eig, SVD, LU) which do not have any
         | option for preallocating and reusing work arrays. The only way
         | to do this now is to call the BLAS/LAPACK routines directly
         | with ccall, which is a huge pain. There was an attempt to fix
         | this with PowerLAPACK.jl, but it seems abandoned, so for the
         | moment writing optimized methods in FORTRAN/C and calling from
         | Julia is sometimes still preferable.
         | 
         | Julia does quite well though, so this only seems necessary for
         | things like core NLA libraries, for example I would not try
         | rewriting ARPACK in Julia for a performance gain. The gain in
         | flexibility for writing these in Julia is absolutely huge
         | though, and I definitely recommend it. The Grassmann.jl library
         | has many examples of what a language like Julia makes possible,
         | or DifferentialEquations.jl.
        
           | Xcelerate wrote:
           | I love Julia and have used it for years, but this is really
           | my only complaint with my language. I used to do HPC work and
           | ended up spending a lot of time tracking down where tiny bits
           | of memory were being allocated in for loops. A lot of my
           | routines ended up having "blocks" of memory to pass in as
           | arguments, which as you mention, moves the language a bit
           | closer to C or Fortran IMO.
        
           | etik wrote:
           | There is a recent effort [1] to provide low-level support for
           | faster operations by transforming user code to take advantage
           | of a compiler's instruction set, memory packing, etc. This is
           | being expanded upon to essentially provide a Julia-native
           | BLAS. Some of the benchmarks are even competitive with or
           | beat Intel MKL (calibrate that statement appropriately to
           | your level of trust in benchmarks). I wouldn't count out a
           | Julia ARPACK implementation just yet.
           | 
           | [1] LoopVectorization:
           | https://github.com/chriselrod/LoopVectorization.jl
           | Announcement post and discussion:
           | https://discourse.julialang.org/t/ann-
           | loopvectorization/3284...
        
             | spacedome wrote:
             | This is great, a lot of the performance in BLAS comes from
             | memory management. This does not solve my problem though.
             | Sometimes you want to manually control memory, and Julia
             | does not make it easy to do so. In particular when
             | interfacing with BLAS/LAPACK though Julia, memory
             | management is quite ugly, mainly due to the need for
             | allocating work arrays, and a pure Julia implementation of
             | BLAS is unlikely to fix this. I don't think writing eg
             | ARPACK performantly in Julia is imposible, just painful to
             | the point that writing it in FORTRAN starts to make sense
             | (to me).
        
               | stabbles wrote:
               | > I don't think writing eg ARPACK performantly in Julia
               | is impossible
               | 
               | Not sure what you mean, ARPACK just wraps a bunch of
               | calls to LAPACK and BLAS, that's all. It does not have
               | any low-level linalg kernels of its own. Also, its main
               | bottleneck is typically outside of the algorithm, namely
               | matrix-vector products. ArnoldiMethod.jl is pretty much a
               | pure Julia implementation of ARPACK without any
               | dependency on LAPACK (only BLAS when used with
               | Float32/Float64/ComplexF32/ComplexF64). Finally note that
               | you can easily beat ARPACK by adding GPU support, since
               | they don't provide it.
        
               | spacedome wrote:
               | The person I was responding to seemed to have read my
               | comment and thought I had an issue with raw performance.
               | My point above is that writing iterative code that makes
               | many LAPACK calls becomes difficult to write in Julia
               | because there is no way to manage the memory in this
               | situation other than ccall-ing everything yourself, at
               | which point I would rather write FORTRAN. I work on
               | eigenvalue solvers, so it is all more or less just
               | wrapping a bunch of calls to BLAS/LAPACK. As you say, the
               | main bottleneck is in the BLAS calls anyways, but the
               | excess allocation in Julia can really slow you down.
               | ARPACK is maybe a bad example because its all just mat-
               | vec, when you need to do something like compute an SVD
               | every iteration, thats where you run into issues.
        
             | stabbles wrote:
             | You're probably confusing ARPACK with BLAS / LAPACK.
             | 
             | Pure julia ARPACK already exists, e.g.
             | https://github.com/haampie/ArnoldiMethod.jl/.
             | 
             | A competive BLAS-gemm is implemented here https://github.co
             | m/YingboMa/MaBLAS.jl/blob/master/src/gemm.j... (single-
             | threaded).
             | 
             | A LAPACK-like library could be https://github.com/JuliaLine
             | arAlgebra/GenericLinearAlgebra.j...
        
               | etik wrote:
               | I was mostly referring to the parent comment's suggestion
               | that low level numerical libraries wouldn't benefit from
               | a pure Julia implementation, specifically to the
               | statement that it was still better to write optimized
               | C/FORTRAN and call from Julia. Indeed, MaBLAS, which you
               | linked, is built on top of LoopVectorization.jl.
               | 
               | I don't know how well ArnoldiMethod.jl compares with
               | ARPACK, but if there is a gap my suggestion is simply
               | that these recent developments might help bridge it :)
        
               | stabbles wrote:
               | FWIW, MaBLAS currently does not depend on
               | LoopVectorization.jl, the code to generate kernels is all
               | handwritten.
        
       | ssivark wrote:
       | Could somebody motivate the comparison being made? It is not
       | apriori obvious what Grassman numbers have to do with linear
       | solvers. Thanks.
        
         | spacedome wrote:
         | They are solving the linear system _Ax=b_ , written in
         | Julia/Matlab usually as _A\b_ , specifically for a fixed size
         | 5x5 matrix. The Grassmann algebras can give a way of
         | representing the matrix and vectors, and a comparison is being
         | made to the representation (and solver) given by
         | StaticArrays.jl, which just uses typical matrices and arrays
         | but optimized for small sizes. No real motivation to use
         | Grassmann algebras if your only concern is generally solving a
         | linear system, other than this method seeming to be faster,
         | which mostly (imo) points to the StaticArrays.jl method needing
         | to be optimized.
         | 
         | Edit: This post also does not mention numerical stability of
         | the Grassmann.jl method, which is a real concern in practice,
         | even for such small systems.
        
           | [deleted]
        
         | deltasquared wrote:
         | This is a pretty neat trick, they implemented exterior algebra
         | see https://en.wikipedia.org/wiki/Exterior_algebra
         | 
         | You see this kind of math in textbooks focused on computer
         | graphics.
        
       ___________________________________________________________________
       (page generated 2020-06-15 23:00 UTC)