[HN Gopher] Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl ___________________________________________________________________ Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl Author : DreamScatter Score : 93 points Date : 2020-06-15 16:15 UTC (6 hours ago) (HTM) web link (discourse.julialang.org) (TXT) w3m dump (discourse.julialang.org) | thanatropism wrote: | When I first heard of Julia in 2016 it was sandwiched between R | and Python and thought it had no chance whatsoever of succeeding. | | I'm increasingly impressed by Julia. It's a crowded space and | they remain relevant. Maybe even best of class for some | applications. | macawfish wrote: | Grassmann.jl is the library that made me take Julia seriously. | It's an awesome library, and the author really gets one of | Julia's most special and well integrated features, multiple | dispatch. | | Because the code is written in solid, idiomatic Julia, it's | interoperable with tons of other libraries. I actually had | success using this library with CuArrays.jl without any | modifications. So yeah, you can use it to run geometric algebra | computations on a GPU for algebras of dimension < 2^5 (in my | case, I tried it with CGA3D). | plafl wrote: | The same first impression for me. It's great people with more | faith than me have pushed forward and actually I'm still | waiting to make the jump. It was very similar with python for | scientific computing in the beginning. When I wrote my master | thesis in python in 2006 python was completely dwarfed by | Matlab. I was more enthusiastic on those days trying new things | and also had more time. It is yet to be seen if Julia displaces | python but at least it has non zero chance of doing it. | peatmoss wrote: | I've wanted Scheme / Racket to make a showing in places where | R and Python have dominated the data science and numerical | computing spaces (for those of us not writing C/C++/Fortran). | | But, I think Julia is the closest thing to an honest-to- | goodness lisp that has a prayer of mainstream adoption. I | quite like Julia and think the community has leveraged the | meta programming strengths of the language well enough to | convince me to relinquish my love of minimalist lisp syntax. | | I do wonder if the biggest risk to Julia isn't that it's not | extreme / superlative in any dimension. If you want raw | performance, Rust seems to be capturing the lion's share of | excitement among "new" languages. If you want meta | programming, Julia still feels like a compromise compared to | Scheme/Racket/Clojure/Common Lisp. If you want libraries, | Python is still king. If you want stats libraries, R is still | king. Julia is impressively good at all of those things... | but it's not good enough to be the most compelling of any of | them. | | Still, it makes me think that it may be time to dive in head | first rather than dabble my toes as I've been doing for the | past few years. I guess my main barrier at the moment is | compiler / deployment story--it still seems hard to build a | deployable binary that fits in, say, a Lambda layer as far as | I can tell. | ddragon wrote: | Even if Julia never ends up being the very best at anything | particular, Python has shown that being "generally the | second best language for everything" has even more value | for most people. Leveraging 80% of the flexibility and | composability of Lisp, 80% of the "code that looks like | pseudo-code" aspect and overall easy to use from Python, | 80% of the speed of the fastest static languages and | eventually the library and tooling maturity to support all | use cases, it can definitely become an even better second | best language for everything. | improbable22 wrote: | > it's not extreme / superlative in any dimension. If you | want raw performance ... good at all of those things... but | | I guess the selling point is that being good at many things | means you don't have to keep switching between all these | languages, or glueing them together. Others have mentioned | the composability benefits of this. But it also has a | learning advantage, I think. You can incrementally learn to | speed up the crucial piece of something. You can learn just | enough metaprogramming to solve some problem you have. | nextos wrote: | It has a lot of potential. You can often write code in Julia | that is really performant by just following a few simple rules. | Most notably, keeping type stability. Libraries are now | catching up. Some are great, some are unique, others are still | missing. | | I have generated SIMD code that outperforms C and C++, even | after some hand optimizations. As a matter of fact, some people | are re-writing a BLAS drop-in replacement in pure Julia, and | the performance is not going to be too far off from BLAS [1]. | If you work in scientific computing, you can probably | appreciate how amazing that is. | | The combination of multiple dispatch, a simple type system and | an LLVM backend is wonderful. In the long run, Julia will | probably grab lots of Python's and R's marketshare because | having a single language instead of two languages (Python + | Cython/Pythran/... or R + C++) is a huge advantage. | | [1] https://discourse.julialang.org/t/we-can-write-an- | optimized-... | krapht wrote: | Maybe Julia will become more popular, but I'd bet on the 800 | pound gorilla Python getting faster through broader use of | Dask, Cython, and Numba. I'll never bet against the network | effects of language and library development. | cbkeller wrote: | People often talk about speed as an advantage of Julia | (which is true), but the real secret advantage compared to | any other language I've worked with is more of a network | effect: composability | | I didn't appreciate the importance of this (or the degree | to which this _just works_ as a result of multiple | dispatch), but there 's a good talk about this from | JuliaCon 2019 [1] by Stefan Karpinski | | [1] https://www.youtube.com/watch?v=kc9HwsxE1OY | KenoFischer wrote: | One aspect that's underappreciated is that composability | can actually lead to speed. If it's super easy to compose | your algorithmic package with speed-improvement packages | (parallelism, hardware accelerators, custom compilers), | you're much more likely to be actually able to use them. | dklend122 wrote: | I'd be interested to see a static arrays benchmark run on 1.5 as | it can now stack allocate views and other types due to general | compiler improvements. | stabbles wrote: | StaticArrays.jl is just linear algebra operations for tuples, | and they have always been stack allocated as they are immutable | and have a size known at compile-time, so Julia 1.5 will not | bring anything new for it. | | The deal with 1.5 is that the GC has improved in that it does | not require structs with references to dynamically allocated | objects to be heap allocated themselves. | dklend122 wrote: | The allocations for the StaticArrays solve must come from | somewhere | snicker7 wrote: | The real performance win seems to be reducing the number of | allocations (2 vs. 30 thousand). | doublesCs wrote: | That's not uncommon in HPC. | | I'm surprised if this was inherent to StaticArrays.jl though. | Probably some interop edge case, I'm imagining. | | Shall we summon Chris Rackauckas? | marmaduke wrote: | So the real question is why a library with the word "static" in | its title does 30 thousand allocations.. | dnautics wrote: | I believe the "static" refers to static, aka, compile-time | dimension of the array (yes, julia is a compiling language), | versus an array that has a run-time, mutable array | dimensionality. | spacedome wrote: | This has been my general experience writing numerical linear | algebra methods in Julia. Optimizing a specific algorithm, | eliminating unnecessary allocations is the first thing I do, | and can give large performance gains, especially for iterative | methods. | | It can be a bit annoying, and can result in much less readable | code (eg having to explicitly write things like _mul!(C, A, B)_ | ). It ends up looking a lot more like the FORTRAN it is meant | to replace, and if you aren't careful you lose the ability to | use generic types, though multiple dispatch is a great | solution. The worst case I have found is iteratively calling | linear algebra solvers (Eig, SVD, LU) which do not have any | option for preallocating and reusing work arrays. The only way | to do this now is to call the BLAS/LAPACK routines directly | with ccall, which is a huge pain. There was an attempt to fix | this with PowerLAPACK.jl, but it seems abandoned, so for the | moment writing optimized methods in FORTRAN/C and calling from | Julia is sometimes still preferable. | | Julia does quite well though, so this only seems necessary for | things like core NLA libraries, for example I would not try | rewriting ARPACK in Julia for a performance gain. The gain in | flexibility for writing these in Julia is absolutely huge | though, and I definitely recommend it. The Grassmann.jl library | has many examples of what a language like Julia makes possible, | or DifferentialEquations.jl. | Xcelerate wrote: | I love Julia and have used it for years, but this is really | my only complaint with my language. I used to do HPC work and | ended up spending a lot of time tracking down where tiny bits | of memory were being allocated in for loops. A lot of my | routines ended up having "blocks" of memory to pass in as | arguments, which as you mention, moves the language a bit | closer to C or Fortran IMO. | etik wrote: | There is a recent effort [1] to provide low-level support for | faster operations by transforming user code to take advantage | of a compiler's instruction set, memory packing, etc. This is | being expanded upon to essentially provide a Julia-native | BLAS. Some of the benchmarks are even competitive with or | beat Intel MKL (calibrate that statement appropriately to | your level of trust in benchmarks). I wouldn't count out a | Julia ARPACK implementation just yet. | | [1] LoopVectorization: | https://github.com/chriselrod/LoopVectorization.jl | Announcement post and discussion: | https://discourse.julialang.org/t/ann- | loopvectorization/3284... | spacedome wrote: | This is great, a lot of the performance in BLAS comes from | memory management. This does not solve my problem though. | Sometimes you want to manually control memory, and Julia | does not make it easy to do so. In particular when | interfacing with BLAS/LAPACK though Julia, memory | management is quite ugly, mainly due to the need for | allocating work arrays, and a pure Julia implementation of | BLAS is unlikely to fix this. I don't think writing eg | ARPACK performantly in Julia is imposible, just painful to | the point that writing it in FORTRAN starts to make sense | (to me). | stabbles wrote: | > I don't think writing eg ARPACK performantly in Julia | is impossible | | Not sure what you mean, ARPACK just wraps a bunch of | calls to LAPACK and BLAS, that's all. It does not have | any low-level linalg kernels of its own. Also, its main | bottleneck is typically outside of the algorithm, namely | matrix-vector products. ArnoldiMethod.jl is pretty much a | pure Julia implementation of ARPACK without any | dependency on LAPACK (only BLAS when used with | Float32/Float64/ComplexF32/ComplexF64). Finally note that | you can easily beat ARPACK by adding GPU support, since | they don't provide it. | spacedome wrote: | The person I was responding to seemed to have read my | comment and thought I had an issue with raw performance. | My point above is that writing iterative code that makes | many LAPACK calls becomes difficult to write in Julia | because there is no way to manage the memory in this | situation other than ccall-ing everything yourself, at | which point I would rather write FORTRAN. I work on | eigenvalue solvers, so it is all more or less just | wrapping a bunch of calls to BLAS/LAPACK. As you say, the | main bottleneck is in the BLAS calls anyways, but the | excess allocation in Julia can really slow you down. | ARPACK is maybe a bad example because its all just mat- | vec, when you need to do something like compute an SVD | every iteration, thats where you run into issues. | stabbles wrote: | You're probably confusing ARPACK with BLAS / LAPACK. | | Pure julia ARPACK already exists, e.g. | https://github.com/haampie/ArnoldiMethod.jl/. | | A competive BLAS-gemm is implemented here https://github.co | m/YingboMa/MaBLAS.jl/blob/master/src/gemm.j... (single- | threaded). | | A LAPACK-like library could be https://github.com/JuliaLine | arAlgebra/GenericLinearAlgebra.j... | etik wrote: | I was mostly referring to the parent comment's suggestion | that low level numerical libraries wouldn't benefit from | a pure Julia implementation, specifically to the | statement that it was still better to write optimized | C/FORTRAN and call from Julia. Indeed, MaBLAS, which you | linked, is built on top of LoopVectorization.jl. | | I don't know how well ArnoldiMethod.jl compares with | ARPACK, but if there is a gap my suggestion is simply | that these recent developments might help bridge it :) | stabbles wrote: | FWIW, MaBLAS currently does not depend on | LoopVectorization.jl, the code to generate kernels is all | handwritten. | ssivark wrote: | Could somebody motivate the comparison being made? It is not | apriori obvious what Grassman numbers have to do with linear | solvers. Thanks. | spacedome wrote: | They are solving the linear system _Ax=b_ , written in | Julia/Matlab usually as _A\b_ , specifically for a fixed size | 5x5 matrix. The Grassmann algebras can give a way of | representing the matrix and vectors, and a comparison is being | made to the representation (and solver) given by | StaticArrays.jl, which just uses typical matrices and arrays | but optimized for small sizes. No real motivation to use | Grassmann algebras if your only concern is generally solving a | linear system, other than this method seeming to be faster, | which mostly (imo) points to the StaticArrays.jl method needing | to be optimized. | | Edit: This post also does not mention numerical stability of | the Grassmann.jl method, which is a real concern in practice, | even for such small systems. | [deleted] | deltasquared wrote: | This is a pretty neat trick, they implemented exterior algebra | see https://en.wikipedia.org/wiki/Exterior_algebra | | You see this kind of math in textbooks focused on computer | graphics. ___________________________________________________________________ (page generated 2020-06-15 23:00 UTC)