[HN Gopher] Show HN: Prometeo - a Python-to-C transpiler for hig... ___________________________________________________________________ Show HN: Prometeo - a Python-to-C transpiler for high-performance computing Author : zanellia Score : 112 points Date : 2021-11-17 14:01 UTC (8 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | BBC-vs-neolibs wrote: | A brief comparison distinguishing it from Cython would be most | welcome. | dom96 wrote: | This is really cool. Just a bit of pedantry: is Python higher- | level than C? If so this isn't a transpiler but a compiler :) | zanellia wrote: | Fair enough - it's blurred I'd say. I see C as a lower-level, | and yet still high-level, language, if compared to Python :) | xapata wrote: | That's a compiler. I don't understand the desire to create a new | word when the old one is fine. | zanellia wrote: | "A program that translates between high-level languages is | usually called a source-to-source compiler or transpiler" from | https://en.wikipedia.org/wiki/Compiler. | marmaduke wrote: | Stand-alone is a very useful concept. I don't like deploying | Python stacks much. Wouldn't that additionally mean you could | target CL, CUDA or Sycl variants of C? | zanellia wrote: | I'd say that's possible in principle - definitely not there at | the moment though (and not even planned). | sergius wrote: | How does this compare with Nim and MicroPython? | zanellia wrote: | I though about using Nim as a host language for the DSL for a | while, but then decided to rely on Python simply because it is | more mature (and I had already partially figured out how to | manipulate Python ASTs to generate C code). | zanellia wrote: | Hi all, | | prometeo is an experimental modeling tool for embedded high- | performance computing. prometeo provides a domain specific | language (DSL) based on a subset of the Python language that | allows one to conveniently write scientific computing programs in | a high-level language (Python itself) that can be transpiled to | high-performance self-contained C code easily deployable on | embedded devices. | | The package is still rather experimental, but I hope this concept | could help making the development of software for high- | performance computing (especially for embedded applications) a | little easier. | | What do you think of it? Looking forward to receiving | comments/suggestions/criticism :) | chriswarbo wrote: | Very interesting! What are the similarities/differences compared | to RPython (as used by PyPy)? | | https://rpython.readthedocs.io/en/latest/rpython.html | loeg wrote: | Looks like RPython is a bigger language that doesn't target an | embedded use case without a Python runtime. Though I may be | mistaken - I am not super familiar with RPython. | chrisseaton wrote: | RPython programs can be compiled to a standalone executable | without a Python runtime - it's what PyPy is written in, for | example. | klyrs wrote: | Kneejerk reaction as an enthusiastic Cython developer: "bah, | another crappy (subset of Python)-to-C compiler." | | After reading: this is _really_ cool. If I understand this, I | think you should be able to beat Cython without breaking a sweat. | I 'm quite excited to use this. | zanellia wrote: | hahaha thanks! | pella wrote: | Nice project. | | small comment - related to the benchmarks: | | - in Julia: it has a newer ricatti solver (in package) | | https://github.com/andreasvarga/MatrixEquations.jl/blob/mast... | | https://github.com/andreasvarga/MatrixEquations.jl | lvass wrote: | Cython, pypy, micropython, nuitka, shedskin, ironpython, | graalpython, jython, mypyc, pyjs, skuptjs, brython, activepython, | stackless, transcrypt, cinder and many more I don't remember. | | They're all practically useless or delegated to specific tasks. | At this point you'd need to present incredible evidence that an | alternative compiler can be useful. Personally I find it comical | how many developers are still eluded by a promise of performant | python. I hope you achieve your goals, good luck. | m_ke wrote: | Numpy, Numba and PyTorch seem to be doing ok. | gh02t wrote: | Cython is also pretty successful obviously, though I don't | think it quite fits in OPs list given that it's more about | writing extensions than replacing your entire Python | code/stack. But I do agree with OPs sentiment even as someone | who writes a lot of Python. | fwsgonzo wrote: | Most of the ones you list require dynamic linking and so is | hard to make use of in specialized environments. | | His project seems to be generating generic C code which is much | easier to port to any weird platforms. In fact, it might be | perfect for my use-case where dynamic linking is just extra | attack surface. | | I understand that the project is still in the early stages, but | I will be paying close attention to it. If at some point it | will be possible to write "regular" Python in it (minus most of | the standard library and imports), then it could be a candidate | for an edge computing platform. | staticautomatic wrote: | SpaCy is pretty incredible evidence. | zanellia wrote: | The point of prometeo is not to obtain a "performant Python". | Python is used merely as a host language for an embedded domain | specific language. You could do the same thing with any other | language with a mature library for AST analysis :) | [deleted] | lvass wrote: | Which makes this thread's title at least confusing. | zanellia wrote: | fair enough - could not cram "embedded" into it :) | throw10920 wrote: | I'd argue it's downright misleading. "Python-to-C | transpiler" means _Python_ , not "a DSL based on a subset | of Python". | | An accurate title would be "a DSL embedded in Python for | high-performance scientific computing" or something | similar. | nspattak wrote: | so much effort to match the performance of lower level | languages that it would have actually been easier to use those | directly :) | Zababa wrote: | I'm not sure, most people aren't writing ASM these days | because the compilers are good enough for most cases. | Compilers are great. | zanellia wrote: | I think most HPC people would disagree with this statement. | State-of-the-art HPC code is still written in ASM (see | e.g., https://github.com/xianyi/OpenBLAS) [that's what | Intel is doing too] | guenthert wrote: | That ASM code is however not necessarily constructed | manually. You'd think for high performance code with | limited scope, a superoptimizer would be used. | zanellia wrote: | Not sure what a "superoptimizer" would look like in this | context. For a reference, I know for sure that this | https://github.com/giaf/blasfeo (which beats Intel MKL) | was coded entirely by hand. | Zababa wrote: | I don't think they would. I think they realize that | state-of-the-art HPC code is a small fraction of all the | code written. I doubt that these people write ASM instead | of Python or JS or C or whatever when doing simple | scripts. | marmaduke wrote: | ASM makes sense when the time spent in a specific routine | exceeds the time it takes to write the ASM, which makes a | lot of sense for Blas, less so for other HPC yet | speculative or less fundamental projects. Cvodes for | instance doesn't need to be written in ASM, and I think | Julia makes a strong case that it could have been written | in Julia. | Tozen wrote: | Good point. And you don't have to go that low. Maybe go use | Object Pascal, Nim, or Vlang. I know... the libraries. But a | lot of them are bindings of C libraries. So, you can create | bindings in other languages too or use Python from those | languages. There are various options. | zanellia wrote: | I would disagree on "easier" :) Ever spent half a day | debugging a segfault? | MR4D wrote: | Looks like this could be pretty nice. | | I noticed your disclaimer at the bottom of the linked page [0], | and wanted to get an idea of how far you were looking to take | this. Will it go beyond maths into normal functions (string | handling, etc) ? Do you eventually plan on supporting most of | python - for instance, do you think I could write a web server | using your tool in the future? | | [0] - " _Disclaimer: prometeo is still at a very preliminary | stage and only few linear algebra operations and Python | constructs are supported for the time being._ " | zanellia wrote: | Unfortunately, I think that writing a transpiler for general | Python programs might be rather difficult without resorting to | approaches used, e.g., in Cython/Nuitka. Among other things, | computing the worst-case heap usage could be quite | cumbersome/computationally heavy for a general program without | "constraints". I'd be happy to hear what others think about the | topic though. | rich_sasha wrote: | Soo... it takes Python syntax and produces a C program, with no | links back to Python - is that right? It uses a strict subset of | Python, so that Prometeo programs are valid Python, but not | necessarily the opposite. Is that fair? | | Do you envisage this being a conduit for tight loop optimisation | in Python? Or is it rather "you'd like a C program but can't | write C good"? | | And if the former, how do you compare to Nuitka and Cython? I | read your README but couldn't quite make sense of it :) | zanellia wrote: | > Soo... it takes Python syntax and produces a C program, with | no links back to Python - is that right? It uses a strict | subset of Python, so that Prometeo programs are valid Python, | but not necessarily the opposite. Is that fair? | | yep | | > Do you envisage this being a conduit for tight loop | optimisation in Python? Or is it rather "you'd like a C program | but can't write C good"? | | There are already plenty of options for calling high- | performance libraries from Python. Now 1) interpreting Python | programs that use, e.g., NumPy, can be slow. 2) Compiling these | programs using, e.g., Cython or Nuitka, can speed up the code | _across_ calls to high-performance libraries, but the resulting | code will still rely on the Python runtime library, which can | be slow /unreliable in an embedded context. | | Coming to the second part of the question, writing C code | directly is definitely an option, but, after doing a bit of | that, I realized how tedious/error prone it is to | develop/maintain/extend relatively complex code bases for | embedded scientific computing (e.g. this one | https://github.com/acados/acados). Or, to put it as Bjarne | Stroustroup once said "fiddling with machine addresses and | memory is rather unpleasant and not very productive". The good | news seemed to be that many of the code structures necessary to | write that type of code are rather repetitive and can hopefully | be generated automatically to some extent. | | > And if the former, how do you compare to Nuitka and Cython? I | read your README but couldn't quite make sense of it :) | | This table (from the README) shows some computation times for | Nuitka, prometeo, Python and PyPy. | | CPU times in [s]: | | Python 3.7 (CPython) : 11.787 Nuitka : 10.039 PyPy: 1.78 | prometeo : 0.657 | | Other than performance, the main difference is, again, the | runtime library dependency. | BBC-vs-neolibs wrote: | And Cython? (Not CPython) | rich_sasha wrote: | Right. Gotcha. So Prometeo isn't another "make Python fast | again" project, but rather an orthogonal effort to write fast | (C) programs, but in a high-level Python-like language. | Thanks. | zanellia wrote: | yep, that's right. | 4w4s wrote: | It seems a convenient/high_level way to use highly optimized C | libraries with minimal overhead both in terms of execution time | (i.e. vs standard interpreted Python) both in term of runtime | size/complexity (see Julia). | zanellia wrote: | That's correct. I'd say one of the fundamental differences | between the two lies in the fact that the code generated by | prometeo does not depend on a runtime library (which is | somewhat fundamental for embedded applications, e.g., embedded | optimization). From prometeo's README: | | Finally, although it does not use Python as source language, we | should mention that Julia too is just-in-time (and partially | ahead-of-time) compiled into LLVM code. The emitted LLVM code | relies however on the Julia runtime library such that | considerations similar to the one made for Cython and Nuitka | apply. | zcw100 wrote: | Have you thought of targeting WebAssembly? If you're going from | Python/Prometo -> C you could always make the extra sep of | Python/Prometo -> C -> WASM but I wonder if there would be an | advantage of skipping the intermediate C. | fwsgonzo wrote: | Why WASM? It would be a pessimization compared to just | transpiling to C if performance is the goal. WASM also is | restricted to 128-bit vector instructions. | zcw100 wrote: | Because wasm doesn't support Python and it might be nice to | be able to write WASM in a Python like language. | fault1 wrote: | Hasn't cython been ported to wasm (iodide), or perhaps one | of the "rewrite in Rust" Python impls? rustc can output | wasm pretty naturally. | zanellia wrote: | Python to ASM would actually be really cool and would guarantee | performance gains for small matrices, but it would require | quite some implementation effort. Not sure about WASM. | b20000 wrote: | just write your code in c or c++ and be done with it. if you need | math libs there are plenty out there for anything you can | imagine. python will go the way java went many years ago. | sys_64738 wrote: | Seems like a python to C++ would translate more of the language | to like for like concepts more easily. | 4w4s wrote: | But some "embedded platform" tool-chains do no support C++ | zanellia wrote: | Right, for sure I would not need to re-invent the machinery to | translate a class into a glorified C struct. The whole thing | started with C in mind for portability arguments, but it might | be a good idea to keep an eye on C++ as an option. | cerved wrote: | Looks like a cool project! | | I can't speak much about the code itself or the aims of the | projects. Personally I would recommend more informative commit | messages. | | I do this myself, especially working on personal stuff, but | writing commit messages that succinctly explain what each commit | does is a good practice and gives a serious impression. | | I often find myself hacking away and periodically going back to | flesh out messages using rebase. | zanellia wrote: | Thanks for the suggestion. Until now it's been a lot of | discussion with friends and colleagues and much less actual | collaboration on code writing - I might have drifted into bad | practices. | amkkma wrote: | Regarding all the questions about Julia: | | There's ongoing work to reduce runtime dependencies of Julia (for | example in 1.8, you can strip out the compiler and metadata), but | then it's only approaching Go/Swift and other static languages | with runtimes. | | Generating standalone runtime free LLVM is another path, that is | actually already pretty mature as it's what is being done for the | GPU stack. | | Someone just has to retarget that to cpu LLVM, and there's a | start here: https://github.com/tshort/StaticCompiler.jl/issues/43 | zanellia wrote: | That's quite cool. Maybe the whole thing can be rewritten in | Julia too at some point. I just know too little about Julia to | judge. | amkkma wrote: | Well IMO it can definitely be rewritten in Julia, and to an | easier degree than python since Julia allows hooking into the | compiler pipeline at many areas of the stack. It's lispy an | built from the ground up for codegen, with libraries like | (https://github.com/JuliaSymbolics/Metatheory.jl) that | provide high level pattern matching with e-graphs. The | question is whether it's worth your time to learn Julia to do | so. | | You could also do it at the LLVM level: | https://github.com/JuliaComputingOSS/llvm-cbe | | One cool use case is in | https://github.com/JuliaLinearAlgebra/Octavian.jl which | relies on loopvectorization.jl to do transforms on Julia AST | beyond what LLVM does. Because of that, Octavian.jl. a pure | julia linalg library, beats openblas on many benchmarks | cossatot wrote: | I'm curious what an example use case is for scientific computing | on an embedded device. Is this for real-time analysis on a data | logger or something? | | Many of us think of clusters as high-performance scientific | computing, which are about as far from embedded as it gets. | | Please note that I am not being snarky, just curious! | zanellia wrote: | Thanks for the question! My background is in numerical | optimization for optimal control. Projects like this | https://github.com/acados/acados motivated the development of | prometeo. It's mostly about solving optimization problems as | fast as possible to make optimal decisions in real-time. | 4w4s wrote: | Nice job! Is this aimed at single core/thread computations or the | prometeo layer is also a way to write in a more "user friendly | way" basic parallel code? | zanellia wrote: | For the time being, it targets single core/thread applications | only. | fwsgonzo wrote: | Do you have access to builtins and intrinsics? Are there any | plans? | | The single threaded thing is not an issue because you can | still call the same function on each CPU and use the CPU ID | to target parts of the computation, like a compute kernel | function. | zanellia wrote: | Intrinsics (or directly assembly) are used in BLASFEO | (https://github.com/giaf/blasfeo) the linear package used | by prometeo. It would be cool to generate assembly directly | for a few things, but that would require quite a bit of | work! | OulaX wrote: | Each programming language has its purpose. | | C code is performant and that is a fact. Python code is not. | | When building mission critical systems why don't programmers just | use C itself instead of coding in another programming language | and having it transpiled for them? Why introduce such tools all | the time? | | I am against this because the tools programmers use are becoming | too bloated compared to 10-20 years ago. | | Want to build an Android App? Use Java/Kotlin. | | Want to build an iOS App? Use Swift. | | Want to build a Web App? Use a Single JS Framework (Why millions | of frameworks?) | | Want to build a Windows Desktop App? Use C#.NET Either with | WinForms or WPF. | | I really see tools and technologies coming up all the time to | solve a problem that most of the time doesn't exist. | Zababa wrote: | > When building mission critical systems why don't programmers | just use C itself instead of coding in another programming | language and having it transpiled for them? | | Why C and not assembler? | | > Why introduce such tools all the time? | | C compilers are one of those tools. | GekkePrutser wrote: | It doesn't have to be production. Maybe it's for a research | project where you just need the extra performance. | | Everything has a cost. This may not be ideal but learning to do | C properly as an experienced Python dev will have a time cost | as well. This may just be the best way to get from A to B. | | I remember when I did a one-off project with a PIC | microcontroller. I only had an assembler and I spent 2 days | getting nowhere. | | Then i found a C compiler and I had the whole thing running in | 2 hours. The compiler turned out to much more efficient in | speed as well as code size than my hand-written assembler. | zanellia wrote: | I think many people who have at least once first prototyped a | numerical algorithm in a high-level language (say Python, | Julia, MATLAB?) and _then_ implemented it in C, can relate to | the experience of transitioning from error messages of the | type: "dimension mismatch for XYZ" to "segmentation fault". | That's in my opinion a strong motivation to build tools that | can automate certain parts of the development process. | | Writing C code directly is as a good option, as long as your | code is not too complex to develop, maintain and extend. | | And, again, Python here is intended to be the host language for | an embedded domain specific language that gets compiled into C. | It does not need to be efficient it needs to be expressive and | easy to analyse and transpile. | adgjlsfhk1 wrote: | Note that the whole point of Julia is that it saves you the | rewrite. There is Julia code running on top supercomputers | that gives speed competitive to C/C++/Fortran. You will have | to put in some work to get Julia code to be that fast, but it | is usually dramatically easier than a rewrite in a different | language. | zanellia wrote: | It's not so easy deploy an algorithm written in Julia on an | embedded platform though, is it? | adgjlsfhk1 wrote: | Probably not :) | fault1 wrote: | yes, but "speed" in 'top supercomputers' is not "speed" in | 'embedded systems'. | | I do think Julia potentially can crack this space, but the | runtime at least historically has not been tailored for it. | | It does seem like Julia has become more modular lately | especially being able to disconnect the JIT (or LLVM ORC). | H Hopefully you'll either being able to either defatten or | completely remove the runtime dependencies (ala Rust in no- | std mode). Each of these is important for different use | cases. | packetlost wrote: | You've _never_ been near a lab environment clearly. Python is a | dominant language in university labs and runs of lot more real- | time systems than you think. Grad students rarely have industry | experience and don 't necessarily have the know-how to write C | code effectively, so it's a question of resources and | ecosystem. Numpy, matplotlib, pandas, scikit, TensorFlow, etc. | are all huge draws for the scientific and ML communities. | klyrs wrote: | The problem that this language solves is that it automatically | sorts out the memory usage for you. That isn't a problem for | me; I've been programming in C for decades. But it is a problem | for most python programmers who don't have a lick of C | experience, but want to get C performance. It drastically | lowers the barrier of entry. | zanellia wrote: | For what it counts, I have developed code for this kind of | applications exclusively in C for ~5 years (let's say 20% of | my working time). I still think that debugging a segfault | that you could have avoided is not very productive and that | motivated me to look into possible alternatives. | kevin_thibedeau wrote: | 99% of the time a stack trace shows the culprit for a | segfault straight away. No different than debugging Python. | zanellia wrote: | if we are arguing that implementing a numerical algorithm | in C is as easy as implementing it in Python - I would | disagree. But maybe I am just wrong :) | klyrs wrote: | FWIW, I almost always use valgrind before a debugger, when | tracking down segfaults. It doesn't catch everything, but | 90% of the time, it gets me to the right region of code in | a single run. | zanellia wrote: | sure I use valgrind and gdb too - still hard to argue | that a segfault is pleasant to debug though? | klyrs wrote: | Good good, just wanted to advocate for my favorite tool | there. But, in my experience, segfaults are usually the | easiest bugs to resolve. Unlike a sign error in my math, | they're impossible to miss! | | That said, tooling to get rid of them entirely is not to | be sneezed at :) | GekkePrutser wrote: | Yes and it will also prevent common memory management bugs | that can lead to code injection. | up6w6 wrote: | Yes, I'm gonna talk about Julia... | | It's kinda of sad how much effort is put on the creation of new | Python compilers to make it slight faster while the problem of | latency to compile that people hate at Julia is not tracked | because of the lack of manpower to improve Julia's interpreter. | | https://youtu.be/IlFVwabDh6Q?t=2530 (tldr: The Julia interpreter | is currently about 500x slower than JIT code and there are a lot | of low-hanging fruit work there that could easily give it a 10x | speedup - this could make more viable to switch between compiler | and interpreter depending on the work) | zanellia wrote: | Personally, I think Julia is great - just don't know it well | enough to write a package that takes Julia ASTs and generate C | code from them :) There could totally be a Julia implementation | of the main idea behind prometeo (Julia per se does not solve | the problem that prometeo aims at solving). | adgjlsfhk1 wrote: | You can just use `@code_llvm` to generate LLVM code, or | `@code_native` to generate assembly. Does that do what you | need? | zanellia wrote: | hmm not sure, the compiled LLVM code would still depend on | the runtime library? | adgjlsfhk1 wrote: | The LLVM code will only call into the runtime for | allocation or dynamic dispatch, both of which are | avoidable. Lots of real Julia code will never touch it. | fault1 wrote: | the problem with Julia in the use case of OP is really the fact | that it is garbage collected (and perhaps also how its GC is | tuned). You can work to eliminate allocations, but the memory | determinism problem is more important in real time control and | embedded systems. see for example, this video: | https://www.youtube.com/watch?v=dmWQtI3DFFo | | It's kind of why C is still king in this space. | loeg wrote: | To get ahead of the obvious question I had and I'm sure others | will, this is from the README: | | > Cython is a programming language whose goal is to facilitate | writing C extensions for the Python language. In particular, it | can translate (optionally) statically typed Python-like code into | C code that relies on CPython. Similarly to the considerations | made for Nuitka, this makes it a powerful tool whenever it is | possible to rely on libpython (and when its overhead is | negligible, i.e., when dealing with sufficiently large scale | computations), but not in the context of interest here. | | I.e., it's a python-like DSL that does not depend on the Python | runtime. | | Thanks for sharing OP, this is pretty cool. | zanellia wrote: | Right, that's indeed the main reason I could not simply use | Cython or Nuitka (or Julia?). The Python runtime library will | do all kinds of non real-time/embedded friendly operations such | as garbage collections, memory allocation/de-allocation and so | on, in the background. | cycomanic wrote: | How does it compare to pythran? Except for the fact that it's c | and not c++? | zanellia wrote: | Not sure how easy it would be to make the code generated by | Pythran standalone, i.e., no dependency on the Python runtime | library. Any Pythran expert? :) | cycomanic wrote: | Pythran code is standalone, i.e. no dependency on the Python | runtime AFAIK. | zanellia wrote: | It generates a Python extension, doesn't it? Would not know | how to run it outside of Python. | cycomanic wrote: | It can generate python extensions, but doesn't have to, | here is a blog post talking about using it to generate | self contained c++ code by the author: https://serge- | sans-paille.github.io/pythran-stories/pythran-... | | BTW very cool project nevertheless, just wanted to see | the differences parallels to pythran. There might even by | room for collaboration on some features. | throw5399375930 wrote: | Great project, but terrible name, considering how popular | Prometheus is. | zanellia wrote: | fair enough :p I might change it in the future. ___________________________________________________________________ (page generated 2021-11-17 23:01 UTC)