[HN Gopher] Hello World
       ___________________________________________________________________
        
       Hello World
        
       Author : ddevault
       Score  : 88 points
       Date   : 2020-01-04 13:53 UTC (9 hours ago)
        
 (HTM) web link (drewdevault.com)
 (TXT) w3m dump (drewdevault.com)
        
       | alberth wrote:
       | I'm curious to know how NIM performed given it transcodes into C.
        
         | zamadatix wrote:
         | Someone commented a test shortly after you asked
         | https://news.ycombinator.com/item?id=21957476
        
           | dom96 wrote:
           | I'm actually curious why Nim didn't make the list. Crystal is
           | there and a lot of other emerging languages. Is the author
           | not familiar with it?
        
             | kick wrote:
             | Drew's familiar with nim, and he's talked with you about it
             | before on HN. Crystal was volunteered by someone working on
             | Crystal, Julia was volunteered by a dev who works using
             | Julia, etc. Haskell was also presented, I think, but
             | Haskell was a mess.
        
       | phoe-krk wrote:
       | What is this post supposed to prove? It certainly is not supposed
       | to prove that a hello world is a representative real-world
       | program, from which one could infer that writing and debugging a
       | real-world program in Julia is 835 times as complex as writing a
       | real-world program in assembly, since the former makes 835 times
       | as many syscalls as an assembly program. (You know, that number
       | seems okay for me, except it needs to be applied to these
       | languages in reverse.)
       | 
       | I agree that software bloat is a big problem, but trivializing
       | that problem to printing a "hello world" to the screen, punishing
       | all languages with runtimes by measuring the syscalls involved in
       | their startup routines, disregarding the fact that many users are
       | going to have a single system-wide runtime for e.g. C or Python
       | or Julia and therefore the total-kB number does not scale
       | linearly with the number of programs written in C or Python or
       | Julia, ignoring the massively increased software development and
       | debugging time for writing in low-level and memory-unsafe
       | languages like assembly, static Zig, or C, and directly
       | implying[0] that most problems with software complexity can be
       | solved by writing in assembly, static Zig, or C rather than in
       | Julia/Ruby/Java/all other languages from the bottom 90% of the
       | list (and that's the vibe that this post gives me) is, for me,
       | more about making a venting shitpost than creating something that
       | provides even a part of an actual solution to software bloat in
       | general.
       | 
       | The "more time your users are sitting there waiting for your
       | program" statement is especially amusing to me. Your users are
       | not going to wait shorter for your program because you and your
       | team are taking another year to write, test, and debug it in
       | assembly.
       | 
       | [0] "These numbers are real. This is more complexity that someone
       | has to debug, more time your users are sitting there waiting for
       | your program, less disk space available for files which actually
       | matter to the user."
        
         | ajkjk wrote:
         | It could just be trying to be interesting, not to prove
         | anything.
        
           | phoe-krk wrote:
           | Why is it attempting to moralize if it is not meant to prove
           | any morals then?
        
           | hn_throwaway_99 wrote:
           | The language in the post, "Most languages do a whole lot of
           | other crap other than printing out "hello world", even if
           | that's all you asked for." certainly seems to imply it's
           | moralizing about something.
        
       | zamadatix wrote:
       | "Passing /dev/urandom into perl is equally likely to print "hello
       | world""
       | 
       | That gave me a good chuckle towards the end.
       | 
       | It'd be useful to break this out a little further as it'd have
       | been interesting to see how small just the output is on the
       | dynamically linked versions instead of just comparing static to
       | whole dynamic bundle.
       | 
       | It's also a bit odd that e.g. zig gets optimized for size and
       | stripped via the compiler, c gets optimized for speed and
       | stripped via strip, and Go/Crystal just gets built standard with
       | no stripping at all. I don't think it'd change the big picture
       | just a bit odd.
       | 
       | .
       | 
       | Unrelated tangent/ramble, I played with Zig and Go as part of my
       | yearly "take December off and tinker" break. Zig was really fun
       | to work with but unfortunately still in a huge churn and
       | development. Go was a lot better than I expected it to be (I had
       | put off messing with Go for a few years now) and the size of the
       | stdlib is just astounding. In the end it wasn't as "fun" as zig
       | but it had very low friction and I definitely see myself using it
       | for a few personal projects over the next year... and then seeing
       | if Zig has less churn in December ;).
        
       | franciscop wrote:
       | Random thought/question, does `process.stdout.write("Hello
       | World");` in Node.js make any difference? While `console.log()`
       | is correct for this analysis since it's the more common one, it
       | does a lot of extra internal logic:
       | https://github.com/nodejs/node/blob/v13.x/lib/internal/conso...
       | 
       | Edit: I'm just curious and don't know how to even start testinng
       | this, not trying to promote/demote Node.js in any way.
        
         | zamadatix wrote:
         | I would be surprised if console.log() made many more syscalls
         | than process.stdout.write. More function calls probably but
         | those aren't being counted and neither is RAM usage. "strace"
         | would let you count and find out though!
         | 
         | The size would be a few bytes larger since node is scripted and
         | that's more characters.
        
         | np_tedious wrote:
         | If it is any better, it probably would be the more fair entry
         | (and perhaps similar for python and stdout/bytes) since Go's
         | example did basically that instead of fmt.print with a string
        
       | hn_throwaway_99 wrote:
       | What is with it lately where there seem to be lots of posts
       | fetishising absolute performance over lots of other attributes,
       | or even worse, pretending those other attributes don't even
       | matter.
       | 
       | What is the point of this post? Yes, I fully expect a simple
       | Hello World in assembly would be straightforward and fast. I
       | still want the advantage of things like automated memory
       | management, an interpreter or JIT compiler where warranted, a
       | standard runtime environment, etc. For anything even remotely
       | complicated.
       | 
       | I get it, over the past 30-40 years we've built layers upon
       | layers of abstraction, so it's worth it to take a look back and
       | ask "Are there some cases where we overdid it?" Still, let's not
       | throw the baby out with the bathwater, or forget why we added
       | those layers in the first place.
        
       | ChrisMarshallNY wrote:
       | That's pretty cool. It reminds me of GodBolt
       | (https://godbolt.org).
       | 
       | I'm told that the story behind it, is that he was arguing with
       | someone about the efficiency of an operation, and actually wrote
       | that site to prove his point.
        
       | NilsIRL wrote:
       | It would be interesting to know why the number of syscalls for C
       | are so high.
        
         | gerikson wrote:
         | Which version of C?
        
           | zamadatix wrote:
           | All versions are interesting, zig/assembly managesto do it in
           | 2/3 so what is musl doing that needs 5? And what on Earth is
           | glibc dynamic doing that it needs 65?
        
         | BearOso wrote:
         | I'm wondering why his dynamic glibc C executable is so big.
        
       | _paulc wrote:
       | As it's not on Drew's list:
       | 
       | Nim:                 $ cat hello.nim       stdout.write("hello,
       | world!\n")
       | 
       | Static (musl):                 $ nim --gcc.exe:musl-gcc
       | --gcc.linkerexe:musl-gcc --passL:-static c -d:release hello.nim
       | $ ldd ./hello       not a dynamic executable            Execution
       | Time: 0m0.002s (real)       Total Syscalls: 16       Unique
       | Syscalls: 8       Size (KiB): 95K (78K stripped)
       | 
       | Dynamic (glibc):                 $ nim c -d:release hello.nim
       | $ ldd ./hello       linux-vdso.so.1 =>  (0x00007ffc994b6000)
       | libdl.so.2 => /lib64/libdl.so.2 (0x00007f7c88785000)
       | libc.so.6 => /lib64/libc.so.6 (0x00007f7c883b8000)
       | /lib64/ld-linux-x86-64.so.2 (0x00007f7c88989000)
       | Execution Time: 0m0.002s (real)       Total Syscalls: 42
       | Unique Syscalls: 13       Size (KiB): 91K (79K stripped)
       | 
       | Which I think is actually pretty reasonable for a high-level GC'd
       | language.
        
         | zamadatix wrote:
         | "Size (KiB): 95K (78K stripped)"
         | 
         | Seems suspicious that lines up with the 95.9 KiB the author
         | listed for C + GCC + musl static build even though the author
         | says they stripped the binary after. I think they might have
         | copied the wrong number into the table :).
         | 
         | The author was counting the size of dynamic as binary +
         | dynamically linked files. Should be about the same as the c
         | dynamic ones in the table in this case anyways but just a note
         | to anyone else running their own tests.
        
         | tyingq wrote:
         | Curious if the generated C is any different for nim's echo as
         | opposed to stdout.write().
        
       | cycloptic wrote:
       | Sorry but I have to give this a thumbs down for not being a very
       | convincing or well-written blog post. It dumps some data and then
       | immediately jumps to a statement about how "lots of syscalls =
       | bad" without actually detailing what those syscalls are doing in
       | the context of the runtime. And I'm saying this as someone who
       | already runs Alpine on my servers and doesn't need to be
       | convinced. Drew, I think you can write much better posts than
       | this.
        
         | peteradio wrote:
         | More matter with less art.
         | 
         | I thought it was a breezy read with a simple thesis. I don't
         | know why such a thing should be discouraged.
        
           | cycloptic wrote:
           | It should be no surprise to users of CPython and Ruby that
           | those languages have a lot of startup code. The details of
           | what that code is doing are already evident if you're
           | watching it happen in an strace log, but those bits were left
           | out. This isn't art, it's details, and without the details,
           | it's just preaching to the choir. No new readers are going to
           | be convinced.
        
         | nielsole wrote:
         | It's still an interesting table. I was surprised that Java is
         | 10x faster than Python. I would have expected initializing the
         | JVM would be similarly complex to initializing the Python
         | Interpreter.
        
           | cycloptic wrote:
           | My point is that it's not clear from the article why that is
           | the case.
        
           | giantrobot wrote:
           | The author doesn't state the Java version but more recent
           | version JREs (9+ IIRC) start up way faster than older
           | versions. I'd imagine a JRE's launch time is heavily
           | influenced by disk cache. A warm launch ends up way faster
           | than a cold launch with the tens of megabytes of classes in
           | RAM means the warm launch basically loads the program's class
           | file from disk.
        
       | lttlrck wrote:
       | It's comparing unassembled assembler to JITed code and compilers
       | that are pulling in precompiled libraries.
       | 
       | I feel it needs some kind of normalizing. I get that it is
       | illustrating bloat but it doesn't really illustrate where that's
       | coming from.
       | 
       | Maybe only the output of the JITs should counts, or the syscalls
       | required to assemble the example should be included. Are musl and
       | glibc really wasting cycles or are they doing something that the
       | example is missing.
       | 
       | Fun to think about.
        
         | tensor wrote:
         | It's not even meaningfully illustrating bloat. Hello world is
         | an unrealistic edge case. Any program that does something
         | useful will be far more complicated, and it's entirely possible
         | that a lot of the extra stuff being measured here will required
         | anyways.
        
         | zamadatix wrote:
         | > It's comparing unassembled assembler to JITed code
         | 
         | No, he runs the assembly through NASM + GCC as documented on
         | the page.
         | 
         | I think it's a comparison of "when the user runs the program
         | what runs, how long does it take and how much disk space does
         | it need" based of the column headings. It's not a comparison of
         | the tooling prior to the user's computer as far as I can tell.
        
         | ddevault wrote:
         | It's deliberate that JITs, interpreters, and compiled languages
         | are compared on the same terms here. JITs and interpreters are
         | fundamentally less performant than compiled languages, they
         | don't get a pass on performance tests just because it's by
         | design.
        
           | joshuamorton wrote:
           | > JITs
           | 
           | This is...highly context dependent. For highly polymorphic
           | code, my understanding is that JITs can outperform
           | precompiled binaries, since they can inline
           | virtual/polymorphic calls in tight loops.
           | 
           | This also isn't a "performance" test in any real sense. It's
           | a test of startup time. Where, yes, JITs lose, but unless
           | you're writing short lived interactive command line tools, or
           | something that runs on lambda, that shouldn't be a concern.
           | For "normal" serverside or desktop apps that run for more
           | than, say, 30 seconds, the difference between 0s of startup
           | and 0.2s of startup time is literally in the noise.
        
             | ddevault wrote:
             | Startup time is the least interesting metric in this
             | article. The more interesting metric is the number of
             | syscalls. This isn't a measure of performance, it's a
             | measure of complexity and busywork. Complexity tends to
             | indirectly affect performance, but that's not the point of
             | the article.
        
               | joshuamorton wrote:
               | Complexity of _what_?
               | 
               | The resulting generated binary? Well no, a python binary
               | is smaller than the c binary. The toolchain? Well gcc is
               | pretty complex and that's unaccounted for. The build
               | process? Again, no.
               | 
               | The closest thing I can think of is the language runtime.
               | But why do I care about how complex the language runtime
               | is? Often more complex language runtimes make my life
               | easier anyway, and they're all sitting atop the Intel
               | microcode magic box anyway.
               | 
               | There's a very specific definition of complexity you're
               | using, and I'm still not sure what it is. In my world,
               | you usually add complexity to eek out extra performance
               | by breaking the less complex abstractions.
        
               | ddevault wrote:
               | The compleixty of the _system_.
        
               | joshuamorton wrote:
               | I'm still confused. Why is runtime compilation a
               | component of the "system", but aot compilation is not?
               | Why are python interpreters a component of the system,
               | but microcode interpreters and aot compilation not?
               | 
               | If you haven't given a clear definition of what "the
               | system" is, I can't really use your evaluation to
               | influence my decision making.
        
               | ddevault wrote:
               | Because AOT compilation can be used to construct a system
               | which is functional _without_ the compiler, but runtime
               | compilation requires it at runtime. The difference is
               | plainly obvious.
        
       | marcosscriven wrote:
       | Curious what all the syscalls Rust is making, and why?
        
         | steveklabnik wrote:
         | One reason is that println! will lock stdout for you. I'm not
         | sure what percentage that makes up.
         | 
         | If you don't want that behavior, you can control this all
         | yourself with write! and friends.
         | 
         | EDIT: deeper analysis on Reddit:
         | https://www.reddit.com/r/programming/comments/ejxwlu/hello_w...
        
       | tyingq wrote:
       | Perl seems to lead the pack for interpreted langauges by a wide
       | margin for this microbenchmark. I wonder if that's just for the
       | narrow case of print() / hello-world.
        
         | [deleted]
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2020-01-04 23:00 UTC)