[HN Gopher] Challenge to scientists: does your ten-year-old code...
       ___________________________________________________________________
        
       Challenge to scientists: does your ten-year-old code still run?
        
       Author : sohkamyung
       Score  : 236 points
       Date   : 2020-08-24 13:19 UTC (9 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | daly wrote:
       | Axiom is a computer algebra system written in the 1970s-80s. It
       | still runs (and is open source).
        
       | stillsut wrote:
       | It's not academia but Kaggle that's really been on the forefront
       | of building portable and reproducible computational pipelines.
       | 
       | The real key is incentives and there are two that standout to me:
       | 
       | - Incentive to get others to "star" and fork your code makes the
       | coder compete to not only have an accurate result, but also
       | prioritize producing code/notebooks that are digestible and
       | instructive. That includes liberal commenting/markup, idiomatic
       | syntax and patterns, diagnostic figures, and the use of modern
       | and standard libraries.
       | 
       | - There is an incentive to move _with_ the community on best
       | practices for the libraries while still allowing experimental
       | libraries. Traditionally, there is the incentive of inertia: e.g.
       | "I always do my modelling in Lisp, and I won't change because
       | then I'd be less productive". But with kaggle, to learn from the
       | insights and advances of others, you need to have an ability to
       | work with the developing common toolset.
       | 
       | In academia, if these incentives were given weight on par with
       | publication and citation then we'd see the tools and practices
       | fall into place.
        
       | hprotagonist wrote:
       | Mine does.
       | 
       | I swear 40% of the idiocy of science code is because people
       | fundamentally don't understand how file paths work. Stop
       | hardcoding paths to data and the world gets better by an order.
        
       | m3kw9 wrote:
       | It always runs if you used the same computer with the same
       | environment you last time ran it. So yes.
        
       | bloak wrote:
       | I had an 18-year-old Python script. But it didn't work! And I
       | couldn't make it work! Fortunately I had an even older version of
       | the code in Perl, which did work after some very minor changes.
       | 
       | This wasn't scientific code. It was some snarly private code for
       | generating the index for a book and I didn't look at it between
       | one edition and the next. I hope I don't have to fix it again in
       | another 18 years.
       | 
       | Applying some version of the "doomsday argument", Perl 5 might be
       | a good choice if you're writing something now that you want to
       | work (without a great tower of VMs) in 10 or 20 years' time. C
       | would only be a reasonable choice if you have a way of checking
       | that your program does not cause any undefined behaviour. A C
       | program that causes undefined behaviour can quite easily stop
       | working with a newer version of the compiler.
        
       | Fishysoup wrote:
       | As a scientist I've written massive amounts of shitty code that
       | turned out to be reproducible by lucky accident. Part of the
       | problem are the tools: depending on the field, scientists either
       | use Matlab, C++, Fortran or some other framework that needs to
       | die. They base their code on other ancient code that runs for
       | unknown reasons, and use packages written by other scientists
       | with the same problems.
       | 
       | As someone who's transitioning into industry, I can tell you that
       | scientists will never adopt software engineering principles to
       | any significant extent. It takes too much time to do things like
       | write tests and thorough documentation, learn Git, etc., and
       | software engineering just isn't interesting to most of them.
       | 
       | So the only alternative I see is changing the tools to stuff
       | that's still easy to hack around with but where it's harder to
       | mess up (or it's more obvious when you do so). That doesn't leave
       | a ton of options (that I can see). Some I can think of are:
       | 
       | - Make your code look more like math and less like
       | mathlib.linalg.dot(x1, x2).reshape(a,
       | b).mean().euclidean_distance((x3, x4)) + (other long expression)
       | or whatever: Use a language like Julia
       | 
       | - Your language/environment gets angry when you write massive
       | hairballs, loads of nested for-loops and variables that keep
       | getting changed: Use a language like Rust, and/or write more
       | modular code with a functional-leaning language like Rust or
       | Julia.
       | 
       | - You're forced to make your code semi-understandable to you and
       | others more than an hour after writing it: Forcing people to
       | write documentation isn't gonna work (a lot). Forcing sensible
       | variable names is slightly more realistic. More likely, you need
       | some combination of the above two things that just make your code
       | more legible.
       | 
       | How do you make that happen? No idea.
        
       | biophysboy wrote:
       | I'm a grad student in biophysics - even if I wrote perfect code,
       | it would almost certainly go obsolete in 10 years because the
       | hardware that it interfaces with would go obsolete.
        
       | dekhn wrote:
       | The longest-running code I wrote as a scientist was a sandwich
       | ordering system. I worked for a computer graphics group at UCSF
       | and while taking a year off from grad school while my simulations
       | ran on a supercomputer, and we had a weekly group meeting where
       | everybody ordered sandwiches from a local deli.
       | 
       | It was 2000, so I wrote a cgi-bin in Python (2?) with a MySQL
       | backend. The menu was stored in MySQL, as were the orders. I
       | occasionally check back to see if it's still running, and it is-
       | a few code changes to port to Python3, a data update since they
       | changed vendors, and a mysql update or two as well.
       | 
       | It's not much but at least it was honest work.
        
       | jnxx wrote:
       | Very related to this, see also Hinsens blog post:
       | http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stabilit...
       | 
       | I think that GNU Guix is extremely well-suited to improve this
       | situation.
       | 
       | Also, one could think this is an academic problem, in the sense
       | of am otherwise unimportant niche problem. It really isn't, it is
       | just like in many other topics that academics get confronted
       | first with this issue. I am sure that in many medium or large
       | companies there are some Visual Basic or Excel code bases which
       | are important but could turn out extremely hard to reproduce.
       | This issue will only get more burning with today's fast-moving
       | ecosystems where backward-compatibility is more a moral ideal
       | than an enforced requirement.
       | 
       | It is well known that ransomware can wipe-out businesses if
       | critical business data is lost. But more and more businesses and
       | organizations also have critical, and non-standard, software.
        
         | neuromantik8086 wrote:
         | Guix is one of several solutions that has been touted as a
         | solution. Another one that is quite popular in HPC circles is
         | Spack (https://spack.readthedocs.io/en/latest/).
         | 
         | At my institute, we actually tried out Spack for a little bit,
         | but consistently felt like it was implemented more as a
         | research project rather than something that was production-
         | level and maintainable. In large part, this was due to the
         | dependency resolver, which attempts to tackle some very
         | interesting CS problems I gather (although this is a bit above
         | me at the moment; these problems are discussed in detail at htt
         | ps://extremecomputingtraining.anl.gov//files/2018/08/ATPE...),
         | but which produces radically different dependency graphs when
         | invoked with the same command across different versions of
         | Spack.
         | 
         | I've since come to regard Spack as the kind of package manager
         | that science deserves, with conda being the more pragmatic /
         | maintainable package manager that we get instead .
         | Spack/Guix/nix are the best solution in theory, but they come
         | with a host of other problems that made them less desirable.
        
           | jnxx wrote:
           | > Spack/Guix/nix are the best solution in theory, but they
           | come with a host of other problems that made them less
           | desirable.
           | 
           | I would be quite interested to learn more what these problems
           | are, in your experience. I've only tried Guix (on top of
           | Debian and Arch) and while it is definitively more resource-
           | hungry (especially in terms of disk space), I don't percive
           | it as impractical.
        
             | yjftsjthsd-h wrote:
             | As someone coming from the computing side of things, I
             | found nix to be quite difficult to grok enough to write a
             | package spec, and guix was pretty close, at least in part
             | because of the whole "packages are just side-effects of a
             | functional programming language" idea. At least nix also
             | suffers from a lot of "magic"; if you're trying to package,
             | say, an autotools package then the work's done for you -
             | and that's great, right up until you try to package
             | something that doesn't fit into the existing patterns and
             | you're in for a world of hurt.
             | 
             | Basically, the learning curve is nearly vertical.
        
               | rekado wrote:
               | > guix was pretty close, at least in part because of the
               | whole "packages are just side-effects of a functional
               | programming language" idea
               | 
               | This must be a misunderstanding. One of the big visible
               | differences of Guix compared to Nix is that packages are
               | first-class values.
        
               | yjftsjthsd-h wrote:
               | You're right; on further reading I can see guix making
               | packages the actual output of functions. I do maintain
               | that the use of a whole functional language to build
               | packages raises the barrier to entry, but my precise
               | criticism was incorrect.
        
       | akerro wrote:
       | Code written in Oak still works in Java 14. You can still write
       | `public abstract interface BlaBla{}` and it still works. If it
       | doesn't work (due to reflection safety changes in Java9), it sill
       | surely compile with newer compiler.
       | 
       | Another thing, are tools used to compile still available? I tried
       | to compile my BCS Android+native OpenCV project and failed
       | quickly. Gradle removed some plugin for native code integration,
       | another plugin was no longer maintained, it had internal check
       | for gradle version and it said "I'm designed to work with gradle
       | >= 1.x < 3.x" and just refused to run under 6.x ... I would have
       | to fork that plugin, make it work with newer Gradle or find
       | replacement. I was obviously too lazy and stopped working on that
       | project before I even started.
       | 
       | I'm sure if (would) put more effort into making the build process
       | reproducible, it would work effortlessly, but I didn't care at
       | the point. I wrote it using beta release of OpenCV that's also no
       | longer maintained, because there are better, faster official
       | alternatives available.
        
         | therealx wrote:
         | Or use the old version of Gradle? It sounds like creating a
         | vm/container/whatever with the old versions of everything is
         | the fastest path, although I understand not wanting to do it
         | after some point.
        
       | mensetmanusman wrote:
       | Yes, it is all pasted into my thesis, comments and all, like all
       | code should.
        
       | nanddalal wrote:
       | GitHub offers a free tier for GitHub actions with 2,000 Actions
       | minutes/month [1]. This could be useful:
       | 
       | 1. write some unit tests which don't use too much compute
       | resources (so you can stick to the free tier)
       | 
       | 2. package your code into a docker where the tests can be run
       | 
       | 3. wire up the docker with tests to GitHub Actions
       | 
       | This way now you have continuous testing and can make sure your
       | codes keep running.
       | 
       | References:
       | 
       | [1] https://github.com/pricing
        
         | MattGaiser wrote:
         | Even if it broke, who would go back and fix it?
         | 
         | I do not see that happening, especially with complex library
         | bugs.
        
         | jnxx wrote:
         | > package your code into a docker
         | 
         | docker is not a general solution for this.
         | 
         | What is needed is a way to re-generate everything from source
         | and from scratch.
        
       | snowwrestler wrote:
       | The gold standard for a scientific finding is not whether an
       | particular experiment can be repeated, it is whether a
       | _different_ experiment can confirm the finding.
       | 
       | The idea is that you have learned something about how the
       | universe works. Which means that the details of your experiment
       | should not change what you find... assuming it's a true finding.
       | 
       | Concerns about software quality in science are primarily about
       | avoiding experimental error at the time of publication, not the
       | durability of the results. If you did the experiment correctly,
       | it doesn't matter if your code can run 10 years later. Someone
       | else can run their own experiment, write their own code, and find
       | the same thing you did.
       | 
       | And if you did the experiment incorrectly, it also doesn't matter
       | if you can run your code 10 years later; running wrong code a
       | decade later does not tell you what the right answer is. Again--
       | conducting new research to explore the same phenomenon would be
       | better.
       | 
       | When it comes to hardware, we get this. Could you pick up a PCR
       | machine that's been sitting in a basement for 10 years and get it
       | running to confirm a finding from a decade ago? The real question
       | is, why would you bother? There are plenty of new PCR machines
       | available today, that work even better.
       | 
       | And it's the same for custom hardware. We use all sorts of
       | different telescopes to look at Jupiter. Unless the telescope is
       | broken, it looks the same in all of them. Software is also a tool
       | for scientific observation and experimentation. Like a telescope,
       | the thing that really matters is whether it gives a clear view of
       | nature at the time we look through it.
        
         | nextaccountic wrote:
         | > running wrong code a decade later does not tell you what the
         | right answer is.
         | 
         | It can tell, however, exactly where the error lies (if the
         | error is in software at all). Like a math teacher that can
         | circle where the student made a mistake in an exam.
        
         | ISL wrote:
         | Reproducibility is about understanding the result. It is the
         | modern version of "showing your work".
         | 
         | One of the unsung and wonderful properties of reproducible
         | workflows is the fact that it can allow science to be salvaged
         | from an analysis that contains an error. If I had made an error
         | in my thesis data analysis (and I did, pre-graduation), the
         | error can be corrected and the analysis re-run. This works even
         | if the authors are dead (which I am not :) ).
         | 
         | Reproducibility abstracts the analysis from data in a rigorous
         | (and hopefully in the future, sustainable) fashion.
        
       | suyjuris wrote:
       | I wrote a tool to visualise algorithms for binary decision
       | diagrams [1], also in an academic context, where the problem was
       | basically the same: Does the code still run in ten years? In
       | particular, the assumption is that I will not be around then, and
       | no one will have any amount of time to spend on maintenance.
       | 
       | In the end, I chose to write it in C++ with minimal dependencies
       | (only X11, OpenGL and stb_truetype.h), with custom GUI, and
       | packed all resources into a single executable.
       | 
       | A lot of effort, but if it causes the application to survive 5x
       | as long then it is probably worth spending twice the time.
       | 
       | [1] https://github.com/suyjuris/obst
        
       | yummypaint wrote:
       | Not to disagree with any points in the article, but i would point
       | out that the sciences also have cases of very old code being
       | maintained and used in production successfully. For example we
       | still use a kinematics code written in fortran over half a
       | century ago. In practice parts of it get reimplemented in newer
       | projects, but the original still sees use.
        
       | proverbialbunny wrote:
       | This seems like a fluff piece because:
       | 
       | 1) Prototype code scientists write tends to be written at a high
       | level, so barring imported libraries not up and disappearing,
       | there is a high chance that code written by scientists will run
       | 10 years later. There is a higher chance it will run than
       | production code written at a lower level.
       | 
       | 2) The article dives into documentation but scientists code in
       | the Literate Programming Paradigm[0] where the idea is you're
       | writing a book and the code is used as examples to support what
       | you're presenting. Of course scientists write documentation.
       | Presenting findings is a primary goal.
       | 
       | 3) Comments here have mentioned unit testing. Some of you may
       | scoff at this but when prototyping, every time you run your code,
       | the output from it teaches you something, and that turns into an
       | iterative feedback loop, so every time you learn something you
       | want to change the code. Unit tests are not super helpful when
       | you're changing what the code should be doing every time you run
       | it. Unit tests are better once the model has been solidified and
       | is being productionized. Having a lack of unit testing does not
       | make 10 year old prototype code harder to run.
       | 
       | [0] https://en.wikipedia.org/wiki/Literate_programming
        
         | comicjk wrote:
         | > scientists code in the Literate Programming Paradigm
         | 
         | I wish. In my career as a computational scientist I have never
         | seen this in practice, either in academia or industry.
         | 
         | On unit testing, I half agree. Most unit tests get quickly
         | thrown out as the code changes, so it's a depressing way to
         | write research code. But tests absolutely help someone trying
         | to run old code - they show what parts still work and how to
         | use them.
        
       | noobermin wrote:
       | Does code from ten years ago run ever? Try running something on
       | that runs on Python 2 on the current python interpreter today.
        
         | jnxx wrote:
         | Python is an extremely bad example.
         | 
         | Try twenty years old Common Lisp code. Or Fortran.
        
           | noobermin wrote:
           | What are we doing then? We can choose bad scientific code but
           | use "good examples" for other types of code? Seems
           | convenient.
        
       | pvaldes wrote:
       | Yes, it still run, of course.
        
       | tenYearOldCode wrote:
       | Yes it does!
       | 
       | 10 PRINT "HELP" 20 GOTO 10
        
       | drummer wrote:
       | This is why backwards compatibility is important. Many people
       | have a problem when this is raised as a primary concern and goal
       | of the C and C++ languages, but it is a must have feature.
        
       | modeless wrote:
       | Let's not criticize people who release their code. Let's
       | criticize the people who _don 't_ release their code instead. We
       | don't need more barriers to releasing code.
       | 
       | I'd much rather fix someone's broken build than reimplement a
       | whole research paper from scratch without the relevant details
       | that seem to always be accidentally omitted from the paper.
        
       | CoffeeDregs wrote:
       | Would be a super-useful to have a sciencecode.com service which
       | is a long-term CI system for scientific code and its required
       | artifacts. Journals could include references to
       | sciencecode.com/xyz and sciencecode.com/abc could be derived from
       | sciencecode.com/xyz. Given Github Actions and Forks, the only
       | thing holding this back is scientists doing it (and, possibly,
       | the HN community helping).
       | 
       | And I get that it's not fun to have your code publicly critiqued
       | but it's also not fun to live lives based on (medical,
       | epidemiological) unpublished, unaudited, unverified code...
       | 
       | EDIT: hell, just post a "HELP HN: science code @
       | github.com/someone/project" and I'd be surprised if you weren't
       | overwhelmed with offers of help.
        
       | tanilama wrote:
       | If you wrap you code into Docker, I would say...probably.
        
       | vikramkr wrote:
       | In addition - does your ten year old protocol still work? Do your
       | 10 year old results replicate? This isn't isolates to just
       | programming - making robust and reproducible tools, code,
       | equiptment, protocols, and results is undervalued across all
       | areas of research, leading to situations where protocols
       | published weren't robust so a change in reagent supplier leads to
       | failure, or to protocols so dependent on weird local or
       | unreported environmental conditions or random extra steps that
       | attempting to replicate them leaves you nowhere. Robustness needs
       | to be improved in general.
        
       | ecmascript wrote:
       | I am not a scientist, but actually I think most of the code I
       | wrote 10 years ago still is in production at different companies.
        
         | jnxx wrote:
         | Companies with code in production have a short-term real
         | incentive to keep that code running.
         | 
         | This is different from code from research projects, which is on
         | many cases just run a few times, and in other times, written by
         | somebody who has, if he / she wants to make any kind of career
         | in the field, to change to a new workplace and will not have
         | any time to maintain that old code.
         | 
         | There are a few long-running mayor science projects, say, in
         | particle physics or astronomy, which are forced to work
         | differently. And in these environments, there are actually
         | people who have knowledge on both science and software
         | engineering.
        
         | dhosek wrote:
         | If it's still in production, it's most likely still getting
         | some level of maintenance attention as well. When I was an
         | undergrad I did some coding for some of the professors at the
         | college. A lot of scientific programming is stuff that gets
         | written and run once and never run again. Try dusting off some
         | 10-year-old C++ and try compiling it with the current version
         | of your compiler.
        
         | hex1848 wrote:
         | I just shut down a VB6 app that had been running since 1998 at
         | the company I work for last month. The leadership team finally
         | decided they didn't want to sell that particular feature
         | anymore. We still have a handful of apps from around that time
         | period that do various small tasks. One day it will be a
         | priority to get rid of them.
        
       | brobdingnagians wrote:
       | The Fossil documentation has this gem:
       | 
       | > "The global state of a fossil repository is kept simple so that
       | it can endure in useful form for decades or centuries. A fossil
       | repository is intended to be readable, searchable, and extensible
       | by people not yet born."
       | 
       | I always liked that they planned for the long-term. Keeping that
       | in mind helps you build systems that will work in 10 years, or in
       | 100, if it happens to last that long. When you are building a
       | foundation, like a language or database, it is nice to plan for
       | long term support since so much depends on it. C has stayed
       | mostly recognizable over the years, much more so than C++ or
       | other high level languages. When your design is simple, you can
       | have a "feature complete" end.
        
       | driverdan wrote:
       | This is a challenge with many types of code.
       | 
       | Earlier this year it took me a weekend to get a 7 year old Rails
       | project running again. It's a simple project but the packages it
       | used had old system dependencies that were no longer available.
       | 
       | I ended up having to upgrade a lot of things, including code,
       | just to get it running again.
        
         | therealx wrote:
         | I ran into this too. Rails has changed a lot in 7 years, even
         | if you don't see it. My friend wanted to learn and somehow
         | found the original demo/getting started page and was
         | frustrated.
        
       | shadowgovt wrote:
       | I'm not sure how interesting the question is, given how few
       | software engineers outside academic sciences have 10-year-old
       | code that still runs (unless they've maintained a dedicated
       | hardware platform for it without regular software updates).
        
       | noisy_boy wrote:
       | Not a scientist but 13+ year old Perl code I wrote is still
       | running (based on my catchup chats with ex-colleagues) to
       | generate MIS reports.
        
       | goalieca wrote:
       | > Today, researchers can use Docker containers (see also ref. 7)
       | and Conda virtual environments (see also ref. 8) to package
       | computational environments for reuse.
       | 
       | Docker is also flawed. You can perfectly reproduce it today but
       | what about in 10 years. I can barely go back to our previous
       | release for some dockerfiles.
        
         | jnxx wrote:
         | Guix is arguably better.
        
         | Sulfolobus wrote:
         | Similarly, conda envs can break in weeks due to package
         | changes.
         | 
         | Even if you remove build versioning and all transitive
         | dependencies from your env (making it less reproducible...)
         | they will break pretty damn quick.
        
       | _wldu wrote:
       | IMO, this is why ISO standard programming languages are so
       | important and will be around forever. One can always compile with
       | --std=c++11 (or whatever) and be certain it will work.
        
         | Kenji wrote:
         | Hahaha you would be surprised. Compiling complex C++ projects
         | is incredibly difficult.
        
       | ris wrote:
       | A well-written Nix package should be buildable at any point in
       | the future, producing near-identical results. This is why I
       | sometimes publish Nix packages for obscure & hard to build pieces
       | of software that I'm not likely to maintain - because it's like
       | rescuing a snapshot of them from oblivion.
        
       | bobcostas55 wrote:
       | How do you do reproducible builds in R? It seems like a huge PITA
       | to specify versions of R and especially the packages used...
        
       | magv wrote:
       | An interesting concern is that there often is no single piece of
       | code that has produced the results of a given paper.
       | 
       | Often it is a mixture of different (and evolving) versions of
       | different scripts and programs, with manual steps in between.
       | Often one starts the calculation with one version of the code,
       | identifies edge cases where it is slow or inaccurate, develops it
       | further while the calculations are running, does the next step
       | (or re-does a previous one) with the new version, possibly
       | modifying intermediate results manually to fit the structure of
       | the new code, and so on -- the process it interactive, and not
       | trivially repeatable.
       | 
       | So the set of code one has at the end is not the code the results
       | were obtained with: it is just the code with the latest edge case
       | fixed. Is it able to reproduce the parts of the results that were
       | obtained before it was written? One hopes so, but given that
       | advanced research may take months of computer time and machines
       | with high memory/disk/CPU/GPU/network speed requirements only
       | available in a given lab -- it is not at all easy to verify.
        
         | vharuck wrote:
         | >the process it interactive, and not trivially repeatable.
         | 
         | The kind of interaction you're describing should be frowned
         | upon. It requires the audience to trust the manual data edits
         | are no different than rerunning the analysis. But the
         | researcher should just rerun the analysis.
         | 
         | Also, mixing old and new results is a common problem in
         | manually updated papers. It can be avoided by using
         | reproducible research tools like R Markdown.
        
           | James_Henry wrote:
           | If it can't be trivially repeated, then you should publish
           | what you have with an explanation of how you got it. Saying
           | that "the researcher should just rerun the analysis" is not
           | taking into account the fact that this could be very
           | expensive and that you can learn a lot from observations that
           | come from messy systems. Science is about more than just
           | perfect experiments.
        
         | i-am-curious wrote:
         | And any such "research" should go in the bin. Reproducibility
         | of final results a me d their review is key.
        
           | James_Henry wrote:
           | No, you should publish this research and be clear with how it
           | all worked out and someone will reproduce it in their own
           | way.
           | 
           | Reproducibility isn't usually about having a button to press
           | that magically gives you the researchers' results. It's also
           | not always a set of perfect instructions. More often it is a
           | documentation of what happen and what was observed as the
           | researcher's believe is important to the understanding of the
           | research questions. Sometimes we don't know what's important
           | to document so we try to document as much as possible. This
           | isn't always practical and sometimes it is obviously
           | unnecessary.
        
       | uberdru wrote:
       | Sure it does. On a 10-year old machine.
        
       | dhosek wrote:
       | Back in the 80s/90s I was heavily into TeX/LaTeX--I was
       | responsible for a major FTP archive that predated CTAN, wrote
       | ports for some of the utilities to VM/CMS and VAX/VMS and taught
       | classes in LaTeX for the TeX Users Group. I wrote most of a book
       | on LaTeX based on those classes that a few years back I thought
       | I'd resurrect. Even something as stable as LaTeX has evolved
       | enough that just getting the book to recompile with a
       | contemporary TeX distribution was a challenge. (On the other
       | hand, I've also found that a lot of what I knew from 20+ years
       | ago is still valid and I'm able to still be helpful on the TeX
       | stack exchange site).
        
       | JoeAltmaier wrote:
       | Strangely, they were running (some of ) the code on old hardware.
       | That's hardly a useful case, and much easier than 'resurrecting'
       | the code for modern reuse.
        
         | therealx wrote:
         | Something with non-standard asm?
        
           | JoeAltmaier wrote:
           | That sounds like a big issue. And certainly part of getting
           | 10-year-old code resurrected.
        
       | lordnacho wrote:
       | You often run into code of the "just get it to work" variety,
       | which has the problem that when it was written, maintainability
       | was bottom of the list of priorities. Often the author has a goal
       | that isn't described in terms of software engineering terms:
       | calculate my option model, work out the hedge amounts, etc.
       | 
       | And the people who write this kind of code tend not to think
       | about version control, documentation, dependency management,
       | deployment, and so forth. The result is you get these fragile
       | pieces holding up some very complex logic, which takes a lot of
       | effort to understand.
       | 
       | IMO there should be a sort of code literacy course that everyone
       | who writes anything needs to do. In a way it's the modern
       | equivalent of everyone who writes needing to understand not just
       | grammar but style and other writing related hygiene.
        
         | dhosek wrote:
         | Even with all the best practices, things outside your control
         | can cause issues. A lot of the code that software engineers
         | write is subject to tiny bits of continual maintenance as small
         | changes in the runtime environment take place. Imagine ten
         | years of those changes deployed all at once. Even something
         | employing all the best practices of ten years ago could be a
         | challenge. You've got a subversion repository somewhere with
         | the code which was compiled to run on Windows XP with Windows
         | Visual Studio C++ 2008 Express but you've abandoned Windows for
         | Linux. If you're lucky the code will compile with the
         | appropriate flags to support C++98 in gcc, but who knows? And
         | maybe there's a bunch of graphical stuff that isn't supported
         | at all anymore or a computational library you used which was
         | only distributed as a closed-source library for 32-bit Windows.
        
         | throwanem wrote:
         | The fundamental problem here, as you note, is that scientists
         | are rarely also engineers, and don't really share our
         | desiderata. The point is to develop and publish a result, and
         | engineering analysis code for resiliency is of secondary
         | concern at best when that code isn't likely to need to be used
         | again once the paper is finished.
         | 
         | The "Software Carpentry" movement [1] has in the past decade
         | tried to address this, as I recall. It's very much in the vein
         | of the "basic literacy" course you suggest. I can't say how far
         | they've gotten, and I'm no longer adjacent to academia, but
         | based on what I do still see of academics' code, there's a long
         | way still to go.
         | 
         | [1] https://software-carpentry.org/
        
           | detaro wrote:
           | And that scientists also are rarely supported by programmers,
           | or if they are it's an unstable and unappreciated position.
        
             | throwanem wrote:
             | Having had that exact experience - yeah, that can be a big
             | problem too.
             | 
             | Researchers and engineers _can_ work really well together,
             | because the strengths of each role complement the
             | weaknesses of the other, and I think it would be very nice
             | to see that actually happen some day.
        
               | detaro wrote:
               | It doesn't help with the issue of hard-to-reproduce work,
               | but apparently working for a company making _products_
               | aimed at scientists can be a place to see this happen (if
               | the company is good about talking to customers).
        
               | throwanem wrote:
               | Interesting, thanks! I'll keep that in mind for when I'm
               | next looking for a new client.
        
             | mnw21cam wrote:
             | Being in such a position, I can say that I am appreciated,
             | but not in a manner that results in job stability and
             | promotion. It's a massive problem in academia, and there's
             | an attempt to get the position recognised and call it
             | "Research Software Engineer", with comparable opportunities
             | for promotion and job stability as a researcher. However,
             | it's not going massively well. Academic job progression is
             | still almost completely purely based on the ability to get
             | first or last author papers in top journals. I have lots of
             | papers where I am a middle author, because I wrote the
             | software that did the analysis that was vital for the paper
             | to even exist, but it largely doesn't count. And I'm lucky
             | - many software engineers don't even get put in as a middle
             | author on the paper they contributed to.
        
             | non-entity wrote:
             | Ive seen job listings for "scientific programmers" where
             | what they're asking for is a scientist who happens to know
             | a little programming.
        
               | detaro wrote:
               | Yeah - who then likely doesn't have that much software
               | experience, and worse, if they want to _stay_ a scientist
               | such a role is often a bad career move, because they help
               | others get ahead with their research instead of
               | publishing their own work. Even if they build some really
               | great domain-specific software tool in that role, it
               | often doesn 't count as much.
               | 
               | Or it's an informal thing done by some student as a side-
               | gig. Which can be cool, but is not a stable long-term
               | thing.
               | 
               | I hope there's exceptions.
               | 
               | EDIT: weirdest example I've seen was a lab looking for
               | _sysadmins_ with PhD preferred. I wonder if they had some
               | funding source that only paid for  "scientists" or what
               | was going on there...
        
               | mnw21cam wrote:
               | Simple answer for that. University pay scales tend to be
               | fairly inflexible in terms of which grades you are
               | eligible for without a PhD, if you are counted as
               | academic staff. If you're non-academic staff (like the
               | cleaner, the receptionist, and the central IT sysadmin)
               | then you can be paid a fair wage based upon your
               | experience, but if you are academic staff, then you have
               | a hard ceiling without a PhD. An individual research
               | group with a grant may only be able to hire academic
               | staff, but they want a sysadmin, so in order to be able
               | to pay them more than a pittance they would have to have
               | a PhD.
        
           | neutronicus wrote:
           | Nah.
           | 
           | The _fundamental_ problem is that scientific code is produced
           | by entry-level developers:
           | 
           | 1. Paid below-market wages
           | 
           | 2. With no way to move up in the organization
           | 
           | 3. With lots of non-software responsibilities
           | 
           | 4. With an expectation of leaving the organization in six
           | years
           | 
           | As long as the grunt work of science is done by overworked
           | junior scientists whose careers get thrown to the wolves no
           | matter what they do, you're not going to get maintainable
           | code out of it.
        
             | jnxx wrote:
             | Even more fundamental is that there is no maintenance
             | budget for important scientific libraries and tools.
             | Somebody wrote them as part of their job, and the person
             | who wrote it, is now working somewhere else.
        
             | throwanem wrote:
             | I mean, senior researchers in stable roles don't really do
             | any better. Just to pick the first example off the top of
             | my head - one of the investigators I worked with, during my
             | year as a staff member of an academic institution most of a
             | decade ago, is also one of my oldest friends; he's been a
             | researcher there for what must be well past ten years by
             | now. Despite one of his undergrad degrees being actually in
             | CS, I still find ample reason whenever I see it to give him
             | a hard time about the maintainability of his code.
             | 
             | Like I said before, it's a field in which people really
             | just don't give a damn about engineering. Which is fair!
             | There's little reason why they should, as far as I've ever
             | been able to see.
        
         | justinmeiners wrote:
         | Unit testing, readability version control, documentation, etc
         | are all engineering practices for the purpose of making ongoing
         | development organized (especially for teams).
         | 
         | Why would a researcher need to do this, when in most cases all
         | that they use is the output, and in CS/math it's only a minimal
         | prototype demonstrating operation of their principle?
         | 
         | All of the other stuff would certainly be nice, but they don't
         | need to adopt our whole profession to write code
        
       | [deleted]
        
       | matsemann wrote:
       | Would an abandoned project I wrote 10 years ago still run? The
       | code is probably fine, but getting it to actually run by linking
       | up whatever libraries, sdks and environment correctly could be
       | troublesome. Even a small pipeline a wrote a few weeks ago I had
       | trouble re-running, because I forgot there was a manual step I
       | had to do on the input file.
       | 
       | Expecting more rigid software practices of scientists than
       | software engineers would be wrong. I don't think they should have
       | to tangle with this, tools should aid them somehow.
        
         | tyingq wrote:
         | It's interesting that it's often easier to get something 25+
         | years old running because I need fewer things. Not so hard to
         | find, say "DosBox" and and old version of Turbo Pascal.
        
           | lebuffon wrote:
           | Sounds like simplicity for the win.
           | 
           | The complex house of cards we currently stand on seems
           | fragile by comparison.
        
             | tyingq wrote:
             | We also benefit, for that old stuff, from enthusiasts that
             | build cool stuff. Like DosBox, Floppy Emulators, etc.
             | 
             | I doubt there are going to be folks nostalgic for the
             | complex mess we have now.
        
               | lebuffon wrote:
               | Indeed. I participate in Atariage.com and the level of
               | dedication is amazing.
               | 
               | Are there groups for Win 3.1, Win95?
        
           | jnxx wrote:
           | This. In the last years, conventional software engineering
           | has in many cases experienced an explosion in complexity
           | which will make very very difficult to maintain stuff in the
           | long run. This only works because over 90% of startups go
           | bust anyways, within a few years.
        
           | dhosek wrote:
           | When I was in my 20s I managed to get a contract updating
           | some control software for a contact lens company on the basis
           | of my happening to own an old copy of Borland C++ 1.0.
        
             | eythian wrote:
             | Had a similar experience getting a contract updating a mass
             | spectrometer control system because I had extensive high
             | school experience in Turbo Pascal.
        
         | zimbatm wrote:
         | If the same project had been packaged with Nix, it would
         | probably still compile. People regularly checkout older
         | versions of nixpkgs to get access to older package releases.
         | 
         | One of the key property is that the build system enforces all
         | the build inputs to be declared. And the other one is to keep a
         | cache of all the build inputs like sources because upstream
         | repositories tend to disappear over time.
        
       | cube00 wrote:
       | The day when code used to produce a paper must also be published
       | can not come soon enough.
        
         | dandelion_lover wrote:
         | It won't happen until researchers are forced to do it. Please
         | sign petition at https://publiccode.eu and have a look at my
         | other comment here.
        
         | goalieca wrote:
         | Arguably, data is just as important. Academics hoard their data
         | and try to milk out every paper they can from it. The reward
         | system is based on publishing as many papers as possible rather
         | than just making a meaningful contribution.
        
           | belval wrote:
           | Data is much trickier because your data source for medical,
           | education or even just regular businesses don't want the
           | added legal weight of making data freely available.
           | 
           | This is obviously a shame, I was working on segmentation of
           | open wounds and most papers include a "we are currently in
           | talks with the hospital to make the data available". If you
           | contact the authors directly they will tell you that their
           | committee blocked it because the information is too
           | sensitive.
        
             | abathur wrote:
             | It seems like there can be a balance between "the results
             | are unverifiable because no one else can touch the data"
             | and "effectively open-source the dataset"?
             | 
             | Something like: "To make it easier to verify the code
             | behind this paper, we've used <accepted standard
             | project/practice> to generate a synthetic dataset with the
             | same fields as the original and included it with the source
             | code. The <data-owning institution> isn't comfortable with
             | publishing the full dataset, but they did agree to provide
             | the same data to groups working on verification studies as
             | long as they're willing to sign a data privacy agreement.
             | Send a query to <blahblahblah> ..."
        
               | belval wrote:
               | > but they did agree to provide the same data to groups
               | working on verification studies as long as they're
               | willing to sign a data privacy agreement. Send a query to
               | <blahblahblah> ..."
               | 
               | This would be administrative overhead, it will be shut
               | down 9 times out of 10. I understand why this might seem
               | easy but it really is not, you can have multiple hospital
               | that each have their committee that agreed to give the
               | researcher their data. They don't have a central
               | authority that you can appeal to, much less someone that
               | can green light your specific access.
               | 
               | As for the synthetic datasets that's basically just
               | having tests and was advocated for elsewhere in this
               | thread.
        
           | jgeada wrote:
           | The reward system also prevents dead ends from being
           | identified, publication of approaches that did not lead to
           | the expected results or got nul results, publishing
           | confirmations of prior papers, etc.
           | 
           | Basically, the reward system is designed to be easy to
           | measure and administer, but is not actually useful in any way
           | to the advancement of science.
        
         | WanderPanda wrote:
         | Making this mandatory might have bad downstream effects like
         | prohibiting publication of some research at all (GPT-X I am
         | looking at you)
        
           | qppo wrote:
           | Closed source research isn't publication, it's advertisement.
        
             | WanderPanda wrote:
             | So R&D is not a thing, but A&D is? That would be new to me
        
         | jhrmnn wrote:
         | In all my papers the results were produced on multiple days
         | (spanning months), with multiple versions of the code, and they
         | are computationally too expensive to reproduce with the final
         | version of the code. I'm trying to keep track of all the used
         | versions, but given that there is no automated framework for
         | this (is there?) and research involves lots of experiments,
         | it's never perfect. Given this context, any ideas how to do it
         | better?
        
           | chriswarbo wrote:
           | I tend to do the following (some or all, depending on the
           | situation):
           | 
           | - Use known, plaintext formats like LaTeX, Markdown, CSV,
           | JSON, etc. rather than undocumented binary formats like those
           | of Word, Excel, etc.
           | 
           | - Keep sources in git (just a master branch will do)
           | 
           | - Write all of the rendering steps into a shell script of
           | Makefile, so it's just one command with no options
           | 
           | - I go even further and use Nix, with all dependencies pinned
           | (this is like an extreme form of Make)
           | 
           | - Code for generating diagrams, graphs, tables, etc. is kept
           | in git alongside the LaTex/whatever
           | 
           | - Generated diagrams/graphs/tables are _not_ included in git;
           | they 're generated during rendering, as part of the shell-
           | script/Makefile/Nix-file; the latter only re-generate things
           | if their dependencies have changed
           | 
           | - All code is liberally sprinkled with assertions, causing a
           | hard crash if anything looks wrong
           | 
           | - If journals/collaborators/etc. want things a certain way
           | (e.g. a zip file containing plain LaTeX, with all diagrams as
           | separate PNGs, or whatever) then the "rendering" should take
           | care of generating that (and make assertions about the
           | result, e.g. that it renders to PDF without error, contains
           | the number of pages we're expecting, that the images have the
           | expected dimensions, etc.)
           | 
           | - I push changes from my working copies into a 'repos'
           | directory, which in turn pushes to my Web server and to
           | github (for backups and redundancy)
           | 
           | - Pushing changes also triggers a build on the continuous
           | integration server (Laminar) running on my laptop. This makes
           | a fresh copy of the repo and tries to render the document
           | (this prevents depending on uncommitted files, the absolute
           | directory path, etc.)
           | 
           | Referencing a particular git commit should be enough to
           | recreate the document (this can also be embedded in the
           | resulting document somewhere, for easy reference). Some care
           | needs to be taken to avoid implicit dependencies, etc. but
           | Nix makes this _much_ easier. Results should also be
           | deterministic; if we need pseudorandom numbers then a fixed
           | seed can be used, or (to prove there 's nothing up our
           | sleeves) we can use SHA256 on something that changes on each
           | commit (e.g. the LaTeX source).
           | 
           | For computationally-expensive operations (with relatively
           | small outputs) I'll split this across a few git repos:
           | 
           | 1) The code for setting up and performing the
           | experiments/generating the data goes in one repo. This is
           | just like any other software project.
           | 
           | 2) The results of each experiment/run are kept in a separate
           | git repo. This may be a bad idea for large, binary files; but
           | I've found it works fine for compressed JSON weighing many
           | MBs. Results are always _appended_ to this repo as new files;
           | existing files are never altered, so we don 't need to worry
           | about binary diffs. There should be metadata alongside/inside
           | each file which gives the git commit of the experiment repo
           | (i.e. step 1) that was used, alongside other relevant
           | information like machine specs (if it depends on
           | performance), etc. This could be as simple as a file naming
           | scheme. The exact details for this should be written down in
           | this repo, e.g. in a README and/or a simple script to grab
           | the relevant experiment repo, run it, and store the
           | results+metadata in the relevant place. Results should be as
           | "raw" as possible, so that they don't depend on e.g. post-
           | processing details, or choice of analysis, etc.
           | 
           | 3) I tend to put the writeup in a separate git repo from the
           | results, so that those results can be referenced by commit +
           | filename, without a load of unrelated churn from the writeup.
           | This repo will follow the same advice as above, e.g. code for
           | turning the "raw" results into graphs, tables, etc. will be
           | kept here and run as part of the rendering process. Fetching
           | the particular commit from the results repo should also be
           | one of the rendering steps (Nix makes this easy, or you could
           | use a git submodule, etc.)
           | 
           | I don't know what the best advice is w.r.t. large datasets
           | (GBs or TBs), but I've found the above to be robust for about
           | 5 years so far.
        
           | qppo wrote:
           | That's no different than normal software engineering. We use
           | version control software (VCS, like git) to deal with it. You
           | can include your results in the tracked source.
           | 
           | For what it's worth, using results from outdated source code
           | is extremely suspicious. This is a frequent problem in
           | software development where we have tests or benchmarks based
           | on stale code, and it's almost always incorrect. I would not
           | trust your results if they are not created with the most up
           | to date version of your software at all.
        
           | xen0 wrote:
           | My first thought: Demand the journals provide hosting for a
           | code repo that is part of your paper. For every numerical
           | result, specify the version (e.g. a git tag) used to generate
           | your result.
           | 
           | And if that means scientists need to learn about version
           | control, well... they should if they're writing code.
        
             | mnw21cam wrote:
             | For a paper I recently submitted, the journal demanded a
             | github release of the software.
        
             | chriswarbo wrote:
             | I agree, except that AFAIK "tags" in git are not fixed,
             | they can be deleted and re-created to point at a different
             | commit. Hence I prefer to use (short) commit IDs, since
             | changing them is infeasible.
        
               | xen0 wrote:
               | I'm assuming the repo, once hosted and the paper is
               | published, is "fixed" and cannot be changed by the
               | authors.
               | 
               | But commit ids work just as well.
        
       | jpeloquin wrote:
       | The point of being able to run ten-year-old code is the ability
       | to replay an analysis (exact replication). This allows an
       | analysis to be verified after the fact, which increases trust and
       | helps figure out what happened when contradictions appear between
       | experiments. However, if the original work involved physical
       | experimentation or any non-automated steps (as is the case for
       | most science) the ability to run the original code provides only
       | partial replication. Overall the ability to re-run old code is a
       | fairly low priority.
       | 
       | From the perspective of someone who primarily uses computers as a
       | tool to facilitate research, the priority list is closer to:
       | 
       | 1. Retain documentation of what was _meant_ to happen.
       | Objectives, experimental design, experimental  & analysis
       | protocols, relevant background, etc.
       | 
       | 2. Retain documentation of what actually happened, usually in
       | terms of noting deviations from the protocol. This is the purpose
       | of a lab notebook. Pen & paper excels here.
       | 
       | 3. Retain raw data files.
       | 
       | 4. Retain files produced in the course of analysis.
       | 
       | 5. Retain custom source code.
       | 
       | 6. Version control all the above.
       | 
       | 7. Make everything run in the correct order with one command
       | (i.e, full automation).
       | 
       | Only once all the above is achieved would it be worth ensuring
       | that the software used in the analysis can be re-run in 10 years.
       | Solving the "packaging problem" in a typical scientific context
       | (multiple languages, multiple OSes, commercial software, mostly
       | short scripts) is complex. When the outcome of an analysis is
       | suspect, the easiest and most robust approach is to check the
       | analysis by redoing it from scratch. This takes less time than
       | trying to ensure every analysis will run on demand even as the
       | computing ecosystem changes out from under it.
       | 
       | Most of the time spent writing analysis code is deciding _what_
       | the code should do, not actually writing the code. There is
       | generally very little code because few people were involved, and
       | they probably weren 't programmers. So redoing the work from
       | scratch is generally pretty easy, especially for anyone with the
       | skill to routinely produce fully reproducible computational
       | environments.
        
       | Ericson2314 wrote:
       | Glad to see many mentions of Nix in this thread!
       | 
       | I wonder if Nix and Guix should standardize the derivation format
       | both share to kick that off as the agreed-upon "thin waste" other
       | projects and the the academy can standardize around.
        
         | rekado wrote:
         | The derivation format is little more than a compilation
         | artifact (a low-level representation of a build), and I think
         | standardizing on it would not be as useful as it may seem.
        
       | scipute68 wrote:
       | As the systems architect and infra programmer for a scientific
       | startup I'll simply chime in on the production != scientific
       | conversation. When you don't hold your modeling code to the
       | minimal production standard where it counts (documentation,
       | comments, debug) it _will_ cause your evolving team hardship.
       | When that same code goes into production for a startup (as it
       | could/should) you will be causing everyone long nights and 80
       | hour weeks.
        
       | emerged wrote:
       | Any scientist with good foresight would've implemented their code
       | in 6502 for the NES. The emulators are nearly flawless and will
       | probably be around until the end of time.
        
         | yjftsjthsd-h wrote:
         | I once had a thought, that if I wanted to write something that
         | would last forever and run anywhere, I should write it to
         | target DOS, and make sure to test it on FreeDOS in a VM and on
         | DOSBox. That way it would run on a stable ABI with loads of
         | emulators, and via DOSBox it will happily run on all modern
         | desktop OSs (and some non-desktops; IIRC there's at least an
         | Android port).
        
       | dekhn wrote:
       | I wrote a C++ implementation of the AMBER force field in 2003.
       | Still have the source code with its original modification times.
       | Let's see:                 /usr/bin/g++
       | -I/home/dek/sw/rh9/gsl-1.3/include    -c -o NBEnergy.o
       | NBEnergy.cpp       NBEnergy.cpp: In member function 'virtual
       | double NBEnergy::Calculate(Coordinates&, std::vector<Force*>)':
       | NBEnergy.cpp:20:68: error: no matching function for call to
       | 'find(std::vector<atom*>::const_iterator,
       | std::vector<atom*>::const_iterator, const atom*&)'        20 |
       | if (std::find(at1->Excluded.begin(), at1->Excluded.end(), at2) !=
       | at1->Excluded.end())        {             |
       | ^       In file included from
       | /usr/include/c++/9/bits/locale_facets.h:48,
       | from /usr/include/c++/9/bits/basic_ios.h:37,
       | from /usr/include/c++/9/ios:44,                        from
       | /usr/include/c++/9/ostream:38,                        from
       | GeneralParameters.h:6,                        from NBEnergy.h:6,
       | from NBEnergy.cpp:1:
       | /usr/include/c++/9/bits/streambuf_iterator.h:373:5: note:
       | candidate: 'template<class _CharT2> typename
       | __gnu_cxx::__enable_if<std::__is_char<_CharT2>::__value,
       | std::istreambuf_iterator<_CharT> >::__type
       | std::find(std::istreambuf_iterator<_CharT>,
       | std::istreambuf_iterator<_CharT>, const _CharT2&)'         373 |
       | find(istreambuf_iterator<_CharT> __first,           |     ^~~~
       | /usr/include/c++/9/bits/streambuf_iterator.h:373:5: note:
       | template argument deduction/substitution failed:
       | NBEnergy.cpp:20:68: note:   '__gnu_cxx::__normal_iterator<atom*
       | const*, std::vector<atom*> >' is not derived from
       | 'std::istreambuf_iterator<_CharT>'        20 |       if
       | (std::find(at1->Excluded.begin(), at1->Excluded.end(), at2) !=
       | at1->Excluded.end())        {             |
       | ^       make: *** [<builtin>: NBEnergy.o] Error 1
       | 
       | I still have a hardcoded reference to RedHat 9 apparently. But
       | the only error has to do with an iterator, so clearly, something
       | in C++ changed. Looks like a 1-2 line change.
        
         | josefx wrote:
         | You probably didn't include the algorithm header that defines
         | find directly and it stopped compiling once the standard
         | library maintainers cleaned up their own includes. The
         | iostreams headers you include define their own stream iterator
         | specific overload of find and that doesn't match.
        
           | dekhn wrote:
           | Yup, that was it.
           | 
           | After that, I had to install libpython27-dev, and add -fPIC.
           | Then my 17 year old Python module that has linked-in C++ code
           | runs just fine. I'm not surprised- I've been writing cross-
           | platform code that runs for 10+ years for 20+ years.
        
       | pruthvishetty wrote:
       | I read it as does your ten-year-old still run code, and was
       | thinking if this was a challenge for scientists to have their
       | kids do better things than coding.
        
       | dcolkitt wrote:
       | I mean, Python 2->3 alone is gonna kill this challenge for most
       | people.
        
         | djsumdog wrote:
         | You can always run old Python2 stuff in a Docker container, so
         | long as the dependencies haven't disappeared.
        
           | jnxx wrote:
           | As long as it does not use some CUDA hardware which is using
           | tensorflow and Numba which is using a version of llvmlite
           | which does not support Python2 any more.....
           | 
           | This isn't a theoretical example.
        
             | therealx wrote:
             | Then you make a vm or whatnot and install all the old
             | versions of everything. I haven't seen an open source
             | project in a while that doesn't have old versions for
             | download easily. Still, annoying if you can't stand that
             | kind of stuff.
             | 
             | (people seem to be in two camps: either they hate it or
             | have almost no problem with it)
        
               | PeterisP wrote:
               | To clarify, the issue is that the old version of software
               | won't work with the new libraries, and the old libraries
               | won't work with the current GPU models, so you can't run
               | the old code without modification unless you have old
               | hardware as well, and you can't virtualize the GPUs.
        
               | jnxx wrote:
               | Well, where do you download the hardware ? ;-)
        
           | closeparen wrote:
           | Most of the "requirements.txt" I come across in the real
           | world do not actually lock down all deps to Python 2.7
           | compatible versions. I've been able to get most of them
           | running again, but it's a long porcess looking through
           | changelogs to find the last 2.7-compatible version of each
           | dependency.
        
             | hobofan wrote:
             | Yes, because the "requirements.txt" is a dependency
             | requirements file and not a lockfile. It took the Node.js
             | ecosystem an embarrassingly long time to arrive at that
             | insight, and I feel like the Python ecosystem/community
             | still isn't there yet (though finally it's easily usable
             | with Poetry).
        
       | roberto wrote:
       | For my first scientific article, in 2007, I created a Subversion
       | repo with a Makefile. Running `make` would recreate the whole
       | paper: downloading data, running analyses, creating pictures
       | (color or BW, depending on an environment flag) and generating
       | the PDF.
       | 
       | I'm going to try to find the repo and see if it still works.
        
         | O_H_E wrote:
         | Wow, nice. I will be waiting :D
        
       | awkward wrote:
       | Scientific programming is a perfect storm of extremely smart
       | people, with strong abilities to do it themselves, distain for
       | the subject matter, and no direct experience with the price of
       | failing to write portable code. In some circumstances, even
       | parameterizing scripts so that they aren't re-edited with new
       | values for every experiment is an uphill fight, never mind having
       | promotion through environments.
        
       | bluetwo wrote:
       | 18-year old code still runs. And generates revenue.
        
       | jonathanstrange wrote:
       | I think it's unfair to expect from anyone to maintain code
       | forever when the code rot is completely beyond your control, let
       | alone to expect this from scientists who have better things to
       | do. Anything with a GUI is bound to self-destruct, for example,
       | and it's not the programmer's fault. Blame the OS makers and
       | framework/3rd party library suppliers.
       | 
       | The damage can be limited by choosing a programming language that
       | provides good long compatibility. Languages like ANSI C, Ada,
       | CommonLisp, and Fortran fit the bill. There are many more. Heck,
       | you could use Chipmunk Basic. Anything fancy and trendy will stop
       | working soon, though, sometimes even within a year.
        
         | jnxx wrote:
         | > CommonLisp
         | 
         | Common Lisp has fantastic long-term stability. I think that
         | deserves more recognition, as Common Lisp is often almost as
         | fast as C, but is (by default) not riddled with undefined
         | behavior.
         | 
         | It would be superb if Rust could take C's space in
         | computational science and libraries.
        
           | kazinator wrote:
           | Regarding that last comment, that's probably where Rust
           | brings least to the table. C hasn't even taken away the
           | entire space from Fortran.
           | 
           | Lisp is somewhat "riddled" with undefined behavior, but not
           | to the same extent as C, but, more importantly, not with the
           | same nuance.
           | 
           | The ISO C standard makes very little mention of optimization.
           | It does refer to abstract semantics as a point of departure
           | for optimizing, but there is no concept of safety level
           | whereby code that is diagnosed at high safety becomes
           | undefined behavior at low safety. Whether optimized or not, C
           | is always unsafe. No undefined behavior turns into something
           | that must be diagnosed when there is no optimization.
           | 
           | For instance, in theory Common Lisp doesn't define the
           | behavior of an access beyond the bounds of an array any more
           | than C. In practice, all implementations reliably diagnose it
           | at the default high safety level, which is trivially achieved
           | since all the manipulation of arrays goes through library
           | functions. Only if you compile with low safety may it turn
           | into undiagnosed behavior that is unreliable, whereby the
           | compiler emits code that directly accesses the object without
           | checks.
           | 
           | Common Lisp has separate control mechanisms for speed and
           | safety, and these apply to individual expressions in the
           | program, not at the file level, like C compiler options. They
           | are also defined by the standard, unlike C compiler options.
        
       | rougier wrote:
       | For those interested, the results of the challenge are published
       | here: https://rescience.github.io/read/ (volume 6, issue 1).
        
       | rudolph9 wrote:
       | This was in the Guix-science mail list today
       | 
       | > Hello!
       | 
       | In an article entitled "Challenge to scientists: does your ten-
       | year-old code still run?", Nature reports on the Ten Years
       | Reproducibility Challenge organized by ReScience C, led by
       | Nicolas P. Rougier and Konrad Hinsen:
       | https://www.nature.com/articles/d41586-020-02462-7
       | 
       | It briefly mentions Guix as well as the many obstacles that
       | people encountered and solutions they found, including using
       | Software Heritage and floppy disks. :-)
       | 
       | You can read the papers (and reviews!) at:
       | https://rescience.github.io/read/#issue-1-ten-years-
       | reproducibility-challenge
       | 
       | Ludo'.
        
       | cabaalis wrote:
       | I visited my first employer recently (a local government) and
       | found that the first MySQL/PHP database I created, an internal
       | app, had been in continuous use for nearly 18 years.
        
       | djsumdog wrote:
       | This article brings up scientific code from 10 years ago, but how
       | about code from .. right now? Scientists really need to publish
       | their code artifacts, and we can no longer just say "Well they're
       | scientists or mathematicians" and allow that as an excuse for
       | terrible code with no testing specs. Take this for example:
       | 
       | https://github.com/mrc-ide/covid-sim/blob/e8f7864ad150f40022...
       | 
       | This was used by the Imperial College for COVID-19 predictions.
       | It has race conditions, seeds the model multiple times, and
       | therefore has totally non-deterministic results[0]. Also, this is
       | the cleaned up repo. The original is not available[1].
       | 
       | A lot of my homework from over 10 years ago still runs (Some
       | require the right Docker container:
       | https://github.com/sumdog/assignments/). If journals really care
       | about the reproducibility crisis, artifact reviews need to be
       | part of the editorial process. Scientific code needs to have
       | tests, a minimal amount of test coverage, and code/data used
       | really need to be published and run by volunteers/editors in the
       | same way papers are reviewed, even for non-computer science
       | journals.
       | 
       | [0] https://lockdownsceptics.org/code-review-of-fergusons-model/
       | 
       | [1] https://github.com/mrc-ide/covid-sim/issues/179
        
         | arcanus wrote:
         | > Scientists really need to publish their code artifacts, and
         | we can no longer just say "Well they're scientists or
         | mathematicians" and allow that as an excuse for terrible code
         | with no testing specs.
         | 
         | You are blaming scientists but speaking from my personal
         | experience as a computational scientist, this exists because
         | there are few structures in place that incentivize strong
         | programming practices.
         | 
         | * Funding agencies do not provide support for verification and
         | validation of scientific software (typically)
         | 
         | * Few journals require assess code reproducibility and few
         | require public code (few require even public data)
         | 
         | * There are few funded studies to reproduce major existing
         | studies
         | 
         | Until these structural challenges are addressed, scientists
         | will not have sufficient incentive to change their behavior.
         | 
         | > Scientific code needs to have tests, a minimal amount of test
         | coverage, and code/data used really need to be published and
         | run by volunteers/editors in the same way papers are reviewed,
         | even for non-computer science journals.
         | 
         | I completely agree.
        
           | geoalchimista wrote:
           | Second this. Research code is already hard, and with
           | misaligned incentives from the funding agencies and grad
           | school pipelines, it's an uphill battle. Not to mention that
           | professors with an outdated mindset might discourage graduate
           | students from committing too much time to work on scientific
           | code. "We are scientists, not programmers. Coding doesn't
           | advance your career" is often an excuse for that.
           | 
           | In my opinion, enforcing standards without addressing this
           | root cause is not gonna fix the problem. Worse, students and
           | early career researchers will bear the brunt of increased
           | workload and code compliance requirements from journals. Big,
           | well-funded labs that can afford a research engineer position
           | is gonna have an edge over small labs that cannot do so.
        
         | j45 wrote:
         | One of the things I come across is scientists who believe
         | they're capable of learning code quickly because they're
         | capable in another field.
         | 
         | After they embark on solving problems, it does become an
         | eyeopening experience, and one that becomes now about keeping
         | things running.
         | 
         | For those who have a STEM discipline in addition to a software
         | development background >5Y, would you agree with seeing the
         | above?
         | 
         | I would have thought the scientists among us would approach
         | someone with familiarity with software development expertise.
         | (something abstract and requiring a different set of muscles)
         | 
         | One positive emerging is the variety of low/no-code tooling
         | that can replace a lot of this hornets nest coding.
        
           | PeterisP wrote:
           | It's generally not plausible to "approach someone with
           | familiarity with software development expertise" for
           | organizational and budget reasons. Employing dedicated
           | software developers is simply not a thing that happens;
           | research labs overwhelmingly have the coding done by
           | researchers and involved students without having _any_
           | dedicated positions for software development.
           | 
           | In any case you'd need to teach them the problem domain, and
           | it's considered cheaper (and simpler from organizational
           | perspective) to get some phd students or postdocs from your
           | domain to spend half a year getting up to speed on coding
           | (and they likely had a few courses in programming and
           | statistics anyway) than to hire an experienced software
           | developer and have them learn the basics of your domain
           | (which may well take a third or half of the appropriate
           | undergraduate bachelor's program).
        
             | analog31 wrote:
             | As a grad student in physics, I not only wrote code, but
             | also designed my own (computer controlled) electronics,
             | mechanics, optics, vacuum systems, etc. I was my own
             | machinist and millwright. Today I work in a small R&D team
             | within a larger business, and still do a lot of those
             | things myself when needed.
             | 
             | There are many problems with using a dedicated programmer,
             | or any other technical specialist in a small R&D team. The
             | first is keeping them occupied. There was programming to be
             | done, but not full time. And it had to be done in an
             | extremely agile fashion, with requirements changing
             | constantly, often at the location where the problem is
             | occurring, not where their workstation happens to be set
             | up. _Many developers hate this kind of work._
             | 
             | Second is just managing software development. Entire books
             | have been written about the topic, and it's not a solved
             | problem how to keep software development from eating you
             | alive and taking ownership of your organization. Nobody
             | knows how to estimate the time and effort. You never know
             | if you're going to be able to recover your source code and
             | make sense of it, if your programmer up and quits.
             | 
             | With apologies to Clemenceau, programming is too important
             | to be left to the programmers. ;-)
        
             | marmaduke wrote:
             | > Employing dedicated software developers is simply not a
             | thing that happens
             | 
             | This is a really key point that is lost on devs outside of
             | science looking in. In our case, good devs are out of
             | budget by a factor of 2x at least (at an EU public
             | university in a lab doing lots of computational work).
             | 
             | The best we get are engineers which are expected to keep
             | the cluster running, order computers, organize seminars..
             | and eventually resolve any software or dev problems. This
             | doesn't leave much time for caring about reproducibility
             | outside the very core algorithms. The overall workflow can
             | fade away since the next post doc is going to redo it
             | anyway.
        
               | j45 wrote:
               | Are the hiring scientists also paid well-below market
               | wages by that degree?
        
           | jpeloquin wrote:
           | > I would have thought the scientists among us would approach
           | someone with familiarity with software development expertise.
           | 
           | Is there a pool of skilled software architects willing to
           | provide consultations at well-below market wages? Or a Q&A
           | forum full of people interested in giving this kind of
           | advice? (StackOverflow isn't useful for this; the allowed
           | question scope is too narrow.) I guess one incentive to
           | publish one's code is to get it criticized on places like
           | Hacker News. The best way to get the right answer on the
           | internet is to post the wrong answer, after all.
        
             | UweSchmidt wrote:
             | I'll state the obvious and answer with No. There are not
             | enough skilled software architects to go around and many
             | who consider themselves skilled are not actually producing
             | good code themselves, probably including many confident
             | posters here in this forum.
             | 
             | The idiosyncrasies and tastes of many 'senior' software
             | engineers would likely make the code unreadable and
             | unmaintainable for the average scientist and possibly
             | discourage them from programming altogether.
             | 
             | Software architecture is an unsolved problem as evident in
             | the frequent fundamental discussions about even trivial
             | things, highlighted by a cambrian explosion of frameworks
             | who try to help herding cats, and made obvious in senior
             | programmers struggling to get a handle on moderately
             | complex code.
             | 
             | I propose scientists keep their code base as simple as
             | possible, review the code along with the ideas with their
             | peers, maybe use Jupyter notebooks to show the iterations
             | and keep intermediate steps, and, as others state, show the
             | code as appropriate and try to keep it running. There is no
             | silver bullet and very few programmers could walk into your
             | lab or office and really clean things up the way you'd
             | hope.
        
             | j45 wrote:
             | Are the hiring scientists also paid well-below market
             | wages?
        
         | jonnycomputer wrote:
         | A seasoned software developer encountering scientific code can
         | be a jarring experience. So many code smells. Yet, most of
         | those code smells are really only code smells in application
         | development. Most scientific programming code only ever runs
         | once, so most of the axioms of software engineering are
         | inapplicable or a distraction from the business at hand.
         | 
         | Scientists, not programmers, should be the ones spear-heading
         | the development of standards and rules of thumb.
         | 
         | Still, there are real problematic practices that an emphasis on
         | sharing scientific code would discourage. One classic one is
         | the use of a single script that you edit each time you want to
         | re-parameterize a model. Unless you copy the script into the
         | output, you lose the informational channel between your code
         | and its output. This can have real consequences. Several years
         | ago I started up a project with a collaborator to follow up on
         | their unpublished results from a year prior. Our first task was
         | to take that data and reproduce the results they obtained
         | before, because the person no longer had access to the exact
         | copy of the script that they ran. We eventually determined that
         | the original result was due to a software error (which we
         | eventually identified). My colleague took it well, but the
         | motivation to continue the project was much diminished.
        
         | Fiahil wrote:
         | My work position was created because scientists are not
         | engineers. I had to explain -to my disappointment- why non-
         | deterministic algorithms are bad, how to write tests, and how
         | to write SQL queries, more than once.
         | 
         | However, when working as equals scientists and engineers can
         | create truly transformative projects. Algorithms accounts for
         | 10% of the solution. The code, infrastructure and system design
         | accounts for 20% of the final result. The remaining 70% of the
         | value, is directly coming from its impact. A projects that
         | nobody uses is a failure. Something that perfectly solves a
         | problem that nobody cares about is useless.
        
         | dandelion_lover wrote:
         | As a theoretical physicist doing computer simulations, I am
         | trying to publish all my code whenever possible. However all my
         | coauthors are against that. They say things like "Someone will
         | take this code and use it without citing us", "Someone will
         | break the code, obtain wrong results and blame us", "Someone
         | will demand support and we do not have time for that", "No one
         | is giving away their tools which make their competitive
         | advantage". This is of course all nonsense, but my arguments
         | are ignored.
         | 
         | If you want to help me (and others who agree with me), please
         | sign this petition: https://publiccode.eu. It demands that all
         | publicly funded code must be public.
         | 
         | P.S. Yes, my 10-year-old code is working.
        
           | onhn wrote:
           | As a theoretical physicist your results should be
           | reproducible based on the content of your papers, where you
           | should detail/state the methods you use. I would make the
           | argument that releasing code in your position has the
           | potential to be scientifically damaging; if another
           | researcher interested in reproducing your results reads your
           | code, then it is possible their reproduction will not be
           | independent. However they will likely still publish it as
           | such.
        
           | Vinnl wrote:
           | Interestingly each of those arguments also applies to
           | publishing an article describing your work.
        
           | pthread_t wrote:
           | > "No one is giving away their tools which make their
           | competitive advantage"
           | 
           | This hits close to home. Back in college, I developed
           | software, for a lab, for a project-based class. I put the
           | code up on GitHub under the GPL license (some code I used was
           | licensed under GPL as well), and when the people from the lab
           | found out, they lost their minds. A while later, they
           | submitted a paper and the journal ended up demanding the code
           | they used for analysis. Their solution? They copied and
           | pasted pieces of my project they used for that paper and
           | submitted it as their own work. Of course, they also
           | completely ignored the license.
        
           | bumby wrote:
           | I'm curious, are dedicated software assurance teams a thing
           | in your research area? Or is quality left up to the primary
           | researchers?
        
             | dandelion_lover wrote:
             | Most of the codes I am developing alone. No one else looks
             | at them ever. My supervisor also develops the code alone
             | and never shows it to anyone (not even members of the
             | group).
             | 
             | In other cases, a couple of other researchers may have a
             | look at my code or continue its development. I worked with
             | 4+ research teams and only saw one professional programmer
             | in one of them helping the development. Never heard about a
             | "dedicated software assurance team".
        
               | SiempreViernes wrote:
               | To clarify, nobody sees the code because they aren't
               | allowed, or nobody ever ask to see it?
        
               | dandelion_lover wrote:
               | The second case. However I am hesitating to ask to look
               | at the code of my supervisor. How would I explain why I
               | need it (if it's not needed for my research)? It's also
               | unlikely user-friendly, so it would take a lot of time to
               | understand anything.
        
               | bumby wrote:
               | I think you touched on something important. Researchers
               | are most concerned with "getting things working".
               | 
               | One of my favorite points from the book _Clean Code_ was
               | that professional developers aren't satisfied with
               | "working code", they aim to make it maintainable. Which
               | may mean writing it in a way that is more clear and
               | concise than we are used to
        
             | BeetleB wrote:
             | > Or is quality left up to the primary researchers?
             | 
             | Individual researchers, and in many disciplines (like
             | physics), there is almost no emphasis on quality.
             | 
             | I left academia a decade ago, but at the time all except
             | one of my colleagues protested when version control was
             | suggested to them. Some of these have code in the 30-40K
             | lines.
        
               | jack_h wrote:
               | I think this is a much wider problem than just in
               | academia/research. Really any area where software isn't
               | the primary product tends to have fairly lax software
               | standards. I work in the embedded firmware field and best
               | practices are often looked at with skepticism and even
               | derision by the electrical engineers who are often the
               | ones doing the programming^[1].
               | 
               | I think software development as a field is incredibly
               | vast and diverse. Programming is an amazing tool, but
               | it's a tool that requires a lot of knowledge in a lot of
               | different areas.
               | 
               | ^[1] This isn't universally true of course, I'm not
               | trying to be insulting here.
        
               | core-questions wrote:
               | > protested when version control was suggested
               | 
               | Academics are strange like this. The root reason is fear:
               | fear that you're complicating their process, that you're
               | going to interrupt their productivity or flow state, that
               | you're introducing complication that has no benefit. They
               | then build up a massive case in their minds for why they
               | shouldn't do this; good luck fighting it.
               | 
               | Doubly so if you're IT staff and don't have a PhD.
               | There's a fundamental lack of respect on behalf of (a
               | vocal minority) of academics about bit plumbers, until of
               | course when they need us to do something laughably basic.
               | It's the seeds of elitism; in reality we should be able
               | to work together, each of us understanding our particular
               | domain and working to help the other.
        
               | gowld wrote:
               | I think this is why industry does better science than
               | academia, at least in any area where there are
               | applications. Generally, they get paid for being right,
               | not just for being published, so they put respect and
               | money into people that help get correct results.
        
               | BeetleB wrote:
               | > The root reason is fear: fear that you're complicating
               | their process, that you're going to interrupt their
               | productivity or flow state, that you're introducing
               | complication that has no benefit.
               | 
               | Yes, but how does it compare to all the complicated
               | processes that exist in academic institutions currently?
               | Almost _all_ of which originated from academics
               | themselves, mind you.
        
               | core-questions wrote:
               | It's not that complicated. No one individual process is
               | that bad. The problem is that there's so many that you
               | need to steep in it for ages to pick everything up.
               | 
               | This means it makes most sense to pick up processes that
               | are portable and have longevity. Learning Git is a pretty
               | solid example.
        
               | bumby wrote:
               | I formerly worked in research, left and am now back in a
               | quasi-research organization.
               | 
               | It's bit disconcerting seeing how much quality is brushed
               | aside particularly in software. Researchers seem to
               | intuitively grasp how they need quality hardware to do
               | their job, yet software rarely gets the same
               | consideration. I've never been able to get many to come
               | around to the idea that software should be treated the
               | same as any other engineered product that enables their
               | research
        
               | gowld wrote:
               | "quality" is a subjectit word. Let's be clear what this
               | means:
               | 
               | Individual researchers, and in many disciplines (like
               | physics), there is almost no emphasis on _correct
               | results_ , merely on believable results.
        
               | bumby wrote:
               | There are a few standardized definitions. The most
               | succinct bring "quality is the adherence to
               | requirements".
               | 
               | As an example, if your science has the requirement of
               | being replicable (as it should) there are a host of best
               | practices that should flow down to the software
               | development requirements. Not implementing those best
               | practices would be indicative of lower quality
        
             | throwaway287391 wrote:
             | > I'm curious, are dedicated software assurance teams a
             | thing in your research area?
             | 
             | Are these a thing in _any_ research area? I 've heard of
             | exactly one case of an academic lab (one that was easily
             | 99th+ percentile in terms of funding) hiring _one software
             | engineer_ not directly involved in leading a research
             | effort, and when I tell other academics about this they 're
             | somewhat incredulous. (I admittedly have a bit of trouble
             | believing it myself -- I can't imagine the incentive to
             | work for low academic pay in an environment where you're
             | inevitably going to feel a sense of inferiority to first
             | year PhD students who think they're hot shit because
             | they're doing "research".)
        
               | bumby wrote:
               | > _Are these a thing in any research area_
               | 
               | I can say there are some that have the explicit intent
               | but it can often fall to the wayside due to cost
               | pressure. For example, government funded research from
               | large organizations (think DoD or NASA) have these
               | quality requirements but they can often be hand-waved
               | away or just plain ignored due to cost concerns
        
           | SilasX wrote:
           | >"Someone will demand support and we do not have time for
           | that",
           | 
           | Well ... that part isn't nonsense, though I agree it
           | shouldn't be a dealbreaker. And it means we should work
           | towards making such support demands minimal or non-existent
           | via easy containerization.
           | 
           | I note with frustration that even the Docker people, _whose
           | entire job is containerization_ , can get this part wrong. I
           | remember when we containerized our startup's app c. 2015, to
           | the point that you should be able to run it locally just by
           | installing docker and running `docker-compose up`, and it
           | _still_ stopped working within a few weeks (which we found
           | when onboarding new employees), which required a
           | knowledgeable person to debug and re-write.
           | 
           | (They changed the spec for docker-compose so that the new
           | version you'd get when downloading Docker would interpret the
           | yaml to mean something else.)
        
         | paperwork wrote:
         | Can you describe a bit more about what is going on in the
         | project? The file you linked is over 2.5k lines of c++ code,
         | and that is just the "setup" file. As you say, this is supposed
         | to be a statistical model, I expected this to be R, Python or
         | one of the standard statistical packages.
         | 
         | Why is there so much c++ code?
        
           | disgruntledphd2 wrote:
           | Because much of this code was written in the 80's, I suspect.
           | In general, there's a bunch of really old scientific
           | codebases in particular disciplines because people have been
           | working on these problems for a looooonnngg time.
        
           | recursivecaveat wrote:
           | It is essentially a detailed simulation of viral spread, not
           | just a programmed distribution or anything. It's all in C++
           | because it's pretty performance-critical.
        
           | fsh wrote:
           | It's a Monte-Carlo simulation, not a statistical model. These
           | are usually written in C++ for performance reasons.
        
             | dandelion_lover wrote:
             | Or Fortran.
        
               | Zenst wrote:
               | Oh gosh yes, the amount of `just works` Fortran in
               | science is one of those things akin to COBOL in business.
               | I just know some people are thinking 10 years - ha, be
               | some instances of 40 and possible 50 years for some.
               | Heck, the sad part is many will have computer systems
               | older than 10 years just as it links to this bit of kit
               | and the RS232 just works with the DOS software fine as
               | and the updated version had issues when they last tried.
               | That's a common theme with specialist kit attached to a
               | computer for control - medical as well has that.
        
               | klyrs wrote:
               | I know two fresh PhDs from two different schools whose
               | favorite language is fortran. I think it's rather
               | different from cobol in that way -- yes, the old stuff
               | still works, but newer code cuts down on the boilerplate
               | and is much more readable. And yeah, the ability to link
               | to 50 year-old battle-tested code is quite a feature.
        
               | Mvandenbergh wrote:
               | Large chunks of this particular code was in fact
               | originally written in Fortran and then machine translated
               | into C++.
        
           | roel_v wrote:
           | Who says anything about statistical models?
        
         | djaque wrote:
         | I am all for open science, but you understand that the links in
         | your post are the exact worry people have when it comes to
         | releasing code: people claiming that their non-software
         | engineering grade code invalidates the results of their study.
         | 
         | I'm an accelerator physicist and I wouldn't want my code to end
         | up on acceleratorskeptics.com with people that don't understand
         | the material making low effort critiques of minor technical
         | points. I'm here to turn out science, not production ready
         | code.
         | 
         | As an example, you seem to be complaining that their Monte
         | Carlo code has non-deterministic output when that is the entire
         | point of Monte Carlo methods and doesn't change their result.
         | 
         | By the way, yes I tested my ten year old code and it does still
         | work. What I'm saying is that scientific code doesn't need to
         | handle every special case or be easily usable by non-experts.
         | In fact the time spent making it that way is time that a
         | scientist spends doing software engineering instead of science,
         | which isn't very efficient.
        
           | beefee wrote:
           | I want science to be held to a very high standard. Maybe even
           | higher than "software engineering grade". Especially if it's
           | being used as a justification for public policy.
        
             | MaxBarraclough wrote:
             | Perhaps just a nitpick: software engineering runs the gamut
             | from throwing together a GUI in a few hours, all the way up
             | to avionics software where a bug could kill hundreds.
             | There's no such thing as 'software engineering grade'.
        
           | chrchang523 wrote:
           | Nit: implementations of Monte Carlo methods are _not_
           | necessarily nondeterministic. Whenever I implement one, I
           | always aim for a deterministic function of (input data, RNG
           | seed, parallelism, workspace size).
        
             | petschge wrote:
             | It really helps with debugging if your MC code is
             | deterministic for a given input seed. And then you just run
             | for a sufficient number of different seeds to sample the
             | probability space.
        
               | vngzs wrote:
               | Alternatively: seed the program randomly by default, but
               | allow the user to specify a seed as a CLI argument or
               | function argument (for tests).
               | 
               | In the common case, the software behaves as expected
               | (random output), but it is reproducible for tests. You
               | can then publish your RNG seed with the commit hash when
               | you release your code/paper, and others may see your
               | results and investigate that particular code execution.
        
               | petschge wrote:
               | Sure that works too. But word of advice from real life:
               | Print the random seed at the beginning of the run so you
               | can find out which seed caused it to crash or do stupid
               | things.
        
             | jnxx wrote:
             | And it seems that the people from Imperial College have
             | done that with their epidemiological simulation. What
             | critics claim is that their code produces non-deterministic
             | results when given deterministic input and random seeds,
             | i.e. that their code is seriously broken. Which would be a
             | serious issue if true.
        
           | pbalau wrote:
           | > people claiming that their non-software engineering grade
           | code invalidates the results of their study.
           | 
           | But that's exactly the problem.
           | 
           | Are you familiar with that bug in early Civ games where an
           | overflow was making Ghandi nuke the crap out of everyone?
           | What if your code has a similar issue?
           | 
           | What if you have a random value right smack in the middle of
           | your calculations and you just happened to be lucky when you
           | run your code?
           | 
           | I'm not that familiar with Monte Carlo, my understanding is
           | that this is just a way to sample the data. And I won't be
           | testing your data sampling, but I will expect that given the
           | same data to your calculations part (eg, after the sampling
           | happens), I get exactly the same results every time I run the
           | code and on any computer. And if there are differences I
           | expect you to be able to explain why they don't matter, which
           | will show you were aware of the differences in the first
           | place and you were not just lucky.
           | 
           | And then there is the matter of magic values that plaster
           | research code.
           | 
           | Researchers should understand that the rules for "software
           | engineering grade code" are not there just because we want to
           | complicate things, but because we want to make sure the code
           | is correct and does what we expect it to do.
           | 
           | /edit: The real problem is not getting good results with
           | faulty code, is ignoring good solutions because faulty code.
        
           | ivanbakel wrote:
           | Doesn't it concern you that it would be possible for critics
           | to look at your scientific software and find mistakes (some
           | of which the OP mentioned are not "minor") so easily?
           | 
           | Given that such software forms the very foundation of the
           | results of such papers, why shouldn't it fall under scrutiny,
           | even for "minor" points? If you are unable to produce good
           | technical content, why are you qualified to declare what is
           | or isn't minor? Isn't the whole point that scrutiny is best
           | left to technical experts (and not subject experts)?
        
             | James_Henry wrote:
             | When you say OP, do you mean djsumdog? If so, what mistakes
             | does he mention that aren't minor?
        
               | gowld wrote:
               | How is it possible to know the difference between minor
               | and major, if the mistakes are kept secret?
               | 
               | If we're supposed to accept scientific results on faith,
               | why bother with science at all?
        
             | smnrchrds wrote:
             | > _Doesn 't it concern you that it would be possible for
             | critics to look at your scientific software and find
             | mistakes (some of which the OP mentioned are not "minor")
             | so easily?_
             | 
             | A non-native English speaker may make grammatical mistakes
             | when communicating their research in English--it does not
             | in any way invalidate their results or hint that there is
             | anything amiss. It is simply what happens when you are a
             | non-native speaker.
             | 
             | Some (many?) code critiques by people unfamiliar with the
             | field of study the research will be about superficial
             | mistakes that do not invalidate the results. They are the
             | code equivalents of grammatical mistakes. That's what the
             | OP is talking about.
        
               | stult wrote:
               | Journals employ copy editors to address just those sorts
               | of mistakes, why should we not hold software to the same
               | standard as academic language? But more importantly,
               | these software best practices aren't mere "grammatical
               | mistakes," they exist because well-organized, well-tested
               | code has fewer bugs and is easier for third parties to
               | verify. Third-parties validating that the code underlying
               | an academic paper executes as expected is no different
               | than third-parties replicating the results of a physical
               | experiment. You can be damn sure that an experimental
               | methodology error invalidates a paper, and you can be
               | damn sure that bad documentation of the methodology
               | dramatically reduces the value/reliability of the paper.
               | Code is no different. It's just been the wild west
               | because it is a relatively new and immature field, so
               | most academics have never been taught coding as a
               | discipline nor held to rigorous standards in their own
               | work. Is it annoying that they now have to learn how to
               | use these tools properly? I'm sure it is. That doesn't
               | mean it isn't a standard we should aim for, nor that we
               | shouldn't teach the relevant skills to current students
               | in sciences so that they are better prepared when they
               | become researchers themselves.
        
               | labcomputer wrote:
               | > Third-parties validating that the code underlying an
               | academic paper executes as expected is no different than
               | third-parties replicating the results of a physical
               | experiment.
               | 
               | First, it's not no different--it's completely different.
               | Third parties have always constructed their own apparatus
               | to reproduce an experiment. They don't go to the original
               | author's lab to perform the experiment!
               | 
               | Second, a lot of scientific code won't run _at all_
               | outside the environment it was developed in.
               | 
               | If it's HPC code, it's very likely that the code makes
               | assumptions about the HPC cluster that will cause it to
               | break on a different cluster. If it's experiment control
               | / data-acquisition code, you'll almost certainly need the
               | exact same peripherals for the program to do anything at
               | all sensible.
               | 
               | I see a lot of people here on HN vastly over-estimating
               | the value of bit-for-bit reproducibility of one
               | implementation, and vastly underestimating the value of
               | having a diversity of implementations to test an idea.
        
               | garden_hermit wrote:
               | I agree with your overall point, but I just want to point
               | out that many (most?) journals don't employ copy-editors,
               | or if they do, then they overlook many errors, especially
               | in the methods section of papers.
        
               | Bukhmanizer wrote:
               | I'm glad someone else feels this way. It's an expectation
               | that scientists can share their with other scientists
               | using language. Scientists aren't always the best
               | writers, but there are standards there. Writing good code
               | is a form of communication. It baffles me that there are
               | absolutely no standards there.
        
               | ryandrake wrote:
               | On the contrary: If I'm (in industry) doing a code review
               | and see simple, obvious mistakes like infinite loops,
               | obvious null pointer exceptions, ignored compiler
               | warnings, etc., in my mind it casts a good deal of doubt
               | over the entire code. If the author is so careless with
               | these obvious errors, what else is he/she being careless
               | about?
               | 
               | Same with grammatical or spelling errors. I don't review
               | research but I do review resumes, and I've seen atrocious
               | spelling on resumes. Here's the candidate's first chance
               | to make an impression. They have all the time in the
               | world to proofread, hone, and have other eyes edit it.
               | Yet, they still miss obvious mistakes. If hired, will
               | their work product also be sloppy?
        
             | [deleted]
        
             | SiempreViernes wrote:
             | This sort of scrutiny only matters once someone else has a
             | totally different code that gives incompatible results,
             | before that point there's no sense in looking for bugs
             | because all you're proving is that there are no obvious
             | mistakes: you don't say anything about the interesting
             | questions since you only bother with codes for things with
             | non-obvious answers.
        
           | jnxx wrote:
           | _edit:_ please read the grandchild comment before going off
           | on the idea that some random programmer on the Internet dares
           | to criticize scientific code he does not understand. What is
           | crucial in the argument here is indeed the distinction
           | between methods employing pseudo-randomness, like Monte Carlo
           | simulation, and non-determinism caused by undefined behavior.
           | 
           | > I'm an accelerator physicist and I wouldn't want my code to
           | end up on acceleratorskeptics.com with people that don't
           | understand the material making low effort critiques of minor
           | technical points.
           | 
           | The person which wrote the linked blog post writes that it
           | was a software engineer at google. Unfortunately, that claim
           | is not falsifiable as the person decided to remain anonymous.
           | 
           | > As an example, you seem to be complaining that their Monte
           | Carlo code has non-deterministic output when that is the
           | entire point of Monte Carlo methods and doesn't change their
           | result.
           | 
           | The claim is that even with the same random seed for the
           | random generator, the program produces different results, and
           | this is explained by the allegation that it runs non-
           | deterministic (in the sense of undefined behavior) in
           | multiple threads. It claims also that it produces
           | significantly different results depending on which output
           | file format is chosen.
           | 
           | If this is true, the code would have race conditions, and as
           | being impacted by race conditions is a form of undefined
           | behavior, this would make any result of the program
           | questionable, as the program would not be well-defined.
           | 
           | Personally, I am very doubtful whether this is true, this
           | would be incredibly sloppy by the imperial college
           | scientists. Some more careful analysis by a recognized
           | programmer might be warranted.
           | 
           | However it underlines well the importance of the main topic
           | that scientific code should be open to analysis.
           | 
           | > What I'm saying is that scientific code doesn't need to
           | handle every special case or be easily usable by non-experts.
           | 
           | Fully agree with this. But it should try to document its
           | limitations.
        
             | aspaceman wrote:
             | > If this is true, the code would have race conditions, and
             | as being impacted by race conditions is a form of undefined
             | behavior, this would make any result of the program
             | questionable, as the program would not be well-defined.
             | 
             | That's not at all what that means. What are you talking
             | about? As long as a Monte Carlo process works towards the
             | same result it's equivalent.
             | 
             | You're speaking genuine nonsense as far as I'm concerned.
             | Randomness doesn't imply non deterministic. Non-
             | determinitism in no way implies race conditions or
             | undefined behavior. We care that the random process reaches
             | the same result, not that the exact sequence of steps is
             | the same.
             | 
             | This is what scientists are talking about. A bunch of
             | (pretty stupid) nonexperts want to criticize your code, so
             | they feel smart on the internet.
        
               | jnxx wrote:
               | I am referring to this blog post:
               | 
               | https://lockdownsceptics.org/code-review-of-fergusons-
               | model/
               | 
               | It says, word-by-word:
               | 
               |  _> Clearly, the documentation wants us to think that,
               | given a starting seed, the model will always produce the
               | same results.
               | 
               | >
               | 
               | >Investigation reveals the truth: the code produces
               | critically different results, even for identical starting
               | seeds and parameters.
               | 
               | > I'll illustrate with a few bugs. In issue 116 a UK "red
               | team" at Edinburgh University reports that they tried to
               | use a mode that stores data tables in a more efficient
               | format for faster loading, and discovered - to their
               | surprise - that the resulting predictions varied by
               | around 80,000 deaths after 80 days: ..._
               | 
               | The bugs which the blog post implies here are such ones
               | as described by Jens Regehr:
               | https://blog.regehr.org/archives/213
               | 
               | Not that I do not endorse these statements in the blog -
               | I am rather skeptical whether they are true at all.
               | 
               | What the authors of the blob post mean is clearly
               | "undefined behaviour" in the sense of non-deterministic
               | program execution of a program that is not well-formed.
               | It is clear that many non-experts could confuse that with
               | the pseudo-randomness implicit in Monte-Carlo
               | simulations, but this is a _very_ different thing. The
               | first is basically a broken, invalid, and untrustworthy
               | program. The second is the established method to produce
               | a computational result by introducing stochastic
               | behavior, which is for example how modern weather models
               | work.
               | 
               | These are wildly different things. I do not understand
               | why your comment just adds to the confusion between these
               | two things??
               | 
               | > A bunch of (pretty stupid) nonexperts want to criticize
               | your code, so they feel smart on the internet.
               | 
               | As said, I don't endorse the critique in the blog.
               | However, critique in a software implementation, as well
               | as in scientific matters, should never carry a call on
               | authority - it should logically explain what is the
               | problem, with concrete points. Unfortunately, the cited
               | blog post remains very vague about this, while claiming:
               | 
               |  _> My background. I have been writing software for 30
               | years. I worked at Google between 2006 and 2014, where I
               | was a senior software engineer working on Maps, Gmail and
               | account security. I spent the last five years at a US /UK
               | firm where I designed the company's database product,
               | amongst other jobs and projects. I was also an
               | independent consultant for a couple of years._
               | 
               | It would be much better if, instead claiming that there
               | could be race conditions, it could point to lines in the
               | code with actual race conditions, and show how the
               | results of the simulation are different when the race
               | conditions are fixed. Otherwise, it just looks like he
               | claims that the program is buggy, because he is in no
               | position to question the science, and does not like the
               | result.
        
               | jnxx wrote:
               | There is something I need to add, it is a subtle but
               | important point:
               | 
               | Non-determinism can be caused by
               | 
               | a) random seeds derived from hardware, such as seek times
               | in a HDD controller, which is fed into pseudo random
               | number (PRNG) generation. This is not a problem. For
               | debugging, or comparison, it can make sense to switch it
               | off, though.
               | 
               | b) data race conditions, which is a form of undefined
               | behavior. This not only can dramatically change results
               | of a program run, but also invalidates the program logic,
               | in languages such as C and C++. This is what he blog post
               | in "lockdownskeptics.org" suggests. For the application
               | area and its consequences, this would be a major
               | nightmare.
               | 
               | c) What I had forgotten is that parallel execution (for
               | example in LAM/MPI, map/reduce or similar frameworks) is
               | inherently non-deterministic and, in combination with
               | properties of floating-point computation, can yield
               | different but valid results.
               | 
               | Here an example:
               | 
               | A computation is carried out on five nodes and they
               | return the values 1e10, 1e10, 1e-20, -1e10, -1e10, in
               | random order. The final result is computed by summing
               | these up.
               | 
               | Now, the order of computation could be:
               | 
               | ((((1e10 + 1e10) + 1e-20) + -1e10) + -1e10)
               | 
               | or it could be:
               | 
               | (((1e10 + -1e10) + 1e-20) + (+1e10 + -1e10))
               | 
               | In the first case, the result would be zero, in the
               | second case, 1e-20, because of the finite length of
               | floating point representation.
               | 
               | _However_... if the numerical model or simulation or
               | whatever is stable, this should not lead to a dramatic
               | qualitative difference in the result (otherwise, we have
               | a stability problem with the model).
               | 
               | Finally, I want to cite one last paragraph from the post
               | on lockdownskeptics.org:
               | 
               |  _> Conclusions. All papers based on this code should be
               | retracted immediately. Imperial's modelling efforts
               | should be reset with a new team that isn't under
               | Professor Ferguson, and which has a commitment to
               | replicable results with published code from day one.
               | 
               | > On a personal level, I'd go further and suggest that
               | all academic epidemiology be defunded. This sort of work
               | is best done by the insurance sector. Insurers employ
               | modellers and data scientists, but also employ managers
               | whose job is to decide whether a model is accurate enough
               | for real world usage and professional software engineers
               | to ensure model software is properly tested,
               | understandable and so on. Academic efforts don't have
               | these people, and the results speak for themselves._
        
             | UncleMeat wrote:
             | Race conditions aren't undefined behavior in C/C++. Data
             | races are. Lots and lots of real systems contain race
             | conditions without catastrophe.
        
               | jnxx wrote:
               | > Race conditions aren't undefined behavior in C/C++.
               | Data races are.
               | 
               | You are right with the distinction, I had data race
               | conditions in mind.
               | 
               | Race conditions can well happen in a correct C/C++ multi-
               | threaded program in the sense that the order of specific
               | computation steps is sometimes random. And for operations
               | such as floating-point addition, where order of
               | operations does matter, the exact result can be random as
               | a consequence. But the end result should not depend
               | dramatically on it (which is what the poster at
               | lockdownskeptics.org claims).
        
           | jnxx wrote:
           | > I'm an accelerator physicist and I wouldn't want my code to
           | end up on acceleratorskeptics.com with people that don't
           | understand the material making low effort critiques of minor
           | technical points. I'm here to turn out science, not
           | production ready code.
           | 
           | Specifically, to that point, I want to cite the saying:
           | 
           | "The dogs bark, but the caravan passes."
           | 
           | (There is a more colorful German variant which is,
           | translated: "What does it bother the mighty old oak tree if a
           | dog takes a piss...").
           | 
           | Of course, if you publish your code, you expose it to
           | critics. Some of this will be unqualified. And as we have
           | seen in the case e.g. of climate scientists, some might be
           | even nasty. But who cares? What matters is open discussion
           | which is a core value of science.
        
           | RandoHolmes wrote:
           | > people claiming that their non-software engineering grade
           | code invalidates the results of their study.
           | 
           | How exactly is this a bad thing?
           | 
           | > I'm an accelerator physicist and I wouldn't want my code to
           | end up on acceleratorskeptics.com with people that don't
           | understand the material making low effort critiques of minor
           | technical points. I'm here to turn out science, not
           | production ready code.
           | 
           | But it should be noted that what you didn't say is that
           | you're here to turn out __accurate __science.
           | 
           | This is the software version of statistics. Imagine if
           | someone took a random sampling of people at a Trump rally and
           | then claimed that "98% of Americans are voting for Trump".
           | And now imagine someone else points out that the sample is
           | biased and therefore the conclusion is flawed, and the
           | response was "Hey, I'm just here to do statistics".
           | 
           | ---
           | 
           | Do you see the problem now? The poster above you pointed out
           | that the conclusions of the software can't be trusted, not
           | that the coding style was ugly. Most developers would be more
           | than willing to say "the code is ugly, but it's accurate".
           | What we don't want is to hear "the conclusions can't be
           | trusted and 100 people have spent 10+ years working from
           | those unreliable conclusions".
        
             | auntienomen wrote:
             | Oh, he didn't say 'accurate science', nice gotcha!
             | 
             | This is exactly the sort of pedantic cluelessness that
             | scientists are seeking to avoid by not publishing their
             | code.
        
               | RandoHolmes wrote:
               | I don't consider accuracy in science to be pedantic, and
               | I suspect most others don't either.
               | 
               | To paraphrase what the other developer said: "I don't
               | want my work to be checked, I'm not here for accuracy,
               | just the act of doing science".
               | 
               | When I was young, the ability to invalidate was the core
               | aspect of science, but apparently that's changed over the
               | years.
        
           | booleandilemma wrote:
           | _What I 'm saying is that scientific code doesn't need to
           | handle every special case or be easily usable by non-
           | experts._
           | 
           | Sounds like I should just become a scientist then.
           | 
           | Do you guys write unit tests or is that beneath you too?
        
           | sitkack wrote:
           | > exact worry people have when it comes to releasing code:
           | people claiming that their non-software engineering grade
           | code invalidates the results of their study.
           | 
           | If code is what is substantiating a scientific claim, then
           | code needs to stand up to scientific scrutiny. This is how
           | science is done.
           | 
           | I came from physics, but systems and computer engineering was
           | always an interest of mine, even before physics, I thought it
           | was kooky-dooks that CS people can release papers w/o code,
           | fine if the paper contains all the proofs but otherwise it
           | shouldn't even be looked at. PoS (proof-of-science) or GTFO.
           | 
           | We are the point in human and scientific civilization that
           | knowledge needs to prove itself correct. Papers should be
           | self contained execution environments that generate PDFs and
           | resulting datasets. The code doesn't need to be pretty, or
           | robust, but it needs to be sealed inside of a container so
           | that it can be re-run, re-validated and someone else can
           | confirm the result X years from now. And it isn't about
           | trusting or not trusting the researcher, we need to
           | fundamentally trust the results.
        
             | matthewdgreen wrote:
             | All of my 2010 scientific code runs on the then-current
             | edition of Docker. /s
        
               | sitkack wrote:
               | I made no mention of Docker, VMs or any virtualization
               | system. Those would be an implementation detail and would
               | obviously change over time.
               | 
               | A container can be a .tar.gz, a zip or a disk image of
               | artifacts, code, data and downstream deps. The generic
               | word has been co-opted to mean a specific thing which is
               | very unfortunate.
        
               | matthewdgreen wrote:
               | My point, which I guess I did not make clearly enough, is
               | that container systems don't necessarily exist or remain
               | supported over the ten-year period being discussed. The
               | idea of ironing over long-term compatibility issues using
               | a container environment seems like a great one! (For the
               | record, .tgz -- the "standard" format for scientific code
               | releases in 2010, does not solve these problems _at
               | all_.)
               | 
               | But the "implementation detail" of which container format
               | you use, and whether it will still be supported in 10
               | years, is not an implementation detail at all -- since
               | this will determine whether containerization actually
               | solves the problem of helping your code run a decade
               | later. This gets worse as the number, complexity and of
               | container formats expands.
               | 
               | Of course if what you mean is that researchers should
               | provide perpetual maintenance for their older code
               | packages, moving them from one obsolete platform to a
               | more recent one, then you're making a totally different
               | and very expensive suggestion.
        
             | snowwrestler wrote:
             | The history of physics is full of complex, one-off custom
             | hardware. Reviewers have not been expected to take the full
             | technical specs and actually build and run the exact same
             | hardware, just to verify correctness for publication.
             | 
             | I doubt any physicist believes we need to get the Tevatron
             | running again just to check decade-old measurements of the
             | top quark. I don't understand why decade-old scientific
             | software code must meet that bar.
        
           | [deleted]
        
           | woah wrote:
           | I'm very puzzled by this attitude. As an accelerator
           | physicist, would you want you accelerator to be held together
           | by duct tape, and producing inconsistent results? Would you
           | complain that you're not a professional machinist when
           | somebody pointed it out? Why is software any different than
           | hardware in this respect?
        
           | kordlessagain wrote:
           | > people that don't understand the material making low effort
           | critiques of minor technical points
           | 
           | GPT-3 FTW!
        
           | solatic wrote:
           | Let's be clear - scientific-grade code is a substandard of
           | production-grade code. _But it is still a real standard_.
           | 
           | Does scientific-grade code need to handle a large number of
           | users running it at the same time? Probably not a genuine
           | concern, since those users will run their own copies of the
           | code on their own hardware, and it's not necessary or
           | relevant for users to see the same networked results from the
           | same instance of the program running on a central machine.
           | 
           | Does scientific-grade code need to publish telemetry? Eh,
           | usually no. Set up alerting so that on-call engineers can be
           | paged when (not if) it falls over? Nope.
           | 
           | Does scientific-grade code need to handle the authorization
           | and authentication of users? Nope.
           | 
           | Does scientific-grade code need to be reproducible? _Yes_.
           | Fundamentally yes. The reproducibility of results is core to
           | the scientific method. Yes, that includes Monte Carlo code,
           | when there is no such thing as truly random number generation
           | on contemporary computers, only pseudorandom number
           | generation, and what matters for cryptographic purposes is
           | that the seed numbers for the pseudorandom generation are
           | sufficiently hidden  / unknown. For scientific purposes, the
           | seed numbers should be published _on purpose_ , so that a)
           | the exact results you found, sufficiently random as they are
           | for the purpose of your experiment, can still be
           | independently verified by a peer reviewer, b) a peer reviewer
           | can intentionally decide to pick a different seed value,
           | which will lead to different results but should _still lead
           | to the same conclusion_ if your decision to reject  / refuse
           | to reject the null hypothesis was correct.
        
             | dekhn wrote:
             | As an ex-scientist who used to run lots of simulations, I
             | really fail to see a truly compelling reason why most
             | numerical results (for publication purposes) truly need to
             | publish (and support) deterministic seeding.
             | 
             | We've certainly done a lot, scientifically speaking (in
             | terms of post-validated studies), without that level of
             | reproducibility.
        
               | jnxx wrote:
               | If nothing else, it helps debugging code which tries to
               | reproduce your findings.
        
               | dekhn wrote:
               | The code I work with is not debuggable in that way under
               | most circumstances. It's a complex distributed system.
               | You don't attempt to debug it by being deterministic- you
               | debug it by sampling its properties.
        
             | throwaway287391 wrote:
             | Controlling randomness can be extremely difficult to get
             | right, especially when there's anything asynchronous about
             | the code (e.g. multiple worker threads populating a queue
             | to load data). In machine learning, some of the most
             | popular frameworks (e.g. TensorFlow [0]) don't offer this
             | as a feature, and in other frameworks that do (PyTorch [1])
             | it will cripple the speed you get as a result as GPU
             | accelerators rely on non-deterministic accumulation for
             | reasonable speed.
             | 
             | Scientific reproducibility does not mean, and has never
             | meant, you rerun the code and the output perfectly matches
             | bit-for-bit every time. If you can achieve that, great --
             | it's certainly a useful property to have for debugging. But
             | a much stronger and more relevant form of reproducibility
             | for actually advancing science is running the same study
             | e.g. on different groups of participants (or in computer
             | science / applied math/stats / etc., with different
             | codebases, with different model variants/hyperparameters,
             | on different datasets) and the overall conclusions hold.
             | 
             | To paraphrase a comment I saw from another thread on HN:
             | "Plenty of good science got done before modern devops came
             | to be."
             | 
             | [0] https://github.com/tensorflow/tensorflow/issues/12871
             | https://github.com/tensorflow/tensorflow/issues/18096
             | 
             | [1] https://pytorch.org/docs/stable/notes/randomness.html
             | 
             | ==========
             | 
             | EDIT to reply to solatic's replies below (I'm being rate-
             | limited):
             | 
             | The social science arguments are probably fair (or at least
             | I'll leave it to someone more knowledgeable to defend them
             | if they want) -- perhaps I shouldn't have led with the
             | example of "different groups of participants".
             | 
             | > If you can achieve that, for the area of study in which
             | you conduct your experiment, it should be required.
             | Deciding to forego formal reproducibility should be
             | justified with a clear explanation as to why
             | reproducibility is infeasible for your experiment, and
             | peer-review should reject studies that could have be
             | reproducible but weren't in practice.
             | 
             | This _might_ be a reasonable thing to enforce if everyone
             | in the field were using the same computing platform. Given
             | that they 're not (and that telling everyone that all
             | published results have to be done using AWS with this
             | particular machine configuration is not a tenable solution)
             | I don't see how this could ever be a realistic requirement.
             | Or if you don't want to enforce that the results remain
             | identical across different platforms, what's the point of
             | the requirement in the first place? How would it be
             | enforced if nobody else has the exact combination of
             | hardware/software to do so? And then even if someone does,
             | almost inevitably there'll be some detail of the setup that
             | the researcher didn't think to report and results will
             | differ slightly anyway.
             | 
             | Besides, if you're allowing for exemptions, just about
             | every paper in machine learning studying datasets larger
             | than MNIST (where asynchronous prefetching of data is
             | pretty much required to achieve decent speeds) would have a
             | good reason to be exempt. It's possible that there are
             | other fields where this sort of requirement would be both
             | useful and feasible for a large amount of the research in
             | that field, but I don't know what they are.
             | 
             | > Also, reading through the issues you linked points to:
             | https://github.com/NVIDIA/framework-determinism which is a
             | relatively recent attempt by nVidia to support
             | deterministic computation for TensorFlow. Not perfect yet,
             | but the effort is going there.
             | 
             | (From your other comment.) Yes, there exists a $300B
             | company with an ongoing-but-incomplete funded effort of so
             | far >6 months' work (and that's just the part they've done
             | in public) to make one of its own APIs optionally
             | deterministic when it's being used through a single
             | downstream client framework. If this isn't a perfect
             | illustration that it's not realistic to expect exact
             | determinism from software written by individual grad
             | students studying chemistry, I'm not sure what to say.
        
               | solatic wrote:
               | Also, reading through the issues you linked points to:
               | https://github.com/NVIDIA/framework-determinism which is
               | a relatively recent attempt by nVidia to support
               | deterministic computation for TensorFlow. Not perfect
               | yet, but the effort is going there.
        
               | throwawaygh wrote:
               | _> or in computer science  / applied math/stats / etc.,
               | with different codebases, with different model variants,
               | on different datasets) and the overall conclusions hold_
               | 
               | A lot of open sourced CS research is not reproducible.
               | 
               | "the code still runs and gives the same output" is _not_
               | the same as reproducibility.
        
               | throwaway287391 wrote:
               | > A lot of open sourced CS research is not reproducible.
               | 
               | I'm not sure if this was meant to be a counter-argument
               | to me, but I completely agree!
               | 
               | > "the code still runs and gives the same output" is not
               | the same as reproducibility.
               | 
               | Yes, bit-for-bit identical results are neither necessary
               | nor sufficient for reproducibility in the usual
               | scientific sense.
        
               | throwawaygh wrote:
               | _> I 'm not sure if this was meant to be a counter-
               | argument to me_
               | 
               | It wasn't :)
        
               | dnautics wrote:
               | the correct way to control randomness in scientific code
               | is to have the RNG be seeded with a flag and have the
               | result check out with a snapshot value. Almost no one
               | does this, but that doesn't mean it shouldn't be done.
        
               | throwaway287391 wrote:
               | Did you read my post? I know what a seed is. Setting one
               | is typically not enough to ensure bit-for-bit identical
               | results in high-performance code. I gave two examples of
               | this: CUDA GPUs (which do non-deterministic accumulation)
               | and asynchronous threads (which won't always run
               | operations in the same order).
        
               | dnautics wrote:
               | Most scientific runs are scaled where you run multiple
               | replicates. And not all scientific runs are high-
               | performance in the HPC sense. Even if your code is HPC in
               | the HPC sense, and requires CUDA, and 40,000 cores, you
               | should consider creating a release flag where an end user
               | can do at least single "slow" run on a CPU on a reduced
               | dataset, in single threaded mode, to sanity check the
               | results and at least verify that the computational and
               | algorithmic pipeline is sound at the most basic level.
               | 
               | I used to be a scientist. I get it, getting scientists to
               | do this is like pulling teeth, but it's the least you
               | could do to give other people confidence in your results.
        
               | throwaway287391 wrote:
               | > consider creating a release flag where an end user can
               | do at least single "slow" run on a CPU on a reduced
               | dataset, in single threaded mode, to sanity check the
               | results and at least verify that the computational and
               | algorithmic pipeline is sound at the most basic level.
               | 
               | Ok, that's a reasonable ask :) But yeah as you implied,
               | good luck getting the average scientist, who in the best
               | case begrudgingly uses version control, to care enough to
               | do this.
        
               | ska wrote:
               | This is not correct on several levels. Reproducibility is
               | not achievable in many real world scenarios, but worse
               | it's not even very informative.
               | 
               | Contra your assertion, many people do some sort of
               | regression testing like this but it's isn't terribly
               | useful for verification _or_ validation - but it is good
               | at catching bad patches.
        
               | SilasX wrote:
               | You're right about bit-for-bit reproducibility possibly
               | being overkill, but I don't think that invalidates the
               | parent's point that Monte Carlo randomization doesn't
               | obviate reproducibility concerns. It just means that e.g.
               | your results shouldn't be hypersensitive to the details
               | of the randomization. That is, reviewers should be able
               | to take your code, feed it different random data from a
               | similar distribution to what you claimed to use (perhaps
               | by choosing a different seed), and get substantively
               | similar results.
        
               | jbay808 wrote:
               | It does seem like a valid response to OP's objection to
               | the imperial college's COVID model, though. Doesn't it?
        
               | SilasX wrote:
               | Reviewing the original comment, I think so (that the
               | original comment is overcritical). For purpose of
               | reproducibility, it's enough that you can validate that
               | you can run the model with different random data and see
               | that their results aren't due to pathological choices of
               | initial conditions. If the race conditions and non-
               | determinism just transform the random data into another
               | set of valid random data, that doesn't compromise
               | reproducibility.
        
               | throwaway287391 wrote:
               | That brings up a separate issue that I didn't comment on
               | above: the expectation that the code runs in a completely
               | different development/execution environment (e.g. the one
               | the reviewer is using vs. the one that the researcher
               | used). That means making it run regardless of the OS
               | (Windows/OSX/Linux/...) and hardware (CPU/GPU/TPU, and
               | even within those, which one) the reviewer is using. This
               | would be an extremely difficult if not impossible thing
               | for even a professional software engineer to achieve. It
               | could easily be a full time job. There are daily issues
               | on even the most well-funded projects in machine learning
               | by huge companies (ex: TF, PyTorch) that the latest
               | update doesn't work on GPU X or CUDA version Y or OS Z.
               | It's not a realistic expectation for a researcher even in
               | computer science, let alone researchers in other fields,
               | most of whom are already at the top of the game
               | programming-wise if they would even think to reach for a
               | "script" to automate repetitive data entry tasks etc.
               | 
               | ==========
               | 
               | EDIT to reply to BadInformatics' reply below (I'm being
               | rate-limited): I fully agree that a lot of ML code
               | releases could be better about this, and it's even
               | reasonable to expect them to do some of these more basic
               | things like you mention. I don't agree that bit-for-bit
               | reproducibility is a realistic standard that will get us
               | there.
        
               | BadInformatics wrote:
               | I don't think that removes the need to provide enough
               | detail to replicate the original environment though. We
               | write one-off scripts with no expectation that they will
               | see outside usage, whereas research publications are
               | meant for just that! The bar isn't terribly high either:
               | for ML, a requirements.txt + OS version + CUDA version
               | would go a long way, no need to learn docker just for
               | this.
        
               | solatic wrote:
               | > But a much stronger and more relevant form of
               | reproducibility for actually advancing science is running
               | the same study e.g. on different groups of participants
               | (or in computer science / applied math/stats / etc., with
               | different codebases, with different model
               | variants/hyperparameters, on different datasets) and the
               | overall conclusions hold
               | 
               | > Plenty of good science got done before modern devops
               | came to be
               | 
               | This isn't as strong of an argument as you think. This is
               | more-or-less the underlying foundation behind the social
               | sciences, which argues that no social sampling can ever
               | be entirely reproduced since no two people are alike, and
               | even the same person cannot be reliably sampled twice as
               | people change with time.
               | 
               | Has there been "good science" done in the social
               | sciences? Sure. I don't think that you're going to find
               | anybody arguing that the state of the social sciences
               | today is about the same as it was in the Dark Ages.
               | 
               | With that said, one of the reasons why so many laypeople
               | look at the social sciences as a kind of joke is because
               | so many contradictory studies come out of these peer-
               | reviewed journals that their trustworthiness is quite
               | low. One of the reasons why there's so much confusion
               | surrounding what constitutes a healthy diet and how
               | people should best attempt to lose weight is precisely
               | because diet-and-exercise studies are more-or-less
               | impossible to reproduce.
               | 
               | > If you can achieve that, great -- it's certainly a
               | useful property to have for debugging
               | 
               | If you can achieve that, for the area of study in which
               | you conduct your experiment, it should be _required_.
               | Deciding to forego formal reproducibility should be
               | justified with a clear explanation as to why
               | reproducibility is infeasible for your experiment, and
               | peer-review should reject studies that could have be
               | reproducible but weren 't in practice.
        
               | jbay808 wrote:
               | Plenty of good physics got done before modern devops came
               | to be, too! Maybe the pace of advancement was slower when
               | the best practice was to publish a cryptographic hash of
               | your discoveries in the form of a poetic latin anagram
               | rather than just straight-up saying it, but it's not like
               | Hooke's law is considered unreproducible today because
               | you can't deterministically re-instantiate his
               | experimental setup with a centuries-old piece of brass
               | and get the same result to n significant figures.
        
               | mnl wrote:
               | And physicists have been writing code for a while simply
               | because the number of software engineers with a working
               | knowledge of physics (as in ready for research), have
               | been trained in numerical analysis (as in being able to
               | read applied mathematics) and then are willing to help
               | you with your paper for peanuts is about zero.
               | 
               | I don't understand why it is so hard to see that you need
               | either a pretty big collaboration where somebody else has
               | isolated the specifications so you don't need to know
               | anything about the problem your code solves really, or
               | becoming a physics graduate student yourself for this
               | line of work.
        
             | a_zaydak wrote:
             | I do agree with you on publishing seeds for Monte Carlo
             | simulations however the argument against it is also very
             | strong. Usually when you run a monte carlo simulation you
             | are quoting the results in terms of statistics. I think it
             | would be sufficient to say that you can 'reproduce' the
             | results as long as your statistics (over many simulations
             | with different seeds) is consistent with the published
             | results. If you run a single simulation with are particular
             | seed you _should_ get the same results however this might
             | be cherry picking a particular simulation result. This is
             | good for code testing but probably not for scientific
             | results. I think by running the code with new seeds is a
             | better way to test the science.
        
             | kag0 wrote:
             | > there is no such thing as truly random number generation
             | on contemporary computers
             | 
             | well that's just not true. there's no shortage of noise we
             | can sample to get true random numbers. we just often
             | stretch the random numbers for performance purposes.
        
             | dllthomas wrote:
             | > Does scientific-grade code need to be reproducible? Yes.
             | Fundamentally yes.
             | 
             | I agree that this is a good property for scientific code to
             | have, but I think we need to be careful not to treat re-
             | running of existing code the same way we treat genuinely
             | independent replication.
             | 
             | Traditionally, people freshly constructed any necessary
             | apparatus, and people walked through the steps of the
             | procedures. This is an interaction between experiment and
             | human brain meats that's missing when code is simply reused
             | (whether we consider it apparatus or procedure).
             | 
             | Once we have multiple implementations, _if_ there is a
             | meaningful difference between them, _at that point_
             | replayability is of tremendous value in identifying why
             | they differ.
             | 
             | But it is not reproducibility, as we want that term to be
             | used in science.
        
               | hobofan wrote:
               | But "rerunning reproducability" is mostly a neccessary
               | requirement for independent reproducability. If you can't
               | even run the original calculations against the original
               | data again how can you be sure that you are not comparing
               | apples to oranges?
        
               | dllthomas wrote:
               | Very interesting. I was thinking of software as most
               | similar to apparatus, and secondarily to procedure. You
               | raise a third possible comparison: calculations, which
               | IIUC would be expected to be included in the paper.
               | 
               | There are some kinds of code (a script that controls a
               | sensor or an actuator) where I think that doesn't match
               | up well at all. There are plenty of kinds of code where
               | they _are_ , in fact, simply crunching numbers produced
               | earlier. For the latter, I'm honestly not sure the best
               | way to treat it, except to say that we should be sure
               | that _enough_ information is included in some form that
               | replication should be possible, and that we keep in mind
               | the idea that replication should involve human
               | interaction.
        
               | jabirali wrote:
               | In some simulations, each rerun produces different
               | results as you're simulating random events (like
               | lightning formation) or using a non-deterministic
               | algorithm (like Monte Carlo sampling). Just "saving the
               | random seed" might not be sufficient to make it
               | deterministic either, as if you do parallelized or
               | concurrent actions in your code (common in scientific
               | code) the same pseudorandom numbers may be used in
               | different orders each time you run it.
               | 
               | But repeating the simulation a large number of times,
               | with different random seeds, should produce statistically
               | similar output if the code is rigorous. So even if each
               | simulation is not reproducible, as long as the
               | statistical distribution of outputs is reproducible, that
               | should be sufficient.
        
               | kkylin wrote:
               | This. I absolutely agree there needs to be more
               | transparency, and scientific code should be as open as
               | possible. But this should not replace replication.
        
               | BadInformatics wrote:
               | Conversely though, it is often impossible to obtain the
               | original code to replay and identify differences once
               | that step is reached _without_ some sort of strong
               | incentive or mandate for researchers to publish it. When
               | the only copy is lost in the now-inaccessible home folder
               | of some former grad student 's old lab machine, there is
               | a strong disincentive to try replicating at all because
               | one has little to consult on whether/how close the
               | replicated methods are to the original ones.
        
               | dllthomas wrote:
               | And so we find ourselves in the same situation as the
               | rest of the scientific process, throughout history. When
               | I try to replicate your published paper and I fail, it's
               | completely unclear whether it's "your fault" or "my
               | fault" or pure happenstance, and there's a lot of picking
               | apart that needs to be done with usually no access to the
               | original experimental apparatus and sometimes no access
               | to the original experimenters.
               | 
               | The fact that we _can_ have that option is an amazing
               | opportunity that a confluence of attributes of software
               | (specificity, replayability, easy of copying) afford us.
               | Where we are not exploiting this like we could be, it is
               | a failure of our institutions! But it is different-in-
               | kind from traditional reproducibility.
        
               | BadInformatics wrote:
               | Of course, but the flip side is that same confluence of
               | attributes has also exacerbated issues of
               | reproducibility. Just as science and the methods/mediums
               | by which we conduct/disseminate it have changed, so too
               | should the standard of what is considered acceptable to
               | reproduce. This is especially relevant given how much
               | broader the societal and policy implications have become.
               | 
               | More concretely, it is 100% fair (and I might argue
               | necessary) to demand more of our institutions _and_ work
               | to improve their failures. I 'm sure many researchers
               | have encountered publications of the form "we applied
               | <proprietary model (TM)> (not explained) to <proprietary
               | data> (partially explained) after <two sentence
               | description of preprocessing> and obtained SOTA results!"
               | in a reputable venue. Sure, this might be even less
               | reproducible 200 years ago than now, but the authors
               | would also be less likely to be competing with you for
               | limited funding! Debating about the traditional
               | definition of reproducibility has its place, but we
               | should _also_ be doing as much as possible to give
               | reviewers and replicators a leg up. This is often flies
               | in the face of many incentives the research community
               | faces, but shifting blame to institutions by default (not
               | saying you 're doing this, but I've seen many who do) is
               | taking the easy road out and does little to help the
               | imbalanced ratio of discussion:progress.
        
               | ajford wrote:
               | This! I struggled with this topic in university. I was
               | studying pulsar astronomy, and there was only one or two
               | common tools used at the lower levels of data processing,
               | and had been the same tools used for a couple of decades.
               | 
               | The software was "reproducible" in that the same starting
               | conditions produced the same output, but that didn't mean
               | the _science_ was reproducible, as every study used the
               | same software.
               | 
               | I repeatedly brought it up, but I wasn't advanced enough
               | in my studies to be able to do anything about it. By the
               | time I felt comfortable with that, I was on my way out of
               | the field and into an non-academic career.
               | 
               | I have kept up with the field to a certain extent, and
               | there is now a project in progress to create a fully
               | independent replacement for that original code that
               | should help shed some light (in progress for a few years
               | now, and still going strong).
        
               | allenofthehills wrote:
               | > The software was "reproducible" in that the same
               | starting conditions produced the same output, but that
               | didn't mean the _science_ was reproducible, as every
               | study used the same software.
               | 
               | This is the difference between reproducibility and
               | replicability [1]. Reproducibility is the ability to run
               | the same software on the same input data to get the same
               | output; replication would be analyzing the same input
               | data (or new, replicated data following the original
               | collection protocol) with new software and getting the
               | same result.
               | 
               | I've experienced the same lack of interest with
               | established researchers in my field, but I can at least
               | ensure that all my studies are both reproducible and
               | replicable by sharing my code _and_ data.
               | 
               | [1] Plesser HE. Reproducibility vs. Replicability: A
               | Brief History of a Confused Terminology. Front
               | Neuroinform. 2018;11:76.
        
               | improbable22 wrote:
               | This is almost an argument for _not_ publishing code. If
               | you publish all the equations, then everybody has to
               | write their own implementation from that.
               | 
               | Something like this is the norm in some more mathematical
               | fields, where only the polished final version is
               | published, as if done by pure thought. To build that,
               | first you have to reproduce it, invariably by building
               | your own code -- perhaps equally awful, but independent.
        
               | 7thaccount wrote:
               | Should this be surprising? I'm not saying it is correct,
               | but it is similar to the response many managers give
               | concerning a badly needed rewrite of business software.
               | Doing so is very risky and the benefits aren't always
               | easy to quantify. Also, nobody wants to pay you to do
               | that. Research is highly competitive, so no researcher is
               | going to want to spend valuable time making a new tool
               | that already exists even if needed if no other
               | researchers are doing that.
        
             | andrewprock wrote:
             | > Does scientific-grade code need to be reproducible? Yes.
             | Fundamentally yes
             | 
             | This is definitely not correct. The experiment as a whole
             | needs to be reproducible independently. This is very
             | different, and more robust, from requiring that a
             | particular portion of a previous version of the experiment
             | to be reproducible in isolation.
        
           | enriquto wrote:
           | > I wouldn't want my code to end up on
           | acceleratorskeptics.com with people that don't understand the
           | material making low effort critiques of minor technical
           | points. I'm here to turn out science, not production ready
           | code.
           | 
           | In what way do idiots making idiotic comments about your
           | correct code invalidate your scientific production? You can
           | still turn out science and let people read and comment freely
           | on it.
           | 
           | > As an example, you seem to be complaining that their Monte
           | Carlo code has non-deterministic output when that is the
           | entire point of Monte Carlo methods and doesn't change their
           | result.
           | 
           | I guess you would not need to engage personally with the
           | idiots at "acceleratorskeptics.com", but likely most of their
           | critique would be easily shut off by a simple sentence such
           | as this one. Since most of your readers would not be idiots,
           | they could scrutinize your code and even provide that reply
           | on your behalf. This is called the scientific method.
           | 
           | I agree that you produce science, not merely code. Yet, the
           | code is part of the science and you are not really publishing
           | anything if you hide that part. Criticizing scientific code
           | because it is bad software engineering is like criticizing it
           | because it uses bad typography. You should not feel attacked
           | by that.
        
             | spamizbad wrote:
             | > In what way do idiots making idiotic comments about your
             | correct code invalidate your scientific production? You can
             | still turn out science and let people read and comment
             | freely on it.
             | 
             | How would a layperson identify a faulty critique? It would
             | be picked up by the media who would do their usual "both
             | sides" thing.
        
               | enriquto wrote:
               | Not that they abstain from doing that shit today, when
               | code is not often published.
               | 
               | An educated and motivated layperson at least would have
               | the _chance_ to learn whether the critique is faulty.
               | Today, with secret code, it is impossible to verify for
               | almost everybody.
        
           | halfdan wrote:
           | I have done research on Evolutionary Algorithm and numerical
           | optimization. It was nigh impossible to reproduce poorly
           | described algorithms from state of the art research at the
           | time and researchers would very often not bother to reply to
           | inquiries for their code. Even if you did get the code it
           | would be some arcane C only compatible with a GCC from 1996.
           | 
           | Code belongs with the paper. Otherwise we can just continue
           | to make up numbers and pretend we found something
           | significant.
        
           | shirakawasuna wrote:
           | Race conditions and certain forms of non-determinism could
           | invalidate the results of a given study. Code is essentially
           | a better-specified methods section, it just says what they
           | did. Scientists are expected to include a methods section for
           | exactly this reason, and any scientist worried about
           | including a methods section in their paper would be rightly
           | rejected.
           | 
           | However, a methods section is always under-specified. Code
           | provides the unique opportunity to actually see the full
           | methods on display and properly review their work. It should
           | be mandated by all reputable journals and worked into the
           | peer review process.
        
           | Jabbles wrote:
           | I am interested to know the distinction between "production-
           | ready" and "science-ready" code.
           | 
           | I do not think "non-experts" should be able to use your code,
           | but I do think an expert who was not involved in writing it
           | should be.
        
             | petschge wrote:
             | One example: My code used to crash for a long time if you
             | set the thermal speed to something greater than the speed
             | if light. Should the code crash? No. And by now I have
             | found the time to write extra code to catch the error and
             | midly insult the user (It says "Faster than light? Please
             | share that trick with me!") Does it matter? No. It didn't
             | run and give plausible-but-wrong results. So that is code
             | that I would call "science-ready" but I wouldn't want it
             | criticized by people outside my domain.
        
               | jnxx wrote:
               | I don't think that would be any problem (why should it?).
               | 
               | Code exhibiting undefined behavior is a different kettle
               | of fish...
        
               | petschge wrote:
               | Which is why I run valgrind on my code (with a parameter
               | file containing physically valid inputs) to get rid of
               | all undefined behavior. But I gave up on running afl-
               | fuzz, because all it found was crashes following from
               | physically invalid inputs. I fixed the obvious once to
               | make the code nicer for new users, but once afl started
               | to find only very creative corner cases I stopped.
        
               | jnxx wrote:
               | Well done!
        
               | gowld wrote:
               | Then you publish your work and critics publish theirs and
               | the community decides which claims have proven their
               | merit. This is the fundamental structure of the
               | scientific community.
               | 
               | How is "your code has error and I rebuke you" a more
               | painful critique than "you are hiding your methodology
               | and so I rebuke you"?
        
               | petschge wrote:
               | Nothing limits the field of critics to people who have
               | written their own code and know what they are doing.
        
             | arethuza wrote:
             | I would regard (from experience) "science ready" code as
             | something that _you_ run just often enough to get the
             | results to create publications.
             | 
             | Any effort to get code working for other people, or
             | documented in any way would probably be seen as wasted
             | effort that could be used to write more papers or create
             | more results to create new papers.
             | 
             | This kind of reasoning was one of the many reasons I left
             | academic research - I personally didn't value publications
             | as deliverables.
        
               | chriswarbo wrote:
               | My experience has been similar.
               | 
               | Still, there's plenty of room to encourage good(/better)
               | practices which cost essentially nothing, e.g. using $PWD
               | rather than /home/bob/foo
        
               | gowld wrote:
               | If your experiment is not repeatable, it's an anecdote
               | not data.
               | 
               | Any effort to write a paper readable for other people, or
               | document the experiment in any way would probably be seen
               | as wasted effort that could be used to create more
               | results.
               | 
               | The "don't show your work" argument only makes sense if
               | you are doing PR, not science.
        
               | neutronicus wrote:
               | If it's repeatable _by you_ then it 's a trade secret,
               | not an anecdote
        
             | qppo wrote:
             | Disclaimer, I'm a professional engineer and not a
             | researcher.
             | 
             | The kind of code I'll ship for production will include unit
             | testing designed around edge or degenerate cases that arose
             | from case analysis, usually some kind of end to end
             | integration test, aggressive linting and crashing on
             | warnings, and enforcing of style guidelines with auto
             | formatting tools. The last one is more important than
             | people give it credit for.
             | 
             | For research it would probably be sufficient to test that
             | the code compiles and given a set of known valid input the
             | program terminates successfully.
        
             | dmlorenzetti wrote:
             | Hard-coded file paths for input data. File paths hard-coded
             | to use somebody's Google Drive so that it only runs if you
             | know their password. Passwords hard-coded to get around the
             | above problem.
             | 
             | In-code selection statements like `if( True ) {...}`, where
             | you have no idea what is being selected or why.
             | 
             | Code that only runs in the particular workspace image that
             | contains some function that was hacked out to make things
             | work during a debugging session 5 years ago.
             | 
             | Distributed projects where one person wrote the
             | preprocessor, another wrote the simulation software, and a
             | third wrote the analysis scripts, and they all share
             | undocumented assumptions worked out between the three
             | researchers over the course of two years.
             | 
             | Depending on implementation-defined behavior (like zeroing
             | out of data structures).
             | 
             | Function and variable names, like `doit()` and `hold`,
             | which make it hard to understand the intention.
             | 
             | Files that contain thousands of lines of imperative
             | instructions with documentation like "Per researcher X"
             | every 100 lines or so.
             | 
             | Code that runs fine for 6 hours, then stops because some
             | command-line input had the wrong value.
             | 
             | I've seen all of these over the years. Even as a domain
             | expert who has spoken directly with authors and project
             | leads, this kind of stuff makes it very hard to tease out
             | what the code actually does, and how the code corresponds
             | to the papers written about the results.
        
               | mroche wrote:
               | You're giving me flashbacks! I spent a year as an admin
               | on an HPC cluster at my university building
               | tools/software and helping researchers get their projects
               | running and re-lead the implementation of container
               | usage. The amount of scientific code/projects that
               | required libraries/files to be in specific locations, or
               | assumed that everything was being run from a home
               | directory, or sourced shell scripts at run time (that
               | would break in containers) was staggering. A lot of stuff
               | had the clear "this worked on my system so..." vibe about
               | it.
               | 
               | As an admin it was quite frustrating, but I understand it
               | sometimes when you know the person/project isn't tested
               | in a distributed environment. But when it's the projects
               | that do know how they're used and still do those
               | things...
        
             | searine wrote:
             | >I am interested to know the distinction between
             | "production-ready" and "science-ready" code.
             | 
             | In general, scientists don't care how long it takes or how
             | many resources the code uses. It is not a big deal to run a
             | script for an extra hour, or use up a node of
             | supercomputer. Extravagent solutions or added packages to
             | make the code run smoother or faster is only wasting time.
             | It speed/elegance only really matters when you know the
             | code is going to be distributed to the community.
             | 
             | Basically scientists only care if the result, is true. If
             | the result it outputs is sensible, defensible, reliable,
             | reproducible. It would be considered a dick move to
             | criticism someones code, if the code was proven to produce
             | the correct result.
        
               | Jabbles wrote:
               | Do you know how you could get to the state that "the code
               | was proven to produce the correct result"?
               | 
               | If not by unit tests, code review or formal logic, then
               | what?
        
               | searine wrote:
               | >If not by unit tests, code review or formal logic, then
               | what?
               | 
               | Cross referencing independent experiments and external
               | datasets.
               | 
               | Science doesn't work like software. The code can be
               | perfect and still not give results that reflect reality.
               | The code can be logical and not reflect reality. Most
               | scientists I know go in with the expectation that "the
               | code is wrong" and its results must be validated by at
               | least one other source.
        
               | jabirali wrote:
               | Not all scientific code is amenable to unit testing. From
               | my own experience from a PhD in condensed matter physics,
               | the main issue was that how important equations and
               | quantities "should" behave by themselves was often
               | unknown or undocumented, so very often each such
               | component could only be tested as part of a system with
               | known properties.
               | 
               | You can then use unit testing for low-level
               | infrastructure (e.g. checking that your ODE solver works
               | as expected), but do the high-level testing via
               | scientific validation. The first line of defense is to
               | check that you don't break any laws of physics, e.g. that
               | energy and electric charge is conserved in your end
               | results. Even small implementation mistakes can violate
               | these.
               | 
               | Then you search for related existing publications of a
               | theoretical or numerical nature, trying to reproduce
               | their results; the more existing research your code can
               | reproduce, the more certain you can be that it is at
               | least consistent with known science. If this fails, you
               | have something to guide your debugging; or if you're very
               | lucky, something interesting to write a paper about :).
               | 
               | The final validation step is of course to validate
               | against experiments. This is not suited for debugging
               | though, since you can't easily say whether a mismatch is
               | due to a software bug, experimental noise, neglected
               | effects in the mathematical model, etc.
        
               | jnxx wrote:
               | > It would be considered a dick move to criticism
               | someones code, if the code was proven to produce the
               | correct result.
               | 
               | Formal proof is much much harder than making code
               | understandable and reviewable. It can be done but it is
               | not easy, and can yield surprising results:
               | 
               | https://en.wikipedia.org/wiki/CompCert
               | 
               | http://envisage-project.eu/proving-android-java-and-
               | python-s...
        
             | lemmsjid wrote:
             | There's a ton of overlap, because science code might be a
             | long running, multi-engineer distributed system and
             | production code might be a script that supports a temporary
             | business process. But let's assume production ready is a
             | multi customer application and science ready is
             | computations to reproduce results in a paper.
             | 
             | Here's a quick pass, I'm sure I'm missing stuff, but I've
             | needed to code review a lot of science and production
             | output and below is how I tend to think of it, especially
             | taking efficiency of engineer/scientist time into account.
             | 
             | Production Ready?
             | 
             | * code well factored for extensibility, feature change, and
             | multi-engineer contribution
             | 
             | * robust against hostile user input
             | 
             | * unit and integration tested
             | 
             | Science Ready?
             | 
             | * code well factored for readability and reproducibility
             | (e.g. random numbers seeded, time calcs not set against
             | 'now')
             | 
             | * robust against expected user input
             | 
             | * input data available? testing optional but desired, esp
             | unit tests of algorithmic functions
             | 
             | * input data not available? a schema-correct facsimile of
             | input data available in a unit test context to verify
             | algorithms correct
             | 
             | Both?
             | 
             | * security needs assessed and met (science code might be
             | dealing with highly secure data, as might production code)
             | 
             | * performance and stability needs met (production code more
             | often requires long term stability, science sometimes needs
             | performance within expected Big O to save compute time if
             | it's a big calculation)
        
               | PeterisP wrote:
               | Your requirements seem to push 'Science ready' far into
               | what I'd consider "worthless waste of time", coming from
               | the perspective of code that's used for data analysis for
               | a particular paper.
               | 
               | The key aspect of that code is that it's going to be run
               | once or twice, ever, and it's only ever going to be run
               | on a particular known set of input data. It's a tool
               | (though complex) that we used (once) to get from A to B.
               | It does not need to get refactored, because the
               | expectation is that it's only ever going to be used as-is
               | (as it was used once, and will be used only for
               | reproducing results), it's not intended to be built upon
               | or maintained. It's not the basis of the research, it's
               | not the point of research, it's not a deliverable in that
               | research, it's just a scaffold that was temporarily
               | neccessary to do some task - one which might have been
               | done manually earlier through great effort, but that's
               | automated now. It's expected that the vast majority of
               | the readers of that paper won't ever need to touch that
               | code, they care only about the results and a few key
               | aspects of the methodology, which are (or should be) all
               | mentioned in the paper.
               | 
               | It should be reproducible to ensure that we (or someone
               | else) can obtain the same B from A in future, but that's
               | it, it does not need to be robust to input that's not in
               | the input datafile - noone in the world has another set
               | of real data that could/should be processed with that
               | code. If after a few years we or someone else will obtain
               | another dataset, _then_ (after those few years, _if_ that
               | dataset happens) there would be a need to ensure that it
               | works on that dataset before writing a paper about that
               | dataset, but it 's overwhelmingly likely that you'd want
               | to modify that code anyway both because that new dataset
               | would not be 'compatible' (because the code will be
               | tightly coupled to all the assumptions in the methodology
               | you used to get that data, and because it's likely to be
               | richer in ways you can't predict right now) and you'd
               | want to extend the analysis in some way.
               | 
               | It _should_ have a  'toy example' - what you call 'a
               | schema-correct facsimile of input data' that's used for
               | testing and validation before you run it on the actual
               | dataset, and it should have test scenarios and/or unit
               | tests that are preferably manually verifiable for
               | correctness.
               | 
               | But the key thing here is that no matter what you do,
               | that's still in most cases going to be "write once, run
               | once, read never" code, as long as we're talking about
               | the auxiliary code that supports some experimental
               | conclusions, not the "here's a slightly better method for
               | doing the same thing" CS papers. We are striving for
               | _reproducible_ code, but actual _reproductions_ are quite
               | rare, the incentives are just not there. We publish the
               | code as a matter of principle, knowing all well that most
               | likely noone will download and read it. The community
               | needs the possibility for reproduction for the cases
               | where the results are suspect (which is the main scenario
               | where someone is likely to attempt reproducing that
               | code), it 's there to ensure that if we later suspect
               | that the code is flawed in a way where the flaws affect
               | the conclusions then we can go back to the code and
               | review it - which is plausible, but not that likely.
               | Also, if someone does not trust our code, they can (and
               | possibly should) simply ignore it and perform a 'from
               | scratch' analysis of the data based what's said in the
               | paper. With a reimplementation, some nuances in the
               | results might be slightly different, but all the
               | conclusions in the paper should still be valid, if the
               | paper is actually meaningful - if a reimplementation
               | breaks the conclusions, _that_ would be a successful,
               | valuable non-reproduction of the results.
               | 
               | This is a big change from industry practice where you
               | have mantras like "a line of code is written once but
               | read ten times", in a scientific environment that ratio
               | is the other way around, so the tradeoffs are different -
               | it's not worth investing refactoring time to improve
               | readability, if it's expected that most likely noone will
               | ever read that code; it makes sense to spend that effort
               | only if and when you need it.
        
               | lemmsjid wrote:
               | Yep! I don't disagree with anything you're saying when I
               | think from a particular context. It's really hard to
               | generalize about the needs of 'science code', and my stab
               | at doing so was certain to be off the mark for a lot of
               | cases.
        
               | PeterisP wrote:
               | Yes, there are huge differences between the needs of
               | various fields. For example, some fields have a lot of
               | papers where the authors are presenting a superior method
               | for doing something, and if code is a key part of that
               | new "method and apparatus", then it's a key deliverable
               | of that paper and its accessibility and (re-)usability is
               | very important; and if a core claim of their paper is
               | that "we coded A and B, and experimentally demonstrated
               | that A is better than B" then any flaws in that code may
               | invalidate the whole experiment.
               | 
               | But I seem to get the vibe that this original Nature
               | article is mostly about the auxiliary data analysis code
               | for "non-simulated" experiments, while Hacker News seems
               | biased towards fields like computer science, machine
               | learning, etc.
        
             | analog31 wrote:
             | I'm a scientist in a group that also includes a software
             | production team. For me, the standard of scientific
             | reproducibility is that a result can be replicated by a
             | reasonably skilled person, who might even need to fill in
             | some minor details themselves.
             | 
             | Part of our process involves cleaning up code to a higher
             | state of refinement as it gets closer to entering the
             | production pipeline.
             | 
             | I've tested 30 year old code, and it still runs, though I
             | had to dig up a copy of Turbo Pascal, and much of it no
             | longer exists in computer readable form but would have to
             | be re-entered by hand. Life was actually simpler back then
             | -- with the exception of the built-ins of Turbo Pascal, it
             | has no dependencies.
             | 
             | My code was in fact adopted by two other research groups
             | with only minor changes needed to suit slightly different
             | experimental conditions. It contained many cross-checks,
             | though we were unaware of modern software testing concepts
             | at the time.
             | 
             | For a result to have broader or lasting impact, replication
             | is not enough. The result has to fit into a broader web of
             | results that reinforce one another and are extended or
             | turned into something useful. That's the point where
             | precise replication of minor supporting results becomes
             | less important. The quality of any specific experiment done
             | in support of modern electromagnetic theory would probably
             | give you the heebie jeebies, but the overall theory is
             | profoundly robust.
             | 
             | The same thing has to happen when going from prototype to
             | production. Also, production requires what I call push-
             | button replication. It has to replicate itself at the click
             | of a mouse, because the production team doesn't have domain
             | experts who can even critique the entirety of their own
             | code, and maintaining their code would be nearly impossible
             | if it didn't adhere to standards that make it maintainable
             | by multiple people at once.
        
               | Jabbles wrote:
               | This sounds great. In your opinion, do you think your
               | team is unusual in those aspects? Do you have any
               | knowledge of the quality of code in other branches of
               | physics or other sciences?
        
               | analog31 wrote:
               | Well, I know the quality of my own code before I got some
               | advice. And I've watched colleagues doing this as well.
               | 
               | My own code was quite clean in the 1980s, when the
               | limitations on the machines themselves tended to keep
               | things fairly compact with minimal dependencies. And I
               | learned a decent "structured programming" discipline.
               | 
               | As I moved into more modern languages, my code kind of
               | degenerated into a giant hairball of dependencies and
               | abstractions. "Just because you can do that, doesn't mean
               | you should." I've kind of learned that the commercial
               | programmers limit themselves to a few familiar patterns,
               | and if you try to create a new pattern for every problem,
               | your code will be hard to hand off.
               | 
               | Scientists would benefit from receiving some training in
               | good programming hygiene.
        
             | dandelion_lover wrote:
             | > the distinction between "production-ready" and "science-
             | ready" code
             | 
             | In the first case, you must take into account all
             | (un)imaginable corner cases and never allow the code to
             | fail or hang up. In the second case it needs to produce a
             | reproducible result at least for the published case. And do
             | not expect it to be user-friendly at all.
        
           | throwaway7281 wrote:
           | That's not how the game is played. If you cannot the release
           | the code because the code is too ugly or untested or has
           | bugs, how do you expect anyone with the right expertise to
           | assess your findings?
           | 
           | It reminds me of Kerckhoffs's principle in cryptography,
           | which states: A cryptosystem should be secure even if
           | everything about the system, except the key, is public
           | knowledge.
        
             | jnxx wrote:
             | > If you cannot the release the code because the code is
             | too ugly or untested or has bugs, how do you expect anyone
             | with the right expertise to assess your findings?
             | 
             | Yes, that should be this way.
             | 
             | Also all cases where some company research team goes to a
             | scientific conference and presents a nifty solution for
             | problem X without telling how it was purportedly done, it
             | should be absolutely required to publish code and data for
             | this.
             | 
             | *And that's also something which is broken about software
             | patents - patents are about open knowledge, software which
             | uses such patents is not open - this combination should not
             | be allowed at all).
        
               | jnxx wrote:
               | With the caveat that while in some cases, like
               | computational science, numerical analysis, machine
               | learning algorithms, computer-assisted proofs, and so on,
               | details of the code could be crucial, in other cases,
               | they should not matter that much. I too have the
               | impression that the HN public tends to over-value the
               | importance of code in these cases when it is mostly a
               | tool for evaluating a scientific result.
        
             | sjburt wrote:
             | The findings really should be independent of the code.
             | Reproduction should occur by taking the methodology and re-
             | implementing the software and running new experiments.
        
               | martingab wrote:
               | That's exactly the philosophy we follow e.g. in particle
               | physics and its a common excuse to dismiss all guidelines
               | made in the article. However, this kind of
               | validation/falsification is often done between different
               | research groups (maybe using different but formally
               | equivalent approaches) while people within the same group
               | have to deal with the 10 years old code base.
               | 
               | I myself had very bad experience with extending the
               | undocumented Fortran 77 code (lots of gotos and common
               | blocks) of my supervisor. Finally, I decided to rewrite
               | the whole thing including my new results instead of just
               | somehow embedding my results into the old code for two
               | reasons: (1) I'm presumably faster in rewriting the whole
               | thing including my new research rather than struggling
               | with the old code and (2) I simply would not trust in the
               | numerical results/phenomenology produced by the code.
               | After all, I'm wasting 2 months of my PhD for the
               | marriage of my own results with known results which -in
               | principle- could have been done within one day if the
               | code base would allow for it.
               | 
               | So yes, If it's a one-man-show I would not give too much
               | on code quality (though unit tests and git can safe quite
               | a lot of time during development) but if there is a
               | chance that someone else is going to touch the code in
               | near future it will save time to your colleagues and
               | improve the overall (scientific) productivity.
               | 
               | PS: quite excited about my first post here
        
               | jnxx wrote:
               | > After all, I'm wasting 2 months of my PhD for the
               | marriage of my own results with known results which -in
               | principle- could have been done within one day if the
               | code base would allow for it.
               | 
               | Sounds like it is quite good science to do that, because
               | it puts the computation on a pair of independent feet.
               | 
               | Otherwise, it could just be that the code you are using
               | as a bug and nobody notes until it is too late.
        
               | MaxBarraclough wrote:
               | > If it's a one-man-show I would not give too much on
               | code quality
               | 
               | This makes me a little uneasy, as _I 'm not too worried
               | about code quality_ can easily translate into _Yes I know
               | my code is full of undefined behaviour, and I don 't
               | care_.
               | 
               | > PS: quite excited about my first post here
               | 
               | Welcome to HN! reddit has more cats, Slashdot has more
               | jokes about sharks and laserbeams, but somehow we get by.
        
             | labcomputer wrote:
             | In GIS, there's a saying "the map is not the terrain". It
             | seems like HN is in a little SWE bubble, and needs to
             | understand "the code is not the science".
             | 
             | In science, code is not an end in-and-of-itself. It is a
             | _tool_ for simulation, data reduction, calculation, etc. It
             | is a way to test scientific ideas.
             | 
             | > how do you expect anyone with the right expertise to
             | assess your findings
             | 
             | I would expect other experts in the field to write their
             | own implementation of the scientific ideas expressed in a
             | paper. If the idea has any merit, their implementations
             | should produce similar results. Which is exactly what they
             | would do if it were a physical experiment.
        
               | yjftsjthsd-h wrote:
               | > In GIS, there's a saying "the map is not the terrain".
               | It seems like HN is in a little SWE bubble, and needs to
               | understand "the code is not the science".
               | 
               | And if you're a map maker, it's a bit rich to start
               | claiming that the accuracy of your maps is unimportant.
               | If code is "a way to test scientific ideas", then it
               | kinda needs to work if you want meaningful results. Would
               | you run an experiment with thermometers that were
               | accurate to +-30deg and reactants from a source known for
               | contamination?
        
               | jnxx wrote:
               | In many parts of scientific research, researchers are, to
               | stay in your metaphor, more travelers _using_ a map, than
               | map makers.
               | 
               | Of course, it is a difference whether you make a clinical
               | study on drugs, and use a pocket calculator to compute a
               | mean, or whether you research in numerical analysis, or
               | are presenting a paper in how to use Coq to more
               | efficiently prove the four-color theorem or Fermat's last
               | theorem.
               | 
               | In short, much of science is not computer science, and
               | for it, computation is just a tool.
        
               | RandoHolmes wrote:
               | No one is saying that code is the science.
               | 
               | If I'm given bad information and I act on that
               | information, then problems can occur.
               | 
               | Similarly, if the software is giving the scientist bad
               | information, problems can occur.
               | 
               | How many more stories do we have to read about some
               | research getting published in a journal only to have to
               | retract it down the road because they had a bug in the
               | software before we start asking if maybe there needs to
               | be more rigor in the software portion of the research as
               | well?
               | 
               | There was a story on HN a while back about a professor
               | who had written software, had come to some conclusions,
               | and even had a Ph.D. student working on research based on
               | that work. Only to find out that a software flaw meant
               | the conclusions weren't useful to anyone and that student
               | ended up wasting years of their life.
               | 
               | ---
               | 
               | This stuff matters. This isn't a model of reality, it's
               | an exploration of reality. It would be like telling a
               | hiker that terrain doesn't matter. They would,
               | rightfully, disagree with you.
        
               | kalenx wrote:
               | > How many more stories do we have to read about some
               | research getting published in a journal only to have to
               | retract it down the road because they had a bug in the
               | software before we start asking if maybe there needs to
               | be more rigor in the software
               | 
               | We will always hear stories like that, as we will always
               | hear stories about major bugs in stable software
               | releases. Asking a scientist to do better than whole
               | teams of software engineers makes little sense to me.
               | 
               | Of course, a bug that was introduced or kept with the
               | counscious intention of fooling the reviewers and the
               | readers is another story.
        
               | RandoHolmes wrote:
               | > Asking a scientist to do better than whole teams of
               | software engineers makes little sense to me.
               | 
               | This is not what is being asked, shame on you for the
               | strawman.
               | 
               | Your entire post can be summed up with the following
               | sentence: "if we can't be perfect then we may as well not
               | try to be better".
        
               | ufmace wrote:
               | I don't entirely disagree, but haven't there also been
               | cases of experimental results being invalidated due to
               | subtle mechanical, electrical, chemical, etc
               | complications with the test equipment, when none of the
               | people involved in the experiment were experts in those
               | fields?
               | 
               | I think that, while we could use a bit more training in
               | software engineering best-practices in the science, the
               | thesis is still that science is hard and we need real
               | replication of everything before reaching important
               | conclusions, and over-focusing on one specific type of
               | errors isn't all that helpful.
        
               | RandoHolmes wrote:
               | If they're setting up experiments whose correct results
               | require electrical expertise, then yes, they should
               | either get better training or bring in someone who has
               | it.
               | 
               | It's not clear to me why you think I would argue that
               | inaccuracies should be avoided in software but accept
               | that they're ok for electrical systems.
        
               | booleandilemma wrote:
               | If you're saying you produced certain results with code,
               | then the code is indeed the science. Not being able to
               | vouch for the code is like believing a mathematical
               | theorem without seeing the proof.
        
           | MaxBarraclough wrote:
           | At the risk of just mirroring points which have already been
           | made:
           | 
           | > you understand that the links in your post are the exact
           | worry people have when it comes to releasing code: people
           | claiming that their non-software engineering grade code
           | invalidates the results of their study.
           | 
           | It's profoundly unscientific to suggest that researchers
           | should be given the choice to withhold details of their
           | experiments that they fear will not withstand peer review.
           | That's much of the point of scientific publication.
           | 
           | Researchers who are too ashamed of their code to submit it
           | for publication, should be denied the opportunity to publish.
           | If that's the state of their code, their results aren't
           | publishable. Unpublishable garbage in, unpublishable garbage
           | out. Simple enough. Journals just shouldn't permit that kind
           | of sloppiness. Neither should scientists be permitted to take
           | steps to artificially make it difficult to reproduce (in some
           | weak sense) an experiment. (Independently re-running code
           | whose correctness is suspect, obviously isn't as good as
           | comparing against a fully independent reimplementation, but
           | it still counts for something.)
           | 
           | If a mathematician tried to publish the conclusion of a proof
           | but refused to show the derivation, they'd be laughed out of
           | the room. Why should we hold software-based experiments to
           | such a pitifully low standard by comparison?
           | 
           | It's not as if this is a minor problem. Software bugs really
           | can result in incorrect figures being published. In the case
           | of C and C++ code in particular, a seemingly minor issue can
           | result in undefined behaviour, meaning the output of the
           | program is _entirely_ unconstrained, with no assurance that
           | the output will resemble what the programmer expects. This
           | isn 't just theoretical. Bizarre behaviour really can happen
           | on modern systems, when undefined behaviour is present.
           | 
           | A computer scientist once told me a story of some students he
           | was supervising. The students had built some kind of physics
           | simulation engine. They seemed pretty confident in its
           | correctness, but in truth it hadn't been given any kind of
           | proper testing, it merely looked about right to them. The
           | supervisor had a suggestion: _Rotate the simulated world by
           | 19 degrees about the Y axis, run the simulation again, and
           | compare the results._ They did so. Their program showed
           | totally different results. Oh dear.
           | 
           | Needless to say, not all scientific code can so easily be
           | shown to be incorrect. All the more reason to subject it to
           | peer review.
           | 
           | > I'm an accelerator physicist and I wouldn't want my code to
           | end up on acceleratorskeptics.com with people that don't
           | understand the material making low effort critiques of minor
           | technical points.
           | 
           | Why would you care? Science is about advancing the frontier
           | of knowledge, not about avoiding invalid criticism from
           | online communities of unqualified fools.
           | 
           | I sincerely hope vaccine researchers don't make publication
           | decisions based on this sort of fear.
        
           | mmmBacon wrote:
           | Monte-Carlo can and should be deterministic and repeatable.
           | It's a matter of correctly initializing you random number
           | generators and providing a known/same random seed from run to
           | run. If you aren't doing that, you aren't running your Monte-
           | Carlo correctly. That's a huge red flag.
           | 
           | Scientists need to get over this fear about their code. They
           | need to produce better code and need to actually start
           | educating their students on how to write and produce code.
           | For too long many in the physics community have trivialized
           | programming and seen it as assumed knowledge.
           | 
           | Having open code will allow you to become better and you'll
           | produce better results.
           | 
           | Side note: 25 years ago I worked in accelerator science too.
        
             | neutronicus wrote:
             | Then you need to re-imagine the system in such a way that
             | junior scientific programmers (i.e. _Grad Students_ ) can
             | at least _imagine_ having enough job security for code
             | maintainability to matter, and for PIs to invest in their
             | students ' knowledge with a horizon longer than a couple
             | person-years.
        
             | djaque wrote:
             | Hello fellow accelerator physicist!
             | 
             | Yes I understand how seeding PRNGs work and I personally do
             | that for my own code for debugging purposes. My point was
             | that not using a fixed seed doesn't invalidate their
             | result. It's just a cheap shot and, to me, demonstrates
             | that the lockdownskeptics author doesn't have a real
             | understanding of the methods being used.
             | 
             | Also, to be clear, I support open science and have some of
             | my own open-source projects out in the wild (which is not
             | the norm in my own field yet). I'm not arguing against
             | releasing code, I'm arguing against OP arguing against this
             | particular piece of code.
        
               | SiempreViernes wrote:
               | Indeed it was a cheap shot, the code does give
               | reproducible results:
               | https://www.nature.com/articles/d41586-020-01685-y
               | 
               | The main issue is if it used sensible inputs, but that's
               | entirely different from code quality and requires subject
               | matter expertise, so programmers don't bother with such
               | details -_-
        
             | jnxx wrote:
             | > Monte-Carlo can and should be deterministic and
             | repeatable.
             | 
             | That's a nitpick, but if the computation is executed in
             | parallel threads (e.g. on multicore, or on a
             | multicomputer), and individual terms are, for example,
             | summed in a random order, caused by the non-determinism
             | introduced by the parallel computation, then the result is
             | not strictly deterministic. This is a property of floating-
             | point computation, more specifically, the finite accuracy
             | of real floating-point implementations.
             | 
             | So, it is not deterministic, but that _should_ not cause
             | large qualitative differences.
        
             | improbable22 wrote:
             | > Monte-Carlo can and should be deterministic and
             | repeatable
             | 
             | I guess it can be made so, but not necessarily easy / fast
             | (if it's parallel, and sensitive to floating point
             | rounding). And sounds like the kind of engineering effort
             | GP is saying isn't worth it. Re-running exactly the same
             | monte-carlo chain does tell you something, but is perhaps
             | the wrong level to be checking. Re-running from a different
             | seed, and getting results that are within error, might be
             | much more useful.
        
               | jbay808 wrote:
               | I guess the best thing would be that it uses a different
               | random seed every time it's run (so that, when re-running
               | the code you'll see _similar_ results which verifies that
               | the result is not sensitive to the seed), but the
               | particular seed that produced the particular results
               | published in a paper is noted.
               | 
               | But still, for code running on different machines,
               | especially for numeric-heavy code that might be running
               | on a particular GPU setup, distributed big data source
               | (where you pull the first available data rather than read
               | in a fixed order), or even on some special supercomputer,
               | it's hard to ask that it be totally reproducible down to
               | the smallest rounding error.
        
             | tgvaughan wrote:
             | I write M-H samplers for a living. While I agree that being
             | able to rerun a chain using the same seed as before is
             | crucial for debugging, and while I'm very strongly in
             | favour of publishing the code used for a production
             | analysis, I'm generally opposed to publishing the
             | corresponding RNG seeds. If you need the seeds to reproduce
             | my results, then the results aren't worth the PDF they're
             | printed on. [edit: typo]
        
             | jack_h wrote:
             | Since I have a bit of experience in this area, quasi-Monte
             | Carlo methods also work quite well and ensure deterministic
             | results. They're not applicable for all situations though.
        
           | ativzzz wrote:
           | While you're running experiments, it doesn't matter, but
           | publishing any sort of result or using your code in parts of
           | other publishable code IS production code, and you should
           | treat it as such.
        
           | oliver101 wrote:
           | Why is "doing software engineering" not "doing science"?
           | 
           | Anybody who has conducted experimental research will say they
           | spent 80% of the time using a hammer or a spanner. Repairing
           | faulty lasers or power supplies. This process of reliable and
           | repeatable experimentation is the basis of science itself.
           | 
           | Computational experiments must be held to the same standards
           | as physical experiments. They must be reproducible and they
           | should be publicly available (if publicly funded).
        
         | OminousWeapons wrote:
         | I am in 100% agreement and would like to point out that many
         | papers based on code don't even come with code bases, and if
         | they do those code bases are not going to contain or be
         | accompanied by any documentation whatsoever. This is frequently
         | by design as many labs consider code to be IP and they don't
         | want to share it because it gives them a leg up on producing
         | more papers and the shared code won't yield an authorship.
        
           | acutesoftware wrote:
           | If published research is based on a code base, then surely
           | the documentation and working code is equally important than
           | the carefully written paper.
        
             | OminousWeapons wrote:
             | I completely agree, the problem is the journal editors and
             | reviewers largely don't.
        
             | freeone3000 wrote:
             | No, the paper is what matters. The code is a means to
             | generate the paper.
        
               | bumby wrote:
               | I agree, but that's similar to saying the data is what
               | matters, not the methodology.
               | 
               | In the research germane to this conversation, software is
               | the means by which the scientific data is generated. If
               | the software is flawed, it undermines the confidence in
               | the data and thus the conclusions.
        
               | freeone3000 wrote:
               | Most researchers would agree with the first statement
               | without significant qualification. Methods are at the end
               | for a reason.
        
               | bumby wrote:
               | Not disagreeing with your assertion on the opinion of
               | "most researchers" but you'll often find quite a few
               | people advocating for using the methodology sans data as
               | a means to determine publication worthiness to try and
               | avoid the perverse incentives for novel or meaningful
               | data.
               | 
               | I think it's too easy to game the data (whether knowingly
               | or not) with poor methodology. I advocate process before
               | product, in other words.
        
         | WhompingWindows wrote:
         | It's hard for me to publish my code in healthcare services
         | research because most of it is under lock-and-key due to HIPAA
         | concerns. I can't release the data, and so 90% of the work of
         | munging and validating the data is un-releasable. So, should I
         | release my last 10% of code where I do basic descriptive stats,
         | make tables, make visualizations, or do some regression
         | modeling? Certainly, I can make that available in de-identified
         | ways, but without data, how can anyone ever verify its
         | usefulness? And does anyone want to see how I calculated the
         | mean, median, SD, IQR?...because it's with base R or tidyverse,
         | that's not exactly revolutionary code.
        
         | rscho wrote:
         | > If journals really care about the reproducibility crisis
         | 
         | All is well and good then, because journals absolutely don't
         | care about science. They care about money and prestige. From
         | personal experience, I'd say this intersects with the interests
         | of most high-ranking academics. So the only unhappy people are
         | idealistic youngsters and science "users".
         | 
         | Let's get back to non-profit journals.
        
         | SiempreViernes wrote:
         | In the event, the code actually _is_ reproducible:
         | https://www.nature.com/articles/d41586-020-01685-y
        
         | prionassembly wrote:
         | Institutions need to provide scientists and mathematicians with
         | coders. It's a bit insane to expect them to be software
         | engineers as well.
        
           | izacus wrote:
           | Noone expects them to be software engineers, but we do expect
           | them to be _scientists_ - to publish results that are
           | reproducible and verifiable. And that has to hold for code as
           | well.
        
           | neuromantik8086 wrote:
           | There are some efforts in this vein within academia, but they
           | are very weak in the United States. The U.S. Research
           | Software Engineer Association (https://us-rse.org/)
           | represents one such attempt at increasing awareness about the
           | need for dedicated software engineers in scientific research
           | and advocates for a formal recognition that software
           | engineers are essential to the scientific process.
           | 
           | In terms of tangible results, Princeton at least has created
           | a dedicated team of software engineers as part of their
           | research computing unit
           | (https://researchcomputing.princeton.edu/software-
           | engineering).
           | 
           | Realistically though even if the necessity of research
           | software engineering were acknowledged at the institutional
           | level at the bulk of universities, there would still be the
           | problem of universities paying way below market rate for
           | software engineering talent...
           | 
           | To some degree, universities alone cannot effect the change
           | needed to establish a professional class of software
           | engineers that collaborate with researchers. Funding agencies
           | such as the NIH and NSF are also responsible, and need to
           | lead in this regard.
        
             | geebee wrote:
             | Thank you for the link to the Princeton group. That is
             | encouraging. Aside from that, I share your lack of optimism
             | about the prospects for this niche.
             | 
             | Most research programmers, in my experience, work in a lab
             | for a PI. Over time, these programmers have become more
             | valued by their team. However, they often still face a hard
             | cap on career advancement. They generally are paid
             | considerably less than they'd earn in the private sector,
             | with far less opportunity for career growth. I think they
             | often make creative contributions to research that would be
             | "co-author" level worthy if they came from someone in an
             | academic track, but they are frequently left off
             | publications. They don't get the benefits that come with
             | academic careers, such as sabbaticals, and they often work
             | to assignment, with relatively little autonomy. The right
             | career path and degree to build the skills required for
             | this kind of programming is often a mismatch for the
             | research-oriented degrees that are essential to advancement
             | in an academic environment (including leadership roles that
             | aren't research roles).
             | 
             | In short, I think there is a deep need for the emerging
             | "research software engineer" you mention, but at this
             | point, I can't recommend these jobs to someone with the
             | talent to do them. There are a few edge cases (lifestyle,
             | trailing spouse in academic, visa restrictions), but
             | overall, these jobs are not competitive with the pay,
             | career growth, autonomy, and even job security elsewhere
             | (university jobs have a reputation for job security, but
             | many research programmers are paid purely through a grant,
             | so often these are 1-2 year appointments that can be
             | extended only if the grant is renewed).
             | 
             | The Princeton group you linked to is encouraging - working
             | for a unit of software developers who engage with
             | researchers could be an improvement. Academia is still a
             | long, long way away from building the career path that
             | would be necessary to attract and keep talent in this
             | field, though.
        
         | noelsusman wrote:
         | The criticisms of the code from Imperial College are strange to
         | me. Non-deterministic code is the least of your problems when
         | it comes to modeling the spread of a brand new disease.
         | Whatever error is introduced by race conditions or multiple
         | seeds is completely dwarfed by the error in the input
         | parameters. Like, it's hard to overstate how irrelevant that is
         | to the practical conclusions drawn from the results.
         | 
         | Skeptics could have a field day tearing apart the estimates for
         | the large number of input parameters to models like that, but
         | they choose not to? I don't get it.
        
         | marmaduke wrote:
         | This is an easy argument to make because it was already made
         | for you in popular press months ago.
         | 
         | Show me the grant announcements that identify reproducible long
         | term code as a key deliverable, and I'll show you 19 out of 20
         | scientists who start worrying about it.
        
         | amelius wrote:
         | You can blame all the scientists, but shouldn't we blame the CS
         | folks for not coming up with suitable languages and software
         | engineering methods that will prevent software from rotting in
         | the first place?
         | 
         | Why isn't there a common language that all other languages
         | compile to, and that will be supported on all possible
         | platforms, for the rest of time?
         | 
         | (Perhaps WASM could be such a language, but the point is that
         | this would be just coincidental and not a planned effort to
         | conservate software)
         | 
         | And why aren't package managers structured such that packages
         | will live forever (e.g. in IPFS) regardless of whether the
         | package management system is online? Why is Github still a
         | single point of failure in many cases?
        
         | klyrs wrote:
         | I do research for a private company, and open-source as much of
         | my work as I can. It's _always_ a fight. So I 'll take their
         | side for the moment.
         | 
         | Many years ago, a paper on the PageRank algorithm was written,
         | and the code behind that paper was monetized to unprecedented
         | levels. Should computer science journals also require working
         | proof of concept code, even if that discourages companies from
         | sharing their results; even if it prevents students from
         | monetizing the fruits of their research?
        
         | bartvbl wrote:
         | The graphics community has started an interesting initiative at
         | this end: http://www.replicabilitystamp.org/
         | 
         | After a paper has been accepted, authors can submit a
         | repository containing a script which automatically replicates
         | results shown in the paper. After a reviewer confirms that the
         | results were indeed replicable, the paper gets a small badge
         | next to its title.
         | 
         | While there could certainly be improvements, I think it's a
         | step in the right direction.
        
           | dandelion_lover wrote:
           | But does this badge influence the scientific profile / resume
           | of the researcher in any way?
        
             | jpeloquin wrote:
             | You can always put "certified by the Graphics Replicability
             | Stamp Initiative" next to each paper on your CV. It might
             | influence people a little, even if it isn't part of the
             | formal review for employment / promotion. Although
             | "Graphics Replicability Stamp Initiative" does not sound
             | very impressive. And Federal grant applications have rules
             | about what can be included in your profile.
             | 
             | Informal reputation does matter though. If you want to get
             | things done and not just get promoted, you need the
             | cooperation of people with a similar mindset, and
             | collaboration is entirely voluntary.
        
         | ranaexmachina wrote:
         | In computer science a lot of researcher already publish their
         | code (at least in the domain of software engineering) but my
         | biggest problem is not the absence of tests but the absence of
         | any documentation how to run it. In the best case you can open
         | it in an IDE and it will figure out how to run it but I rarely
         | see any indications what the dependencies are. So if you figure
         | out how to run the code you run it until you get the first
         | import exception, get the dependency until you get the next
         | import exception and so on. I spent way too much time on that
         | instead of doing real research.
        
         | justin66 wrote:
         | John Carmack, who did some small amount of work on the code,
         | had a short rebuttal of the "Lockdown Skeptics" attack on the
         | Imperial College code that probably mirrors the feelings of
         | some of us here:
         | 
         | https://mobile.twitter.com/id_aa_carmack/status/125819213475...
        
         | onhn wrote:
         | There is a fundamental reason not to publish scientific code.
         | 
         | If someone is trying to reproduce someone else's results, the
         | data and methods are the only ingredients they need. If you add
         | code into this mix, all you do is introduce new sources of
         | bias.
         | 
         | (Ideally the results would be blinded too.)
        
       | alexeiz wrote:
       | Pfff. Does my 3 month old code still run? Uh, nope. And I don't
       | remember what it was supposed to do!
        
       | hpcjoe wrote:
       | Short answer: Yes, my 30 year old Fortran code runs (with a few
       | minor edits between f77 and modern fortran), as did my ancient
       | Perl codes.
       | 
       | Watching the density functional theory based molecular dynamics
       | zip along at ~2 seconds per time step on my 2 year old laptop,
       | versus the roughly 6k seconds per time step on an old Sun machine
       | back in 1991. I remember the same code getting down to 60 seconds
       | per time step on my desktop R8k machine in the late 90s.
       | 
       | Whats been really awesome about that has been the fact that I've
       | written some binary data files on big endian machines in the
       | early 90s, and re-read them on the laptop (little endian) adding
       | a single compiler switch.
       | 
       | Perl code that worked with big XML file input in the mid 2000s
       | continues to work, though I've largely abandoned using XML for
       | data interchange.
       | 
       | C code I wrote in the mid 90s compiled, albeit with errors that
       | needed to be corrected. C++ code was less forgiving.
       | 
       | Over the past 4 months, I had to forward port a code from Boost
       | 1.41 to Boost 1.65. Enough changes over 9 years (code was from
       | 2011) that it presented a problem. So I had to follow the changes
       | in the API and fix it.
       | 
       | I am quite thankful I've avoided the various fads in platforms
       | and languages over the years. Keep inputs in simple textual
       | format that can be trivially parsed.
        
         | atrettel wrote:
         | > Whats been really awesome about that has been the fact that
         | I've written some binary data files on big endian machines in
         | the early 90s, and re-read them on the laptop (little endian)
         | adding a single compiler switch.
         | 
         | I want to second the idea of just dumping your floating point
         | data as binary. It's basically the CSV of HPC data. It doesn't
         | require any libraries, which could break or change, and even if
         | the endianness changes you can still read it decades later.
         | I've been writing a computational fluid dynamics code recently
         | and decided to only write binary output for those reasons. I'm
         | not convinced of the long-term stability of other formats. I've
         | seen colleagues struggle to read data in proprietary formats
         | even a few years after creating it. Binary is just simple and
         | avoids all of that. Anybody can read it if needed.
        
           | petschge wrote:
           | Counter argument: Binary dumps are horrible because usually
           | the documentation that allows you to read the data is
           | missing. Using a self-documenting format such as HDF5 is far
           | superior. It will tell you of the bit are floating point
           | numbers in single or double precision, which endianess and
           | what the layout of the 3d array was. (No surprise that HDF
           | was invented for the Voyager mission where they had to ensure
           | readability of the data for half a century).
        
             | iagovar wrote:
             | Why not dumping into SQLite? It makes everything easy, and
             | we will be able to use sqlite3 for a long time IMO.
        
               | petschge wrote:
               | Because parallel IO from a lot of different MPI ranks is
               | not supported. And filesystems tend to look unhappy when
               | 100k processes try to open a new file at the same time.
        
             | atrettel wrote:
             | Your argument raises a lot of good points. I actually agree
             | that binary does lose all of the metadata and documentation
             | that goes with it. That is a big problem. That is why I
             | think it is also important to include some sort of
             | documentation like an Xdmf file [1]. That is what I use to
             | tie everything together in my particular project. HDF5 is
             | fine. In fact, I would have strongly preferred my
             | colleagues using HDF5 over the proprietary format that they
             | did end up using. But HDF5 requires an additional library.
             | I did not want to use any external libraries in my
             | particular project (other than MPI), so I tried to look for
             | a solution that achieves close to what HDF5 can achieve but
             | without requiring something as "heavy" as HDF5. I have to
             | admit that perhaps my design choice does not work for more
             | complex situations, but I think it is something people
             | should consider before tying themselves down too much.
             | 
             | [1] http://www.xdmf.org/index.php/Main_Page
        
               | petschge wrote:
               | Having an Xdmf file alongside is nice, but the breaking
               | changes between v2 and v3 are very annoying. And I
               | understand the want to have few external dependencies,
               | but at least HDF5 is straight forward to compile and
               | available as a pre-compiled module on all supercomputers
               | that I have ever seen.
        
             | hpcjoe wrote:
             | I got into the habit of documenting each file with a
             | file.meta that I could view later on.
             | 
             | I did binary dumps in the past because ascii dumps
             | (remember, 90s) were far more time/space expensive. HDF
             | wasn't quite an option then, either HDF4, or HDF5.
             | 
             | These days I would probably look at something like that,
             | though, to be honest, there is always a danger of choosing
             | something that may not be supported over the long term.
             | This is why I generally prefer open and simple formats for
             | everything. HDF5 is nice and open.
             | 
             | One needs to look carefully at the total risk of using a
             | proprietary format/system for any part of their storage.
             | Chances are you will not be able to even read older data
             | within a small number of decades if any of the
             | format/system dependent technologies goes away.
             | 
             | I've got old word processor files from the mid 80s, that I
             | can't read. What I've written there (mostly college papers)
             | is lost (which may be a net positive for humanity).
             | 
             | My tarballs, and zip files though, are readable 30+ years
             | later. That is pretty amazing.
             | 
             | Simple, documented, and open formats. Picture a time when
             | you can't read/open your pptx/xlsx/docx files any more.
             | Same with data. Simple binary formats are like CSV files,
             | but you do need to maintain metadata on their contents, and
             | document it extensively in the code as to what you are
             | reading/writing, why you are doing this, and how you are
             | doing this.
             | 
             | I think this will get more important over time as we start
             | asking questions on how to maintain open artefact
             | repositories for data and code. The fewer dependencies the
             | better.
             | 
             | And unlike the recent gene renaming snafu in biology[1],
             | you really, never, want your tool to get in the way of the
             | science. Either in terms of formats, or interpretation of
             | data.
             | 
             | [1] https://www.theverge.com/2020/8/6/21355674/human-genes-
             | renam...
        
         | Rochus wrote:
         | Yes, I know a couple of Fortran 77 apps and libraries which
         | were developed more than 25 years ago and which are still in
         | use today.
         | 
         | My C++ Qt GUI application for NMR spectrum analysis
         | (https://github.com/rochus-keller/CARA) runs since 20 years now
         | with continuing high download and citation rates.
         | 
         | So obviously C++/Qt or Fortran 77 are very well suited to
         | outlast time.
        
           | O_H_E wrote:
           | Nice. Interesting to know that Github starts aren't always a
           | representative metric.
        
             | Rochus wrote:
             | Yes, many of my apps and libs were more than ten years old
             | when I pushed them to github. Some projects started before
             | git was invented.
        
       | lumost wrote:
       | I'm continuously surprised that Code Review isn't a part of the
       | review process for journal acceptance. The majority of academic
       | code for a given paper isn't particularly large - and the
       | benefits are significant.
        
       | myself248 wrote:
       | Plenty of actual professional programmers can't manage this, how
       | is it a fair standard to hold scientists to, when the code is
       | just one of the many tools they're trying to use to get their
       | real job done?
       | 
       | I think moving away from the cesspool of imported remote
       | libraries that update at random times and can vanish off the
       | internet without warning, would help a lot of both cases.
        
         | minkzilla wrote:
         | I think we have to hold scientists to higher standards for code
         | quality because it has a direct impact on the findings of their
         | results. How many off by one or other subtle errors that are
         | found later in testing have most software engineers written in
         | their career? Is it fine to just say eh, scientific results can
         | be off by one because the standards should be lower?
        
         | proverbialbunny wrote:
         | >Plenty of actual professional programmers can't manage this,
         | how is it a fair standard to hold scientists to
         | 
         | That's a good point. On a tangential note, prototype code tends
         | to be at a higher level than production code, so there is a
         | higher chance 10 year old code will continue to run on the
         | scientist side, as long as the libraries imported haven't
         | vanished.
        
         | rudolph9 wrote:
         | Professional programmers should adopt package manager that
         | focus on reproducibility like Guix and Nix and make them
         | accessible enough for non programmers to use.
         | 
         | Neither of these are perfect but in my experience they are
         | worlds better than apk, Dockerfiles, and many other commonly
         | used solutions.
         | 
         | http://guix.gnu.org/
         | 
         | https://nixos.org/
        
       | userbinator wrote:
       | I still use Windows binaries daily which I wrote and last
       | modified over 20 years ago. I don't expect that to change in the
       | next ten years either.
        
         | dr-detroit wrote:
         | sounds neat please link the open source repo so we can check it
         | out
        
       | xipho wrote:
       | Yes. 110% attributed to learning about unit-tests and gems/CPAN
       | in grad school.
       | 
       | IMO there is a big fallacy about the "just get it to work"
       | approach. Most serious scientific code, i.e. supporting months-
       | years of research, is used and modified _a lot_. It 's also not
       | really one-off, it's a core part of a dissertation, or research
       | program, if it fails- you do. I'd argue that (and I found that),
       | using unit-tests, a deployment strategy, etc. ultimately allowed
       | me to do more, and better science because in the long run I
       | didn't spend as much time figuring out why my code didn't run
       | when I tweaked stuff. This is really liberating stuff. I suspect
       | this is all obvious to those who have gone down that path.
       | 
       | Frankly, every reasonably tricky problem benefits from unit-tests
       | as well for another reason. Don't know how to code it, but know
       | the answer? Assert lots of stuff, not just one at a time red-
       | green style. Then code, and see what happens. So powerful for
       | scientific approaches.
        
         | xorfish wrote:
         | And bugs can have quite big implications:
         | 
         | https://smw.ch/article/doi/smw.2020.20336
        
       | wdwvt1 wrote:
       | An excellent article full of good suggestions. I appreciated that
       | it's less certain of the Best Practices TM than many comments on
       | this subject. I am curious how the goals/techniques for
       | reproducibility change with the percentage of
       | software/computational work that a scientific project contains.
       | It feels like as the percentage of a paper's ultimate conclusions
       | that are computationally derived increases, the importance of
       | strict "the tests pass and the numerical results are identical"
       | reproducibility also increases. Most of my projects are mixed
       | wet-lab/dry-lab - a fair amount of custom code is required, but
       | it's usually less than 50% of the work. When I'm relying on other
       | papers that have a similar mix of things, I'm often not
       | interested if the continuous integration tests of their code
       | pass. I am more interested in understanding well the specific
       | steps they take computationally and in a sensitivity analysis of
       | their computational portion (if you slightly alter your binning
       | threshold do you still get that fantastic clustering?). I believe
       | this is because in my field (microbiology), computational tools
       | can guide, but physical reality and demonstrated biology are the
       | only robust evidence of a phenomenon/mechanism/etc. For most
       | research I do not demand tests of all the analytical pieces they
       | are relying on (was their incubator actually set to 37C? was the
       | pH of the media +- 0.2? etc) - I trust they've done good science.
       | Why would I demand their code meet a higher standard?
        
       | majewsky wrote:
       | CMake Error at /usr/share/cmake-3.18/Modules/FindQt4.cmake:1314
       | (message):         Found unsuitable Qt version "5.15.0" from
       | /usr/bin/qmake, this code         requires Qt 4.x
       | 
       | Well fuck.
        
       | Gatsky wrote:
       | I think code is remarkably persistent in the scheme of things.
       | Try reproducing a wet lab experimental technique from 5 years
       | ago.
        
       | closeparen wrote:
       | Almost certainly not, because it would have been written in
       | Python 2.
        
       | fizzled wrote:
       | Yep. I wrote a netlist analyzer in Perl that provides
       | statistics... in 1997. It is still part of a regression suite
       | because it is very small, very fast and callable through the
       | command line without loading hundreds of megabytes of libraries
       | (unlike foundation tools). I reconnected with a peer on LinkedIn
       | who still works at the company and joked that he still sees my
       | sill script name in verification flows. The only changes I made
       | to in 20+ years it was moving to PERL 5.61 so that I could parse
       | files >1GB, but it has been maintained and kept to standard
       | practices.
        
       | fourseventy wrote:
       | Forget 10 year old code. Try to get your 2 year old javascript +
       | webpack + react set up running...
        
       | slhck wrote:
       | The two main problems in academia are that a) few researchers
       | have formal training in best practices of software engineering,
       | and that b) time pressure leads to "whatever worked two minutes
       | before submission deadline" becoming what is kept for
       | posteriority.
       | 
       | When I started working as a full-time researcher, I had come from
       | working two years in a software shop, only to find people at the
       | research lab having never used VCS, object-oriented programming,
       | etc. Everyone just put together a few text files and Python or
       | MATLAB scripts that output some numbers that went into Excel or
       | gnuplot scripts that got copy-pasted into LaTeX documents with
       | suffixes like "v2_final_modified.tex", shared over Dropbox.
       | 
       | Took a long time to establish some coding standards, but even
       | then it took me a while to figure out that that alone didn't
       | help: you need a proper way to lock dependencies, which, at the
       | time, was mostly unknown (think requirements.txt, packrat for R,
       | ...).
        
         | justinmeiners wrote:
         | Don't you think docker, dependencies, unit test frameworks, etc
         | actually increase the need for ongoing maintenance as opposed
         | to spitting out some C files or python scripts which last
         | "forever"?
        
           | tanilama wrote:
           | No.
           | 
           | Python/C files didn't work in a vacuum. They need
           | dependencies, that is the point of Docker after all.
           | 
           | Capture all necessary dependencies into a single image.
        
             | justinmeiners wrote:
             | > Python/C files didn't work in a vacuum
             | 
             | They do if you use the standard library (which for python
             | is quite extensive), and copy any dependencies into your
             | own source, as if they are your own. By "in a vaccum" we
             | can mean if python o is installed, it will work.
             | 
             | > Capture all necessary dependencies
             | 
             | Docker doesn't capture any dependencies. They still exist
             | on the internet. It just captures a list of which ones to
             | download when you build the image.
             | 
             | Do you think software we write now has more longevity than
             | older software that uses make or a shell script?
        
           | slhck wrote:
           | I don't think so. The source code is the same but there's now
           | metadata that helps in setting up the same environment again,
           | even years later. You still have the original code in case,
           | e.g. Docker is no longer available.
           | 
           | For instance, if you just have a Python script importing a
           | statistical library, what version are you going to use? Scipy
           | had a pretty nasty change in one of its statistical
           | functions, changing the outcome of significance tests in our
           | project. Depending on which version you happened to have
           | installed it'd give you a positive or negative result.
        
             | justinmeiners wrote:
             | It makes sense that having more information is better than
             | less.
             | 
             | I would argue that they should use no dependencies to avoid
             | this problem entirely, or download them and include them as
             | source in the project, or at least include a note of which
             | version of a major library they used in a README or
             | comment. I think this is what is often done in practice
             | currently.
             | 
             | Perhaps as you are saying, docker is just a stable way to
             | document this stuff formally. But it is a large moving part
             | that assumes a lot of stuff is still on the internet. What
             | if the docker hub image is removed or dramatically changed?
             | What if that OS package manager no longer exists? It just
             | doesn't seem like our software is getting more longevity,
             | but less. I don't know why we would bring that extra
             | complexity to academic research if the goal is longevity.
        
         | hobofan wrote:
         | requirements.txt is not a lockfile
        
       | neuromantik8086 wrote:
       | Just as a quick bit of context here, Konrad Hinsen has a specific
       | agenda that he is trying to push with this challenge. It's not
       | clear from this summary article, but if you look at the original
       | abstract soliciting entries for the challenge
       | (https://www.nature.com/articles/d41586-019-03296-8), it's a bit
       | clearer that Hinsen is using this to challenge the technical
       | merits of Common Workflow Language (https://www.commonwl.org/;
       | currently used in bioinformatics by the Broad Institute via the
       | Cromwell workflow manager).
       | 
       | Hinsen has created his own DSL, Leibniz
       | (https://github.com/khinsen/leibniz ; http://dirac.cnrs-
       | orleans.fr/~hinsen/leibniz-20161124.pdf), which he believes is a
       | better alternative to Common Workflow Language. This
       | reproducibility challenge is in support of this agenda in
       | particular, which is worth keeping in mind; it is not an unbiased
       | thought experiment.
        
         | jnxx wrote:
         | Konrad Hinsen is an expert in molecular bioinformatics and also
         | has significantly contributed to Numerical Python, for example,
         | and has extensively published around the topic of reproducible
         | science and algorithms - see his blog.
         | 
         | The fact that he might favor different solutions from you does
         | not mean that he is pushing some kind of hidden agenda.
         | 
         | If you think that Common Workflow Language is a better
         | solution, you are free to explain in a blog why you think this.
         | 
         | Are you saying that the reproductive challenge poses a
         | difficulty to Common Workflow Language? If this is so, would
         | that not rather support Hinsen's point - without implying that
         | what he suggests is already a perfect solution?
        
           | neuromantik8086 wrote:
           | I never said that Konrad Hinsen's agenda was hidden; in fact,
           | it's not at all hidden (which is why I linked the abstract).
           | It's just that this context isn't at all clear in the Nature
           | write-up, and it's relevant to take into account.
           | 
           | I haven't taken the time to seriously contemplate the merits
           | of CWL vs Leibniz, although my gut instinct is that we don't
           | really need another domain-specific language for science
           | given the profusion of such languages that already exist
           | (Mathematica, Maple, R, MATLAB, etc). That's the extent of my
           | bias, but again, it's a gut instinct and not a comprehensive
           | well-reasoned argument against Leibniz.
        
       | rkagerer wrote:
       | _" Visual Basic," Maggi writes in his report, "is a dead language
       | and long since has been replaced..."_
       | 
       | In fact I still have the VB6 IDE installed on my primary
       | workstation and use it for quick and dirty projects from time to
       | time.
        
       | garden_hermit wrote:
       | I favor open code, but like everything, there are issues. For
       | example, the EPA years ago required that research can only inform
       | policy when data is open; open data, however, takes a lot of
       | effort to document and provide. Companies, however, with vested
       | interest in EPA policy can easily produce open (and often very
       | biased) data.
       | 
       | Requirements for open code can lead to similar issues--what
       | happens when a government agency rejects the outcome of a
       | supercomputer simulation because the code wasn't documented well
       | enough? What happens when those with vested interests are the
       | ones best able to produce scientific code?
       | 
       | Scientists already wear many hats. Any shift in policy and norms
       | needs to consider that they have limited time, a fact that can
       | have far-reaching consequences.
        
       | adornedCupcake wrote:
       | "Python 2.7 puts "at our disposal an advanced programming
       | language that is guaranteed not to evolve anymore", Rougier
       | writes1." Oh no. That's not at all what was intended. Regarding
       | my own research: I'm doing theoretical biophysics. Often I do
       | simulations. If conda stays stable enough, my code should be
       | reproducible. There's however some external binaries(like lammps)
       | I did not turn into a conda package yet. There's no official
       | package that fits my use-case in conda since compilation is fine-
       | grained to each user's needs.
        
         | rekado wrote:
         | I added different variants of lammps to a Guix channel we
         | maintain at our institute:
         | 
         | https://github.com/BIMSBbioinfo/guix-bimsb/blob/master/bimsb...
         | 
         | Thankfully, Guix makes it easy to take an existing package
         | definition and create an altered variant of it.
        
       | wenc wrote:
       | The easiest way to preserve code for posterity is the wrap up the
       | runtime environment in a VM. I can boot up a VM from 15 years ago
       | (when I was in grad school) and it will run.
       | 
       | When you're writing code for science, preserving code for
       | posterity is rarely a priority. Your priority is to iterate
       | quickly because the goal is scientific results, not code.
       | 
       | (this is in fact, correct prioritization. Under most
       | circumstances, though not all, most grad students who try to
       | write pristine code find themselves progressing more slowly than
       | those who don't.)
        
       | Sebb767 wrote:
       | As someone who worked with bits of scientific code: Does the code
       | you write _right now_ work on another machine might be the more
       | appropriate challenge. If seen a lot of hardcoded paths,
       | unmentioned dependencies and monkey-patched libraries downloaded
       | from somewhere; just getting the new code to work is hard enough.
       | And let 's not even begin to talk about versioning or magic
       | numbers.
       | 
       | Similar to other comments I don't mean to fault scientists for
       | that - their job is not coding and some of the dependencies come
       | from earlier papers or proprietary cluster setups and are
       | therefore hard to avoid - but the situation is not good.
        
         | TheJoeMan wrote:
         | I emailed an author of a 5 year old paper and they said they
         | had lost their original MATLAB code, certainly brings into
         | question their paper.
        
           | James_Henry wrote:
           | Definitely makes you question it more. Does the paper not
           | explain the contents of the MATLAB code? That's all that is
           | usually needed for reproducibility. You should be able to get
           | the same results no matter who writes the code to do what is
           | explained in their methods.
           | 
           | Of course, I have no idea about the paper you're talking
           | about and just want to say that reproducibility isn't
           | dependent on releasing code. There could even be a case were
           | it's better if someone reproduces a result without having
           | been biased by someone else's code.
        
         | dunefox wrote:
         | If a scientist needs to write code then it's part of their job.
         | It's as easy as that.
        
           | magv wrote:
           | I think the idea that scientific code should be judged by the
           | same standards as production code is a bit unfair. The point
           | when the code works the first time is when an industry
           | programmer starts to refactor it -- because he expects to use
           | and work on it in the future. The point when the code works
           | the first time is when a scientists abandons it -- because it
           | has fulfilled its purpose. This is why the quality is lower:
           | lots of scientific code is the first iteration that never got
           | a second.
           | 
           | (Of course, not all scientific code is discardable, large
           | quantities of reusable code is reused every day; we have many
           | frameworks, and the code quality of those is completely
           | different).
        
             | dunefox wrote:
             | That's not the point, though. If you obtain your results by
             | writing and executing code then code quality matters - to
             | reproduce and validate them.
        
         | abdullahkhalids wrote:
         | Lots of people saying, it is the scientist's job to produce
         | reproducible code. It is, and the benefits of reproducible code
         | are many. I have been a big proponent of it in my own work.
         | 
         | But not with the current mess of software frameworks. If I am
         | to produce reproducible scientific code, I need an idiot-proof
         | method of doing it. Yes, I can put in the 50-100 hours to learn
         | how to do it [1], but guess what, in about 3-5 years a lot of
         | that knowledge will be outdated. People comparing it with math,
         | but the math proofs I produce will still be readable and
         | understandable a century from now.
         | 
         | Regularly used scientific computing frameworks like
         | matlab/R/Python ecosystem/mathematica need a dumb guided method
         | of producing releasable and reproducable code. I want to go
         | through a bunch of next buttons, that help me fix the problems
         | you indicate, and finally release a final version that has all
         | the information necessary for someone else to reproduce the
         | results.
         | 
         | [1] I have. I would put myself in the 90th percentile of
         | physicists familiar with best practices for coding. I speak for
         | the 50% percentile.
        
           | zelphirkalt wrote:
           | The dumb guide is the following:
           | 
           | (1) Use a package manager, which stores hashsums in a lock
           | file. (2) Install your dependencies from a lock file as spec.
           | (3) Do not trust version numbers. Trust hash sums. Do not
           | believe in "But I set the version number!". (4) Do not rely
           | on downloads Again, trust hash sums, not URLs. (5)
           | Hashsums!!! (6) Wherever there is randomness as in random
           | number generators, use a seed. If the interface does not
           | allow to specify the seed, thtow the trash away and use
           | another generator. Careful when concurrency is involved. It
           | might destroy reproducibility. For example this was the case
           | with Tensorflow. Not sure it still is. (7) Use a version
           | control system.
        
             | hobofan wrote:
             | > in about 3-5 years a lot of that knowledge will be
             | outdated
             | 
             | Yup, and most of the points you mentioned will probably not
             | be outdated for quite some while. Every package manager I'm
             | aware of with lock files that are that old can still
             | consume them today.
        
         | hobofan wrote:
         | > their job is not coding
         | 
         | But it often is. For most non-CS papers (mostly biosciences)
         | I've read, there are specific authors whose contribution to a
         | large degree was mainly "coding".
        
         | BeetleB wrote:
         | > their job is not coding
         | 
         | To me, that's like a theoretical physicist saying "My job is
         | not to do mathematics" when asked for a derivation of a formula
         | he put in the paper.
         | 
         | Or an experimental physicist saying "My job is not mechanical
         | engineering" when asked for details of their lab equipment
         | (almost all of which is typically custom built for the
         | experiment).
        
           | Sebb767 wrote:
           | On one hand, yes. But on the other hand, reuseable code,
           | dependency management, linting, portability etc are not
           | _that_ easy problems and something junior developers tend to
           | struggle with (and its not like that problem never pops up
           | for seniors, either). I really can 't fault non-compsci
           | scientist for not handling that problem well. Of course, part
           | of it (like publishing the relevant code) is far easier and
           | should be done, but some aspects are really hard.
           | 
           | IMO the incentive problem in science (basically number of
           | papers and new results is what counts) also plays into this,
           | as investing tons of time in your code gives you hardly any
           | reward.
        
             | dunefox wrote:
             | There are tons of tutorials on using conda for dependency
             | management, it's not rocket science. And using a linter is
             | difficult? If a scientist needs to read and write code as
             | part of their job then they should learn the basics of
             | programming - that includes tools and 'best practices'.
        
             | BeetleB wrote:
             | > But on the other hand, reuseable code, dependency
             | management, linting, portability etc are not that easy
             | problems and something junior developers tend to struggle
             | with
             | 
             | On the original hand, these are easier problems than all
             | the years of math education they have. Once you're relying
             | on simulations to get results to explain natural phenomena,
             | it needs to be put on the same pedestal as mathematics.
        
           | djaque wrote:
           | The point is that as a scientist your code is a tool to get
           | the job done and not the product. I can't spend 48 hours
           | writing unit tests for my library (even though I want to) if
           | it's not going to give me results. It's literally not my job
           | and is not an efficient use of my time
        
             | TimothyBJacobs wrote:
             | This is the same as any other argument against testing.
             | Unless you are actually selling a library, code is not the
             | product. Customers are buying results, not your code base.
             | Yet, we've discovered the importance of testing to make
             | sure customers get the right results without issues.
             | 
             | If you want your results to be usable by others, the
             | quality of the code matters. If all you care is publishing
             | a paper, then I guess sure it doesn't matter if anyone else
             | can build off your work.
        
               | PeterisP wrote:
               | But the results _are_ usable by others, in most fields of
               | science the code is not part of these results and is not
               | needed to enjoy, use and build upon the research results.
               | 
               | The only case where the code would be used (which is a
               | valid reason why it should be available _somehow_ ) is to
               | assert that your particular results are flawed or
               | fraudulent; otherwise the quality of the code (or its
               | availability, or even existence - perhaps you could have
               | had a bunch of people do all of it on paper without any
               | code) is simply irrelevant if you want your results to be
               | usable by others.
        
               | BeetleB wrote:
               | > The only case where the code would be used (which is a
               | valid reason why it should be available somehow) is to
               | assert that your particular results are flawed or
               | fraudulent;
               | 
               | Not true. Code is often used and reused to churn out a
               | lot more results than the initial paper. A flaw in the
               | code doesn't just show one paper/result as problematic.
               | It can show a large chunk of a researcher's work in his
               | area of expertise to be problematic.
        
             | [deleted]
        
             | RandoHolmes wrote:
             | > I can't spend 48 hours writing unit tests for my library
             | 
             | No one is insisting on top quality code, but there has to
             | be an acceptance that code can be flawed and that needs to
             | be tested for.
        
             | dunefox wrote:
             | If the code you base your work on is horrible it definitely
             | makes me question your results. That's why it's called the
             | _reproducibility_ crisis.
             | 
             | Writing some tests, using a linter, commenting your code,
             | and learning about best programming practices doesn't take
             | long and pays off - even for yourself when writing the code
             | or you need to touch the code again. "48 hours writing unit
             | tests" is a ridiculous comparison.
        
             | BeetleB wrote:
             | > The point is that as a scientist your code is a tool to
             | get the job done and not the product.
             | 
             | Everything you say is as true for experimental equipment
             | and mathematical tools. Physicists are fantastic at
             | mathematics, yet are one of the most anti-math people I
             | know - in the sense of "Mathematics is just a tool to get
             | results that explain nature! Doing mathematics for its own
             | sake is a waste of time!"
             | 
             | The equation is not the product - the explanation of
             | physical phenomena is. If the attitude of "I don't need to
             | show how I got this equation" is unacceptable, the same
             | should go for code.
        
             | Jabbles wrote:
             | How do you know it won't give you results? Maybe it will
             | find a bug that would have resulted in an embarrassing
             | retraction.
             | 
             | Maybe it wouldn't find any bugs, but give confidence to and
             | encourage other users and increasing your citations and
             | "impact".
             | 
             | Maybe it will just save you 48h later on when you need to
             | adapt the code.
             | 
             | Software engineering has generally accepted that unit
             | testing is a good practice and well worth the time taken.
             | Why do you think science is different?
        
               | dunefox wrote:
               | > Why do you think science is different?
               | 
               | It's really not, I guess his focus lies on cranking out
               | irreproducible papers.
        
       | westurner wrote:
       | "Ten Simple Rules for Reproducible Computational Research"
       | http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fj... :
       | 
       | > _Rule 1: For Every Result, Keep Track of How It Was Produced_
       | 
       | > _Rule 2: Avoid Manual Data Manipulation Steps_
       | 
       | > _Rule 3: Archive the Exact Versions of All External Programs
       | Used_
       | 
       | > _Rule 4: Version Control All Custom Scripts_
       | 
       | > _Rule 5: Record All Intermediate Results, When Possible in
       | Standardized Formats_
       | 
       | > _Rule 6: For Analyses That Include Randomness, Note Underlying
       | Random Seeds_
       | 
       | > _Rule 7: Always Store Raw Data behind Plots_
       | 
       | > _Rule 8: Generate Hierarchical Analysis Output, Allowing Layers
       | of Increasing Detail to Be Inspected_
       | 
       | > _Rule 9: Connect Textual Statements to Underlying Results_
       | 
       | > _Rule 10: Provide Public Access to Scripts, Runs, and Results_
       | 
       | ... You can get a free DOI for and archive a tag of a Git repo
       | with FigShare or Zenodo.
       | 
       | ... re: [Conda and] Docker container images
       | https://news.ycombinator.com/item?id=24226604 :
       | 
       | > _- repo2docker (and thus BinderHub) can build an up-to-date
       | container from requirements.txt, environment.yml, install.R,
       | postBuild and any of the other dependency specification formats
       | supported by REES: Reproducible Execution Environment Standard;
       | which may be helpful as Docker Hub images will soon be deleted if
       | they 're not retrieved at least once every 6 months (possibly
       | with a GitHub Actions cron task)_
       | 
       | BinderHub builds a container with the specified versions of
       | software and installs a current version of Jupyter Notebook with
       | repo2docker, and then launches an instance of that container in a
       | cloud.
       | 
       | "Ten Simple Rules for Creating a Good Data Management Plan"
       | http://journals.plos.org/ploscompbiol/article?id=10.1371/jou... :
       | 
       | > _Rule 6: Present a Sound Data Storage and Preservation
       | Strategy_
       | 
       | > _Rule 8: Describe How the Data Will Be Disseminated_
       | 
       | ... DVC: https://github.com/iterative/dvc
       | 
       | > _Data Version Control or DVC is an open-source tool for data
       | science and machine learning projects. Key features:_
       | 
       | > _- Simple command line Git-like experience. Does not require
       | installing and maintaining any databases. Does not depend on any
       | proprietary online services. Management and versioning of
       | datasets and machine learning models. Data is saved in S3, Google
       | cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD
       | RAID._
       | 
       | > _- Makes projects reproducible and shareable; helping to answer
       | questions about how a model was built._
       | 
       | There are a number of great solutions for storing and sharing
       | datasets.
       | 
       | ... "#LinkedReproducibility"
        
         | jnxx wrote:
         | Open textual formats for data and open source application and
         | system software (more precisely, FLOSS), are just as important.
         | 
         | Imagine that x86 - and with it, the PC platform - gets replaced
         | by ARM within a decade. For binary software, this would be a
         | kind of geological extinction event.
        
           | westurner wrote:
           | The likelihood of there being a [security] bug discovered in
           | a given software project over any significant period of time
           | is near 100%.
           | 
           | It's definitely a good idea to archive source and binaries
           | and later confirm that the output hasn't changed with and
           | without upgrading the kernel, build userspace, execution
           | userspace, and PUT/SUT Package/Software Under Test.
           | 
           | - Specify which versions of which constituent software
           | libraries are utilized. (And hope that a package repository
           | continues to serve those versions of those packages
           | indefinitely). Examples: Software dependency specification
           | formats like requirements.txt, environment.yml, install.R
           | 
           | - Mirror and archive _all_ dependencies and sign the
           | collection. Examples: {z3c.pypimirror, eggbasket,
           | bandersnatch, devpi as a transparent proxy cache}, apt-
           | cacher-ng, pulp, squid as a transparent proxy cache
           | 
           | - Produce a signed archive which includes all requisite
           | software. (And host that download on a server such that data
           | integrity can be verified with cryptographic checksums and/or
           | signatures.) Examples: Docker image, statically-linked
           | binaries, GPG-signed tarball of a virtualenv (which can be
           | made into a proper package with e.g. fpm), ZIP + GPG
           | signature of a directory which includes all dependencies
           | 
           | - Archive (1) the data, (2) the source code of all libraries,
           | and (3) the compiled binary packages, and (4) the compiler
           | and build userspace, and (5) the execution userspace, and (6)
           | the kernel. Examples: Docker can solve for 1-5, but not 6. A
           | VM (virtual machine) can solve for 1-5. OVF (Open
           | Virtualization Format) is an open spec for virtual machine
           | images, which can be built with a tool like Vagrant or Packer
           | (optionally in conjunction with a configuration management
           | tool like Puppet, Salt, Ansible).
           | 
           | When the application requires (7) a multi-node distributed
           | system configuration, something like docker-
           | compose/vagrant/terraform and/or a configuration management
           | tool are pretty much necessary to ensure that it will be
           | possible to reproducibly confirm the experiment output at a
           | different point in spacetime.
        
       ___________________________________________________________________
       (page generated 2020-08-24 23:01 UTC)