[HN Gopher] Challenge to scientists: does your ten-year-old code... ___________________________________________________________________ Challenge to scientists: does your ten-year-old code still run? Author : sohkamyung Score : 236 points Date : 2020-08-24 13:19 UTC (9 hours ago) (HTM) web link (www.nature.com) (TXT) w3m dump (www.nature.com) | daly wrote: | Axiom is a computer algebra system written in the 1970s-80s. It | still runs (and is open source). | stillsut wrote: | It's not academia but Kaggle that's really been on the forefront | of building portable and reproducible computational pipelines. | | The real key is incentives and there are two that standout to me: | | - Incentive to get others to "star" and fork your code makes the | coder compete to not only have an accurate result, but also | prioritize producing code/notebooks that are digestible and | instructive. That includes liberal commenting/markup, idiomatic | syntax and patterns, diagnostic figures, and the use of modern | and standard libraries. | | - There is an incentive to move _with_ the community on best | practices for the libraries while still allowing experimental | libraries. Traditionally, there is the incentive of inertia: e.g. | "I always do my modelling in Lisp, and I won't change because | then I'd be less productive". But with kaggle, to learn from the | insights and advances of others, you need to have an ability to | work with the developing common toolset. | | In academia, if these incentives were given weight on par with | publication and citation then we'd see the tools and practices | fall into place. | hprotagonist wrote: | Mine does. | | I swear 40% of the idiocy of science code is because people | fundamentally don't understand how file paths work. Stop | hardcoding paths to data and the world gets better by an order. | m3kw9 wrote: | It always runs if you used the same computer with the same | environment you last time ran it. So yes. | bloak wrote: | I had an 18-year-old Python script. But it didn't work! And I | couldn't make it work! Fortunately I had an even older version of | the code in Perl, which did work after some very minor changes. | | This wasn't scientific code. It was some snarly private code for | generating the index for a book and I didn't look at it between | one edition and the next. I hope I don't have to fix it again in | another 18 years. | | Applying some version of the "doomsday argument", Perl 5 might be | a good choice if you're writing something now that you want to | work (without a great tower of VMs) in 10 or 20 years' time. C | would only be a reasonable choice if you have a way of checking | that your program does not cause any undefined behaviour. A C | program that causes undefined behaviour can quite easily stop | working with a newer version of the compiler. | Fishysoup wrote: | As a scientist I've written massive amounts of shitty code that | turned out to be reproducible by lucky accident. Part of the | problem are the tools: depending on the field, scientists either | use Matlab, C++, Fortran or some other framework that needs to | die. They base their code on other ancient code that runs for | unknown reasons, and use packages written by other scientists | with the same problems. | | As someone who's transitioning into industry, I can tell you that | scientists will never adopt software engineering principles to | any significant extent. It takes too much time to do things like | write tests and thorough documentation, learn Git, etc., and | software engineering just isn't interesting to most of them. | | So the only alternative I see is changing the tools to stuff | that's still easy to hack around with but where it's harder to | mess up (or it's more obvious when you do so). That doesn't leave | a ton of options (that I can see). Some I can think of are: | | - Make your code look more like math and less like | mathlib.linalg.dot(x1, x2).reshape(a, | b).mean().euclidean_distance((x3, x4)) + (other long expression) | or whatever: Use a language like Julia | | - Your language/environment gets angry when you write massive | hairballs, loads of nested for-loops and variables that keep | getting changed: Use a language like Rust, and/or write more | modular code with a functional-leaning language like Rust or | Julia. | | - You're forced to make your code semi-understandable to you and | others more than an hour after writing it: Forcing people to | write documentation isn't gonna work (a lot). Forcing sensible | variable names is slightly more realistic. More likely, you need | some combination of the above two things that just make your code | more legible. | | How do you make that happen? No idea. | biophysboy wrote: | I'm a grad student in biophysics - even if I wrote perfect code, | it would almost certainly go obsolete in 10 years because the | hardware that it interfaces with would go obsolete. | dekhn wrote: | The longest-running code I wrote as a scientist was a sandwich | ordering system. I worked for a computer graphics group at UCSF | and while taking a year off from grad school while my simulations | ran on a supercomputer, and we had a weekly group meeting where | everybody ordered sandwiches from a local deli. | | It was 2000, so I wrote a cgi-bin in Python (2?) with a MySQL | backend. The menu was stored in MySQL, as were the orders. I | occasionally check back to see if it's still running, and it is- | a few code changes to port to Python3, a data update since they | changed vendors, and a mysql update or two as well. | | It's not much but at least it was honest work. | jnxx wrote: | Very related to this, see also Hinsens blog post: | http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stabilit... | | I think that GNU Guix is extremely well-suited to improve this | situation. | | Also, one could think this is an academic problem, in the sense | of am otherwise unimportant niche problem. It really isn't, it is | just like in many other topics that academics get confronted | first with this issue. I am sure that in many medium or large | companies there are some Visual Basic or Excel code bases which | are important but could turn out extremely hard to reproduce. | This issue will only get more burning with today's fast-moving | ecosystems where backward-compatibility is more a moral ideal | than an enforced requirement. | | It is well known that ransomware can wipe-out businesses if | critical business data is lost. But more and more businesses and | organizations also have critical, and non-standard, software. | neuromantik8086 wrote: | Guix is one of several solutions that has been touted as a | solution. Another one that is quite popular in HPC circles is | Spack (https://spack.readthedocs.io/en/latest/). | | At my institute, we actually tried out Spack for a little bit, | but consistently felt like it was implemented more as a | research project rather than something that was production- | level and maintainable. In large part, this was due to the | dependency resolver, which attempts to tackle some very | interesting CS problems I gather (although this is a bit above | me at the moment; these problems are discussed in detail at htt | ps://extremecomputingtraining.anl.gov//files/2018/08/ATPE...), | but which produces radically different dependency graphs when | invoked with the same command across different versions of | Spack. | | I've since come to regard Spack as the kind of package manager | that science deserves, with conda being the more pragmatic / | maintainable package manager that we get instead . | Spack/Guix/nix are the best solution in theory, but they come | with a host of other problems that made them less desirable. | jnxx wrote: | > Spack/Guix/nix are the best solution in theory, but they | come with a host of other problems that made them less | desirable. | | I would be quite interested to learn more what these problems | are, in your experience. I've only tried Guix (on top of | Debian and Arch) and while it is definitively more resource- | hungry (especially in terms of disk space), I don't percive | it as impractical. | yjftsjthsd-h wrote: | As someone coming from the computing side of things, I | found nix to be quite difficult to grok enough to write a | package spec, and guix was pretty close, at least in part | because of the whole "packages are just side-effects of a | functional programming language" idea. At least nix also | suffers from a lot of "magic"; if you're trying to package, | say, an autotools package then the work's done for you - | and that's great, right up until you try to package | something that doesn't fit into the existing patterns and | you're in for a world of hurt. | | Basically, the learning curve is nearly vertical. | rekado wrote: | > guix was pretty close, at least in part because of the | whole "packages are just side-effects of a functional | programming language" idea | | This must be a misunderstanding. One of the big visible | differences of Guix compared to Nix is that packages are | first-class values. | yjftsjthsd-h wrote: | You're right; on further reading I can see guix making | packages the actual output of functions. I do maintain | that the use of a whole functional language to build | packages raises the barrier to entry, but my precise | criticism was incorrect. | akerro wrote: | Code written in Oak still works in Java 14. You can still write | `public abstract interface BlaBla{}` and it still works. If it | doesn't work (due to reflection safety changes in Java9), it sill | surely compile with newer compiler. | | Another thing, are tools used to compile still available? I tried | to compile my BCS Android+native OpenCV project and failed | quickly. Gradle removed some plugin for native code integration, | another plugin was no longer maintained, it had internal check | for gradle version and it said "I'm designed to work with gradle | >= 1.x < 3.x" and just refused to run under 6.x ... I would have | to fork that plugin, make it work with newer Gradle or find | replacement. I was obviously too lazy and stopped working on that | project before I even started. | | I'm sure if (would) put more effort into making the build process | reproducible, it would work effortlessly, but I didn't care at | the point. I wrote it using beta release of OpenCV that's also no | longer maintained, because there are better, faster official | alternatives available. | therealx wrote: | Or use the old version of Gradle? It sounds like creating a | vm/container/whatever with the old versions of everything is | the fastest path, although I understand not wanting to do it | after some point. | mensetmanusman wrote: | Yes, it is all pasted into my thesis, comments and all, like all | code should. | nanddalal wrote: | GitHub offers a free tier for GitHub actions with 2,000 Actions | minutes/month [1]. This could be useful: | | 1. write some unit tests which don't use too much compute | resources (so you can stick to the free tier) | | 2. package your code into a docker where the tests can be run | | 3. wire up the docker with tests to GitHub Actions | | This way now you have continuous testing and can make sure your | codes keep running. | | References: | | [1] https://github.com/pricing | MattGaiser wrote: | Even if it broke, who would go back and fix it? | | I do not see that happening, especially with complex library | bugs. | jnxx wrote: | > package your code into a docker | | docker is not a general solution for this. | | What is needed is a way to re-generate everything from source | and from scratch. | snowwrestler wrote: | The gold standard for a scientific finding is not whether an | particular experiment can be repeated, it is whether a | _different_ experiment can confirm the finding. | | The idea is that you have learned something about how the | universe works. Which means that the details of your experiment | should not change what you find... assuming it's a true finding. | | Concerns about software quality in science are primarily about | avoiding experimental error at the time of publication, not the | durability of the results. If you did the experiment correctly, | it doesn't matter if your code can run 10 years later. Someone | else can run their own experiment, write their own code, and find | the same thing you did. | | And if you did the experiment incorrectly, it also doesn't matter | if you can run your code 10 years later; running wrong code a | decade later does not tell you what the right answer is. Again-- | conducting new research to explore the same phenomenon would be | better. | | When it comes to hardware, we get this. Could you pick up a PCR | machine that's been sitting in a basement for 10 years and get it | running to confirm a finding from a decade ago? The real question | is, why would you bother? There are plenty of new PCR machines | available today, that work even better. | | And it's the same for custom hardware. We use all sorts of | different telescopes to look at Jupiter. Unless the telescope is | broken, it looks the same in all of them. Software is also a tool | for scientific observation and experimentation. Like a telescope, | the thing that really matters is whether it gives a clear view of | nature at the time we look through it. | nextaccountic wrote: | > running wrong code a decade later does not tell you what the | right answer is. | | It can tell, however, exactly where the error lies (if the | error is in software at all). Like a math teacher that can | circle where the student made a mistake in an exam. | ISL wrote: | Reproducibility is about understanding the result. It is the | modern version of "showing your work". | | One of the unsung and wonderful properties of reproducible | workflows is the fact that it can allow science to be salvaged | from an analysis that contains an error. If I had made an error | in my thesis data analysis (and I did, pre-graduation), the | error can be corrected and the analysis re-run. This works even | if the authors are dead (which I am not :) ). | | Reproducibility abstracts the analysis from data in a rigorous | (and hopefully in the future, sustainable) fashion. | suyjuris wrote: | I wrote a tool to visualise algorithms for binary decision | diagrams [1], also in an academic context, where the problem was | basically the same: Does the code still run in ten years? In | particular, the assumption is that I will not be around then, and | no one will have any amount of time to spend on maintenance. | | In the end, I chose to write it in C++ with minimal dependencies | (only X11, OpenGL and stb_truetype.h), with custom GUI, and | packed all resources into a single executable. | | A lot of effort, but if it causes the application to survive 5x | as long then it is probably worth spending twice the time. | | [1] https://github.com/suyjuris/obst | yummypaint wrote: | Not to disagree with any points in the article, but i would point | out that the sciences also have cases of very old code being | maintained and used in production successfully. For example we | still use a kinematics code written in fortran over half a | century ago. In practice parts of it get reimplemented in newer | projects, but the original still sees use. | proverbialbunny wrote: | This seems like a fluff piece because: | | 1) Prototype code scientists write tends to be written at a high | level, so barring imported libraries not up and disappearing, | there is a high chance that code written by scientists will run | 10 years later. There is a higher chance it will run than | production code written at a lower level. | | 2) The article dives into documentation but scientists code in | the Literate Programming Paradigm[0] where the idea is you're | writing a book and the code is used as examples to support what | you're presenting. Of course scientists write documentation. | Presenting findings is a primary goal. | | 3) Comments here have mentioned unit testing. Some of you may | scoff at this but when prototyping, every time you run your code, | the output from it teaches you something, and that turns into an | iterative feedback loop, so every time you learn something you | want to change the code. Unit tests are not super helpful when | you're changing what the code should be doing every time you run | it. Unit tests are better once the model has been solidified and | is being productionized. Having a lack of unit testing does not | make 10 year old prototype code harder to run. | | [0] https://en.wikipedia.org/wiki/Literate_programming | comicjk wrote: | > scientists code in the Literate Programming Paradigm | | I wish. In my career as a computational scientist I have never | seen this in practice, either in academia or industry. | | On unit testing, I half agree. Most unit tests get quickly | thrown out as the code changes, so it's a depressing way to | write research code. But tests absolutely help someone trying | to run old code - they show what parts still work and how to | use them. | noobermin wrote: | Does code from ten years ago run ever? Try running something on | that runs on Python 2 on the current python interpreter today. | jnxx wrote: | Python is an extremely bad example. | | Try twenty years old Common Lisp code. Or Fortran. | noobermin wrote: | What are we doing then? We can choose bad scientific code but | use "good examples" for other types of code? Seems | convenient. | pvaldes wrote: | Yes, it still run, of course. | tenYearOldCode wrote: | Yes it does! | | 10 PRINT "HELP" 20 GOTO 10 | drummer wrote: | This is why backwards compatibility is important. Many people | have a problem when this is raised as a primary concern and goal | of the C and C++ languages, but it is a must have feature. | modeless wrote: | Let's not criticize people who release their code. Let's | criticize the people who _don 't_ release their code instead. We | don't need more barriers to releasing code. | | I'd much rather fix someone's broken build than reimplement a | whole research paper from scratch without the relevant details | that seem to always be accidentally omitted from the paper. | CoffeeDregs wrote: | Would be a super-useful to have a sciencecode.com service which | is a long-term CI system for scientific code and its required | artifacts. Journals could include references to | sciencecode.com/xyz and sciencecode.com/abc could be derived from | sciencecode.com/xyz. Given Github Actions and Forks, the only | thing holding this back is scientists doing it (and, possibly, | the HN community helping). | | And I get that it's not fun to have your code publicly critiqued | but it's also not fun to live lives based on (medical, | epidemiological) unpublished, unaudited, unverified code... | | EDIT: hell, just post a "HELP HN: science code @ | github.com/someone/project" and I'd be surprised if you weren't | overwhelmed with offers of help. | tanilama wrote: | If you wrap you code into Docker, I would say...probably. | vikramkr wrote: | In addition - does your ten year old protocol still work? Do your | 10 year old results replicate? This isn't isolates to just | programming - making robust and reproducible tools, code, | equiptment, protocols, and results is undervalued across all | areas of research, leading to situations where protocols | published weren't robust so a change in reagent supplier leads to | failure, or to protocols so dependent on weird local or | unreported environmental conditions or random extra steps that | attempting to replicate them leaves you nowhere. Robustness needs | to be improved in general. | ecmascript wrote: | I am not a scientist, but actually I think most of the code I | wrote 10 years ago still is in production at different companies. | jnxx wrote: | Companies with code in production have a short-term real | incentive to keep that code running. | | This is different from code from research projects, which is on | many cases just run a few times, and in other times, written by | somebody who has, if he / she wants to make any kind of career | in the field, to change to a new workplace and will not have | any time to maintain that old code. | | There are a few long-running mayor science projects, say, in | particle physics or astronomy, which are forced to work | differently. And in these environments, there are actually | people who have knowledge on both science and software | engineering. | dhosek wrote: | If it's still in production, it's most likely still getting | some level of maintenance attention as well. When I was an | undergrad I did some coding for some of the professors at the | college. A lot of scientific programming is stuff that gets | written and run once and never run again. Try dusting off some | 10-year-old C++ and try compiling it with the current version | of your compiler. | hex1848 wrote: | I just shut down a VB6 app that had been running since 1998 at | the company I work for last month. The leadership team finally | decided they didn't want to sell that particular feature | anymore. We still have a handful of apps from around that time | period that do various small tasks. One day it will be a | priority to get rid of them. | brobdingnagians wrote: | The Fossil documentation has this gem: | | > "The global state of a fossil repository is kept simple so that | it can endure in useful form for decades or centuries. A fossil | repository is intended to be readable, searchable, and extensible | by people not yet born." | | I always liked that they planned for the long-term. Keeping that | in mind helps you build systems that will work in 10 years, or in | 100, if it happens to last that long. When you are building a | foundation, like a language or database, it is nice to plan for | long term support since so much depends on it. C has stayed | mostly recognizable over the years, much more so than C++ or | other high level languages. When your design is simple, you can | have a "feature complete" end. | driverdan wrote: | This is a challenge with many types of code. | | Earlier this year it took me a weekend to get a 7 year old Rails | project running again. It's a simple project but the packages it | used had old system dependencies that were no longer available. | | I ended up having to upgrade a lot of things, including code, | just to get it running again. | therealx wrote: | I ran into this too. Rails has changed a lot in 7 years, even | if you don't see it. My friend wanted to learn and somehow | found the original demo/getting started page and was | frustrated. | shadowgovt wrote: | I'm not sure how interesting the question is, given how few | software engineers outside academic sciences have 10-year-old | code that still runs (unless they've maintained a dedicated | hardware platform for it without regular software updates). | noisy_boy wrote: | Not a scientist but 13+ year old Perl code I wrote is still | running (based on my catchup chats with ex-colleagues) to | generate MIS reports. | goalieca wrote: | > Today, researchers can use Docker containers (see also ref. 7) | and Conda virtual environments (see also ref. 8) to package | computational environments for reuse. | | Docker is also flawed. You can perfectly reproduce it today but | what about in 10 years. I can barely go back to our previous | release for some dockerfiles. | jnxx wrote: | Guix is arguably better. | Sulfolobus wrote: | Similarly, conda envs can break in weeks due to package | changes. | | Even if you remove build versioning and all transitive | dependencies from your env (making it less reproducible...) | they will break pretty damn quick. | _wldu wrote: | IMO, this is why ISO standard programming languages are so | important and will be around forever. One can always compile with | --std=c++11 (or whatever) and be certain it will work. | Kenji wrote: | Hahaha you would be surprised. Compiling complex C++ projects | is incredibly difficult. | ris wrote: | A well-written Nix package should be buildable at any point in | the future, producing near-identical results. This is why I | sometimes publish Nix packages for obscure & hard to build pieces | of software that I'm not likely to maintain - because it's like | rescuing a snapshot of them from oblivion. | bobcostas55 wrote: | How do you do reproducible builds in R? It seems like a huge PITA | to specify versions of R and especially the packages used... | magv wrote: | An interesting concern is that there often is no single piece of | code that has produced the results of a given paper. | | Often it is a mixture of different (and evolving) versions of | different scripts and programs, with manual steps in between. | Often one starts the calculation with one version of the code, | identifies edge cases where it is slow or inaccurate, develops it | further while the calculations are running, does the next step | (or re-does a previous one) with the new version, possibly | modifying intermediate results manually to fit the structure of | the new code, and so on -- the process it interactive, and not | trivially repeatable. | | So the set of code one has at the end is not the code the results | were obtained with: it is just the code with the latest edge case | fixed. Is it able to reproduce the parts of the results that were | obtained before it was written? One hopes so, but given that | advanced research may take months of computer time and machines | with high memory/disk/CPU/GPU/network speed requirements only | available in a given lab -- it is not at all easy to verify. | vharuck wrote: | >the process it interactive, and not trivially repeatable. | | The kind of interaction you're describing should be frowned | upon. It requires the audience to trust the manual data edits | are no different than rerunning the analysis. But the | researcher should just rerun the analysis. | | Also, mixing old and new results is a common problem in | manually updated papers. It can be avoided by using | reproducible research tools like R Markdown. | James_Henry wrote: | If it can't be trivially repeated, then you should publish | what you have with an explanation of how you got it. Saying | that "the researcher should just rerun the analysis" is not | taking into account the fact that this could be very | expensive and that you can learn a lot from observations that | come from messy systems. Science is about more than just | perfect experiments. | i-am-curious wrote: | And any such "research" should go in the bin. Reproducibility | of final results a me d their review is key. | James_Henry wrote: | No, you should publish this research and be clear with how it | all worked out and someone will reproduce it in their own | way. | | Reproducibility isn't usually about having a button to press | that magically gives you the researchers' results. It's also | not always a set of perfect instructions. More often it is a | documentation of what happen and what was observed as the | researcher's believe is important to the understanding of the | research questions. Sometimes we don't know what's important | to document so we try to document as much as possible. This | isn't always practical and sometimes it is obviously | unnecessary. | uberdru wrote: | Sure it does. On a 10-year old machine. | dhosek wrote: | Back in the 80s/90s I was heavily into TeX/LaTeX--I was | responsible for a major FTP archive that predated CTAN, wrote | ports for some of the utilities to VM/CMS and VAX/VMS and taught | classes in LaTeX for the TeX Users Group. I wrote most of a book | on LaTeX based on those classes that a few years back I thought | I'd resurrect. Even something as stable as LaTeX has evolved | enough that just getting the book to recompile with a | contemporary TeX distribution was a challenge. (On the other | hand, I've also found that a lot of what I knew from 20+ years | ago is still valid and I'm able to still be helpful on the TeX | stack exchange site). | JoeAltmaier wrote: | Strangely, they were running (some of ) the code on old hardware. | That's hardly a useful case, and much easier than 'resurrecting' | the code for modern reuse. | therealx wrote: | Something with non-standard asm? | JoeAltmaier wrote: | That sounds like a big issue. And certainly part of getting | 10-year-old code resurrected. | lordnacho wrote: | You often run into code of the "just get it to work" variety, | which has the problem that when it was written, maintainability | was bottom of the list of priorities. Often the author has a goal | that isn't described in terms of software engineering terms: | calculate my option model, work out the hedge amounts, etc. | | And the people who write this kind of code tend not to think | about version control, documentation, dependency management, | deployment, and so forth. The result is you get these fragile | pieces holding up some very complex logic, which takes a lot of | effort to understand. | | IMO there should be a sort of code literacy course that everyone | who writes anything needs to do. In a way it's the modern | equivalent of everyone who writes needing to understand not just | grammar but style and other writing related hygiene. | dhosek wrote: | Even with all the best practices, things outside your control | can cause issues. A lot of the code that software engineers | write is subject to tiny bits of continual maintenance as small | changes in the runtime environment take place. Imagine ten | years of those changes deployed all at once. Even something | employing all the best practices of ten years ago could be a | challenge. You've got a subversion repository somewhere with | the code which was compiled to run on Windows XP with Windows | Visual Studio C++ 2008 Express but you've abandoned Windows for | Linux. If you're lucky the code will compile with the | appropriate flags to support C++98 in gcc, but who knows? And | maybe there's a bunch of graphical stuff that isn't supported | at all anymore or a computational library you used which was | only distributed as a closed-source library for 32-bit Windows. | throwanem wrote: | The fundamental problem here, as you note, is that scientists | are rarely also engineers, and don't really share our | desiderata. The point is to develop and publish a result, and | engineering analysis code for resiliency is of secondary | concern at best when that code isn't likely to need to be used | again once the paper is finished. | | The "Software Carpentry" movement [1] has in the past decade | tried to address this, as I recall. It's very much in the vein | of the "basic literacy" course you suggest. I can't say how far | they've gotten, and I'm no longer adjacent to academia, but | based on what I do still see of academics' code, there's a long | way still to go. | | [1] https://software-carpentry.org/ | detaro wrote: | And that scientists also are rarely supported by programmers, | or if they are it's an unstable and unappreciated position. | throwanem wrote: | Having had that exact experience - yeah, that can be a big | problem too. | | Researchers and engineers _can_ work really well together, | because the strengths of each role complement the | weaknesses of the other, and I think it would be very nice | to see that actually happen some day. | detaro wrote: | It doesn't help with the issue of hard-to-reproduce work, | but apparently working for a company making _products_ | aimed at scientists can be a place to see this happen (if | the company is good about talking to customers). | throwanem wrote: | Interesting, thanks! I'll keep that in mind for when I'm | next looking for a new client. | mnw21cam wrote: | Being in such a position, I can say that I am appreciated, | but not in a manner that results in job stability and | promotion. It's a massive problem in academia, and there's | an attempt to get the position recognised and call it | "Research Software Engineer", with comparable opportunities | for promotion and job stability as a researcher. However, | it's not going massively well. Academic job progression is | still almost completely purely based on the ability to get | first or last author papers in top journals. I have lots of | papers where I am a middle author, because I wrote the | software that did the analysis that was vital for the paper | to even exist, but it largely doesn't count. And I'm lucky | - many software engineers don't even get put in as a middle | author on the paper they contributed to. | non-entity wrote: | Ive seen job listings for "scientific programmers" where | what they're asking for is a scientist who happens to know | a little programming. | detaro wrote: | Yeah - who then likely doesn't have that much software | experience, and worse, if they want to _stay_ a scientist | such a role is often a bad career move, because they help | others get ahead with their research instead of | publishing their own work. Even if they build some really | great domain-specific software tool in that role, it | often doesn 't count as much. | | Or it's an informal thing done by some student as a side- | gig. Which can be cool, but is not a stable long-term | thing. | | I hope there's exceptions. | | EDIT: weirdest example I've seen was a lab looking for | _sysadmins_ with PhD preferred. I wonder if they had some | funding source that only paid for "scientists" or what | was going on there... | mnw21cam wrote: | Simple answer for that. University pay scales tend to be | fairly inflexible in terms of which grades you are | eligible for without a PhD, if you are counted as | academic staff. If you're non-academic staff (like the | cleaner, the receptionist, and the central IT sysadmin) | then you can be paid a fair wage based upon your | experience, but if you are academic staff, then you have | a hard ceiling without a PhD. An individual research | group with a grant may only be able to hire academic | staff, but they want a sysadmin, so in order to be able | to pay them more than a pittance they would have to have | a PhD. | neutronicus wrote: | Nah. | | The _fundamental_ problem is that scientific code is produced | by entry-level developers: | | 1. Paid below-market wages | | 2. With no way to move up in the organization | | 3. With lots of non-software responsibilities | | 4. With an expectation of leaving the organization in six | years | | As long as the grunt work of science is done by overworked | junior scientists whose careers get thrown to the wolves no | matter what they do, you're not going to get maintainable | code out of it. | jnxx wrote: | Even more fundamental is that there is no maintenance | budget for important scientific libraries and tools. | Somebody wrote them as part of their job, and the person | who wrote it, is now working somewhere else. | throwanem wrote: | I mean, senior researchers in stable roles don't really do | any better. Just to pick the first example off the top of | my head - one of the investigators I worked with, during my | year as a staff member of an academic institution most of a | decade ago, is also one of my oldest friends; he's been a | researcher there for what must be well past ten years by | now. Despite one of his undergrad degrees being actually in | CS, I still find ample reason whenever I see it to give him | a hard time about the maintainability of his code. | | Like I said before, it's a field in which people really | just don't give a damn about engineering. Which is fair! | There's little reason why they should, as far as I've ever | been able to see. | justinmeiners wrote: | Unit testing, readability version control, documentation, etc | are all engineering practices for the purpose of making ongoing | development organized (especially for teams). | | Why would a researcher need to do this, when in most cases all | that they use is the output, and in CS/math it's only a minimal | prototype demonstrating operation of their principle? | | All of the other stuff would certainly be nice, but they don't | need to adopt our whole profession to write code | [deleted] | matsemann wrote: | Would an abandoned project I wrote 10 years ago still run? The | code is probably fine, but getting it to actually run by linking | up whatever libraries, sdks and environment correctly could be | troublesome. Even a small pipeline a wrote a few weeks ago I had | trouble re-running, because I forgot there was a manual step I | had to do on the input file. | | Expecting more rigid software practices of scientists than | software engineers would be wrong. I don't think they should have | to tangle with this, tools should aid them somehow. | tyingq wrote: | It's interesting that it's often easier to get something 25+ | years old running because I need fewer things. Not so hard to | find, say "DosBox" and and old version of Turbo Pascal. | lebuffon wrote: | Sounds like simplicity for the win. | | The complex house of cards we currently stand on seems | fragile by comparison. | tyingq wrote: | We also benefit, for that old stuff, from enthusiasts that | build cool stuff. Like DosBox, Floppy Emulators, etc. | | I doubt there are going to be folks nostalgic for the | complex mess we have now. | lebuffon wrote: | Indeed. I participate in Atariage.com and the level of | dedication is amazing. | | Are there groups for Win 3.1, Win95? | jnxx wrote: | This. In the last years, conventional software engineering | has in many cases experienced an explosion in complexity | which will make very very difficult to maintain stuff in the | long run. This only works because over 90% of startups go | bust anyways, within a few years. | dhosek wrote: | When I was in my 20s I managed to get a contract updating | some control software for a contact lens company on the basis | of my happening to own an old copy of Borland C++ 1.0. | eythian wrote: | Had a similar experience getting a contract updating a mass | spectrometer control system because I had extensive high | school experience in Turbo Pascal. | zimbatm wrote: | If the same project had been packaged with Nix, it would | probably still compile. People regularly checkout older | versions of nixpkgs to get access to older package releases. | | One of the key property is that the build system enforces all | the build inputs to be declared. And the other one is to keep a | cache of all the build inputs like sources because upstream | repositories tend to disappear over time. | cube00 wrote: | The day when code used to produce a paper must also be published | can not come soon enough. | dandelion_lover wrote: | It won't happen until researchers are forced to do it. Please | sign petition at https://publiccode.eu and have a look at my | other comment here. | goalieca wrote: | Arguably, data is just as important. Academics hoard their data | and try to milk out every paper they can from it. The reward | system is based on publishing as many papers as possible rather | than just making a meaningful contribution. | belval wrote: | Data is much trickier because your data source for medical, | education or even just regular businesses don't want the | added legal weight of making data freely available. | | This is obviously a shame, I was working on segmentation of | open wounds and most papers include a "we are currently in | talks with the hospital to make the data available". If you | contact the authors directly they will tell you that their | committee blocked it because the information is too | sensitive. | abathur wrote: | It seems like there can be a balance between "the results | are unverifiable because no one else can touch the data" | and "effectively open-source the dataset"? | | Something like: "To make it easier to verify the code | behind this paper, we've used <accepted standard | project/practice> to generate a synthetic dataset with the | same fields as the original and included it with the source | code. The <data-owning institution> isn't comfortable with | publishing the full dataset, but they did agree to provide | the same data to groups working on verification studies as | long as they're willing to sign a data privacy agreement. | Send a query to <blahblahblah> ..." | belval wrote: | > but they did agree to provide the same data to groups | working on verification studies as long as they're | willing to sign a data privacy agreement. Send a query to | <blahblahblah> ..." | | This would be administrative overhead, it will be shut | down 9 times out of 10. I understand why this might seem | easy but it really is not, you can have multiple hospital | that each have their committee that agreed to give the | researcher their data. They don't have a central | authority that you can appeal to, much less someone that | can green light your specific access. | | As for the synthetic datasets that's basically just | having tests and was advocated for elsewhere in this | thread. | jgeada wrote: | The reward system also prevents dead ends from being | identified, publication of approaches that did not lead to | the expected results or got nul results, publishing | confirmations of prior papers, etc. | | Basically, the reward system is designed to be easy to | measure and administer, but is not actually useful in any way | to the advancement of science. | WanderPanda wrote: | Making this mandatory might have bad downstream effects like | prohibiting publication of some research at all (GPT-X I am | looking at you) | qppo wrote: | Closed source research isn't publication, it's advertisement. | WanderPanda wrote: | So R&D is not a thing, but A&D is? That would be new to me | jhrmnn wrote: | In all my papers the results were produced on multiple days | (spanning months), with multiple versions of the code, and they | are computationally too expensive to reproduce with the final | version of the code. I'm trying to keep track of all the used | versions, but given that there is no automated framework for | this (is there?) and research involves lots of experiments, | it's never perfect. Given this context, any ideas how to do it | better? | chriswarbo wrote: | I tend to do the following (some or all, depending on the | situation): | | - Use known, plaintext formats like LaTeX, Markdown, CSV, | JSON, etc. rather than undocumented binary formats like those | of Word, Excel, etc. | | - Keep sources in git (just a master branch will do) | | - Write all of the rendering steps into a shell script of | Makefile, so it's just one command with no options | | - I go even further and use Nix, with all dependencies pinned | (this is like an extreme form of Make) | | - Code for generating diagrams, graphs, tables, etc. is kept | in git alongside the LaTex/whatever | | - Generated diagrams/graphs/tables are _not_ included in git; | they 're generated during rendering, as part of the shell- | script/Makefile/Nix-file; the latter only re-generate things | if their dependencies have changed | | - All code is liberally sprinkled with assertions, causing a | hard crash if anything looks wrong | | - If journals/collaborators/etc. want things a certain way | (e.g. a zip file containing plain LaTeX, with all diagrams as | separate PNGs, or whatever) then the "rendering" should take | care of generating that (and make assertions about the | result, e.g. that it renders to PDF without error, contains | the number of pages we're expecting, that the images have the | expected dimensions, etc.) | | - I push changes from my working copies into a 'repos' | directory, which in turn pushes to my Web server and to | github (for backups and redundancy) | | - Pushing changes also triggers a build on the continuous | integration server (Laminar) running on my laptop. This makes | a fresh copy of the repo and tries to render the document | (this prevents depending on uncommitted files, the absolute | directory path, etc.) | | Referencing a particular git commit should be enough to | recreate the document (this can also be embedded in the | resulting document somewhere, for easy reference). Some care | needs to be taken to avoid implicit dependencies, etc. but | Nix makes this _much_ easier. Results should also be | deterministic; if we need pseudorandom numbers then a fixed | seed can be used, or (to prove there 's nothing up our | sleeves) we can use SHA256 on something that changes on each | commit (e.g. the LaTeX source). | | For computationally-expensive operations (with relatively | small outputs) I'll split this across a few git repos: | | 1) The code for setting up and performing the | experiments/generating the data goes in one repo. This is | just like any other software project. | | 2) The results of each experiment/run are kept in a separate | git repo. This may be a bad idea for large, binary files; but | I've found it works fine for compressed JSON weighing many | MBs. Results are always _appended_ to this repo as new files; | existing files are never altered, so we don 't need to worry | about binary diffs. There should be metadata alongside/inside | each file which gives the git commit of the experiment repo | (i.e. step 1) that was used, alongside other relevant | information like machine specs (if it depends on | performance), etc. This could be as simple as a file naming | scheme. The exact details for this should be written down in | this repo, e.g. in a README and/or a simple script to grab | the relevant experiment repo, run it, and store the | results+metadata in the relevant place. Results should be as | "raw" as possible, so that they don't depend on e.g. post- | processing details, or choice of analysis, etc. | | 3) I tend to put the writeup in a separate git repo from the | results, so that those results can be referenced by commit + | filename, without a load of unrelated churn from the writeup. | This repo will follow the same advice as above, e.g. code for | turning the "raw" results into graphs, tables, etc. will be | kept here and run as part of the rendering process. Fetching | the particular commit from the results repo should also be | one of the rendering steps (Nix makes this easy, or you could | use a git submodule, etc.) | | I don't know what the best advice is w.r.t. large datasets | (GBs or TBs), but I've found the above to be robust for about | 5 years so far. | qppo wrote: | That's no different than normal software engineering. We use | version control software (VCS, like git) to deal with it. You | can include your results in the tracked source. | | For what it's worth, using results from outdated source code | is extremely suspicious. This is a frequent problem in | software development where we have tests or benchmarks based | on stale code, and it's almost always incorrect. I would not | trust your results if they are not created with the most up | to date version of your software at all. | xen0 wrote: | My first thought: Demand the journals provide hosting for a | code repo that is part of your paper. For every numerical | result, specify the version (e.g. a git tag) used to generate | your result. | | And if that means scientists need to learn about version | control, well... they should if they're writing code. | mnw21cam wrote: | For a paper I recently submitted, the journal demanded a | github release of the software. | chriswarbo wrote: | I agree, except that AFAIK "tags" in git are not fixed, | they can be deleted and re-created to point at a different | commit. Hence I prefer to use (short) commit IDs, since | changing them is infeasible. | xen0 wrote: | I'm assuming the repo, once hosted and the paper is | published, is "fixed" and cannot be changed by the | authors. | | But commit ids work just as well. | jpeloquin wrote: | The point of being able to run ten-year-old code is the ability | to replay an analysis (exact replication). This allows an | analysis to be verified after the fact, which increases trust and | helps figure out what happened when contradictions appear between | experiments. However, if the original work involved physical | experimentation or any non-automated steps (as is the case for | most science) the ability to run the original code provides only | partial replication. Overall the ability to re-run old code is a | fairly low priority. | | From the perspective of someone who primarily uses computers as a | tool to facilitate research, the priority list is closer to: | | 1. Retain documentation of what was _meant_ to happen. | Objectives, experimental design, experimental & analysis | protocols, relevant background, etc. | | 2. Retain documentation of what actually happened, usually in | terms of noting deviations from the protocol. This is the purpose | of a lab notebook. Pen & paper excels here. | | 3. Retain raw data files. | | 4. Retain files produced in the course of analysis. | | 5. Retain custom source code. | | 6. Version control all the above. | | 7. Make everything run in the correct order with one command | (i.e, full automation). | | Only once all the above is achieved would it be worth ensuring | that the software used in the analysis can be re-run in 10 years. | Solving the "packaging problem" in a typical scientific context | (multiple languages, multiple OSes, commercial software, mostly | short scripts) is complex. When the outcome of an analysis is | suspect, the easiest and most robust approach is to check the | analysis by redoing it from scratch. This takes less time than | trying to ensure every analysis will run on demand even as the | computing ecosystem changes out from under it. | | Most of the time spent writing analysis code is deciding _what_ | the code should do, not actually writing the code. There is | generally very little code because few people were involved, and | they probably weren 't programmers. So redoing the work from | scratch is generally pretty easy, especially for anyone with the | skill to routinely produce fully reproducible computational | environments. | Ericson2314 wrote: | Glad to see many mentions of Nix in this thread! | | I wonder if Nix and Guix should standardize the derivation format | both share to kick that off as the agreed-upon "thin waste" other | projects and the the academy can standardize around. | rekado wrote: | The derivation format is little more than a compilation | artifact (a low-level representation of a build), and I think | standardizing on it would not be as useful as it may seem. | scipute68 wrote: | As the systems architect and infra programmer for a scientific | startup I'll simply chime in on the production != scientific | conversation. When you don't hold your modeling code to the | minimal production standard where it counts (documentation, | comments, debug) it _will_ cause your evolving team hardship. | When that same code goes into production for a startup (as it | could/should) you will be causing everyone long nights and 80 | hour weeks. | emerged wrote: | Any scientist with good foresight would've implemented their code | in 6502 for the NES. The emulators are nearly flawless and will | probably be around until the end of time. | yjftsjthsd-h wrote: | I once had a thought, that if I wanted to write something that | would last forever and run anywhere, I should write it to | target DOS, and make sure to test it on FreeDOS in a VM and on | DOSBox. That way it would run on a stable ABI with loads of | emulators, and via DOSBox it will happily run on all modern | desktop OSs (and some non-desktops; IIRC there's at least an | Android port). | dekhn wrote: | I wrote a C++ implementation of the AMBER force field in 2003. | Still have the source code with its original modification times. | Let's see: /usr/bin/g++ | -I/home/dek/sw/rh9/gsl-1.3/include -c -o NBEnergy.o | NBEnergy.cpp NBEnergy.cpp: In member function 'virtual | double NBEnergy::Calculate(Coordinates&, std::vector<Force*>)': | NBEnergy.cpp:20:68: error: no matching function for call to | 'find(std::vector<atom*>::const_iterator, | std::vector<atom*>::const_iterator, const atom*&)' 20 | | if (std::find(at1->Excluded.begin(), at1->Excluded.end(), at2) != | at1->Excluded.end()) { | | ^ In file included from | /usr/include/c++/9/bits/locale_facets.h:48, | from /usr/include/c++/9/bits/basic_ios.h:37, | from /usr/include/c++/9/ios:44, from | /usr/include/c++/9/ostream:38, from | GeneralParameters.h:6, from NBEnergy.h:6, | from NBEnergy.cpp:1: | /usr/include/c++/9/bits/streambuf_iterator.h:373:5: note: | candidate: 'template<class _CharT2> typename | __gnu_cxx::__enable_if<std::__is_char<_CharT2>::__value, | std::istreambuf_iterator<_CharT> >::__type | std::find(std::istreambuf_iterator<_CharT>, | std::istreambuf_iterator<_CharT>, const _CharT2&)' 373 | | find(istreambuf_iterator<_CharT> __first, | ^~~~ | /usr/include/c++/9/bits/streambuf_iterator.h:373:5: note: | template argument deduction/substitution failed: | NBEnergy.cpp:20:68: note: '__gnu_cxx::__normal_iterator<atom* | const*, std::vector<atom*> >' is not derived from | 'std::istreambuf_iterator<_CharT>' 20 | if | (std::find(at1->Excluded.begin(), at1->Excluded.end(), at2) != | at1->Excluded.end()) { | | ^ make: *** [<builtin>: NBEnergy.o] Error 1 | | I still have a hardcoded reference to RedHat 9 apparently. But | the only error has to do with an iterator, so clearly, something | in C++ changed. Looks like a 1-2 line change. | josefx wrote: | You probably didn't include the algorithm header that defines | find directly and it stopped compiling once the standard | library maintainers cleaned up their own includes. The | iostreams headers you include define their own stream iterator | specific overload of find and that doesn't match. | dekhn wrote: | Yup, that was it. | | After that, I had to install libpython27-dev, and add -fPIC. | Then my 17 year old Python module that has linked-in C++ code | runs just fine. I'm not surprised- I've been writing cross- | platform code that runs for 10+ years for 20+ years. | pruthvishetty wrote: | I read it as does your ten-year-old still run code, and was | thinking if this was a challenge for scientists to have their | kids do better things than coding. | dcolkitt wrote: | I mean, Python 2->3 alone is gonna kill this challenge for most | people. | djsumdog wrote: | You can always run old Python2 stuff in a Docker container, so | long as the dependencies haven't disappeared. | jnxx wrote: | As long as it does not use some CUDA hardware which is using | tensorflow and Numba which is using a version of llvmlite | which does not support Python2 any more..... | | This isn't a theoretical example. | therealx wrote: | Then you make a vm or whatnot and install all the old | versions of everything. I haven't seen an open source | project in a while that doesn't have old versions for | download easily. Still, annoying if you can't stand that | kind of stuff. | | (people seem to be in two camps: either they hate it or | have almost no problem with it) | PeterisP wrote: | To clarify, the issue is that the old version of software | won't work with the new libraries, and the old libraries | won't work with the current GPU models, so you can't run | the old code without modification unless you have old | hardware as well, and you can't virtualize the GPUs. | jnxx wrote: | Well, where do you download the hardware ? ;-) | closeparen wrote: | Most of the "requirements.txt" I come across in the real | world do not actually lock down all deps to Python 2.7 | compatible versions. I've been able to get most of them | running again, but it's a long porcess looking through | changelogs to find the last 2.7-compatible version of each | dependency. | hobofan wrote: | Yes, because the "requirements.txt" is a dependency | requirements file and not a lockfile. It took the Node.js | ecosystem an embarrassingly long time to arrive at that | insight, and I feel like the Python ecosystem/community | still isn't there yet (though finally it's easily usable | with Poetry). | roberto wrote: | For my first scientific article, in 2007, I created a Subversion | repo with a Makefile. Running `make` would recreate the whole | paper: downloading data, running analyses, creating pictures | (color or BW, depending on an environment flag) and generating | the PDF. | | I'm going to try to find the repo and see if it still works. | O_H_E wrote: | Wow, nice. I will be waiting :D | awkward wrote: | Scientific programming is a perfect storm of extremely smart | people, with strong abilities to do it themselves, distain for | the subject matter, and no direct experience with the price of | failing to write portable code. In some circumstances, even | parameterizing scripts so that they aren't re-edited with new | values for every experiment is an uphill fight, never mind having | promotion through environments. | bluetwo wrote: | 18-year old code still runs. And generates revenue. | jonathanstrange wrote: | I think it's unfair to expect from anyone to maintain code | forever when the code rot is completely beyond your control, let | alone to expect this from scientists who have better things to | do. Anything with a GUI is bound to self-destruct, for example, | and it's not the programmer's fault. Blame the OS makers and | framework/3rd party library suppliers. | | The damage can be limited by choosing a programming language that | provides good long compatibility. Languages like ANSI C, Ada, | CommonLisp, and Fortran fit the bill. There are many more. Heck, | you could use Chipmunk Basic. Anything fancy and trendy will stop | working soon, though, sometimes even within a year. | jnxx wrote: | > CommonLisp | | Common Lisp has fantastic long-term stability. I think that | deserves more recognition, as Common Lisp is often almost as | fast as C, but is (by default) not riddled with undefined | behavior. | | It would be superb if Rust could take C's space in | computational science and libraries. | kazinator wrote: | Regarding that last comment, that's probably where Rust | brings least to the table. C hasn't even taken away the | entire space from Fortran. | | Lisp is somewhat "riddled" with undefined behavior, but not | to the same extent as C, but, more importantly, not with the | same nuance. | | The ISO C standard makes very little mention of optimization. | It does refer to abstract semantics as a point of departure | for optimizing, but there is no concept of safety level | whereby code that is diagnosed at high safety becomes | undefined behavior at low safety. Whether optimized or not, C | is always unsafe. No undefined behavior turns into something | that must be diagnosed when there is no optimization. | | For instance, in theory Common Lisp doesn't define the | behavior of an access beyond the bounds of an array any more | than C. In practice, all implementations reliably diagnose it | at the default high safety level, which is trivially achieved | since all the manipulation of arrays goes through library | functions. Only if you compile with low safety may it turn | into undiagnosed behavior that is unreliable, whereby the | compiler emits code that directly accesses the object without | checks. | | Common Lisp has separate control mechanisms for speed and | safety, and these apply to individual expressions in the | program, not at the file level, like C compiler options. They | are also defined by the standard, unlike C compiler options. | rougier wrote: | For those interested, the results of the challenge are published | here: https://rescience.github.io/read/ (volume 6, issue 1). | rudolph9 wrote: | This was in the Guix-science mail list today | | > Hello! | | In an article entitled "Challenge to scientists: does your ten- | year-old code still run?", Nature reports on the Ten Years | Reproducibility Challenge organized by ReScience C, led by | Nicolas P. Rougier and Konrad Hinsen: | https://www.nature.com/articles/d41586-020-02462-7 | | It briefly mentions Guix as well as the many obstacles that | people encountered and solutions they found, including using | Software Heritage and floppy disks. :-) | | You can read the papers (and reviews!) at: | https://rescience.github.io/read/#issue-1-ten-years- | reproducibility-challenge | | Ludo'. | cabaalis wrote: | I visited my first employer recently (a local government) and | found that the first MySQL/PHP database I created, an internal | app, had been in continuous use for nearly 18 years. | djsumdog wrote: | This article brings up scientific code from 10 years ago, but how | about code from .. right now? Scientists really need to publish | their code artifacts, and we can no longer just say "Well they're | scientists or mathematicians" and allow that as an excuse for | terrible code with no testing specs. Take this for example: | | https://github.com/mrc-ide/covid-sim/blob/e8f7864ad150f40022... | | This was used by the Imperial College for COVID-19 predictions. | It has race conditions, seeds the model multiple times, and | therefore has totally non-deterministic results[0]. Also, this is | the cleaned up repo. The original is not available[1]. | | A lot of my homework from over 10 years ago still runs (Some | require the right Docker container: | https://github.com/sumdog/assignments/). If journals really care | about the reproducibility crisis, artifact reviews need to be | part of the editorial process. Scientific code needs to have | tests, a minimal amount of test coverage, and code/data used | really need to be published and run by volunteers/editors in the | same way papers are reviewed, even for non-computer science | journals. | | [0] https://lockdownsceptics.org/code-review-of-fergusons-model/ | | [1] https://github.com/mrc-ide/covid-sim/issues/179 | arcanus wrote: | > Scientists really need to publish their code artifacts, and | we can no longer just say "Well they're scientists or | mathematicians" and allow that as an excuse for terrible code | with no testing specs. | | You are blaming scientists but speaking from my personal | experience as a computational scientist, this exists because | there are few structures in place that incentivize strong | programming practices. | | * Funding agencies do not provide support for verification and | validation of scientific software (typically) | | * Few journals require assess code reproducibility and few | require public code (few require even public data) | | * There are few funded studies to reproduce major existing | studies | | Until these structural challenges are addressed, scientists | will not have sufficient incentive to change their behavior. | | > Scientific code needs to have tests, a minimal amount of test | coverage, and code/data used really need to be published and | run by volunteers/editors in the same way papers are reviewed, | even for non-computer science journals. | | I completely agree. | geoalchimista wrote: | Second this. Research code is already hard, and with | misaligned incentives from the funding agencies and grad | school pipelines, it's an uphill battle. Not to mention that | professors with an outdated mindset might discourage graduate | students from committing too much time to work on scientific | code. "We are scientists, not programmers. Coding doesn't | advance your career" is often an excuse for that. | | In my opinion, enforcing standards without addressing this | root cause is not gonna fix the problem. Worse, students and | early career researchers will bear the brunt of increased | workload and code compliance requirements from journals. Big, | well-funded labs that can afford a research engineer position | is gonna have an edge over small labs that cannot do so. | j45 wrote: | One of the things I come across is scientists who believe | they're capable of learning code quickly because they're | capable in another field. | | After they embark on solving problems, it does become an | eyeopening experience, and one that becomes now about keeping | things running. | | For those who have a STEM discipline in addition to a software | development background >5Y, would you agree with seeing the | above? | | I would have thought the scientists among us would approach | someone with familiarity with software development expertise. | (something abstract and requiring a different set of muscles) | | One positive emerging is the variety of low/no-code tooling | that can replace a lot of this hornets nest coding. | PeterisP wrote: | It's generally not plausible to "approach someone with | familiarity with software development expertise" for | organizational and budget reasons. Employing dedicated | software developers is simply not a thing that happens; | research labs overwhelmingly have the coding done by | researchers and involved students without having _any_ | dedicated positions for software development. | | In any case you'd need to teach them the problem domain, and | it's considered cheaper (and simpler from organizational | perspective) to get some phd students or postdocs from your | domain to spend half a year getting up to speed on coding | (and they likely had a few courses in programming and | statistics anyway) than to hire an experienced software | developer and have them learn the basics of your domain | (which may well take a third or half of the appropriate | undergraduate bachelor's program). | analog31 wrote: | As a grad student in physics, I not only wrote code, but | also designed my own (computer controlled) electronics, | mechanics, optics, vacuum systems, etc. I was my own | machinist and millwright. Today I work in a small R&D team | within a larger business, and still do a lot of those | things myself when needed. | | There are many problems with using a dedicated programmer, | or any other technical specialist in a small R&D team. The | first is keeping them occupied. There was programming to be | done, but not full time. And it had to be done in an | extremely agile fashion, with requirements changing | constantly, often at the location where the problem is | occurring, not where their workstation happens to be set | up. _Many developers hate this kind of work._ | | Second is just managing software development. Entire books | have been written about the topic, and it's not a solved | problem how to keep software development from eating you | alive and taking ownership of your organization. Nobody | knows how to estimate the time and effort. You never know | if you're going to be able to recover your source code and | make sense of it, if your programmer up and quits. | | With apologies to Clemenceau, programming is too important | to be left to the programmers. ;-) | marmaduke wrote: | > Employing dedicated software developers is simply not a | thing that happens | | This is a really key point that is lost on devs outside of | science looking in. In our case, good devs are out of | budget by a factor of 2x at least (at an EU public | university in a lab doing lots of computational work). | | The best we get are engineers which are expected to keep | the cluster running, order computers, organize seminars.. | and eventually resolve any software or dev problems. This | doesn't leave much time for caring about reproducibility | outside the very core algorithms. The overall workflow can | fade away since the next post doc is going to redo it | anyway. | j45 wrote: | Are the hiring scientists also paid well-below market | wages by that degree? | jpeloquin wrote: | > I would have thought the scientists among us would approach | someone with familiarity with software development expertise. | | Is there a pool of skilled software architects willing to | provide consultations at well-below market wages? Or a Q&A | forum full of people interested in giving this kind of | advice? (StackOverflow isn't useful for this; the allowed | question scope is too narrow.) I guess one incentive to | publish one's code is to get it criticized on places like | Hacker News. The best way to get the right answer on the | internet is to post the wrong answer, after all. | UweSchmidt wrote: | I'll state the obvious and answer with No. There are not | enough skilled software architects to go around and many | who consider themselves skilled are not actually producing | good code themselves, probably including many confident | posters here in this forum. | | The idiosyncrasies and tastes of many 'senior' software | engineers would likely make the code unreadable and | unmaintainable for the average scientist and possibly | discourage them from programming altogether. | | Software architecture is an unsolved problem as evident in | the frequent fundamental discussions about even trivial | things, highlighted by a cambrian explosion of frameworks | who try to help herding cats, and made obvious in senior | programmers struggling to get a handle on moderately | complex code. | | I propose scientists keep their code base as simple as | possible, review the code along with the ideas with their | peers, maybe use Jupyter notebooks to show the iterations | and keep intermediate steps, and, as others state, show the | code as appropriate and try to keep it running. There is no | silver bullet and very few programmers could walk into your | lab or office and really clean things up the way you'd | hope. | j45 wrote: | Are the hiring scientists also paid well-below market | wages? | jonnycomputer wrote: | A seasoned software developer encountering scientific code can | be a jarring experience. So many code smells. Yet, most of | those code smells are really only code smells in application | development. Most scientific programming code only ever runs | once, so most of the axioms of software engineering are | inapplicable or a distraction from the business at hand. | | Scientists, not programmers, should be the ones spear-heading | the development of standards and rules of thumb. | | Still, there are real problematic practices that an emphasis on | sharing scientific code would discourage. One classic one is | the use of a single script that you edit each time you want to | re-parameterize a model. Unless you copy the script into the | output, you lose the informational channel between your code | and its output. This can have real consequences. Several years | ago I started up a project with a collaborator to follow up on | their unpublished results from a year prior. Our first task was | to take that data and reproduce the results they obtained | before, because the person no longer had access to the exact | copy of the script that they ran. We eventually determined that | the original result was due to a software error (which we | eventually identified). My colleague took it well, but the | motivation to continue the project was much diminished. | Fiahil wrote: | My work position was created because scientists are not | engineers. I had to explain -to my disappointment- why non- | deterministic algorithms are bad, how to write tests, and how | to write SQL queries, more than once. | | However, when working as equals scientists and engineers can | create truly transformative projects. Algorithms accounts for | 10% of the solution. The code, infrastructure and system design | accounts for 20% of the final result. The remaining 70% of the | value, is directly coming from its impact. A projects that | nobody uses is a failure. Something that perfectly solves a | problem that nobody cares about is useless. | dandelion_lover wrote: | As a theoretical physicist doing computer simulations, I am | trying to publish all my code whenever possible. However all my | coauthors are against that. They say things like "Someone will | take this code and use it without citing us", "Someone will | break the code, obtain wrong results and blame us", "Someone | will demand support and we do not have time for that", "No one | is giving away their tools which make their competitive | advantage". This is of course all nonsense, but my arguments | are ignored. | | If you want to help me (and others who agree with me), please | sign this petition: https://publiccode.eu. It demands that all | publicly funded code must be public. | | P.S. Yes, my 10-year-old code is working. | onhn wrote: | As a theoretical physicist your results should be | reproducible based on the content of your papers, where you | should detail/state the methods you use. I would make the | argument that releasing code in your position has the | potential to be scientifically damaging; if another | researcher interested in reproducing your results reads your | code, then it is possible their reproduction will not be | independent. However they will likely still publish it as | such. | Vinnl wrote: | Interestingly each of those arguments also applies to | publishing an article describing your work. | pthread_t wrote: | > "No one is giving away their tools which make their | competitive advantage" | | This hits close to home. Back in college, I developed | software, for a lab, for a project-based class. I put the | code up on GitHub under the GPL license (some code I used was | licensed under GPL as well), and when the people from the lab | found out, they lost their minds. A while later, they | submitted a paper and the journal ended up demanding the code | they used for analysis. Their solution? They copied and | pasted pieces of my project they used for that paper and | submitted it as their own work. Of course, they also | completely ignored the license. | bumby wrote: | I'm curious, are dedicated software assurance teams a thing | in your research area? Or is quality left up to the primary | researchers? | dandelion_lover wrote: | Most of the codes I am developing alone. No one else looks | at them ever. My supervisor also develops the code alone | and never shows it to anyone (not even members of the | group). | | In other cases, a couple of other researchers may have a | look at my code or continue its development. I worked with | 4+ research teams and only saw one professional programmer | in one of them helping the development. Never heard about a | "dedicated software assurance team". | SiempreViernes wrote: | To clarify, nobody sees the code because they aren't | allowed, or nobody ever ask to see it? | dandelion_lover wrote: | The second case. However I am hesitating to ask to look | at the code of my supervisor. How would I explain why I | need it (if it's not needed for my research)? It's also | unlikely user-friendly, so it would take a lot of time to | understand anything. | bumby wrote: | I think you touched on something important. Researchers | are most concerned with "getting things working". | | One of my favorite points from the book _Clean Code_ was | that professional developers aren't satisfied with | "working code", they aim to make it maintainable. Which | may mean writing it in a way that is more clear and | concise than we are used to | BeetleB wrote: | > Or is quality left up to the primary researchers? | | Individual researchers, and in many disciplines (like | physics), there is almost no emphasis on quality. | | I left academia a decade ago, but at the time all except | one of my colleagues protested when version control was | suggested to them. Some of these have code in the 30-40K | lines. | jack_h wrote: | I think this is a much wider problem than just in | academia/research. Really any area where software isn't | the primary product tends to have fairly lax software | standards. I work in the embedded firmware field and best | practices are often looked at with skepticism and even | derision by the electrical engineers who are often the | ones doing the programming^[1]. | | I think software development as a field is incredibly | vast and diverse. Programming is an amazing tool, but | it's a tool that requires a lot of knowledge in a lot of | different areas. | | ^[1] This isn't universally true of course, I'm not | trying to be insulting here. | core-questions wrote: | > protested when version control was suggested | | Academics are strange like this. The root reason is fear: | fear that you're complicating their process, that you're | going to interrupt their productivity or flow state, that | you're introducing complication that has no benefit. They | then build up a massive case in their minds for why they | shouldn't do this; good luck fighting it. | | Doubly so if you're IT staff and don't have a PhD. | There's a fundamental lack of respect on behalf of (a | vocal minority) of academics about bit plumbers, until of | course when they need us to do something laughably basic. | It's the seeds of elitism; in reality we should be able | to work together, each of us understanding our particular | domain and working to help the other. | gowld wrote: | I think this is why industry does better science than | academia, at least in any area where there are | applications. Generally, they get paid for being right, | not just for being published, so they put respect and | money into people that help get correct results. | BeetleB wrote: | > The root reason is fear: fear that you're complicating | their process, that you're going to interrupt their | productivity or flow state, that you're introducing | complication that has no benefit. | | Yes, but how does it compare to all the complicated | processes that exist in academic institutions currently? | Almost _all_ of which originated from academics | themselves, mind you. | core-questions wrote: | It's not that complicated. No one individual process is | that bad. The problem is that there's so many that you | need to steep in it for ages to pick everything up. | | This means it makes most sense to pick up processes that | are portable and have longevity. Learning Git is a pretty | solid example. | bumby wrote: | I formerly worked in research, left and am now back in a | quasi-research organization. | | It's bit disconcerting seeing how much quality is brushed | aside particularly in software. Researchers seem to | intuitively grasp how they need quality hardware to do | their job, yet software rarely gets the same | consideration. I've never been able to get many to come | around to the idea that software should be treated the | same as any other engineered product that enables their | research | gowld wrote: | "quality" is a subjectit word. Let's be clear what this | means: | | Individual researchers, and in many disciplines (like | physics), there is almost no emphasis on _correct | results_ , merely on believable results. | bumby wrote: | There are a few standardized definitions. The most | succinct bring "quality is the adherence to | requirements". | | As an example, if your science has the requirement of | being replicable (as it should) there are a host of best | practices that should flow down to the software | development requirements. Not implementing those best | practices would be indicative of lower quality | throwaway287391 wrote: | > I'm curious, are dedicated software assurance teams a | thing in your research area? | | Are these a thing in _any_ research area? I 've heard of | exactly one case of an academic lab (one that was easily | 99th+ percentile in terms of funding) hiring _one software | engineer_ not directly involved in leading a research | effort, and when I tell other academics about this they 're | somewhat incredulous. (I admittedly have a bit of trouble | believing it myself -- I can't imagine the incentive to | work for low academic pay in an environment where you're | inevitably going to feel a sense of inferiority to first | year PhD students who think they're hot shit because | they're doing "research".) | bumby wrote: | > _Are these a thing in any research area_ | | I can say there are some that have the explicit intent | but it can often fall to the wayside due to cost | pressure. For example, government funded research from | large organizations (think DoD or NASA) have these | quality requirements but they can often be hand-waved | away or just plain ignored due to cost concerns | SilasX wrote: | >"Someone will demand support and we do not have time for | that", | | Well ... that part isn't nonsense, though I agree it | shouldn't be a dealbreaker. And it means we should work | towards making such support demands minimal or non-existent | via easy containerization. | | I note with frustration that even the Docker people, _whose | entire job is containerization_ , can get this part wrong. I | remember when we containerized our startup's app c. 2015, to | the point that you should be able to run it locally just by | installing docker and running `docker-compose up`, and it | _still_ stopped working within a few weeks (which we found | when onboarding new employees), which required a | knowledgeable person to debug and re-write. | | (They changed the spec for docker-compose so that the new | version you'd get when downloading Docker would interpret the | yaml to mean something else.) | paperwork wrote: | Can you describe a bit more about what is going on in the | project? The file you linked is over 2.5k lines of c++ code, | and that is just the "setup" file. As you say, this is supposed | to be a statistical model, I expected this to be R, Python or | one of the standard statistical packages. | | Why is there so much c++ code? | disgruntledphd2 wrote: | Because much of this code was written in the 80's, I suspect. | In general, there's a bunch of really old scientific | codebases in particular disciplines because people have been | working on these problems for a looooonnngg time. | recursivecaveat wrote: | It is essentially a detailed simulation of viral spread, not | just a programmed distribution or anything. It's all in C++ | because it's pretty performance-critical. | fsh wrote: | It's a Monte-Carlo simulation, not a statistical model. These | are usually written in C++ for performance reasons. | dandelion_lover wrote: | Or Fortran. | Zenst wrote: | Oh gosh yes, the amount of `just works` Fortran in | science is one of those things akin to COBOL in business. | I just know some people are thinking 10 years - ha, be | some instances of 40 and possible 50 years for some. | Heck, the sad part is many will have computer systems | older than 10 years just as it links to this bit of kit | and the RS232 just works with the DOS software fine as | and the updated version had issues when they last tried. | That's a common theme with specialist kit attached to a | computer for control - medical as well has that. | klyrs wrote: | I know two fresh PhDs from two different schools whose | favorite language is fortran. I think it's rather | different from cobol in that way -- yes, the old stuff | still works, but newer code cuts down on the boilerplate | and is much more readable. And yeah, the ability to link | to 50 year-old battle-tested code is quite a feature. | Mvandenbergh wrote: | Large chunks of this particular code was in fact | originally written in Fortran and then machine translated | into C++. | roel_v wrote: | Who says anything about statistical models? | djaque wrote: | I am all for open science, but you understand that the links in | your post are the exact worry people have when it comes to | releasing code: people claiming that their non-software | engineering grade code invalidates the results of their study. | | I'm an accelerator physicist and I wouldn't want my code to end | up on acceleratorskeptics.com with people that don't understand | the material making low effort critiques of minor technical | points. I'm here to turn out science, not production ready | code. | | As an example, you seem to be complaining that their Monte | Carlo code has non-deterministic output when that is the entire | point of Monte Carlo methods and doesn't change their result. | | By the way, yes I tested my ten year old code and it does still | work. What I'm saying is that scientific code doesn't need to | handle every special case or be easily usable by non-experts. | In fact the time spent making it that way is time that a | scientist spends doing software engineering instead of science, | which isn't very efficient. | beefee wrote: | I want science to be held to a very high standard. Maybe even | higher than "software engineering grade". Especially if it's | being used as a justification for public policy. | MaxBarraclough wrote: | Perhaps just a nitpick: software engineering runs the gamut | from throwing together a GUI in a few hours, all the way up | to avionics software where a bug could kill hundreds. | There's no such thing as 'software engineering grade'. | chrchang523 wrote: | Nit: implementations of Monte Carlo methods are _not_ | necessarily nondeterministic. Whenever I implement one, I | always aim for a deterministic function of (input data, RNG | seed, parallelism, workspace size). | petschge wrote: | It really helps with debugging if your MC code is | deterministic for a given input seed. And then you just run | for a sufficient number of different seeds to sample the | probability space. | vngzs wrote: | Alternatively: seed the program randomly by default, but | allow the user to specify a seed as a CLI argument or | function argument (for tests). | | In the common case, the software behaves as expected | (random output), but it is reproducible for tests. You | can then publish your RNG seed with the commit hash when | you release your code/paper, and others may see your | results and investigate that particular code execution. | petschge wrote: | Sure that works too. But word of advice from real life: | Print the random seed at the beginning of the run so you | can find out which seed caused it to crash or do stupid | things. | jnxx wrote: | And it seems that the people from Imperial College have | done that with their epidemiological simulation. What | critics claim is that their code produces non-deterministic | results when given deterministic input and random seeds, | i.e. that their code is seriously broken. Which would be a | serious issue if true. | pbalau wrote: | > people claiming that their non-software engineering grade | code invalidates the results of their study. | | But that's exactly the problem. | | Are you familiar with that bug in early Civ games where an | overflow was making Ghandi nuke the crap out of everyone? | What if your code has a similar issue? | | What if you have a random value right smack in the middle of | your calculations and you just happened to be lucky when you | run your code? | | I'm not that familiar with Monte Carlo, my understanding is | that this is just a way to sample the data. And I won't be | testing your data sampling, but I will expect that given the | same data to your calculations part (eg, after the sampling | happens), I get exactly the same results every time I run the | code and on any computer. And if there are differences I | expect you to be able to explain why they don't matter, which | will show you were aware of the differences in the first | place and you were not just lucky. | | And then there is the matter of magic values that plaster | research code. | | Researchers should understand that the rules for "software | engineering grade code" are not there just because we want to | complicate things, but because we want to make sure the code | is correct and does what we expect it to do. | | /edit: The real problem is not getting good results with | faulty code, is ignoring good solutions because faulty code. | ivanbakel wrote: | Doesn't it concern you that it would be possible for critics | to look at your scientific software and find mistakes (some | of which the OP mentioned are not "minor") so easily? | | Given that such software forms the very foundation of the | results of such papers, why shouldn't it fall under scrutiny, | even for "minor" points? If you are unable to produce good | technical content, why are you qualified to declare what is | or isn't minor? Isn't the whole point that scrutiny is best | left to technical experts (and not subject experts)? | James_Henry wrote: | When you say OP, do you mean djsumdog? If so, what mistakes | does he mention that aren't minor? | gowld wrote: | How is it possible to know the difference between minor | and major, if the mistakes are kept secret? | | If we're supposed to accept scientific results on faith, | why bother with science at all? | smnrchrds wrote: | > _Doesn 't it concern you that it would be possible for | critics to look at your scientific software and find | mistakes (some of which the OP mentioned are not "minor") | so easily?_ | | A non-native English speaker may make grammatical mistakes | when communicating their research in English--it does not | in any way invalidate their results or hint that there is | anything amiss. It is simply what happens when you are a | non-native speaker. | | Some (many?) code critiques by people unfamiliar with the | field of study the research will be about superficial | mistakes that do not invalidate the results. They are the | code equivalents of grammatical mistakes. That's what the | OP is talking about. | stult wrote: | Journals employ copy editors to address just those sorts | of mistakes, why should we not hold software to the same | standard as academic language? But more importantly, | these software best practices aren't mere "grammatical | mistakes," they exist because well-organized, well-tested | code has fewer bugs and is easier for third parties to | verify. Third-parties validating that the code underlying | an academic paper executes as expected is no different | than third-parties replicating the results of a physical | experiment. You can be damn sure that an experimental | methodology error invalidates a paper, and you can be | damn sure that bad documentation of the methodology | dramatically reduces the value/reliability of the paper. | Code is no different. It's just been the wild west | because it is a relatively new and immature field, so | most academics have never been taught coding as a | discipline nor held to rigorous standards in their own | work. Is it annoying that they now have to learn how to | use these tools properly? I'm sure it is. That doesn't | mean it isn't a standard we should aim for, nor that we | shouldn't teach the relevant skills to current students | in sciences so that they are better prepared when they | become researchers themselves. | labcomputer wrote: | > Third-parties validating that the code underlying an | academic paper executes as expected is no different than | third-parties replicating the results of a physical | experiment. | | First, it's not no different--it's completely different. | Third parties have always constructed their own apparatus | to reproduce an experiment. They don't go to the original | author's lab to perform the experiment! | | Second, a lot of scientific code won't run _at all_ | outside the environment it was developed in. | | If it's HPC code, it's very likely that the code makes | assumptions about the HPC cluster that will cause it to | break on a different cluster. If it's experiment control | / data-acquisition code, you'll almost certainly need the | exact same peripherals for the program to do anything at | all sensible. | | I see a lot of people here on HN vastly over-estimating | the value of bit-for-bit reproducibility of one | implementation, and vastly underestimating the value of | having a diversity of implementations to test an idea. | garden_hermit wrote: | I agree with your overall point, but I just want to point | out that many (most?) journals don't employ copy-editors, | or if they do, then they overlook many errors, especially | in the methods section of papers. | Bukhmanizer wrote: | I'm glad someone else feels this way. It's an expectation | that scientists can share their with other scientists | using language. Scientists aren't always the best | writers, but there are standards there. Writing good code | is a form of communication. It baffles me that there are | absolutely no standards there. | ryandrake wrote: | On the contrary: If I'm (in industry) doing a code review | and see simple, obvious mistakes like infinite loops, | obvious null pointer exceptions, ignored compiler | warnings, etc., in my mind it casts a good deal of doubt | over the entire code. If the author is so careless with | these obvious errors, what else is he/she being careless | about? | | Same with grammatical or spelling errors. I don't review | research but I do review resumes, and I've seen atrocious | spelling on resumes. Here's the candidate's first chance | to make an impression. They have all the time in the | world to proofread, hone, and have other eyes edit it. | Yet, they still miss obvious mistakes. If hired, will | their work product also be sloppy? | [deleted] | SiempreViernes wrote: | This sort of scrutiny only matters once someone else has a | totally different code that gives incompatible results, | before that point there's no sense in looking for bugs | because all you're proving is that there are no obvious | mistakes: you don't say anything about the interesting | questions since you only bother with codes for things with | non-obvious answers. | jnxx wrote: | _edit:_ please read the grandchild comment before going off | on the idea that some random programmer on the Internet dares | to criticize scientific code he does not understand. What is | crucial in the argument here is indeed the distinction | between methods employing pseudo-randomness, like Monte Carlo | simulation, and non-determinism caused by undefined behavior. | | > I'm an accelerator physicist and I wouldn't want my code to | end up on acceleratorskeptics.com with people that don't | understand the material making low effort critiques of minor | technical points. | | The person which wrote the linked blog post writes that it | was a software engineer at google. Unfortunately, that claim | is not falsifiable as the person decided to remain anonymous. | | > As an example, you seem to be complaining that their Monte | Carlo code has non-deterministic output when that is the | entire point of Monte Carlo methods and doesn't change their | result. | | The claim is that even with the same random seed for the | random generator, the program produces different results, and | this is explained by the allegation that it runs non- | deterministic (in the sense of undefined behavior) in | multiple threads. It claims also that it produces | significantly different results depending on which output | file format is chosen. | | If this is true, the code would have race conditions, and as | being impacted by race conditions is a form of undefined | behavior, this would make any result of the program | questionable, as the program would not be well-defined. | | Personally, I am very doubtful whether this is true, this | would be incredibly sloppy by the imperial college | scientists. Some more careful analysis by a recognized | programmer might be warranted. | | However it underlines well the importance of the main topic | that scientific code should be open to analysis. | | > What I'm saying is that scientific code doesn't need to | handle every special case or be easily usable by non-experts. | | Fully agree with this. But it should try to document its | limitations. | aspaceman wrote: | > If this is true, the code would have race conditions, and | as being impacted by race conditions is a form of undefined | behavior, this would make any result of the program | questionable, as the program would not be well-defined. | | That's not at all what that means. What are you talking | about? As long as a Monte Carlo process works towards the | same result it's equivalent. | | You're speaking genuine nonsense as far as I'm concerned. | Randomness doesn't imply non deterministic. Non- | determinitism in no way implies race conditions or | undefined behavior. We care that the random process reaches | the same result, not that the exact sequence of steps is | the same. | | This is what scientists are talking about. A bunch of | (pretty stupid) nonexperts want to criticize your code, so | they feel smart on the internet. | jnxx wrote: | I am referring to this blog post: | | https://lockdownsceptics.org/code-review-of-fergusons- | model/ | | It says, word-by-word: | | _> Clearly, the documentation wants us to think that, | given a starting seed, the model will always produce the | same results. | | > | | >Investigation reveals the truth: the code produces | critically different results, even for identical starting | seeds and parameters. | | > I'll illustrate with a few bugs. In issue 116 a UK "red | team" at Edinburgh University reports that they tried to | use a mode that stores data tables in a more efficient | format for faster loading, and discovered - to their | surprise - that the resulting predictions varied by | around 80,000 deaths after 80 days: ..._ | | The bugs which the blog post implies here are such ones | as described by Jens Regehr: | https://blog.regehr.org/archives/213 | | Not that I do not endorse these statements in the blog - | I am rather skeptical whether they are true at all. | | What the authors of the blob post mean is clearly | "undefined behaviour" in the sense of non-deterministic | program execution of a program that is not well-formed. | It is clear that many non-experts could confuse that with | the pseudo-randomness implicit in Monte-Carlo | simulations, but this is a _very_ different thing. The | first is basically a broken, invalid, and untrustworthy | program. The second is the established method to produce | a computational result by introducing stochastic | behavior, which is for example how modern weather models | work. | | These are wildly different things. I do not understand | why your comment just adds to the confusion between these | two things?? | | > A bunch of (pretty stupid) nonexperts want to criticize | your code, so they feel smart on the internet. | | As said, I don't endorse the critique in the blog. | However, critique in a software implementation, as well | as in scientific matters, should never carry a call on | authority - it should logically explain what is the | problem, with concrete points. Unfortunately, the cited | blog post remains very vague about this, while claiming: | | _> My background. I have been writing software for 30 | years. I worked at Google between 2006 and 2014, where I | was a senior software engineer working on Maps, Gmail and | account security. I spent the last five years at a US /UK | firm where I designed the company's database product, | amongst other jobs and projects. I was also an | independent consultant for a couple of years._ | | It would be much better if, instead claiming that there | could be race conditions, it could point to lines in the | code with actual race conditions, and show how the | results of the simulation are different when the race | conditions are fixed. Otherwise, it just looks like he | claims that the program is buggy, because he is in no | position to question the science, and does not like the | result. | jnxx wrote: | There is something I need to add, it is a subtle but | important point: | | Non-determinism can be caused by | | a) random seeds derived from hardware, such as seek times | in a HDD controller, which is fed into pseudo random | number (PRNG) generation. This is not a problem. For | debugging, or comparison, it can make sense to switch it | off, though. | | b) data race conditions, which is a form of undefined | behavior. This not only can dramatically change results | of a program run, but also invalidates the program logic, | in languages such as C and C++. This is what he blog post | in "lockdownskeptics.org" suggests. For the application | area and its consequences, this would be a major | nightmare. | | c) What I had forgotten is that parallel execution (for | example in LAM/MPI, map/reduce or similar frameworks) is | inherently non-deterministic and, in combination with | properties of floating-point computation, can yield | different but valid results. | | Here an example: | | A computation is carried out on five nodes and they | return the values 1e10, 1e10, 1e-20, -1e10, -1e10, in | random order. The final result is computed by summing | these up. | | Now, the order of computation could be: | | ((((1e10 + 1e10) + 1e-20) + -1e10) + -1e10) | | or it could be: | | (((1e10 + -1e10) + 1e-20) + (+1e10 + -1e10)) | | In the first case, the result would be zero, in the | second case, 1e-20, because of the finite length of | floating point representation. | | _However_... if the numerical model or simulation or | whatever is stable, this should not lead to a dramatic | qualitative difference in the result (otherwise, we have | a stability problem with the model). | | Finally, I want to cite one last paragraph from the post | on lockdownskeptics.org: | | _> Conclusions. All papers based on this code should be | retracted immediately. Imperial's modelling efforts | should be reset with a new team that isn't under | Professor Ferguson, and which has a commitment to | replicable results with published code from day one. | | > On a personal level, I'd go further and suggest that | all academic epidemiology be defunded. This sort of work | is best done by the insurance sector. Insurers employ | modellers and data scientists, but also employ managers | whose job is to decide whether a model is accurate enough | for real world usage and professional software engineers | to ensure model software is properly tested, | understandable and so on. Academic efforts don't have | these people, and the results speak for themselves._ | UncleMeat wrote: | Race conditions aren't undefined behavior in C/C++. Data | races are. Lots and lots of real systems contain race | conditions without catastrophe. | jnxx wrote: | > Race conditions aren't undefined behavior in C/C++. | Data races are. | | You are right with the distinction, I had data race | conditions in mind. | | Race conditions can well happen in a correct C/C++ multi- | threaded program in the sense that the order of specific | computation steps is sometimes random. And for operations | such as floating-point addition, where order of | operations does matter, the exact result can be random as | a consequence. But the end result should not depend | dramatically on it (which is what the poster at | lockdownskeptics.org claims). | jnxx wrote: | > I'm an accelerator physicist and I wouldn't want my code to | end up on acceleratorskeptics.com with people that don't | understand the material making low effort critiques of minor | technical points. I'm here to turn out science, not | production ready code. | | Specifically, to that point, I want to cite the saying: | | "The dogs bark, but the caravan passes." | | (There is a more colorful German variant which is, | translated: "What does it bother the mighty old oak tree if a | dog takes a piss..."). | | Of course, if you publish your code, you expose it to | critics. Some of this will be unqualified. And as we have | seen in the case e.g. of climate scientists, some might be | even nasty. But who cares? What matters is open discussion | which is a core value of science. | RandoHolmes wrote: | > people claiming that their non-software engineering grade | code invalidates the results of their study. | | How exactly is this a bad thing? | | > I'm an accelerator physicist and I wouldn't want my code to | end up on acceleratorskeptics.com with people that don't | understand the material making low effort critiques of minor | technical points. I'm here to turn out science, not | production ready code. | | But it should be noted that what you didn't say is that | you're here to turn out __accurate __science. | | This is the software version of statistics. Imagine if | someone took a random sampling of people at a Trump rally and | then claimed that "98% of Americans are voting for Trump". | And now imagine someone else points out that the sample is | biased and therefore the conclusion is flawed, and the | response was "Hey, I'm just here to do statistics". | | --- | | Do you see the problem now? The poster above you pointed out | that the conclusions of the software can't be trusted, not | that the coding style was ugly. Most developers would be more | than willing to say "the code is ugly, but it's accurate". | What we don't want is to hear "the conclusions can't be | trusted and 100 people have spent 10+ years working from | those unreliable conclusions". | auntienomen wrote: | Oh, he didn't say 'accurate science', nice gotcha! | | This is exactly the sort of pedantic cluelessness that | scientists are seeking to avoid by not publishing their | code. | RandoHolmes wrote: | I don't consider accuracy in science to be pedantic, and | I suspect most others don't either. | | To paraphrase what the other developer said: "I don't | want my work to be checked, I'm not here for accuracy, | just the act of doing science". | | When I was young, the ability to invalidate was the core | aspect of science, but apparently that's changed over the | years. | booleandilemma wrote: | _What I 'm saying is that scientific code doesn't need to | handle every special case or be easily usable by non- | experts._ | | Sounds like I should just become a scientist then. | | Do you guys write unit tests or is that beneath you too? | sitkack wrote: | > exact worry people have when it comes to releasing code: | people claiming that their non-software engineering grade | code invalidates the results of their study. | | If code is what is substantiating a scientific claim, then | code needs to stand up to scientific scrutiny. This is how | science is done. | | I came from physics, but systems and computer engineering was | always an interest of mine, even before physics, I thought it | was kooky-dooks that CS people can release papers w/o code, | fine if the paper contains all the proofs but otherwise it | shouldn't even be looked at. PoS (proof-of-science) or GTFO. | | We are the point in human and scientific civilization that | knowledge needs to prove itself correct. Papers should be | self contained execution environments that generate PDFs and | resulting datasets. The code doesn't need to be pretty, or | robust, but it needs to be sealed inside of a container so | that it can be re-run, re-validated and someone else can | confirm the result X years from now. And it isn't about | trusting or not trusting the researcher, we need to | fundamentally trust the results. | matthewdgreen wrote: | All of my 2010 scientific code runs on the then-current | edition of Docker. /s | sitkack wrote: | I made no mention of Docker, VMs or any virtualization | system. Those would be an implementation detail and would | obviously change over time. | | A container can be a .tar.gz, a zip or a disk image of | artifacts, code, data and downstream deps. The generic | word has been co-opted to mean a specific thing which is | very unfortunate. | matthewdgreen wrote: | My point, which I guess I did not make clearly enough, is | that container systems don't necessarily exist or remain | supported over the ten-year period being discussed. The | idea of ironing over long-term compatibility issues using | a container environment seems like a great one! (For the | record, .tgz -- the "standard" format for scientific code | releases in 2010, does not solve these problems _at | all_.) | | But the "implementation detail" of which container format | you use, and whether it will still be supported in 10 | years, is not an implementation detail at all -- since | this will determine whether containerization actually | solves the problem of helping your code run a decade | later. This gets worse as the number, complexity and of | container formats expands. | | Of course if what you mean is that researchers should | provide perpetual maintenance for their older code | packages, moving them from one obsolete platform to a | more recent one, then you're making a totally different | and very expensive suggestion. | snowwrestler wrote: | The history of physics is full of complex, one-off custom | hardware. Reviewers have not been expected to take the full | technical specs and actually build and run the exact same | hardware, just to verify correctness for publication. | | I doubt any physicist believes we need to get the Tevatron | running again just to check decade-old measurements of the | top quark. I don't understand why decade-old scientific | software code must meet that bar. | [deleted] | woah wrote: | I'm very puzzled by this attitude. As an accelerator | physicist, would you want you accelerator to be held together | by duct tape, and producing inconsistent results? Would you | complain that you're not a professional machinist when | somebody pointed it out? Why is software any different than | hardware in this respect? | kordlessagain wrote: | > people that don't understand the material making low effort | critiques of minor technical points | | GPT-3 FTW! | solatic wrote: | Let's be clear - scientific-grade code is a substandard of | production-grade code. _But it is still a real standard_. | | Does scientific-grade code need to handle a large number of | users running it at the same time? Probably not a genuine | concern, since those users will run their own copies of the | code on their own hardware, and it's not necessary or | relevant for users to see the same networked results from the | same instance of the program running on a central machine. | | Does scientific-grade code need to publish telemetry? Eh, | usually no. Set up alerting so that on-call engineers can be | paged when (not if) it falls over? Nope. | | Does scientific-grade code need to handle the authorization | and authentication of users? Nope. | | Does scientific-grade code need to be reproducible? _Yes_. | Fundamentally yes. The reproducibility of results is core to | the scientific method. Yes, that includes Monte Carlo code, | when there is no such thing as truly random number generation | on contemporary computers, only pseudorandom number | generation, and what matters for cryptographic purposes is | that the seed numbers for the pseudorandom generation are | sufficiently hidden / unknown. For scientific purposes, the | seed numbers should be published _on purpose_ , so that a) | the exact results you found, sufficiently random as they are | for the purpose of your experiment, can still be | independently verified by a peer reviewer, b) a peer reviewer | can intentionally decide to pick a different seed value, | which will lead to different results but should _still lead | to the same conclusion_ if your decision to reject / refuse | to reject the null hypothesis was correct. | dekhn wrote: | As an ex-scientist who used to run lots of simulations, I | really fail to see a truly compelling reason why most | numerical results (for publication purposes) truly need to | publish (and support) deterministic seeding. | | We've certainly done a lot, scientifically speaking (in | terms of post-validated studies), without that level of | reproducibility. | jnxx wrote: | If nothing else, it helps debugging code which tries to | reproduce your findings. | dekhn wrote: | The code I work with is not debuggable in that way under | most circumstances. It's a complex distributed system. | You don't attempt to debug it by being deterministic- you | debug it by sampling its properties. | throwaway287391 wrote: | Controlling randomness can be extremely difficult to get | right, especially when there's anything asynchronous about | the code (e.g. multiple worker threads populating a queue | to load data). In machine learning, some of the most | popular frameworks (e.g. TensorFlow [0]) don't offer this | as a feature, and in other frameworks that do (PyTorch [1]) | it will cripple the speed you get as a result as GPU | accelerators rely on non-deterministic accumulation for | reasonable speed. | | Scientific reproducibility does not mean, and has never | meant, you rerun the code and the output perfectly matches | bit-for-bit every time. If you can achieve that, great -- | it's certainly a useful property to have for debugging. But | a much stronger and more relevant form of reproducibility | for actually advancing science is running the same study | e.g. on different groups of participants (or in computer | science / applied math/stats / etc., with different | codebases, with different model variants/hyperparameters, | on different datasets) and the overall conclusions hold. | | To paraphrase a comment I saw from another thread on HN: | "Plenty of good science got done before modern devops came | to be." | | [0] https://github.com/tensorflow/tensorflow/issues/12871 | https://github.com/tensorflow/tensorflow/issues/18096 | | [1] https://pytorch.org/docs/stable/notes/randomness.html | | ========== | | EDIT to reply to solatic's replies below (I'm being rate- | limited): | | The social science arguments are probably fair (or at least | I'll leave it to someone more knowledgeable to defend them | if they want) -- perhaps I shouldn't have led with the | example of "different groups of participants". | | > If you can achieve that, for the area of study in which | you conduct your experiment, it should be required. | Deciding to forego formal reproducibility should be | justified with a clear explanation as to why | reproducibility is infeasible for your experiment, and | peer-review should reject studies that could have be | reproducible but weren't in practice. | | This _might_ be a reasonable thing to enforce if everyone | in the field were using the same computing platform. Given | that they 're not (and that telling everyone that all | published results have to be done using AWS with this | particular machine configuration is not a tenable solution) | I don't see how this could ever be a realistic requirement. | Or if you don't want to enforce that the results remain | identical across different platforms, what's the point of | the requirement in the first place? How would it be | enforced if nobody else has the exact combination of | hardware/software to do so? And then even if someone does, | almost inevitably there'll be some detail of the setup that | the researcher didn't think to report and results will | differ slightly anyway. | | Besides, if you're allowing for exemptions, just about | every paper in machine learning studying datasets larger | than MNIST (where asynchronous prefetching of data is | pretty much required to achieve decent speeds) would have a | good reason to be exempt. It's possible that there are | other fields where this sort of requirement would be both | useful and feasible for a large amount of the research in | that field, but I don't know what they are. | | > Also, reading through the issues you linked points to: | https://github.com/NVIDIA/framework-determinism which is a | relatively recent attempt by nVidia to support | deterministic computation for TensorFlow. Not perfect yet, | but the effort is going there. | | (From your other comment.) Yes, there exists a $300B | company with an ongoing-but-incomplete funded effort of so | far >6 months' work (and that's just the part they've done | in public) to make one of its own APIs optionally | deterministic when it's being used through a single | downstream client framework. If this isn't a perfect | illustration that it's not realistic to expect exact | determinism from software written by individual grad | students studying chemistry, I'm not sure what to say. | solatic wrote: | Also, reading through the issues you linked points to: | https://github.com/NVIDIA/framework-determinism which is | a relatively recent attempt by nVidia to support | deterministic computation for TensorFlow. Not perfect | yet, but the effort is going there. | throwawaygh wrote: | _> or in computer science / applied math/stats / etc., | with different codebases, with different model variants, | on different datasets) and the overall conclusions hold_ | | A lot of open sourced CS research is not reproducible. | | "the code still runs and gives the same output" is _not_ | the same as reproducibility. | throwaway287391 wrote: | > A lot of open sourced CS research is not reproducible. | | I'm not sure if this was meant to be a counter-argument | to me, but I completely agree! | | > "the code still runs and gives the same output" is not | the same as reproducibility. | | Yes, bit-for-bit identical results are neither necessary | nor sufficient for reproducibility in the usual | scientific sense. | throwawaygh wrote: | _> I 'm not sure if this was meant to be a counter- | argument to me_ | | It wasn't :) | dnautics wrote: | the correct way to control randomness in scientific code | is to have the RNG be seeded with a flag and have the | result check out with a snapshot value. Almost no one | does this, but that doesn't mean it shouldn't be done. | throwaway287391 wrote: | Did you read my post? I know what a seed is. Setting one | is typically not enough to ensure bit-for-bit identical | results in high-performance code. I gave two examples of | this: CUDA GPUs (which do non-deterministic accumulation) | and asynchronous threads (which won't always run | operations in the same order). | dnautics wrote: | Most scientific runs are scaled where you run multiple | replicates. And not all scientific runs are high- | performance in the HPC sense. Even if your code is HPC in | the HPC sense, and requires CUDA, and 40,000 cores, you | should consider creating a release flag where an end user | can do at least single "slow" run on a CPU on a reduced | dataset, in single threaded mode, to sanity check the | results and at least verify that the computational and | algorithmic pipeline is sound at the most basic level. | | I used to be a scientist. I get it, getting scientists to | do this is like pulling teeth, but it's the least you | could do to give other people confidence in your results. | throwaway287391 wrote: | > consider creating a release flag where an end user can | do at least single "slow" run on a CPU on a reduced | dataset, in single threaded mode, to sanity check the | results and at least verify that the computational and | algorithmic pipeline is sound at the most basic level. | | Ok, that's a reasonable ask :) But yeah as you implied, | good luck getting the average scientist, who in the best | case begrudgingly uses version control, to care enough to | do this. | ska wrote: | This is not correct on several levels. Reproducibility is | not achievable in many real world scenarios, but worse | it's not even very informative. | | Contra your assertion, many people do some sort of | regression testing like this but it's isn't terribly | useful for verification _or_ validation - but it is good | at catching bad patches. | SilasX wrote: | You're right about bit-for-bit reproducibility possibly | being overkill, but I don't think that invalidates the | parent's point that Monte Carlo randomization doesn't | obviate reproducibility concerns. It just means that e.g. | your results shouldn't be hypersensitive to the details | of the randomization. That is, reviewers should be able | to take your code, feed it different random data from a | similar distribution to what you claimed to use (perhaps | by choosing a different seed), and get substantively | similar results. | jbay808 wrote: | It does seem like a valid response to OP's objection to | the imperial college's COVID model, though. Doesn't it? | SilasX wrote: | Reviewing the original comment, I think so (that the | original comment is overcritical). For purpose of | reproducibility, it's enough that you can validate that | you can run the model with different random data and see | that their results aren't due to pathological choices of | initial conditions. If the race conditions and non- | determinism just transform the random data into another | set of valid random data, that doesn't compromise | reproducibility. | throwaway287391 wrote: | That brings up a separate issue that I didn't comment on | above: the expectation that the code runs in a completely | different development/execution environment (e.g. the one | the reviewer is using vs. the one that the researcher | used). That means making it run regardless of the OS | (Windows/OSX/Linux/...) and hardware (CPU/GPU/TPU, and | even within those, which one) the reviewer is using. This | would be an extremely difficult if not impossible thing | for even a professional software engineer to achieve. It | could easily be a full time job. There are daily issues | on even the most well-funded projects in machine learning | by huge companies (ex: TF, PyTorch) that the latest | update doesn't work on GPU X or CUDA version Y or OS Z. | It's not a realistic expectation for a researcher even in | computer science, let alone researchers in other fields, | most of whom are already at the top of the game | programming-wise if they would even think to reach for a | "script" to automate repetitive data entry tasks etc. | | ========== | | EDIT to reply to BadInformatics' reply below (I'm being | rate-limited): I fully agree that a lot of ML code | releases could be better about this, and it's even | reasonable to expect them to do some of these more basic | things like you mention. I don't agree that bit-for-bit | reproducibility is a realistic standard that will get us | there. | BadInformatics wrote: | I don't think that removes the need to provide enough | detail to replicate the original environment though. We | write one-off scripts with no expectation that they will | see outside usage, whereas research publications are | meant for just that! The bar isn't terribly high either: | for ML, a requirements.txt + OS version + CUDA version | would go a long way, no need to learn docker just for | this. | solatic wrote: | > But a much stronger and more relevant form of | reproducibility for actually advancing science is running | the same study e.g. on different groups of participants | (or in computer science / applied math/stats / etc., with | different codebases, with different model | variants/hyperparameters, on different datasets) and the | overall conclusions hold | | > Plenty of good science got done before modern devops | came to be | | This isn't as strong of an argument as you think. This is | more-or-less the underlying foundation behind the social | sciences, which argues that no social sampling can ever | be entirely reproduced since no two people are alike, and | even the same person cannot be reliably sampled twice as | people change with time. | | Has there been "good science" done in the social | sciences? Sure. I don't think that you're going to find | anybody arguing that the state of the social sciences | today is about the same as it was in the Dark Ages. | | With that said, one of the reasons why so many laypeople | look at the social sciences as a kind of joke is because | so many contradictory studies come out of these peer- | reviewed journals that their trustworthiness is quite | low. One of the reasons why there's so much confusion | surrounding what constitutes a healthy diet and how | people should best attempt to lose weight is precisely | because diet-and-exercise studies are more-or-less | impossible to reproduce. | | > If you can achieve that, great -- it's certainly a | useful property to have for debugging | | If you can achieve that, for the area of study in which | you conduct your experiment, it should be _required_. | Deciding to forego formal reproducibility should be | justified with a clear explanation as to why | reproducibility is infeasible for your experiment, and | peer-review should reject studies that could have be | reproducible but weren 't in practice. | jbay808 wrote: | Plenty of good physics got done before modern devops came | to be, too! Maybe the pace of advancement was slower when | the best practice was to publish a cryptographic hash of | your discoveries in the form of a poetic latin anagram | rather than just straight-up saying it, but it's not like | Hooke's law is considered unreproducible today because | you can't deterministically re-instantiate his | experimental setup with a centuries-old piece of brass | and get the same result to n significant figures. | mnl wrote: | And physicists have been writing code for a while simply | because the number of software engineers with a working | knowledge of physics (as in ready for research), have | been trained in numerical analysis (as in being able to | read applied mathematics) and then are willing to help | you with your paper for peanuts is about zero. | | I don't understand why it is so hard to see that you need | either a pretty big collaboration where somebody else has | isolated the specifications so you don't need to know | anything about the problem your code solves really, or | becoming a physics graduate student yourself for this | line of work. | a_zaydak wrote: | I do agree with you on publishing seeds for Monte Carlo | simulations however the argument against it is also very | strong. Usually when you run a monte carlo simulation you | are quoting the results in terms of statistics. I think it | would be sufficient to say that you can 'reproduce' the | results as long as your statistics (over many simulations | with different seeds) is consistent with the published | results. If you run a single simulation with are particular | seed you _should_ get the same results however this might | be cherry picking a particular simulation result. This is | good for code testing but probably not for scientific | results. I think by running the code with new seeds is a | better way to test the science. | kag0 wrote: | > there is no such thing as truly random number generation | on contemporary computers | | well that's just not true. there's no shortage of noise we | can sample to get true random numbers. we just often | stretch the random numbers for performance purposes. | dllthomas wrote: | > Does scientific-grade code need to be reproducible? Yes. | Fundamentally yes. | | I agree that this is a good property for scientific code to | have, but I think we need to be careful not to treat re- | running of existing code the same way we treat genuinely | independent replication. | | Traditionally, people freshly constructed any necessary | apparatus, and people walked through the steps of the | procedures. This is an interaction between experiment and | human brain meats that's missing when code is simply reused | (whether we consider it apparatus or procedure). | | Once we have multiple implementations, _if_ there is a | meaningful difference between them, _at that point_ | replayability is of tremendous value in identifying why | they differ. | | But it is not reproducibility, as we want that term to be | used in science. | hobofan wrote: | But "rerunning reproducability" is mostly a neccessary | requirement for independent reproducability. If you can't | even run the original calculations against the original | data again how can you be sure that you are not comparing | apples to oranges? | dllthomas wrote: | Very interesting. I was thinking of software as most | similar to apparatus, and secondarily to procedure. You | raise a third possible comparison: calculations, which | IIUC would be expected to be included in the paper. | | There are some kinds of code (a script that controls a | sensor or an actuator) where I think that doesn't match | up well at all. There are plenty of kinds of code where | they _are_ , in fact, simply crunching numbers produced | earlier. For the latter, I'm honestly not sure the best | way to treat it, except to say that we should be sure | that _enough_ information is included in some form that | replication should be possible, and that we keep in mind | the idea that replication should involve human | interaction. | jabirali wrote: | In some simulations, each rerun produces different | results as you're simulating random events (like | lightning formation) or using a non-deterministic | algorithm (like Monte Carlo sampling). Just "saving the | random seed" might not be sufficient to make it | deterministic either, as if you do parallelized or | concurrent actions in your code (common in scientific | code) the same pseudorandom numbers may be used in | different orders each time you run it. | | But repeating the simulation a large number of times, | with different random seeds, should produce statistically | similar output if the code is rigorous. So even if each | simulation is not reproducible, as long as the | statistical distribution of outputs is reproducible, that | should be sufficient. | kkylin wrote: | This. I absolutely agree there needs to be more | transparency, and scientific code should be as open as | possible. But this should not replace replication. | BadInformatics wrote: | Conversely though, it is often impossible to obtain the | original code to replay and identify differences once | that step is reached _without_ some sort of strong | incentive or mandate for researchers to publish it. When | the only copy is lost in the now-inaccessible home folder | of some former grad student 's old lab machine, there is | a strong disincentive to try replicating at all because | one has little to consult on whether/how close the | replicated methods are to the original ones. | dllthomas wrote: | And so we find ourselves in the same situation as the | rest of the scientific process, throughout history. When | I try to replicate your published paper and I fail, it's | completely unclear whether it's "your fault" or "my | fault" or pure happenstance, and there's a lot of picking | apart that needs to be done with usually no access to the | original experimental apparatus and sometimes no access | to the original experimenters. | | The fact that we _can_ have that option is an amazing | opportunity that a confluence of attributes of software | (specificity, replayability, easy of copying) afford us. | Where we are not exploiting this like we could be, it is | a failure of our institutions! But it is different-in- | kind from traditional reproducibility. | BadInformatics wrote: | Of course, but the flip side is that same confluence of | attributes has also exacerbated issues of | reproducibility. Just as science and the methods/mediums | by which we conduct/disseminate it have changed, so too | should the standard of what is considered acceptable to | reproduce. This is especially relevant given how much | broader the societal and policy implications have become. | | More concretely, it is 100% fair (and I might argue | necessary) to demand more of our institutions _and_ work | to improve their failures. I 'm sure many researchers | have encountered publications of the form "we applied | <proprietary model (TM)> (not explained) to <proprietary | data> (partially explained) after <two sentence | description of preprocessing> and obtained SOTA results!" | in a reputable venue. Sure, this might be even less | reproducible 200 years ago than now, but the authors | would also be less likely to be competing with you for | limited funding! Debating about the traditional | definition of reproducibility has its place, but we | should _also_ be doing as much as possible to give | reviewers and replicators a leg up. This is often flies | in the face of many incentives the research community | faces, but shifting blame to institutions by default (not | saying you 're doing this, but I've seen many who do) is | taking the easy road out and does little to help the | imbalanced ratio of discussion:progress. | ajford wrote: | This! I struggled with this topic in university. I was | studying pulsar astronomy, and there was only one or two | common tools used at the lower levels of data processing, | and had been the same tools used for a couple of decades. | | The software was "reproducible" in that the same starting | conditions produced the same output, but that didn't mean | the _science_ was reproducible, as every study used the | same software. | | I repeatedly brought it up, but I wasn't advanced enough | in my studies to be able to do anything about it. By the | time I felt comfortable with that, I was on my way out of | the field and into an non-academic career. | | I have kept up with the field to a certain extent, and | there is now a project in progress to create a fully | independent replacement for that original code that | should help shed some light (in progress for a few years | now, and still going strong). | allenofthehills wrote: | > The software was "reproducible" in that the same | starting conditions produced the same output, but that | didn't mean the _science_ was reproducible, as every | study used the same software. | | This is the difference between reproducibility and | replicability [1]. Reproducibility is the ability to run | the same software on the same input data to get the same | output; replication would be analyzing the same input | data (or new, replicated data following the original | collection protocol) with new software and getting the | same result. | | I've experienced the same lack of interest with | established researchers in my field, but I can at least | ensure that all my studies are both reproducible and | replicable by sharing my code _and_ data. | | [1] Plesser HE. Reproducibility vs. Replicability: A | Brief History of a Confused Terminology. Front | Neuroinform. 2018;11:76. | improbable22 wrote: | This is almost an argument for _not_ publishing code. If | you publish all the equations, then everybody has to | write their own implementation from that. | | Something like this is the norm in some more mathematical | fields, where only the polished final version is | published, as if done by pure thought. To build that, | first you have to reproduce it, invariably by building | your own code -- perhaps equally awful, but independent. | 7thaccount wrote: | Should this be surprising? I'm not saying it is correct, | but it is similar to the response many managers give | concerning a badly needed rewrite of business software. | Doing so is very risky and the benefits aren't always | easy to quantify. Also, nobody wants to pay you to do | that. Research is highly competitive, so no researcher is | going to want to spend valuable time making a new tool | that already exists even if needed if no other | researchers are doing that. | andrewprock wrote: | > Does scientific-grade code need to be reproducible? Yes. | Fundamentally yes | | This is definitely not correct. The experiment as a whole | needs to be reproducible independently. This is very | different, and more robust, from requiring that a | particular portion of a previous version of the experiment | to be reproducible in isolation. | enriquto wrote: | > I wouldn't want my code to end up on | acceleratorskeptics.com with people that don't understand the | material making low effort critiques of minor technical | points. I'm here to turn out science, not production ready | code. | | In what way do idiots making idiotic comments about your | correct code invalidate your scientific production? You can | still turn out science and let people read and comment freely | on it. | | > As an example, you seem to be complaining that their Monte | Carlo code has non-deterministic output when that is the | entire point of Monte Carlo methods and doesn't change their | result. | | I guess you would not need to engage personally with the | idiots at "acceleratorskeptics.com", but likely most of their | critique would be easily shut off by a simple sentence such | as this one. Since most of your readers would not be idiots, | they could scrutinize your code and even provide that reply | on your behalf. This is called the scientific method. | | I agree that you produce science, not merely code. Yet, the | code is part of the science and you are not really publishing | anything if you hide that part. Criticizing scientific code | because it is bad software engineering is like criticizing it | because it uses bad typography. You should not feel attacked | by that. | spamizbad wrote: | > In what way do idiots making idiotic comments about your | correct code invalidate your scientific production? You can | still turn out science and let people read and comment | freely on it. | | How would a layperson identify a faulty critique? It would | be picked up by the media who would do their usual "both | sides" thing. | enriquto wrote: | Not that they abstain from doing that shit today, when | code is not often published. | | An educated and motivated layperson at least would have | the _chance_ to learn whether the critique is faulty. | Today, with secret code, it is impossible to verify for | almost everybody. | halfdan wrote: | I have done research on Evolutionary Algorithm and numerical | optimization. It was nigh impossible to reproduce poorly | described algorithms from state of the art research at the | time and researchers would very often not bother to reply to | inquiries for their code. Even if you did get the code it | would be some arcane C only compatible with a GCC from 1996. | | Code belongs with the paper. Otherwise we can just continue | to make up numbers and pretend we found something | significant. | shirakawasuna wrote: | Race conditions and certain forms of non-determinism could | invalidate the results of a given study. Code is essentially | a better-specified methods section, it just says what they | did. Scientists are expected to include a methods section for | exactly this reason, and any scientist worried about | including a methods section in their paper would be rightly | rejected. | | However, a methods section is always under-specified. Code | provides the unique opportunity to actually see the full | methods on display and properly review their work. It should | be mandated by all reputable journals and worked into the | peer review process. | Jabbles wrote: | I am interested to know the distinction between "production- | ready" and "science-ready" code. | | I do not think "non-experts" should be able to use your code, | but I do think an expert who was not involved in writing it | should be. | petschge wrote: | One example: My code used to crash for a long time if you | set the thermal speed to something greater than the speed | if light. Should the code crash? No. And by now I have | found the time to write extra code to catch the error and | midly insult the user (It says "Faster than light? Please | share that trick with me!") Does it matter? No. It didn't | run and give plausible-but-wrong results. So that is code | that I would call "science-ready" but I wouldn't want it | criticized by people outside my domain. | jnxx wrote: | I don't think that would be any problem (why should it?). | | Code exhibiting undefined behavior is a different kettle | of fish... | petschge wrote: | Which is why I run valgrind on my code (with a parameter | file containing physically valid inputs) to get rid of | all undefined behavior. But I gave up on running afl- | fuzz, because all it found was crashes following from | physically invalid inputs. I fixed the obvious once to | make the code nicer for new users, but once afl started | to find only very creative corner cases I stopped. | jnxx wrote: | Well done! | gowld wrote: | Then you publish your work and critics publish theirs and | the community decides which claims have proven their | merit. This is the fundamental structure of the | scientific community. | | How is "your code has error and I rebuke you" a more | painful critique than "you are hiding your methodology | and so I rebuke you"? | petschge wrote: | Nothing limits the field of critics to people who have | written their own code and know what they are doing. | arethuza wrote: | I would regard (from experience) "science ready" code as | something that _you_ run just often enough to get the | results to create publications. | | Any effort to get code working for other people, or | documented in any way would probably be seen as wasted | effort that could be used to write more papers or create | more results to create new papers. | | This kind of reasoning was one of the many reasons I left | academic research - I personally didn't value publications | as deliverables. | chriswarbo wrote: | My experience has been similar. | | Still, there's plenty of room to encourage good(/better) | practices which cost essentially nothing, e.g. using $PWD | rather than /home/bob/foo | gowld wrote: | If your experiment is not repeatable, it's an anecdote | not data. | | Any effort to write a paper readable for other people, or | document the experiment in any way would probably be seen | as wasted effort that could be used to create more | results. | | The "don't show your work" argument only makes sense if | you are doing PR, not science. | neutronicus wrote: | If it's repeatable _by you_ then it 's a trade secret, | not an anecdote | qppo wrote: | Disclaimer, I'm a professional engineer and not a | researcher. | | The kind of code I'll ship for production will include unit | testing designed around edge or degenerate cases that arose | from case analysis, usually some kind of end to end | integration test, aggressive linting and crashing on | warnings, and enforcing of style guidelines with auto | formatting tools. The last one is more important than | people give it credit for. | | For research it would probably be sufficient to test that | the code compiles and given a set of known valid input the | program terminates successfully. | dmlorenzetti wrote: | Hard-coded file paths for input data. File paths hard-coded | to use somebody's Google Drive so that it only runs if you | know their password. Passwords hard-coded to get around the | above problem. | | In-code selection statements like `if( True ) {...}`, where | you have no idea what is being selected or why. | | Code that only runs in the particular workspace image that | contains some function that was hacked out to make things | work during a debugging session 5 years ago. | | Distributed projects where one person wrote the | preprocessor, another wrote the simulation software, and a | third wrote the analysis scripts, and they all share | undocumented assumptions worked out between the three | researchers over the course of two years. | | Depending on implementation-defined behavior (like zeroing | out of data structures). | | Function and variable names, like `doit()` and `hold`, | which make it hard to understand the intention. | | Files that contain thousands of lines of imperative | instructions with documentation like "Per researcher X" | every 100 lines or so. | | Code that runs fine for 6 hours, then stops because some | command-line input had the wrong value. | | I've seen all of these over the years. Even as a domain | expert who has spoken directly with authors and project | leads, this kind of stuff makes it very hard to tease out | what the code actually does, and how the code corresponds | to the papers written about the results. | mroche wrote: | You're giving me flashbacks! I spent a year as an admin | on an HPC cluster at my university building | tools/software and helping researchers get their projects | running and re-lead the implementation of container | usage. The amount of scientific code/projects that | required libraries/files to be in specific locations, or | assumed that everything was being run from a home | directory, or sourced shell scripts at run time (that | would break in containers) was staggering. A lot of stuff | had the clear "this worked on my system so..." vibe about | it. | | As an admin it was quite frustrating, but I understand it | sometimes when you know the person/project isn't tested | in a distributed environment. But when it's the projects | that do know how they're used and still do those | things... | searine wrote: | >I am interested to know the distinction between | "production-ready" and "science-ready" code. | | In general, scientists don't care how long it takes or how | many resources the code uses. It is not a big deal to run a | script for an extra hour, or use up a node of | supercomputer. Extravagent solutions or added packages to | make the code run smoother or faster is only wasting time. | It speed/elegance only really matters when you know the | code is going to be distributed to the community. | | Basically scientists only care if the result, is true. If | the result it outputs is sensible, defensible, reliable, | reproducible. It would be considered a dick move to | criticism someones code, if the code was proven to produce | the correct result. | Jabbles wrote: | Do you know how you could get to the state that "the code | was proven to produce the correct result"? | | If not by unit tests, code review or formal logic, then | what? | searine wrote: | >If not by unit tests, code review or formal logic, then | what? | | Cross referencing independent experiments and external | datasets. | | Science doesn't work like software. The code can be | perfect and still not give results that reflect reality. | The code can be logical and not reflect reality. Most | scientists I know go in with the expectation that "the | code is wrong" and its results must be validated by at | least one other source. | jabirali wrote: | Not all scientific code is amenable to unit testing. From | my own experience from a PhD in condensed matter physics, | the main issue was that how important equations and | quantities "should" behave by themselves was often | unknown or undocumented, so very often each such | component could only be tested as part of a system with | known properties. | | You can then use unit testing for low-level | infrastructure (e.g. checking that your ODE solver works | as expected), but do the high-level testing via | scientific validation. The first line of defense is to | check that you don't break any laws of physics, e.g. that | energy and electric charge is conserved in your end | results. Even small implementation mistakes can violate | these. | | Then you search for related existing publications of a | theoretical or numerical nature, trying to reproduce | their results; the more existing research your code can | reproduce, the more certain you can be that it is at | least consistent with known science. If this fails, you | have something to guide your debugging; or if you're very | lucky, something interesting to write a paper about :). | | The final validation step is of course to validate | against experiments. This is not suited for debugging | though, since you can't easily say whether a mismatch is | due to a software bug, experimental noise, neglected | effects in the mathematical model, etc. | jnxx wrote: | > It would be considered a dick move to criticism | someones code, if the code was proven to produce the | correct result. | | Formal proof is much much harder than making code | understandable and reviewable. It can be done but it is | not easy, and can yield surprising results: | | https://en.wikipedia.org/wiki/CompCert | | http://envisage-project.eu/proving-android-java-and- | python-s... | lemmsjid wrote: | There's a ton of overlap, because science code might be a | long running, multi-engineer distributed system and | production code might be a script that supports a temporary | business process. But let's assume production ready is a | multi customer application and science ready is | computations to reproduce results in a paper. | | Here's a quick pass, I'm sure I'm missing stuff, but I've | needed to code review a lot of science and production | output and below is how I tend to think of it, especially | taking efficiency of engineer/scientist time into account. | | Production Ready? | | * code well factored for extensibility, feature change, and | multi-engineer contribution | | * robust against hostile user input | | * unit and integration tested | | Science Ready? | | * code well factored for readability and reproducibility | (e.g. random numbers seeded, time calcs not set against | 'now') | | * robust against expected user input | | * input data available? testing optional but desired, esp | unit tests of algorithmic functions | | * input data not available? a schema-correct facsimile of | input data available in a unit test context to verify | algorithms correct | | Both? | | * security needs assessed and met (science code might be | dealing with highly secure data, as might production code) | | * performance and stability needs met (production code more | often requires long term stability, science sometimes needs | performance within expected Big O to save compute time if | it's a big calculation) | PeterisP wrote: | Your requirements seem to push 'Science ready' far into | what I'd consider "worthless waste of time", coming from | the perspective of code that's used for data analysis for | a particular paper. | | The key aspect of that code is that it's going to be run | once or twice, ever, and it's only ever going to be run | on a particular known set of input data. It's a tool | (though complex) that we used (once) to get from A to B. | It does not need to get refactored, because the | expectation is that it's only ever going to be used as-is | (as it was used once, and will be used only for | reproducing results), it's not intended to be built upon | or maintained. It's not the basis of the research, it's | not the point of research, it's not a deliverable in that | research, it's just a scaffold that was temporarily | neccessary to do some task - one which might have been | done manually earlier through great effort, but that's | automated now. It's expected that the vast majority of | the readers of that paper won't ever need to touch that | code, they care only about the results and a few key | aspects of the methodology, which are (or should be) all | mentioned in the paper. | | It should be reproducible to ensure that we (or someone | else) can obtain the same B from A in future, but that's | it, it does not need to be robust to input that's not in | the input datafile - noone in the world has another set | of real data that could/should be processed with that | code. If after a few years we or someone else will obtain | another dataset, _then_ (after those few years, _if_ that | dataset happens) there would be a need to ensure that it | works on that dataset before writing a paper about that | dataset, but it 's overwhelmingly likely that you'd want | to modify that code anyway both because that new dataset | would not be 'compatible' (because the code will be | tightly coupled to all the assumptions in the methodology | you used to get that data, and because it's likely to be | richer in ways you can't predict right now) and you'd | want to extend the analysis in some way. | | It _should_ have a 'toy example' - what you call 'a | schema-correct facsimile of input data' that's used for | testing and validation before you run it on the actual | dataset, and it should have test scenarios and/or unit | tests that are preferably manually verifiable for | correctness. | | But the key thing here is that no matter what you do, | that's still in most cases going to be "write once, run | once, read never" code, as long as we're talking about | the auxiliary code that supports some experimental | conclusions, not the "here's a slightly better method for | doing the same thing" CS papers. We are striving for | _reproducible_ code, but actual _reproductions_ are quite | rare, the incentives are just not there. We publish the | code as a matter of principle, knowing all well that most | likely noone will download and read it. The community | needs the possibility for reproduction for the cases | where the results are suspect (which is the main scenario | where someone is likely to attempt reproducing that | code), it 's there to ensure that if we later suspect | that the code is flawed in a way where the flaws affect | the conclusions then we can go back to the code and | review it - which is plausible, but not that likely. | Also, if someone does not trust our code, they can (and | possibly should) simply ignore it and perform a 'from | scratch' analysis of the data based what's said in the | paper. With a reimplementation, some nuances in the | results might be slightly different, but all the | conclusions in the paper should still be valid, if the | paper is actually meaningful - if a reimplementation | breaks the conclusions, _that_ would be a successful, | valuable non-reproduction of the results. | | This is a big change from industry practice where you | have mantras like "a line of code is written once but | read ten times", in a scientific environment that ratio | is the other way around, so the tradeoffs are different - | it's not worth investing refactoring time to improve | readability, if it's expected that most likely noone will | ever read that code; it makes sense to spend that effort | only if and when you need it. | lemmsjid wrote: | Yep! I don't disagree with anything you're saying when I | think from a particular context. It's really hard to | generalize about the needs of 'science code', and my stab | at doing so was certain to be off the mark for a lot of | cases. | PeterisP wrote: | Yes, there are huge differences between the needs of | various fields. For example, some fields have a lot of | papers where the authors are presenting a superior method | for doing something, and if code is a key part of that | new "method and apparatus", then it's a key deliverable | of that paper and its accessibility and (re-)usability is | very important; and if a core claim of their paper is | that "we coded A and B, and experimentally demonstrated | that A is better than B" then any flaws in that code may | invalidate the whole experiment. | | But I seem to get the vibe that this original Nature | article is mostly about the auxiliary data analysis code | for "non-simulated" experiments, while Hacker News seems | biased towards fields like computer science, machine | learning, etc. | analog31 wrote: | I'm a scientist in a group that also includes a software | production team. For me, the standard of scientific | reproducibility is that a result can be replicated by a | reasonably skilled person, who might even need to fill in | some minor details themselves. | | Part of our process involves cleaning up code to a higher | state of refinement as it gets closer to entering the | production pipeline. | | I've tested 30 year old code, and it still runs, though I | had to dig up a copy of Turbo Pascal, and much of it no | longer exists in computer readable form but would have to | be re-entered by hand. Life was actually simpler back then | -- with the exception of the built-ins of Turbo Pascal, it | has no dependencies. | | My code was in fact adopted by two other research groups | with only minor changes needed to suit slightly different | experimental conditions. It contained many cross-checks, | though we were unaware of modern software testing concepts | at the time. | | For a result to have broader or lasting impact, replication | is not enough. The result has to fit into a broader web of | results that reinforce one another and are extended or | turned into something useful. That's the point where | precise replication of minor supporting results becomes | less important. The quality of any specific experiment done | in support of modern electromagnetic theory would probably | give you the heebie jeebies, but the overall theory is | profoundly robust. | | The same thing has to happen when going from prototype to | production. Also, production requires what I call push- | button replication. It has to replicate itself at the click | of a mouse, because the production team doesn't have domain | experts who can even critique the entirety of their own | code, and maintaining their code would be nearly impossible | if it didn't adhere to standards that make it maintainable | by multiple people at once. | Jabbles wrote: | This sounds great. In your opinion, do you think your | team is unusual in those aspects? Do you have any | knowledge of the quality of code in other branches of | physics or other sciences? | analog31 wrote: | Well, I know the quality of my own code before I got some | advice. And I've watched colleagues doing this as well. | | My own code was quite clean in the 1980s, when the | limitations on the machines themselves tended to keep | things fairly compact with minimal dependencies. And I | learned a decent "structured programming" discipline. | | As I moved into more modern languages, my code kind of | degenerated into a giant hairball of dependencies and | abstractions. "Just because you can do that, doesn't mean | you should." I've kind of learned that the commercial | programmers limit themselves to a few familiar patterns, | and if you try to create a new pattern for every problem, | your code will be hard to hand off. | | Scientists would benefit from receiving some training in | good programming hygiene. | dandelion_lover wrote: | > the distinction between "production-ready" and "science- | ready" code | | In the first case, you must take into account all | (un)imaginable corner cases and never allow the code to | fail or hang up. In the second case it needs to produce a | reproducible result at least for the published case. And do | not expect it to be user-friendly at all. | throwaway7281 wrote: | That's not how the game is played. If you cannot the release | the code because the code is too ugly or untested or has | bugs, how do you expect anyone with the right expertise to | assess your findings? | | It reminds me of Kerckhoffs's principle in cryptography, | which states: A cryptosystem should be secure even if | everything about the system, except the key, is public | knowledge. | jnxx wrote: | > If you cannot the release the code because the code is | too ugly or untested or has bugs, how do you expect anyone | with the right expertise to assess your findings? | | Yes, that should be this way. | | Also all cases where some company research team goes to a | scientific conference and presents a nifty solution for | problem X without telling how it was purportedly done, it | should be absolutely required to publish code and data for | this. | | *And that's also something which is broken about software | patents - patents are about open knowledge, software which | uses such patents is not open - this combination should not | be allowed at all). | jnxx wrote: | With the caveat that while in some cases, like | computational science, numerical analysis, machine | learning algorithms, computer-assisted proofs, and so on, | details of the code could be crucial, in other cases, | they should not matter that much. I too have the | impression that the HN public tends to over-value the | importance of code in these cases when it is mostly a | tool for evaluating a scientific result. | sjburt wrote: | The findings really should be independent of the code. | Reproduction should occur by taking the methodology and re- | implementing the software and running new experiments. | martingab wrote: | That's exactly the philosophy we follow e.g. in particle | physics and its a common excuse to dismiss all guidelines | made in the article. However, this kind of | validation/falsification is often done between different | research groups (maybe using different but formally | equivalent approaches) while people within the same group | have to deal with the 10 years old code base. | | I myself had very bad experience with extending the | undocumented Fortran 77 code (lots of gotos and common | blocks) of my supervisor. Finally, I decided to rewrite | the whole thing including my new results instead of just | somehow embedding my results into the old code for two | reasons: (1) I'm presumably faster in rewriting the whole | thing including my new research rather than struggling | with the old code and (2) I simply would not trust in the | numerical results/phenomenology produced by the code. | After all, I'm wasting 2 months of my PhD for the | marriage of my own results with known results which -in | principle- could have been done within one day if the | code base would allow for it. | | So yes, If it's a one-man-show I would not give too much | on code quality (though unit tests and git can safe quite | a lot of time during development) but if there is a | chance that someone else is going to touch the code in | near future it will save time to your colleagues and | improve the overall (scientific) productivity. | | PS: quite excited about my first post here | jnxx wrote: | > After all, I'm wasting 2 months of my PhD for the | marriage of my own results with known results which -in | principle- could have been done within one day if the | code base would allow for it. | | Sounds like it is quite good science to do that, because | it puts the computation on a pair of independent feet. | | Otherwise, it could just be that the code you are using | as a bug and nobody notes until it is too late. | MaxBarraclough wrote: | > If it's a one-man-show I would not give too much on | code quality | | This makes me a little uneasy, as _I 'm not too worried | about code quality_ can easily translate into _Yes I know | my code is full of undefined behaviour, and I don 't | care_. | | > PS: quite excited about my first post here | | Welcome to HN! reddit has more cats, Slashdot has more | jokes about sharks and laserbeams, but somehow we get by. | labcomputer wrote: | In GIS, there's a saying "the map is not the terrain". It | seems like HN is in a little SWE bubble, and needs to | understand "the code is not the science". | | In science, code is not an end in-and-of-itself. It is a | _tool_ for simulation, data reduction, calculation, etc. It | is a way to test scientific ideas. | | > how do you expect anyone with the right expertise to | assess your findings | | I would expect other experts in the field to write their | own implementation of the scientific ideas expressed in a | paper. If the idea has any merit, their implementations | should produce similar results. Which is exactly what they | would do if it were a physical experiment. | yjftsjthsd-h wrote: | > In GIS, there's a saying "the map is not the terrain". | It seems like HN is in a little SWE bubble, and needs to | understand "the code is not the science". | | And if you're a map maker, it's a bit rich to start | claiming that the accuracy of your maps is unimportant. | If code is "a way to test scientific ideas", then it | kinda needs to work if you want meaningful results. Would | you run an experiment with thermometers that were | accurate to +-30deg and reactants from a source known for | contamination? | jnxx wrote: | In many parts of scientific research, researchers are, to | stay in your metaphor, more travelers _using_ a map, than | map makers. | | Of course, it is a difference whether you make a clinical | study on drugs, and use a pocket calculator to compute a | mean, or whether you research in numerical analysis, or | are presenting a paper in how to use Coq to more | efficiently prove the four-color theorem or Fermat's last | theorem. | | In short, much of science is not computer science, and | for it, computation is just a tool. | RandoHolmes wrote: | No one is saying that code is the science. | | If I'm given bad information and I act on that | information, then problems can occur. | | Similarly, if the software is giving the scientist bad | information, problems can occur. | | How many more stories do we have to read about some | research getting published in a journal only to have to | retract it down the road because they had a bug in the | software before we start asking if maybe there needs to | be more rigor in the software portion of the research as | well? | | There was a story on HN a while back about a professor | who had written software, had come to some conclusions, | and even had a Ph.D. student working on research based on | that work. Only to find out that a software flaw meant | the conclusions weren't useful to anyone and that student | ended up wasting years of their life. | | --- | | This stuff matters. This isn't a model of reality, it's | an exploration of reality. It would be like telling a | hiker that terrain doesn't matter. They would, | rightfully, disagree with you. | kalenx wrote: | > How many more stories do we have to read about some | research getting published in a journal only to have to | retract it down the road because they had a bug in the | software before we start asking if maybe there needs to | be more rigor in the software | | We will always hear stories like that, as we will always | hear stories about major bugs in stable software | releases. Asking a scientist to do better than whole | teams of software engineers makes little sense to me. | | Of course, a bug that was introduced or kept with the | counscious intention of fooling the reviewers and the | readers is another story. | RandoHolmes wrote: | > Asking a scientist to do better than whole teams of | software engineers makes little sense to me. | | This is not what is being asked, shame on you for the | strawman. | | Your entire post can be summed up with the following | sentence: "if we can't be perfect then we may as well not | try to be better". | ufmace wrote: | I don't entirely disagree, but haven't there also been | cases of experimental results being invalidated due to | subtle mechanical, electrical, chemical, etc | complications with the test equipment, when none of the | people involved in the experiment were experts in those | fields? | | I think that, while we could use a bit more training in | software engineering best-practices in the science, the | thesis is still that science is hard and we need real | replication of everything before reaching important | conclusions, and over-focusing on one specific type of | errors isn't all that helpful. | RandoHolmes wrote: | If they're setting up experiments whose correct results | require electrical expertise, then yes, they should | either get better training or bring in someone who has | it. | | It's not clear to me why you think I would argue that | inaccuracies should be avoided in software but accept | that they're ok for electrical systems. | booleandilemma wrote: | If you're saying you produced certain results with code, | then the code is indeed the science. Not being able to | vouch for the code is like believing a mathematical | theorem without seeing the proof. | MaxBarraclough wrote: | At the risk of just mirroring points which have already been | made: | | > you understand that the links in your post are the exact | worry people have when it comes to releasing code: people | claiming that their non-software engineering grade code | invalidates the results of their study. | | It's profoundly unscientific to suggest that researchers | should be given the choice to withhold details of their | experiments that they fear will not withstand peer review. | That's much of the point of scientific publication. | | Researchers who are too ashamed of their code to submit it | for publication, should be denied the opportunity to publish. | If that's the state of their code, their results aren't | publishable. Unpublishable garbage in, unpublishable garbage | out. Simple enough. Journals just shouldn't permit that kind | of sloppiness. Neither should scientists be permitted to take | steps to artificially make it difficult to reproduce (in some | weak sense) an experiment. (Independently re-running code | whose correctness is suspect, obviously isn't as good as | comparing against a fully independent reimplementation, but | it still counts for something.) | | If a mathematician tried to publish the conclusion of a proof | but refused to show the derivation, they'd be laughed out of | the room. Why should we hold software-based experiments to | such a pitifully low standard by comparison? | | It's not as if this is a minor problem. Software bugs really | can result in incorrect figures being published. In the case | of C and C++ code in particular, a seemingly minor issue can | result in undefined behaviour, meaning the output of the | program is _entirely_ unconstrained, with no assurance that | the output will resemble what the programmer expects. This | isn 't just theoretical. Bizarre behaviour really can happen | on modern systems, when undefined behaviour is present. | | A computer scientist once told me a story of some students he | was supervising. The students had built some kind of physics | simulation engine. They seemed pretty confident in its | correctness, but in truth it hadn't been given any kind of | proper testing, it merely looked about right to them. The | supervisor had a suggestion: _Rotate the simulated world by | 19 degrees about the Y axis, run the simulation again, and | compare the results._ They did so. Their program showed | totally different results. Oh dear. | | Needless to say, not all scientific code can so easily be | shown to be incorrect. All the more reason to subject it to | peer review. | | > I'm an accelerator physicist and I wouldn't want my code to | end up on acceleratorskeptics.com with people that don't | understand the material making low effort critiques of minor | technical points. | | Why would you care? Science is about advancing the frontier | of knowledge, not about avoiding invalid criticism from | online communities of unqualified fools. | | I sincerely hope vaccine researchers don't make publication | decisions based on this sort of fear. | mmmBacon wrote: | Monte-Carlo can and should be deterministic and repeatable. | It's a matter of correctly initializing you random number | generators and providing a known/same random seed from run to | run. If you aren't doing that, you aren't running your Monte- | Carlo correctly. That's a huge red flag. | | Scientists need to get over this fear about their code. They | need to produce better code and need to actually start | educating their students on how to write and produce code. | For too long many in the physics community have trivialized | programming and seen it as assumed knowledge. | | Having open code will allow you to become better and you'll | produce better results. | | Side note: 25 years ago I worked in accelerator science too. | neutronicus wrote: | Then you need to re-imagine the system in such a way that | junior scientific programmers (i.e. _Grad Students_ ) can | at least _imagine_ having enough job security for code | maintainability to matter, and for PIs to invest in their | students ' knowledge with a horizon longer than a couple | person-years. | djaque wrote: | Hello fellow accelerator physicist! | | Yes I understand how seeding PRNGs work and I personally do | that for my own code for debugging purposes. My point was | that not using a fixed seed doesn't invalidate their | result. It's just a cheap shot and, to me, demonstrates | that the lockdownskeptics author doesn't have a real | understanding of the methods being used. | | Also, to be clear, I support open science and have some of | my own open-source projects out in the wild (which is not | the norm in my own field yet). I'm not arguing against | releasing code, I'm arguing against OP arguing against this | particular piece of code. | SiempreViernes wrote: | Indeed it was a cheap shot, the code does give | reproducible results: | https://www.nature.com/articles/d41586-020-01685-y | | The main issue is if it used sensible inputs, but that's | entirely different from code quality and requires subject | matter expertise, so programmers don't bother with such | details -_- | jnxx wrote: | > Monte-Carlo can and should be deterministic and | repeatable. | | That's a nitpick, but if the computation is executed in | parallel threads (e.g. on multicore, or on a | multicomputer), and individual terms are, for example, | summed in a random order, caused by the non-determinism | introduced by the parallel computation, then the result is | not strictly deterministic. This is a property of floating- | point computation, more specifically, the finite accuracy | of real floating-point implementations. | | So, it is not deterministic, but that _should_ not cause | large qualitative differences. | improbable22 wrote: | > Monte-Carlo can and should be deterministic and | repeatable | | I guess it can be made so, but not necessarily easy / fast | (if it's parallel, and sensitive to floating point | rounding). And sounds like the kind of engineering effort | GP is saying isn't worth it. Re-running exactly the same | monte-carlo chain does tell you something, but is perhaps | the wrong level to be checking. Re-running from a different | seed, and getting results that are within error, might be | much more useful. | jbay808 wrote: | I guess the best thing would be that it uses a different | random seed every time it's run (so that, when re-running | the code you'll see _similar_ results which verifies that | the result is not sensitive to the seed), but the | particular seed that produced the particular results | published in a paper is noted. | | But still, for code running on different machines, | especially for numeric-heavy code that might be running | on a particular GPU setup, distributed big data source | (where you pull the first available data rather than read | in a fixed order), or even on some special supercomputer, | it's hard to ask that it be totally reproducible down to | the smallest rounding error. | tgvaughan wrote: | I write M-H samplers for a living. While I agree that being | able to rerun a chain using the same seed as before is | crucial for debugging, and while I'm very strongly in | favour of publishing the code used for a production | analysis, I'm generally opposed to publishing the | corresponding RNG seeds. If you need the seeds to reproduce | my results, then the results aren't worth the PDF they're | printed on. [edit: typo] | jack_h wrote: | Since I have a bit of experience in this area, quasi-Monte | Carlo methods also work quite well and ensure deterministic | results. They're not applicable for all situations though. | ativzzz wrote: | While you're running experiments, it doesn't matter, but | publishing any sort of result or using your code in parts of | other publishable code IS production code, and you should | treat it as such. | oliver101 wrote: | Why is "doing software engineering" not "doing science"? | | Anybody who has conducted experimental research will say they | spent 80% of the time using a hammer or a spanner. Repairing | faulty lasers or power supplies. This process of reliable and | repeatable experimentation is the basis of science itself. | | Computational experiments must be held to the same standards | as physical experiments. They must be reproducible and they | should be publicly available (if publicly funded). | OminousWeapons wrote: | I am in 100% agreement and would like to point out that many | papers based on code don't even come with code bases, and if | they do those code bases are not going to contain or be | accompanied by any documentation whatsoever. This is frequently | by design as many labs consider code to be IP and they don't | want to share it because it gives them a leg up on producing | more papers and the shared code won't yield an authorship. | acutesoftware wrote: | If published research is based on a code base, then surely | the documentation and working code is equally important than | the carefully written paper. | OminousWeapons wrote: | I completely agree, the problem is the journal editors and | reviewers largely don't. | freeone3000 wrote: | No, the paper is what matters. The code is a means to | generate the paper. | bumby wrote: | I agree, but that's similar to saying the data is what | matters, not the methodology. | | In the research germane to this conversation, software is | the means by which the scientific data is generated. If | the software is flawed, it undermines the confidence in | the data and thus the conclusions. | freeone3000 wrote: | Most researchers would agree with the first statement | without significant qualification. Methods are at the end | for a reason. | bumby wrote: | Not disagreeing with your assertion on the opinion of | "most researchers" but you'll often find quite a few | people advocating for using the methodology sans data as | a means to determine publication worthiness to try and | avoid the perverse incentives for novel or meaningful | data. | | I think it's too easy to game the data (whether knowingly | or not) with poor methodology. I advocate process before | product, in other words. | WhompingWindows wrote: | It's hard for me to publish my code in healthcare services | research because most of it is under lock-and-key due to HIPAA | concerns. I can't release the data, and so 90% of the work of | munging and validating the data is un-releasable. So, should I | release my last 10% of code where I do basic descriptive stats, | make tables, make visualizations, or do some regression | modeling? Certainly, I can make that available in de-identified | ways, but without data, how can anyone ever verify its | usefulness? And does anyone want to see how I calculated the | mean, median, SD, IQR?...because it's with base R or tidyverse, | that's not exactly revolutionary code. | rscho wrote: | > If journals really care about the reproducibility crisis | | All is well and good then, because journals absolutely don't | care about science. They care about money and prestige. From | personal experience, I'd say this intersects with the interests | of most high-ranking academics. So the only unhappy people are | idealistic youngsters and science "users". | | Let's get back to non-profit journals. | SiempreViernes wrote: | In the event, the code actually _is_ reproducible: | https://www.nature.com/articles/d41586-020-01685-y | prionassembly wrote: | Institutions need to provide scientists and mathematicians with | coders. It's a bit insane to expect them to be software | engineers as well. | izacus wrote: | Noone expects them to be software engineers, but we do expect | them to be _scientists_ - to publish results that are | reproducible and verifiable. And that has to hold for code as | well. | neuromantik8086 wrote: | There are some efforts in this vein within academia, but they | are very weak in the United States. The U.S. Research | Software Engineer Association (https://us-rse.org/) | represents one such attempt at increasing awareness about the | need for dedicated software engineers in scientific research | and advocates for a formal recognition that software | engineers are essential to the scientific process. | | In terms of tangible results, Princeton at least has created | a dedicated team of software engineers as part of their | research computing unit | (https://researchcomputing.princeton.edu/software- | engineering). | | Realistically though even if the necessity of research | software engineering were acknowledged at the institutional | level at the bulk of universities, there would still be the | problem of universities paying way below market rate for | software engineering talent... | | To some degree, universities alone cannot effect the change | needed to establish a professional class of software | engineers that collaborate with researchers. Funding agencies | such as the NIH and NSF are also responsible, and need to | lead in this regard. | geebee wrote: | Thank you for the link to the Princeton group. That is | encouraging. Aside from that, I share your lack of optimism | about the prospects for this niche. | | Most research programmers, in my experience, work in a lab | for a PI. Over time, these programmers have become more | valued by their team. However, they often still face a hard | cap on career advancement. They generally are paid | considerably less than they'd earn in the private sector, | with far less opportunity for career growth. I think they | often make creative contributions to research that would be | "co-author" level worthy if they came from someone in an | academic track, but they are frequently left off | publications. They don't get the benefits that come with | academic careers, such as sabbaticals, and they often work | to assignment, with relatively little autonomy. The right | career path and degree to build the skills required for | this kind of programming is often a mismatch for the | research-oriented degrees that are essential to advancement | in an academic environment (including leadership roles that | aren't research roles). | | In short, I think there is a deep need for the emerging | "research software engineer" you mention, but at this | point, I can't recommend these jobs to someone with the | talent to do them. There are a few edge cases (lifestyle, | trailing spouse in academic, visa restrictions), but | overall, these jobs are not competitive with the pay, | career growth, autonomy, and even job security elsewhere | (university jobs have a reputation for job security, but | many research programmers are paid purely through a grant, | so often these are 1-2 year appointments that can be | extended only if the grant is renewed). | | The Princeton group you linked to is encouraging - working | for a unit of software developers who engage with | researchers could be an improvement. Academia is still a | long, long way away from building the career path that | would be necessary to attract and keep talent in this | field, though. | noelsusman wrote: | The criticisms of the code from Imperial College are strange to | me. Non-deterministic code is the least of your problems when | it comes to modeling the spread of a brand new disease. | Whatever error is introduced by race conditions or multiple | seeds is completely dwarfed by the error in the input | parameters. Like, it's hard to overstate how irrelevant that is | to the practical conclusions drawn from the results. | | Skeptics could have a field day tearing apart the estimates for | the large number of input parameters to models like that, but | they choose not to? I don't get it. | marmaduke wrote: | This is an easy argument to make because it was already made | for you in popular press months ago. | | Show me the grant announcements that identify reproducible long | term code as a key deliverable, and I'll show you 19 out of 20 | scientists who start worrying about it. | amelius wrote: | You can blame all the scientists, but shouldn't we blame the CS | folks for not coming up with suitable languages and software | engineering methods that will prevent software from rotting in | the first place? | | Why isn't there a common language that all other languages | compile to, and that will be supported on all possible | platforms, for the rest of time? | | (Perhaps WASM could be such a language, but the point is that | this would be just coincidental and not a planned effort to | conservate software) | | And why aren't package managers structured such that packages | will live forever (e.g. in IPFS) regardless of whether the | package management system is online? Why is Github still a | single point of failure in many cases? | klyrs wrote: | I do research for a private company, and open-source as much of | my work as I can. It's _always_ a fight. So I 'll take their | side for the moment. | | Many years ago, a paper on the PageRank algorithm was written, | and the code behind that paper was monetized to unprecedented | levels. Should computer science journals also require working | proof of concept code, even if that discourages companies from | sharing their results; even if it prevents students from | monetizing the fruits of their research? | bartvbl wrote: | The graphics community has started an interesting initiative at | this end: http://www.replicabilitystamp.org/ | | After a paper has been accepted, authors can submit a | repository containing a script which automatically replicates | results shown in the paper. After a reviewer confirms that the | results were indeed replicable, the paper gets a small badge | next to its title. | | While there could certainly be improvements, I think it's a | step in the right direction. | dandelion_lover wrote: | But does this badge influence the scientific profile / resume | of the researcher in any way? | jpeloquin wrote: | You can always put "certified by the Graphics Replicability | Stamp Initiative" next to each paper on your CV. It might | influence people a little, even if it isn't part of the | formal review for employment / promotion. Although | "Graphics Replicability Stamp Initiative" does not sound | very impressive. And Federal grant applications have rules | about what can be included in your profile. | | Informal reputation does matter though. If you want to get | things done and not just get promoted, you need the | cooperation of people with a similar mindset, and | collaboration is entirely voluntary. | ranaexmachina wrote: | In computer science a lot of researcher already publish their | code (at least in the domain of software engineering) but my | biggest problem is not the absence of tests but the absence of | any documentation how to run it. In the best case you can open | it in an IDE and it will figure out how to run it but I rarely | see any indications what the dependencies are. So if you figure | out how to run the code you run it until you get the first | import exception, get the dependency until you get the next | import exception and so on. I spent way too much time on that | instead of doing real research. | justin66 wrote: | John Carmack, who did some small amount of work on the code, | had a short rebuttal of the "Lockdown Skeptics" attack on the | Imperial College code that probably mirrors the feelings of | some of us here: | | https://mobile.twitter.com/id_aa_carmack/status/125819213475... | onhn wrote: | There is a fundamental reason not to publish scientific code. | | If someone is trying to reproduce someone else's results, the | data and methods are the only ingredients they need. If you add | code into this mix, all you do is introduce new sources of | bias. | | (Ideally the results would be blinded too.) | alexeiz wrote: | Pfff. Does my 3 month old code still run? Uh, nope. And I don't | remember what it was supposed to do! | hpcjoe wrote: | Short answer: Yes, my 30 year old Fortran code runs (with a few | minor edits between f77 and modern fortran), as did my ancient | Perl codes. | | Watching the density functional theory based molecular dynamics | zip along at ~2 seconds per time step on my 2 year old laptop, | versus the roughly 6k seconds per time step on an old Sun machine | back in 1991. I remember the same code getting down to 60 seconds | per time step on my desktop R8k machine in the late 90s. | | Whats been really awesome about that has been the fact that I've | written some binary data files on big endian machines in the | early 90s, and re-read them on the laptop (little endian) adding | a single compiler switch. | | Perl code that worked with big XML file input in the mid 2000s | continues to work, though I've largely abandoned using XML for | data interchange. | | C code I wrote in the mid 90s compiled, albeit with errors that | needed to be corrected. C++ code was less forgiving. | | Over the past 4 months, I had to forward port a code from Boost | 1.41 to Boost 1.65. Enough changes over 9 years (code was from | 2011) that it presented a problem. So I had to follow the changes | in the API and fix it. | | I am quite thankful I've avoided the various fads in platforms | and languages over the years. Keep inputs in simple textual | format that can be trivially parsed. | atrettel wrote: | > Whats been really awesome about that has been the fact that | I've written some binary data files on big endian machines in | the early 90s, and re-read them on the laptop (little endian) | adding a single compiler switch. | | I want to second the idea of just dumping your floating point | data as binary. It's basically the CSV of HPC data. It doesn't | require any libraries, which could break or change, and even if | the endianness changes you can still read it decades later. | I've been writing a computational fluid dynamics code recently | and decided to only write binary output for those reasons. I'm | not convinced of the long-term stability of other formats. I've | seen colleagues struggle to read data in proprietary formats | even a few years after creating it. Binary is just simple and | avoids all of that. Anybody can read it if needed. | petschge wrote: | Counter argument: Binary dumps are horrible because usually | the documentation that allows you to read the data is | missing. Using a self-documenting format such as HDF5 is far | superior. It will tell you of the bit are floating point | numbers in single or double precision, which endianess and | what the layout of the 3d array was. (No surprise that HDF | was invented for the Voyager mission where they had to ensure | readability of the data for half a century). | iagovar wrote: | Why not dumping into SQLite? It makes everything easy, and | we will be able to use sqlite3 for a long time IMO. | petschge wrote: | Because parallel IO from a lot of different MPI ranks is | not supported. And filesystems tend to look unhappy when | 100k processes try to open a new file at the same time. | atrettel wrote: | Your argument raises a lot of good points. I actually agree | that binary does lose all of the metadata and documentation | that goes with it. That is a big problem. That is why I | think it is also important to include some sort of | documentation like an Xdmf file [1]. That is what I use to | tie everything together in my particular project. HDF5 is | fine. In fact, I would have strongly preferred my | colleagues using HDF5 over the proprietary format that they | did end up using. But HDF5 requires an additional library. | I did not want to use any external libraries in my | particular project (other than MPI), so I tried to look for | a solution that achieves close to what HDF5 can achieve but | without requiring something as "heavy" as HDF5. I have to | admit that perhaps my design choice does not work for more | complex situations, but I think it is something people | should consider before tying themselves down too much. | | [1] http://www.xdmf.org/index.php/Main_Page | petschge wrote: | Having an Xdmf file alongside is nice, but the breaking | changes between v2 and v3 are very annoying. And I | understand the want to have few external dependencies, | but at least HDF5 is straight forward to compile and | available as a pre-compiled module on all supercomputers | that I have ever seen. | hpcjoe wrote: | I got into the habit of documenting each file with a | file.meta that I could view later on. | | I did binary dumps in the past because ascii dumps | (remember, 90s) were far more time/space expensive. HDF | wasn't quite an option then, either HDF4, or HDF5. | | These days I would probably look at something like that, | though, to be honest, there is always a danger of choosing | something that may not be supported over the long term. | This is why I generally prefer open and simple formats for | everything. HDF5 is nice and open. | | One needs to look carefully at the total risk of using a | proprietary format/system for any part of their storage. | Chances are you will not be able to even read older data | within a small number of decades if any of the | format/system dependent technologies goes away. | | I've got old word processor files from the mid 80s, that I | can't read. What I've written there (mostly college papers) | is lost (which may be a net positive for humanity). | | My tarballs, and zip files though, are readable 30+ years | later. That is pretty amazing. | | Simple, documented, and open formats. Picture a time when | you can't read/open your pptx/xlsx/docx files any more. | Same with data. Simple binary formats are like CSV files, | but you do need to maintain metadata on their contents, and | document it extensively in the code as to what you are | reading/writing, why you are doing this, and how you are | doing this. | | I think this will get more important over time as we start | asking questions on how to maintain open artefact | repositories for data and code. The fewer dependencies the | better. | | And unlike the recent gene renaming snafu in biology[1], | you really, never, want your tool to get in the way of the | science. Either in terms of formats, or interpretation of | data. | | [1] https://www.theverge.com/2020/8/6/21355674/human-genes- | renam... | Rochus wrote: | Yes, I know a couple of Fortran 77 apps and libraries which | were developed more than 25 years ago and which are still in | use today. | | My C++ Qt GUI application for NMR spectrum analysis | (https://github.com/rochus-keller/CARA) runs since 20 years now | with continuing high download and citation rates. | | So obviously C++/Qt or Fortran 77 are very well suited to | outlast time. | O_H_E wrote: | Nice. Interesting to know that Github starts aren't always a | representative metric. | Rochus wrote: | Yes, many of my apps and libs were more than ten years old | when I pushed them to github. Some projects started before | git was invented. | lumost wrote: | I'm continuously surprised that Code Review isn't a part of the | review process for journal acceptance. The majority of academic | code for a given paper isn't particularly large - and the | benefits are significant. | myself248 wrote: | Plenty of actual professional programmers can't manage this, how | is it a fair standard to hold scientists to, when the code is | just one of the many tools they're trying to use to get their | real job done? | | I think moving away from the cesspool of imported remote | libraries that update at random times and can vanish off the | internet without warning, would help a lot of both cases. | minkzilla wrote: | I think we have to hold scientists to higher standards for code | quality because it has a direct impact on the findings of their | results. How many off by one or other subtle errors that are | found later in testing have most software engineers written in | their career? Is it fine to just say eh, scientific results can | be off by one because the standards should be lower? | proverbialbunny wrote: | >Plenty of actual professional programmers can't manage this, | how is it a fair standard to hold scientists to | | That's a good point. On a tangential note, prototype code tends | to be at a higher level than production code, so there is a | higher chance 10 year old code will continue to run on the | scientist side, as long as the libraries imported haven't | vanished. | rudolph9 wrote: | Professional programmers should adopt package manager that | focus on reproducibility like Guix and Nix and make them | accessible enough for non programmers to use. | | Neither of these are perfect but in my experience they are | worlds better than apk, Dockerfiles, and many other commonly | used solutions. | | http://guix.gnu.org/ | | https://nixos.org/ | userbinator wrote: | I still use Windows binaries daily which I wrote and last | modified over 20 years ago. I don't expect that to change in the | next ten years either. | dr-detroit wrote: | sounds neat please link the open source repo so we can check it | out | xipho wrote: | Yes. 110% attributed to learning about unit-tests and gems/CPAN | in grad school. | | IMO there is a big fallacy about the "just get it to work" | approach. Most serious scientific code, i.e. supporting months- | years of research, is used and modified _a lot_. It 's also not | really one-off, it's a core part of a dissertation, or research | program, if it fails- you do. I'd argue that (and I found that), | using unit-tests, a deployment strategy, etc. ultimately allowed | me to do more, and better science because in the long run I | didn't spend as much time figuring out why my code didn't run | when I tweaked stuff. This is really liberating stuff. I suspect | this is all obvious to those who have gone down that path. | | Frankly, every reasonably tricky problem benefits from unit-tests | as well for another reason. Don't know how to code it, but know | the answer? Assert lots of stuff, not just one at a time red- | green style. Then code, and see what happens. So powerful for | scientific approaches. | xorfish wrote: | And bugs can have quite big implications: | | https://smw.ch/article/doi/smw.2020.20336 | wdwvt1 wrote: | An excellent article full of good suggestions. I appreciated that | it's less certain of the Best Practices TM than many comments on | this subject. I am curious how the goals/techniques for | reproducibility change with the percentage of | software/computational work that a scientific project contains. | It feels like as the percentage of a paper's ultimate conclusions | that are computationally derived increases, the importance of | strict "the tests pass and the numerical results are identical" | reproducibility also increases. Most of my projects are mixed | wet-lab/dry-lab - a fair amount of custom code is required, but | it's usually less than 50% of the work. When I'm relying on other | papers that have a similar mix of things, I'm often not | interested if the continuous integration tests of their code | pass. I am more interested in understanding well the specific | steps they take computationally and in a sensitivity analysis of | their computational portion (if you slightly alter your binning | threshold do you still get that fantastic clustering?). I believe | this is because in my field (microbiology), computational tools | can guide, but physical reality and demonstrated biology are the | only robust evidence of a phenomenon/mechanism/etc. For most | research I do not demand tests of all the analytical pieces they | are relying on (was their incubator actually set to 37C? was the | pH of the media +- 0.2? etc) - I trust they've done good science. | Why would I demand their code meet a higher standard? | majewsky wrote: | CMake Error at /usr/share/cmake-3.18/Modules/FindQt4.cmake:1314 | (message): Found unsuitable Qt version "5.15.0" from | /usr/bin/qmake, this code requires Qt 4.x | | Well fuck. | Gatsky wrote: | I think code is remarkably persistent in the scheme of things. | Try reproducing a wet lab experimental technique from 5 years | ago. | closeparen wrote: | Almost certainly not, because it would have been written in | Python 2. | fizzled wrote: | Yep. I wrote a netlist analyzer in Perl that provides | statistics... in 1997. It is still part of a regression suite | because it is very small, very fast and callable through the | command line without loading hundreds of megabytes of libraries | (unlike foundation tools). I reconnected with a peer on LinkedIn | who still works at the company and joked that he still sees my | sill script name in verification flows. The only changes I made | to in 20+ years it was moving to PERL 5.61 so that I could parse | files >1GB, but it has been maintained and kept to standard | practices. | fourseventy wrote: | Forget 10 year old code. Try to get your 2 year old javascript + | webpack + react set up running... | slhck wrote: | The two main problems in academia are that a) few researchers | have formal training in best practices of software engineering, | and that b) time pressure leads to "whatever worked two minutes | before submission deadline" becoming what is kept for | posteriority. | | When I started working as a full-time researcher, I had come from | working two years in a software shop, only to find people at the | research lab having never used VCS, object-oriented programming, | etc. Everyone just put together a few text files and Python or | MATLAB scripts that output some numbers that went into Excel or | gnuplot scripts that got copy-pasted into LaTeX documents with | suffixes like "v2_final_modified.tex", shared over Dropbox. | | Took a long time to establish some coding standards, but even | then it took me a while to figure out that that alone didn't | help: you need a proper way to lock dependencies, which, at the | time, was mostly unknown (think requirements.txt, packrat for R, | ...). | justinmeiners wrote: | Don't you think docker, dependencies, unit test frameworks, etc | actually increase the need for ongoing maintenance as opposed | to spitting out some C files or python scripts which last | "forever"? | tanilama wrote: | No. | | Python/C files didn't work in a vacuum. They need | dependencies, that is the point of Docker after all. | | Capture all necessary dependencies into a single image. | justinmeiners wrote: | > Python/C files didn't work in a vacuum | | They do if you use the standard library (which for python | is quite extensive), and copy any dependencies into your | own source, as if they are your own. By "in a vaccum" we | can mean if python o is installed, it will work. | | > Capture all necessary dependencies | | Docker doesn't capture any dependencies. They still exist | on the internet. It just captures a list of which ones to | download when you build the image. | | Do you think software we write now has more longevity than | older software that uses make or a shell script? | slhck wrote: | I don't think so. The source code is the same but there's now | metadata that helps in setting up the same environment again, | even years later. You still have the original code in case, | e.g. Docker is no longer available. | | For instance, if you just have a Python script importing a | statistical library, what version are you going to use? Scipy | had a pretty nasty change in one of its statistical | functions, changing the outcome of significance tests in our | project. Depending on which version you happened to have | installed it'd give you a positive or negative result. | justinmeiners wrote: | It makes sense that having more information is better than | less. | | I would argue that they should use no dependencies to avoid | this problem entirely, or download them and include them as | source in the project, or at least include a note of which | version of a major library they used in a README or | comment. I think this is what is often done in practice | currently. | | Perhaps as you are saying, docker is just a stable way to | document this stuff formally. But it is a large moving part | that assumes a lot of stuff is still on the internet. What | if the docker hub image is removed or dramatically changed? | What if that OS package manager no longer exists? It just | doesn't seem like our software is getting more longevity, | but less. I don't know why we would bring that extra | complexity to academic research if the goal is longevity. | hobofan wrote: | requirements.txt is not a lockfile | neuromantik8086 wrote: | Just as a quick bit of context here, Konrad Hinsen has a specific | agenda that he is trying to push with this challenge. It's not | clear from this summary article, but if you look at the original | abstract soliciting entries for the challenge | (https://www.nature.com/articles/d41586-019-03296-8), it's a bit | clearer that Hinsen is using this to challenge the technical | merits of Common Workflow Language (https://www.commonwl.org/; | currently used in bioinformatics by the Broad Institute via the | Cromwell workflow manager). | | Hinsen has created his own DSL, Leibniz | (https://github.com/khinsen/leibniz ; http://dirac.cnrs- | orleans.fr/~hinsen/leibniz-20161124.pdf), which he believes is a | better alternative to Common Workflow Language. This | reproducibility challenge is in support of this agenda in | particular, which is worth keeping in mind; it is not an unbiased | thought experiment. | jnxx wrote: | Konrad Hinsen is an expert in molecular bioinformatics and also | has significantly contributed to Numerical Python, for example, | and has extensively published around the topic of reproducible | science and algorithms - see his blog. | | The fact that he might favor different solutions from you does | not mean that he is pushing some kind of hidden agenda. | | If you think that Common Workflow Language is a better | solution, you are free to explain in a blog why you think this. | | Are you saying that the reproductive challenge poses a | difficulty to Common Workflow Language? If this is so, would | that not rather support Hinsen's point - without implying that | what he suggests is already a perfect solution? | neuromantik8086 wrote: | I never said that Konrad Hinsen's agenda was hidden; in fact, | it's not at all hidden (which is why I linked the abstract). | It's just that this context isn't at all clear in the Nature | write-up, and it's relevant to take into account. | | I haven't taken the time to seriously contemplate the merits | of CWL vs Leibniz, although my gut instinct is that we don't | really need another domain-specific language for science | given the profusion of such languages that already exist | (Mathematica, Maple, R, MATLAB, etc). That's the extent of my | bias, but again, it's a gut instinct and not a comprehensive | well-reasoned argument against Leibniz. | rkagerer wrote: | _" Visual Basic," Maggi writes in his report, "is a dead language | and long since has been replaced..."_ | | In fact I still have the VB6 IDE installed on my primary | workstation and use it for quick and dirty projects from time to | time. | garden_hermit wrote: | I favor open code, but like everything, there are issues. For | example, the EPA years ago required that research can only inform | policy when data is open; open data, however, takes a lot of | effort to document and provide. Companies, however, with vested | interest in EPA policy can easily produce open (and often very | biased) data. | | Requirements for open code can lead to similar issues--what | happens when a government agency rejects the outcome of a | supercomputer simulation because the code wasn't documented well | enough? What happens when those with vested interests are the | ones best able to produce scientific code? | | Scientists already wear many hats. Any shift in policy and norms | needs to consider that they have limited time, a fact that can | have far-reaching consequences. | adornedCupcake wrote: | "Python 2.7 puts "at our disposal an advanced programming | language that is guaranteed not to evolve anymore", Rougier | writes1." Oh no. That's not at all what was intended. Regarding | my own research: I'm doing theoretical biophysics. Often I do | simulations. If conda stays stable enough, my code should be | reproducible. There's however some external binaries(like lammps) | I did not turn into a conda package yet. There's no official | package that fits my use-case in conda since compilation is fine- | grained to each user's needs. | rekado wrote: | I added different variants of lammps to a Guix channel we | maintain at our institute: | | https://github.com/BIMSBbioinfo/guix-bimsb/blob/master/bimsb... | | Thankfully, Guix makes it easy to take an existing package | definition and create an altered variant of it. | wenc wrote: | The easiest way to preserve code for posterity is the wrap up the | runtime environment in a VM. I can boot up a VM from 15 years ago | (when I was in grad school) and it will run. | | When you're writing code for science, preserving code for | posterity is rarely a priority. Your priority is to iterate | quickly because the goal is scientific results, not code. | | (this is in fact, correct prioritization. Under most | circumstances, though not all, most grad students who try to | write pristine code find themselves progressing more slowly than | those who don't.) | Sebb767 wrote: | As someone who worked with bits of scientific code: Does the code | you write _right now_ work on another machine might be the more | appropriate challenge. If seen a lot of hardcoded paths, | unmentioned dependencies and monkey-patched libraries downloaded | from somewhere; just getting the new code to work is hard enough. | And let 's not even begin to talk about versioning or magic | numbers. | | Similar to other comments I don't mean to fault scientists for | that - their job is not coding and some of the dependencies come | from earlier papers or proprietary cluster setups and are | therefore hard to avoid - but the situation is not good. | TheJoeMan wrote: | I emailed an author of a 5 year old paper and they said they | had lost their original MATLAB code, certainly brings into | question their paper. | James_Henry wrote: | Definitely makes you question it more. Does the paper not | explain the contents of the MATLAB code? That's all that is | usually needed for reproducibility. You should be able to get | the same results no matter who writes the code to do what is | explained in their methods. | | Of course, I have no idea about the paper you're talking | about and just want to say that reproducibility isn't | dependent on releasing code. There could even be a case were | it's better if someone reproduces a result without having | been biased by someone else's code. | dunefox wrote: | If a scientist needs to write code then it's part of their job. | It's as easy as that. | magv wrote: | I think the idea that scientific code should be judged by the | same standards as production code is a bit unfair. The point | when the code works the first time is when an industry | programmer starts to refactor it -- because he expects to use | and work on it in the future. The point when the code works | the first time is when a scientists abandons it -- because it | has fulfilled its purpose. This is why the quality is lower: | lots of scientific code is the first iteration that never got | a second. | | (Of course, not all scientific code is discardable, large | quantities of reusable code is reused every day; we have many | frameworks, and the code quality of those is completely | different). | dunefox wrote: | That's not the point, though. If you obtain your results by | writing and executing code then code quality matters - to | reproduce and validate them. | abdullahkhalids wrote: | Lots of people saying, it is the scientist's job to produce | reproducible code. It is, and the benefits of reproducible code | are many. I have been a big proponent of it in my own work. | | But not with the current mess of software frameworks. If I am | to produce reproducible scientific code, I need an idiot-proof | method of doing it. Yes, I can put in the 50-100 hours to learn | how to do it [1], but guess what, in about 3-5 years a lot of | that knowledge will be outdated. People comparing it with math, | but the math proofs I produce will still be readable and | understandable a century from now. | | Regularly used scientific computing frameworks like | matlab/R/Python ecosystem/mathematica need a dumb guided method | of producing releasable and reproducable code. I want to go | through a bunch of next buttons, that help me fix the problems | you indicate, and finally release a final version that has all | the information necessary for someone else to reproduce the | results. | | [1] I have. I would put myself in the 90th percentile of | physicists familiar with best practices for coding. I speak for | the 50% percentile. | zelphirkalt wrote: | The dumb guide is the following: | | (1) Use a package manager, which stores hashsums in a lock | file. (2) Install your dependencies from a lock file as spec. | (3) Do not trust version numbers. Trust hash sums. Do not | believe in "But I set the version number!". (4) Do not rely | on downloads Again, trust hash sums, not URLs. (5) | Hashsums!!! (6) Wherever there is randomness as in random | number generators, use a seed. If the interface does not | allow to specify the seed, thtow the trash away and use | another generator. Careful when concurrency is involved. It | might destroy reproducibility. For example this was the case | with Tensorflow. Not sure it still is. (7) Use a version | control system. | hobofan wrote: | > in about 3-5 years a lot of that knowledge will be | outdated | | Yup, and most of the points you mentioned will probably not | be outdated for quite some while. Every package manager I'm | aware of with lock files that are that old can still | consume them today. | hobofan wrote: | > their job is not coding | | But it often is. For most non-CS papers (mostly biosciences) | I've read, there are specific authors whose contribution to a | large degree was mainly "coding". | BeetleB wrote: | > their job is not coding | | To me, that's like a theoretical physicist saying "My job is | not to do mathematics" when asked for a derivation of a formula | he put in the paper. | | Or an experimental physicist saying "My job is not mechanical | engineering" when asked for details of their lab equipment | (almost all of which is typically custom built for the | experiment). | Sebb767 wrote: | On one hand, yes. But on the other hand, reuseable code, | dependency management, linting, portability etc are not | _that_ easy problems and something junior developers tend to | struggle with (and its not like that problem never pops up | for seniors, either). I really can 't fault non-compsci | scientist for not handling that problem well. Of course, part | of it (like publishing the relevant code) is far easier and | should be done, but some aspects are really hard. | | IMO the incentive problem in science (basically number of | papers and new results is what counts) also plays into this, | as investing tons of time in your code gives you hardly any | reward. | dunefox wrote: | There are tons of tutorials on using conda for dependency | management, it's not rocket science. And using a linter is | difficult? If a scientist needs to read and write code as | part of their job then they should learn the basics of | programming - that includes tools and 'best practices'. | BeetleB wrote: | > But on the other hand, reuseable code, dependency | management, linting, portability etc are not that easy | problems and something junior developers tend to struggle | with | | On the original hand, these are easier problems than all | the years of math education they have. Once you're relying | on simulations to get results to explain natural phenomena, | it needs to be put on the same pedestal as mathematics. | djaque wrote: | The point is that as a scientist your code is a tool to get | the job done and not the product. I can't spend 48 hours | writing unit tests for my library (even though I want to) if | it's not going to give me results. It's literally not my job | and is not an efficient use of my time | TimothyBJacobs wrote: | This is the same as any other argument against testing. | Unless you are actually selling a library, code is not the | product. Customers are buying results, not your code base. | Yet, we've discovered the importance of testing to make | sure customers get the right results without issues. | | If you want your results to be usable by others, the | quality of the code matters. If all you care is publishing | a paper, then I guess sure it doesn't matter if anyone else | can build off your work. | PeterisP wrote: | But the results _are_ usable by others, in most fields of | science the code is not part of these results and is not | needed to enjoy, use and build upon the research results. | | The only case where the code would be used (which is a | valid reason why it should be available _somehow_ ) is to | assert that your particular results are flawed or | fraudulent; otherwise the quality of the code (or its | availability, or even existence - perhaps you could have | had a bunch of people do all of it on paper without any | code) is simply irrelevant if you want your results to be | usable by others. | BeetleB wrote: | > The only case where the code would be used (which is a | valid reason why it should be available somehow) is to | assert that your particular results are flawed or | fraudulent; | | Not true. Code is often used and reused to churn out a | lot more results than the initial paper. A flaw in the | code doesn't just show one paper/result as problematic. | It can show a large chunk of a researcher's work in his | area of expertise to be problematic. | [deleted] | RandoHolmes wrote: | > I can't spend 48 hours writing unit tests for my library | | No one is insisting on top quality code, but there has to | be an acceptance that code can be flawed and that needs to | be tested for. | dunefox wrote: | If the code you base your work on is horrible it definitely | makes me question your results. That's why it's called the | _reproducibility_ crisis. | | Writing some tests, using a linter, commenting your code, | and learning about best programming practices doesn't take | long and pays off - even for yourself when writing the code | or you need to touch the code again. "48 hours writing unit | tests" is a ridiculous comparison. | BeetleB wrote: | > The point is that as a scientist your code is a tool to | get the job done and not the product. | | Everything you say is as true for experimental equipment | and mathematical tools. Physicists are fantastic at | mathematics, yet are one of the most anti-math people I | know - in the sense of "Mathematics is just a tool to get | results that explain nature! Doing mathematics for its own | sake is a waste of time!" | | The equation is not the product - the explanation of | physical phenomena is. If the attitude of "I don't need to | show how I got this equation" is unacceptable, the same | should go for code. | Jabbles wrote: | How do you know it won't give you results? Maybe it will | find a bug that would have resulted in an embarrassing | retraction. | | Maybe it wouldn't find any bugs, but give confidence to and | encourage other users and increasing your citations and | "impact". | | Maybe it will just save you 48h later on when you need to | adapt the code. | | Software engineering has generally accepted that unit | testing is a good practice and well worth the time taken. | Why do you think science is different? | dunefox wrote: | > Why do you think science is different? | | It's really not, I guess his focus lies on cranking out | irreproducible papers. | westurner wrote: | "Ten Simple Rules for Reproducible Computational Research" | http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fj... : | | > _Rule 1: For Every Result, Keep Track of How It Was Produced_ | | > _Rule 2: Avoid Manual Data Manipulation Steps_ | | > _Rule 3: Archive the Exact Versions of All External Programs | Used_ | | > _Rule 4: Version Control All Custom Scripts_ | | > _Rule 5: Record All Intermediate Results, When Possible in | Standardized Formats_ | | > _Rule 6: For Analyses That Include Randomness, Note Underlying | Random Seeds_ | | > _Rule 7: Always Store Raw Data behind Plots_ | | > _Rule 8: Generate Hierarchical Analysis Output, Allowing Layers | of Increasing Detail to Be Inspected_ | | > _Rule 9: Connect Textual Statements to Underlying Results_ | | > _Rule 10: Provide Public Access to Scripts, Runs, and Results_ | | ... You can get a free DOI for and archive a tag of a Git repo | with FigShare or Zenodo. | | ... re: [Conda and] Docker container images | https://news.ycombinator.com/item?id=24226604 : | | > _- repo2docker (and thus BinderHub) can build an up-to-date | container from requirements.txt, environment.yml, install.R, | postBuild and any of the other dependency specification formats | supported by REES: Reproducible Execution Environment Standard; | which may be helpful as Docker Hub images will soon be deleted if | they 're not retrieved at least once every 6 months (possibly | with a GitHub Actions cron task)_ | | BinderHub builds a container with the specified versions of | software and installs a current version of Jupyter Notebook with | repo2docker, and then launches an instance of that container in a | cloud. | | "Ten Simple Rules for Creating a Good Data Management Plan" | http://journals.plos.org/ploscompbiol/article?id=10.1371/jou... : | | > _Rule 6: Present a Sound Data Storage and Preservation | Strategy_ | | > _Rule 8: Describe How the Data Will Be Disseminated_ | | ... DVC: https://github.com/iterative/dvc | | > _Data Version Control or DVC is an open-source tool for data | science and machine learning projects. Key features:_ | | > _- Simple command line Git-like experience. Does not require | installing and maintaining any databases. Does not depend on any | proprietary online services. Management and versioning of | datasets and machine learning models. Data is saved in S3, Google | cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD | RAID._ | | > _- Makes projects reproducible and shareable; helping to answer | questions about how a model was built._ | | There are a number of great solutions for storing and sharing | datasets. | | ... "#LinkedReproducibility" | jnxx wrote: | Open textual formats for data and open source application and | system software (more precisely, FLOSS), are just as important. | | Imagine that x86 - and with it, the PC platform - gets replaced | by ARM within a decade. For binary software, this would be a | kind of geological extinction event. | westurner wrote: | The likelihood of there being a [security] bug discovered in | a given software project over any significant period of time | is near 100%. | | It's definitely a good idea to archive source and binaries | and later confirm that the output hasn't changed with and | without upgrading the kernel, build userspace, execution | userspace, and PUT/SUT Package/Software Under Test. | | - Specify which versions of which constituent software | libraries are utilized. (And hope that a package repository | continues to serve those versions of those packages | indefinitely). Examples: Software dependency specification | formats like requirements.txt, environment.yml, install.R | | - Mirror and archive _all_ dependencies and sign the | collection. Examples: {z3c.pypimirror, eggbasket, | bandersnatch, devpi as a transparent proxy cache}, apt- | cacher-ng, pulp, squid as a transparent proxy cache | | - Produce a signed archive which includes all requisite | software. (And host that download on a server such that data | integrity can be verified with cryptographic checksums and/or | signatures.) Examples: Docker image, statically-linked | binaries, GPG-signed tarball of a virtualenv (which can be | made into a proper package with e.g. fpm), ZIP + GPG | signature of a directory which includes all dependencies | | - Archive (1) the data, (2) the source code of all libraries, | and (3) the compiled binary packages, and (4) the compiler | and build userspace, and (5) the execution userspace, and (6) | the kernel. Examples: Docker can solve for 1-5, but not 6. A | VM (virtual machine) can solve for 1-5. OVF (Open | Virtualization Format) is an open spec for virtual machine | images, which can be built with a tool like Vagrant or Packer | (optionally in conjunction with a configuration management | tool like Puppet, Salt, Ansible). | | When the application requires (7) a multi-node distributed | system configuration, something like docker- | compose/vagrant/terraform and/or a configuration management | tool are pretty much necessary to ensure that it will be | possible to reproducibly confirm the experiment output at a | different point in spacetime. ___________________________________________________________________ (page generated 2020-08-24 23:01 UTC)