Abstract: ## brcon2023 - 2023-08-07 title: Using Unix and Bitreich tools for preparing scientific papers author: Anders Damsgaard (adc) contact: anders@adamsgaard.dk gopher://adamsgaard.dk https://adamsgaard.dk #pause # Previously... * brcon2020: Energy efficient programming in science * brcon2021: Unix principles for science simulations * brcon2022: absent due to fieldwork - Bitreich Arctic Vault ## About me #curriculum-vitae - JOIN #bitreich-en 2019-12-16 - bitreich member 2020-08-28 - bitreich council 2023-07-09 #pause * CV - Geology Ph.D. in 2015, Aarhus University Employment: - Geo (Denmark) - Aarhus University (Denmark) - Stanford University (US) - Danish Environmental Protection Agency (DK) - Scripps Institution of Oceanography (US) - National Oceanic and Atmospheric Administration (NOAA, US) - Princeton University (US) ## Background * Publishing papers the primary goal in academia * Revision and draft formats with consistent styling * Word/LaTeX templates from journals/publishers * Collaboration usually primitive #pause Horror examples: - main_AD_DLE_AD_ST_LHB_AD_JP_NKL_AD_DLE_AD.docx - article3_LHB_DLE_NKL_JAP_DLE_AD_dont_use.docx - CorrectionsSuggestions-Morgane-20170828.docx - si-REV1-track-changes.docx ## Outline This talk: Treating manuscript as source code Show how I structured a recent paper Discuss attempts at optimizing coauthor collaboration and reproducibility of results Introduce ideas for improving the publication process #pause Why should you care? Insight into academic publishing Maybe learn some tricks for producing other kinds of documentation ## Publication example Anders Damsgaard, Liran Goren, and Jenny Suckale "Water pressure fluctuations control variability in sediment flux and slip dynamics beneath glaciers and ice streams" Communications Earth & Environment, vol. 1(66) https://doi.org/10.1038/s43247-020-00074-7 https://www.nature.com/articles/s43247-020-00074-7 Published December 2020 Open access, CC-BY-4.0 - Main article (8 pages, 4 figures, 64 references) - Supplementary information (8 pages) gophers://adamsgaard.dk/9/tmp/damsgaard2020.pdf gophers://adamsgaard.dk/9/tmp/damsgaard2020si.pdf ## Science behind the publication Ice sheets (Antarctica, Greenland) flow on soils where the pore space is saturated by meltwater. The sliding over these soils ("sediments") and their mechanics are of primary importance for understanding how the ice sheets flow in the future. ## Science behind the publication - Cont. A program called cngf-pf(1) simulates the mechanical conditions under ice sheets. (see my brcon2020 talk) The program is run many times with different inputs of meltwater, glacier flow velocities, and other parameters, and each run is called a "simulation". cngf-pf(1) produces numerical results, which are further analyzed and/or plotted as figures. The resultant figures broaden our understanding of the physical processes and how ice sheets flow when more meltwater reaches their beds. ## The publication process 0. Author(s) produce significant scientific result 1. Author(s) document the result in figures and text (.tex or .docx) 2. Main author submits draft .pdf to scientific journal 3. The journal assigns a scientific editor #pause 4. ??? #pause 5. Profit ## The publication process 0. Author(s) produce significant scientific result 1. Author(s) document the result in figures and text (.tex or .docx) 2. Main author submits draft .pdf to scientific journal 3. The journal assigns a scientific editor #pause 4. The scientific editor assigns 2-3 peer reviewers that go through the submitted .pdf 5. The editor compiles a response based on the reviewer feedback and responds to authors #pause 6. The author(s) revise their publication, responding to the reviewer comments in a response letter 7. The author(s) submit a new .pdf, source .tex, figure .pdf, response-to-reviewers.pdf, tracked-changes.pdf #pause 8. Editor decides if GOTO 4 9. The author(s) pay publication fee 10. Source code put into online archive (e.g., zenodo.com) 11. The publisher typesets from source .tex and figure .pdf 12. Online publication on website (DOI assignment) #pause Not unlike the iterative process of source code review. ## Paper: technical details Thinking of a manuscript as source code. Four git repositories: * git://src.adamsgaard.dk/cngf-pf - Numerical code cngf-pf(1) - Permanently archived at: doi:10.5281/zenodo.4106566 * git://src.adamsgaard.dk/cngf-pf-exp1 - Simulation setups for cngf-pf(1) - Plotting of results with gnuplot(1) - Permanently archived at: 10.5281/zenodo.4106575 * https://git.overleaf.com/ (not public) - Main article - LaTeX and BibTeX, Nature journal template * https://git.overleaf.com/ (not public) - Supplementary information - Own LaTeX setup ## Source code structure paper/ .gitmodules cngf-pf-exp1/ # Experiments dir .git/ cngf-pf/ # Source for cngf-pf(1) .git/ Makefile cngf-pf.c ... fig1/ # Runs cngf-pf(1) for first figure Makefile plot.gp fig2/ # Runs cngf-pf(1) for second figure Makefile plot.gp change.patch figN/ Makefile # make -C figN/ (mostly parallel) drist/ # drist(1) setup main/ # Copy figures and produce pdf .git/ Makefile main.tex references.bib si/ # Copy figures and produce pdf .git/ Makefile si.tex ## Producing a figure The cngf-pf(1) program is designed so simulations are configured by command line flags. The cngf-pf(1) stdout and output files are tsv formatted and can easily be plotted with gnuplot(1), etc. Most figures are produced by running cngf-pf(1) with different command line arguments. Larger changes can be included by having source code patches along with each figure. ## Producing the main paper Sync with Overleaf git-remote(1) Makefile: - ../cngf-pf-exp1/fig*.pdf to cwd - merges with pdfjam(1) - crops if necessary with pdfcrop(1) - runs pdflatex(1) with cdoc(1) gophers://adamsgaard.dk/0/tmp/cdoc ## Tips for writing LaTeX (transferrable to other documentation) * Don't (if possible) * Version control * One sentence per line of code * Makefiles * Overleaf useful for collaborating with less-technical coauthors * git-tag(1) submitted versions * Use latexdiff(1) and git to produce tracked-changes.pdf ## Ideas for improving the publication process * Think of text documentation like source code * `make publish` - rsync(1) to journal/publisher * Journals as git remote - Pros and cons of archiving manuscript history * Journals or publishers should host and automatically archive source along with submissions * Journals should use git-branch in review process ## The end * Modularity is good * make(1), sh(1), drist(1), gnuplot(1), git(1) for the future! * Difficulties collaborating across technical levels * LaTeX is inefficient, but far better than Word * Learn from tgtimes * What are your ideas?