site

Unnamed repository; edit this file 'description' to name the repository.
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

commit cd0aae7ec7976a0071936a5f67922219bf46821c
parent 8f01f836eff5753a3608ad84aef8cafc7373d3ee
Author: Beau <cbeauhilton@gmail.com>
Date:   Fri,  2 Dec 2022 17:20:02 -0600

add old post on rmd

Diffstat:
Asite/posts/rmd_py.md | 80+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 80 insertions(+), 0 deletions(-)

diff --git a/site/posts/rmd_py.md b/site/posts/rmd_py.md @@ -0,0 +1,80 @@ +# R Markdown is my spirit animal + +<time id="post-date">2019-10-20</time> + +## update 2022-12-02 + +The R Markdown folks have a new project called [Quarto](https://quarto.org), +looks like it does all the things RMD does plus more. + +## original post + +In a [previous post]({% post_url 2019-06-10-python-write-my-paper %}) I talked about how easy it is, if you're already doing your own stats anyway in some research project, to have a Python script output paragraphs with all the stats written out and updated for you to add into your paper. + +The main problem with the approach I outlined was how to get those nicely updated paragraphs into the document you are sharing with colleagues. + +Medicine, in particular, seems wed to Microsoft Word documents for manuscripts. Word does not have a great way to include text from arbitrary files, forcing the physician-scientist to manually copy and paste those beautifully automated paragraphs. As I struggled with this, I thought (here cue Raymond Hettinger), "There must be a better way." + +<p id="post-excerpt"> +Turns out that a better way does exist, and it is R Markdown. +</p> + +Though I was at first resistant to learning about R Markdown, mostly because I am proficient in Python and thought the opportunity cost for learning R at this point would be too high, as soon as I saw it demoed I changed my tune. Here's why. + +## Writing text +- R Markdown is mostly markdown. + - Markdown is by far the easiest way to write plaintext documents, especially if you want to apply formatting later on without worrying about the specifics while you're writing (e.g. `#` just specifies a header - you can decide how you want the headers to look later, and that styling will automatically be applied). + - Plaintext is beautiful. It costs nearly nothing in terms of raw storage, and is easy to keep within a version control system. Markdown plaintext is human-readable whether or not the styling has been applied. Your ideas will never be hidden in a proprietary format that requires special software to read. + - I had been transitioning to writing in Markdown anyway, so +1 for R Markdown. +- R Markdown is also a little LaTeX. + - LaTeX is [gorgeous](https://tex.stackexchange.com/questions/1319/showcase-of-beautiful-typography-done-in-tex-friends) and wonderful, the most flexible and expressive of all the typesetting tools (though not as fast as our old friend Groff...). It also has a steeper learning curve than Markdown, and is not so pretty on the screen in its raw form. R Markdown lets you do the bulk of your work in simple Markdown, then seamlessly invoke LaTeX when you need something a little fancier. +- R Markdown is also a little HTML. + - HTML is also expressive, and can be gorgeous and wonderful. It is a pain to write. As with LaTeX, you can simply drop in some HTML where you need it, and R Markdown will deal with it as necessary. +- R Markdown is academic-friendly. + - Citations and formatting guidelines for different journals are the tedious banes of any academic's existence. R Markdown has robust support for adding in citations that will be properly formatted in any desired style, just by changing a tag at the top of the document. Got a rejection from Journal 1 and want to submit to Journal 2, which has a completely different set of citation styles and manuscript formatting? NBD. + +## Writing code +R Markdown, as the name implies, can also run R code. +Any analysis you can dream of in R can be included in your document, and you can choose whether you want to show the code and its output, the output alone, or the code alone. +People will think you went through all the work of making that figure, editing it in PowerPoint, screenshotting it to a .png, then dropping that .png file into your manuscript, but the truth is... +you scripted all of that, so the manuscript itself made the .png and included it where it needed to go. + +R Markdown is by no means restricted to R code. +This is the killer app that won me over. +Simply by specifying that a given code block is Python, +and installing a little tool (`reticulate`) that allows R to interface with Python, +I can run arbitrary Python code within the document and capture the output however I want. +That results paragraph? Sure. +Fancy images of predictions from my machine learning model? But of course. + +If you don't want to use any R code ever, that's fine. R Markdown doesn't mind. +Use SAS, MATLAB (via Octave), heck, even bash scripts - the range of language support is fantastic. + +## Working with friends +R Markdown can be compiled to pretty much any format you can dream of. +My current setup simultaneously puts out an HTML document (that can be opened in any web browser), a PDF (because I love PDFs), and (AND!) a .docx Word file, +all beautifully formatted, on demand, whenever I hit my keyboard shortcut. I can preview the PDF or HTML as I write, have a .docx to send to my PI, and life is good. + +Also, because you can write in any programming language, you can easily collaborate between researchers that are comfortable in different paradigms. +You can pass data back and forth between your chosen languages (for me, R and Python), +either directly or by saving intermediate data to a format that both languages can read. + +## Automating tasks +Many analyses and their manuscripts, especially if they use similar techniques (e.g. survival modeling), are rather formulaic. +Many researchers have scripts they keep around and tweak for new analyses revolving around the same basic subject matter or approach. +With R Markdown, your entire manuscript becomes a runnable program, further automating the boring parts of getting research out into the open. + +One of the [first introductions](https://www.youtube.com/watch?v=MIlzQpXlJNk) I had to R Markdown shared the remarkable idea of setting the file to run on a regular basis, +generating a report based on any updated data, +and then sending this report to all the interested parties automatically. +While much academic work could not be so fully automated, parts of it certainly can be. + +Perhaps your team is building a database for outcomes in a given disease, and has specified the analysis in great detail beforehand. +One of my mentors gives the advice that in any project proposal you should go as far as to mock up the results section, +including all figures, +so you make sure you are collecting the right data. +If this was done in an R Markdown document rather than a simple Word document, +you could have large parts of the template manuscript +become the real manuscript as the database fleshes out over time. +Then when it's done, look over the data, make additions and subtractions as needed, +write the discussion sections, and send it in.