Build reproducible and shareable data analyses using R packages
Material of this course is on Github: statnmap/teach-package-dev-rmdfirst
One step package building
One step package building
One step package building
One step package building
One step package building
All conference participants agree to:
What is Bakacode?
What is Bakacode?
BakACode is the home-made ThinkR e-learning platform
Please take this tour to get used to the platform
What is Bakacode?
What is Bakacode?
Now we let you 3 minutes to read the following slides and answer the quiz at the end of this chapter
What is Bakacode?
Connect to https://bakacode.io
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
We recommend that you export your full project at the end of the course
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
What is Bakacode?
Can you download the pdf of the courses?
Did you find the "shared/hello.Rmd" file ?
Can you use the table of content or the search bar to retrieve my email somewhere at the end of the slides?
Start with the documentation
Start with the documentation
Let's say you don't!
Start with the documentation
Start with the documentation
Start with the documentation
install.packages('attachment')
You don't have to think about dependencies...
The developers prepared everything for you
Start with the documentation
Start with the documentation
?att_amend_desc
Start with the documentation
?att_amend_desc
=> ExamplesStart with the documentation
Start with the documentation
Unit tests and CI are set up by developers to ensure reproducibility and maintainability
Start with the documentation
Questions | Answers |
---|---|
What does it do? | CRAN page |
How to install it with its dependencies? | install.packages('attachment') |
What are its functions? | ?attachment => Index |
How to fill parameters of this function? | ?att_amend_desc |
Can I have an example on how to use this function? | ?att_amend_desc => Examples |
Can I have an overview on how to use the package as a whole? | Vignettes, GitHub |
Will it work with the last version of R and dependencies? | README Check |
There is a dedicated website that gathers all these answers: https://thinkr-open.github.io/attachment/
We will explore this website later
Start with the documentation
Start with the documentation
You will build all the doc along the way
Start with the documentation
vignette(package = "thepackage")
.Start with the documentation
vignette(topic = "colwise", package = "dplyr")
The story of the package as a whole
Start with the documentation
Start with the documentation
Start with the documentation
Start with the documentation
Discover the structure of a package with {fusen}
Discover the structure of a package with {fusen}
What if there was a package that could take an Rmd file, kind of like a sheet of paper, and if you follow the right folding, you can blow it up like a package?
Discover the structure of a package with {fusen}
We suggest you to :
Discover the structure of a package with {fusen}
We suggest you to :
Watch your trainers create a package {mytools} by following these short steps, without practicing yourself
Redo this package {mytools} on your own in a new project
Discover the structure of a package with {fusen}
We suggest you to :
Watch your trainers create a package {mytools} by following these short steps, without practicing yourself
Redo this package {mytools} on your own in a new project
We won't explain everything here, but you will see the components of a package and where they fit. For the details, that's the goal of your complete training!
Discover the structure of a package with {fusen}
In Rstudio :
no capital letters, underscores, spaces or special characters
Choose the {fusen} template in the dropdown menu: "teaching"
Choose the directory where to save the project
You are about to build a package. This is a set of tools for testing package structure.
Thus, {mytools}
The directory is the "Home" in our platform using ~
Discover the structure of a package with {fusen}
Discover the structure of a package with {fusen}
The project opens up on a flat template file: "flat_teaching.Rmd"
Here are the main components of a package, in a unique Rmd file
You can see that {fusen} opens up the "flat_teaching.Rmd" file in RStudio.
There are a few additional files that we will explore later.
This Rmd file is the "teaching" template with different chunks, which you will not modify this time.
Let's quickly explore the content of this Rmd. A description, some functions along with examples and tests. And a final 'development' chunk asking to inflate
Discover the structure of a package with {fusen}
Describe your future package:
description
chunkdescription
chunkDiscover the structure of a package with {fusen}
Describe your future package:
description
chunkdescription
chunkYeah, you started with documentation !
There are some more fields in the DESCRIPTION file, but we'll see them later
Discover the structure of a package with {fusen}
fusen::inflate(flat_file = "dev/flat_teaching.Rmd")
Discover the structure of a package with {fusen}
fusen::inflate(flat_file = "dev/flat_teaching.Rmd")
You built a package!
Discover the structure of a package with {fusen}
The 'Build' tab should already appear in RStudio
Install the package
Test the package directly in the console
mytools::add_one(value = 56)
Test the knit of the vignette
Check that the help for your function appears
?add_one
Discover the structure of a package with {fusen}
If you verified everything listed above, your RStudio should look like this
Discover the structure of a package with {fusen}
Package express with {fusen}
Package express with {fusen}
To create a package, we will use:
install.packages(c("fusen", "devtools", "usethis", "pkgbuild", "roxygen2", "attachment", "testthat"))
Rtools is available here: https://cran.r-project.org/bin/windows/Rtools/
Once (properly) installed pkgbuild::has_rtools()
should return TRUE
.
Rtools installs everything needed to compile c++ etc.
All of these should be installed on the server already
Package express with {fusen}
What if there was a package that could take an Rmd file, kind of like a sheet of paper, and if you follow the right folding, you can blow it up like a package?
Package express with {fusen}
Package express with {fusen}
This procedure contains 10 steps, to be done in order. Some points are not explained in detail to allow you to obtain a functional R package quickly.
We suggest you to:
Package express with {fusen}
This procedure contains 10 steps, to be done in order. Some points are not explained in detail to allow you to obtain a functional R package quickly.
We suggest you to:
Package express with {fusen}
This procedure contains 10 steps, to be done in order. Some points are not explained in detail to allow you to obtain a functional R package quickly.
We suggest you to:
Procedure is split in two parts to let you practice. You will thus have 2 times: "Watch - Do"
Package express with {fusen}
In Rstudio:
no capital letters, underscores, spaces or special characters
Choose the {fusen} template in the dropdown menu: "minimal"
Choose the directory where to save the project
You could also directly run:
fusen::create_fusen(path = "~/hello", template = "minimal")
You are about to build a package. This is a set of tools to be polite with other people, starting by saying hello.
Thus, {hello}
~
Package express with {fusen}
This time the template is divided into two flat Rmd files
description
part.You can see that {fusen} opens up the "0-dev_history.Rmd" file in RStudio.
There are a few additional files that we will explore later.
This Rmd file is the "minimal" template with different chunks, empty or not empty, that we will fill together in the next steps
Package express with {fusen}
The description of the package takes place in the first description
chunk of the "dev_history.Rmd" file
The first fields to fill in:
Authors@R
: vector of one or more person()
person("Sébastien", "Rochette", email = "sebastien@thinkr.fr", role = c("aut", "cre"))
Package express with {fusen}
The description of the package takes place in the first description
chunk of the "dev_history.Rmd" file
The first fields to fill in:
Authors@R
: vector of one or more person()
person("Sébastien", "Rochette", email = "sebastien@thinkr.fr", role = c("aut", "cre"))
description
Observe the content of file "DESCRIPTION" created
You are about to build a package, you need to inform the user about its aim. Here, the aim is a set of tools to be polite with other people, starting by saying hello.
You also need to say who you are, so that users know who to call in case of problems.
Finally, the license allows you to say how you want your package to be used and shared. Without license, no one is supposed to use your package.
Observe the content of DESCRIPTION. Note that what you wrote in the Rmd is now copied in this file, but you did not have to move from your Rmd. So we stay in the Rmd, and we continue the development.
Package express with {fusen}
A package is created to automate some operations. Starting with the documentation forces you to think about the structure of the package and the logical sequence of operations.
function
chunkdevelopment
Package express with {fusen}
## Say hello to someoneYou can say hello to someone in particular using `say_hello()`.```{r development}library(glue) # On top with othersmessage("Hello someone")someone <- "Seb"message(glue("Hello {someone}"))```
development
chunkdevelopment
chunk only, at the very beginning, with other library()
callsHere we take a simple example with a code to say hello. This will be the first tool of our package to be polite with people around us.
Forget that you are about to build a package. Now we develop in a Rmd as usual.
First, we write what we are about to do.
Then, we write some code to say hello. But, I want it to be able to say hello to someone else than me.
So I add a parameter.
Package express with {fusen}
function
chunk as soon as you transformed it as a functionexamples
chunk## Say hello to someoneYou can say hello to someone in particular using `say_hello()`.```{r function}say_hello <- function(someone) { message(glue("Hello {someone}"))}``````{r examples}say_hello(someone = "Seb")```
function
to make it availableexamples
chunk of the Rmd to try itfunction()
function. Use the parameter as a parameter of the function.
In the examples chunk, we have a reproducible example.
Package express with {fusen}
The created function can now be documented
Add the doc in {roxygen2} format
@param
to present the content of inputs@export
for the function to be accessible to the users@importFrom package function
for functions coming from other packages@return
to describe the object that comes out of the functionUse RStudio menu: Code > Insert Roxygen Skeleton
You must declare the dependencies for the package with
@importFrom
. The calls tolibrary()
are only there for development, like a classic Rmd, but are not used by the package when inflated
As you just wrote the code of your function, you know exactly
Write it now. In an hour, you will have forgotten it!
Note that:
Let's see how it looks on the next slide
Package express with {fusen}
## Say hello to someoneYou can say hello to someone in particular using `say_hello()`.```{r function}#' Show a message in the console to say Hello to someone#' #' @param someone Character. Name of the person to say hello to#' @importFrom glue glue#' @return Used for side effect. Outputs a message in the console#' @examples#' @exportsay_hello <- function(someone) { message(glue("Hello {someone}"))}``````{r examples}say_hello(someone = "Seb")```
Here you can see the minimal roxygen content.
development
chunk. That's ok.Package express with {fusen}
Tell us in the Chatroom when you are done
Package express with {fusen}
```{r tests}test_that("say_hello works", { expect_message(say_hello(someone = "Seb"), "Hello Seb")})```
Package express with {fusen}
fusen::inflate(flat_file = "dev/flat_minimal.Rmd", vignette_name = "Say Hello!")
If there are any errors or warnings, read them carrefuly, address them in the "flat_minimal.Rmd" and inflate the package again.
Package express with {fusen}
Package express with {fusen}
Package express with {fusen}
Link files to their description:
Files and folders
DESCRIPTION
dev_history / flat_minimal
vignettes
script with roxygen
testthat
Possible answers:
a: ACBDE
c: EABCD
Documentation
A. Development process for developers
B. Present all the functions of the package and its story
C. How to use each function (for the user)
D. Testing the functions for the developers
E. Content and objectives of the package for all
b: EDCBA
d: BAECD
Package express with {fusen}
Generate documentation
attachment::att_amend_desc()
Check that the package follows the packages rules
devtools::check()
Solve potential problems in the "flat_minimal.Rmd"
Re-inflate the package if necessary
Reach 0 Error, 0 Warnings, 0 Notes
Store this commands in the "0-dev_history.Rmd" file
Note that
fusen::inflate()
already launches this two commands, but who knows!
It is always good to know these commands, although inflate()
already does them. You may need them if you go back to a classical way of maintaining your package
Package express with {fusen}
The 'Build' tab should already appear in RStudio
Note that you can "check" in the 'Build' panel
Install the package
Test the package directly in the console
hello::say_hello("Toto")
Test the knit of the vignette
Check that the help for your function appears
?say_hello
Package express with {fusen}
Edit "flat_minimal.Rmd" and re-execute
inflate()
as many times as necessary until everything runs smoothly.
Bonus: If you are motivated, you can start again from the beginning of the 10 steps procedure with a new package name {hello2}
Package express with {fusen}
This would require to start over from 'Step 4' to:
Upgrade your existing function
Add a new function in the current flat template
function
, examples
, tests
There is a RStudio Addin to "Add {fusen} chunks"
Package express with {fusen}
This would require to start over from 'Step 4' to:
Upgrade your existing function
Add a new function in the current flat template
function
, examples
, tests
There is a RStudio Addin to "Add {fusen} chunks"
Or create a new flat template using fusen::add_flat_template("add")
for a new family of functions, thus new vignette
There is a RStudio Addin to "Add {fusen} flat template"
Package express with {fusen}
This would require to start over from 'Step 4' to:
Upgrade your existing function
Add a new function in the current flat template
function
, examples
, tests
There is a RStudio Addin to "Add {fusen} chunks"
Or create a new flat template using fusen::add_flat_template("add")
for a new family of functions, thus new vignette
There is a RStudio Addin to "Add {fusen} flat template"
And inflate()
this new "flat_additional.Rmd"
Package express with {fusen}
Package express with {fusen}
usethis::use_*_license()
usethis::use_build_ignore("dev/")
usethis::use_r("ma_fonction")
usethis::use_testthat()
usethis::use_test("ma_fonction")
usethis::use_vignette("Le titre de ma vignette")
attachment::att_amend_desc()
roxygen2::roxygenise()
+ Fill in DESCRIPTION (Suggests, Imports)devtools::check()
=> 0 errors, 0 warnings, 0 notes
Package express with {fusen}
You need to fill the different files yourself
While developing you could
Note that you can still do these actions using {fusen} after inflate()
Be careful, when using {fusen}, if you want to modify some code, go back to the "flat_*.Rmd" and do
inflate()
again
Show that we can do it with fusen too as it is a real classical package
Package express with {fusen}
Where were moved the pieces of code from chunks:
description
?function
?example
?tests
?development
?Verify and update your previous drawings. See each thing that {fusen} does for you!
description
: in DESCRIPTIONfunction
: in the independent .R file with the name of the functionexample
: in the independent .R fileInclude datasets in your package
Include datasets in your package
In a package, it can be useful to include data.
Include datasets in your package
In a package, it can be useful to include data.
There are three types of datasets to include, thus three ways to include them into a package.
data-raw/
folderInclude datasets in your package
In a package, it can be useful to include data.
There are three types of datasets to include, thus three ways to include them into a package.
A raw data file (xlsx, csv or other), NOT available to the end user, but accessible to the developers only, stored in data-raw/
folder
An example dataset in rda
format, to be loaded as is, available to the user, stored in the data/
folder (such as iris
or mtcars
for example)
Include datasets in your package
In a package, it can be useful to include data.
There are three types of datasets to include, thus three ways to include them into a package.
A raw data file (xlsx, csv or other), NOT available to the end user, but accessible to the developers only, stored in data-raw/
folder
An example dataset in rda
format, to be loaded as is, available to the user, stored in the data/
folder (such as iris
or mtcars
for example)
A data file (xlsx, csv or other) not transformed into rda
, available to the end user, stored in the inst/
folder
Let's see how and why
Include datasets in your package
data-raw/
Include datasets in your package
data-raw/
Steps:
data-raw/
at the root of the package using: usethis::use_data_raw()
Include datasets in your package
data-raw/
Steps:
data-raw/
at the root of the package using: usethis::use_data_raw()
Information:
data-raw
is not installed with the packagedevelopment
chunk.Include datasets in your package
data/
usethis::use_data(my_dataset)
my_dataset
using: data(my_dataset)
Include datasets in your package
data/
usethis::use_data(my_dataset)
my_dataset
using: data(my_dataset)
Steps:
data-raw/
to prepare your dataset using: usethis::use_data_raw("my_dataset")
my_dataset
as internal data# Read some raw data # my_data_to_clean <- readr::read_csv("my_raw_data.csv") # Or use existing dataset like `diamonds` my_dataset <- dplyr::slice_sample(diamonds, prop = 0.2) usethis::use_data(my_dataset, overwrite = TRUE)
overwrite = TRUE
overwrite the dataset if already exists, as for an updatedata/
data-raw
previously created to prepare datasetuse_data()
stores the reproducible example at the right place in the right formatInclude datasets in your package
data/
usethis::use_data(my_dataset)
my_dataset
using: data(my_dataset)
Information:
my_dataset
is accessible to the user after package installationlibrary(mypackage) data(my_dataset)
my_dataset
is stored as .rda
file, only readable by R, similar to .RData
filesPackage needs to be installed to access the dataset
Include datasets in your package
data/
Include datasets in your package
data/
my_dataset <- dplyr::slice_sample(diamonds, prop = 0.2) usethis::use_data(my_dataset, overwrite = TRUE) cat(sinew::makeOxygen("my_dataset"), file = "R/doc_my_dataset.R") rstudioapi::navigateToFile("R/doc_my_dataset.R")
à rajouter dans "data-raw/my_dataset.R"
Documentation is always the key for a correct package. Check will remind you of missing data documentation.
Include datasets in your package
data/
Information:
2 main tags:
@format
(required) a summary of the data.
@format
is not specified, then roxygen will edit a standard one.@source
: the origin of the data, the source.
We do not @export
a dataset.
Include datasets in your package
inst/
inst/
folder as is directlysystem.file("my_dataset.csv", package = "mypackage")
Include datasets in your package
inst/
inst/
folder as is directlysystem.file("my_dataset.csv", package = "mypackage")
Steps
inst/
folder using: dir.create(here::here("inst"))
# Store "my_dataset.csv" in "inst/" folder the_data_path <- system.file("my_dataset.csv", package = "mypackage") the_data <- readr::read_csv(the_data_path)
Include datasets in your package
inst/
inst/
folder as is directlysystem.file("my_dataset.csv", package = "mypackage")
Information
inst/
as your wantInclude datasets in your package
data/
or inst/
are only available after package installationpkgload::load_all()
during developmentsystem.file()
temporarily returns development path (not installed path)Include datasets in your package
data/
or inst/
are only available after package installationpkgload::load_all()
during developmentsystem.file()
temporarily returns development path (not installed path)Steps
# For development only pkgload::load_all() # Same code to add in your `examples` or `tests` # Can be tested directly during development the_data_path <- system.file("my_dataset.csv", package = "mypackage") the_data <- readr::read_csv(the_data_path)
Include datasets in your package
data/
or inst/
are only available after package installationpkgload::load_all()
during developmentsystem.file()
temporarily returns development path (not installed path)Steps
# For development only pkgload::load_all() # Same code to add in your `examples` or `tests` # Can be tested directly during development the_data_path <- system.file("my_dataset.csv", package = "mypackage") the_data <- readr::read_csv(the_data_path)
Note that
pkgload::load_all()
also loads functions in development, as it simulates a real installation
Include datasets in your package
If I want to provide a dataset to the user in any format I want, where do I store it?
extdata/
inst/
data/
data-raw/
Include datasets in your package
Steps
full
templatedescription
chunk in the "dev_history.Rmd"development
chunk in "Read data" section in the "flat_full.Rmd"inst/
during developmentInclude datasets in your package
Steps
full
templatedescription
chunk in the "dev_history.Rmd"development
chunk in "Read data" section in the "flat_full.Rmd"inst/
during developmentBonus
check_data_integrity()
that reads a dataset like nyc_squirrels
and check its integrityprimary_fur_color
columns should only contains a unique color, there should not be any +
sign inside this columninst/
with system.file()
https://github.com/statnmap/squirrels.fusen/blob/main/dev/dev_history.Rmd
#' Check data integrity #' #' @param x dataframe with at least columns "lat", "long" and "primary_fur_color" #' #' @return Original dataframe if all tests are good. Otherwise stops. #' @export check_data_integrity <- function(x) { # Verify points are in New York around Central Park all_coords_ok <- all( c( min(x[["lat"]]) > 40.76400, max(x[["lat"]]) < 40.80100, min(x[["long"]]) > -73.98300, max(x[["long"]]) < -73.94735 ) ) if (!all_coords_ok) {stop("Not all data are in Central Park")} # Verify there is only one color in primary_fur_color. # A `+` in the column is a sign of multiple colours if (any(grepl("+", x[["primary_fur_color"]], fixed = TRUE))) { stop("There are multiple colors in some 'primary_fur_color'") } message("All tests are good !") }
# A working example my_data_example <- data.frame( lat = c(40.77, 40.78), long = c(-73.95, -73.96), primary_fur_color = c("grey", "black") ) check_data_integrity(my_data_example)
Versioning a {fusen} package with git
Versioning a {fusen} package with git
2 possibilities:
Your {fusen} package has not been created yet, and you want to set up the versioning before starting the developments
Your {fusen} package has already been created, and you now want to version it
Versioning a {fusen} package with git
Initialize the empty project on GitLab (remembering to use the name of your future package as project name)
Get the https
link to your project by clicking on the clone
button
In RStudio, create the new project File > New Project > Version Control > git and link to the Repository URL
Initiate your new {fusen} package:
fusen::create_fusen(path = ".", template = "minimal", overwrite = TRUE)
Versioning a {fusen} package with git
In the Terminal,
git switch -c main
git add .
git commit -m "Init fusen package"
git push -u origin main
You can now begin your developments.
Versioning a {fusen} package with git
This procedure can be used for any type of R project, and therefore for any type of package (not only {fusen} packages)
Versioning a {fusen} package with git
This procedure can be used for any type of R project, and therefore for any type of package (not only {fusen} packages)
Initialize the empty project on GitLab (remembering to use the name of your package as project name)
Retrieve the https
link to your project by clicking on the clone
button
Open the RStudio project and run the command usethis::use_git()
in the console (the {usethis} package must be installed)
Answer yes to all the questions asked in the console, if relevant. A first "Initial Commit" is done for you.
RStudio restarts and git is operational locally
ATTENTION Si vous modifiez cette slide, modifiez aussi M14S01C06 pour git1jour
Versioning a {fusen} package with git
Now you have to link this project to your remote
:
usethis::use_git_remote("origin", url = "https://gitlab.com/my_name/mypackage.git", overwrite = TRUE)
In the Terminal, type:
git push -u origin main
You just added a remote
and made a first push
on the main
branch.
ATTENTION Si vous modifiez cette slide, modifiez aussi M14S01C06 pour git1jour
Versioning a {fusen} package with git
Set up the git versioning for your {hello}
package
What about data analyses in a package?
What about data analyses in a package?
Then, two possible ways to build your analysis reports:
What about data analyses in a package?
The compendium logic is to separate code from the report output:
This is similar to package logic:
What about data analyses in a package?
# install.packages("fusen") fusen::create_fusen(template = "full", with_git = TRUE)
dir.create("reports") usethis::use_build_ignore("reports")
Note: you can add the Compendium structure in the "reports/" sub-directory, with package {rcompendium}
# install.packages("rcompendium") rcompendium::add_compendium("reports")
What about data analyses in a package?
library(my.package)
What about data analyses in a package?
Because of:
What about data analyses in a package?
Develop your project in a controled environment, with fixed versions of packages
attachment::create_renv_for_dev()
(>=0.2.5) to build your "renv.lock" fileAllow other users to create their analyses in the same environment, by cloning your repository
You can build a Docker container to also fix system dependencies when using your package. {dockerfiler} can help you to set up the container.
What about data analyses in a package?
# Have a proper Readme - Fill and knit usethis::use_readme_rmd() # Allow {pkgdown} usethis::use_pkgdown() # Try it locally pkgdown::build_site() # GitHub # Add your credentials for GitHub gitcreds::gitcreds_set() # Send your project to a new GitHub project usethis::use_github() # Build and publish with GitHub Actions usethis::use_github_action("pkgdown") # Build and publish on GitLab gitlabr::use_gitlab_ci()
Material of this course is on Github (with answers): statnmap/teach-package-dev-rmdfirst
One step package building
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |