6 Continuous integration
Continuous integration is the process of assigning an external service to verify a package, produce Markdown documents for web pages in a GitHub repository, or completely knit a website from code.
All of these tasks can be done locally on the desktop but are time consuming and may not be repeated with each update. In the context of continuous integration, they are systematically performed, in a transparent way for the user. In case of failure, an alert message is sent.
The implementation of continuous integration is justified for heavy projects, with regular updates. rather than for projects containing a simple Markdown document that is rarely modified.
6.1 Tools
6.1.1 GitHub Actions
The most frequently used tool for R projects filed on GitHub was Travis CI110 but the service became fee-based in 2021.
GitHub Shares is a good replacement for Travis. This service is integrated with GitHub.
6.1.2 Codecov
To evaluate the code coverage rate of R packages, i.e. the proportion of code tested in some way (examples, unit tests, vignette), the Codecov111 service integrates perfectly with GitHub.
You need to open an account, preferably by authenticating through GitHub.
6.2 Principles
The production of a document is treated as an example.
The objective is to have GitHub knit a Markdown project.
This practice is appropriate for book projects, which require a lot of resources for their construction.
In this type of project, the code is knit by knitr to produce several documents, typically in HTML and PDF formats, accessible on GitHub pages.
When documents are produced locally, they are placed in the docs
folder and pushed to GitHub.
In order for GitHub to do this, a few settings are required.
6.2.1 Getting a personal access token
To write to GitHub, the continuous integration service will have to authenticate with a Personal Access Token (PAT) whose creation is described in section 1.4.4.
Generate a new token, describe it as “GitHub Actions” and give it the following authorizations:
- “repo”, i.e. modify all repositories (it is not possible to limit access to a particular repository).
- “workflow”, i.e. run continuous integration scripts.
6.2.2 Project secrets
On GitHub, display the project settings and select “Secrets”. The “New Repository Secret” button allows you to store variables used in GitHub Actions scripts (publicly visible) without exposing their value. The personal access token is essential for GitHub Actions to write their production in the project. Create a secret named “GH_PAT” and enter the previously saved token value. After clicking on “Add Secret”, the token cannot be read anymore.
To allow sending success or failure messages without exposing your email address, create a secret named “EMAIL” which contains it.
6.2.3 Activation of the repository on CodeCov
The analysis of the code coverage of packages is useful to detect untested code portions. On the other hand, the analysis of the coverage of document projects is not useful.
To activate a repository, you need to authenticate on the CodeCov website with your GitHub account. The list of repositories is displayed and can be updated. If the repositories to be processed are hosted by an organization, for example the repositories of a GitHub classroom, you have to update the list of organizations by following the instructions (a link allows you to quickly modify the GitHub options to authorize Codecov to read the data of an organization) and update the list of repositories again. Finally, when the repository you are looking for is visible, you have to activate it. Ignore Codecov’s token system.
6.2.4 Scripting GitHub actions
A GitHub workflow is a succession of jobs with steps. A workflow is triggered by an event, usually every push of the project, but also at regular intervals (cron).
Typically, the workflows created here contain two jobs: the first one installs R and the necessary components and executes R scripts (which constitute its successive steps); the second one publishes the obtained files in GitHub pages.
The workflows are configured in a YAML file placed in the .github/workflows/
folder of the project.
The different parts of the script are presented below.
The full script is that of this document, accessible on GitHub112.
6.2.4.1 Trigger
The action is triggered whenever updates are pushed to GitHub:
on:
push:
branches:
- master
The branch taken into account is master (to be replaced by main if necessary).
To trigger the action periodically, use the syntax of cron (the Unix job scheduling system):
on:
schedule:
- cron: '0 22 * * 0' # every sunday at 22:00
The successive values are for minutes, hours, day (day of the month), month and day of the week (0 for Sunday to 6 for Saturday).
The *
allow to ignore a value.
The push
and schedule
entries can be used together:
on:
push:
branches:
- master
schedule:
- cron: '0 22 * * 0'
Currently, scheduling is only taken into account in the master branch.
6.2.4.2 Workflow name
The name of the workflow is free.
It will be displayed by the badge that will be added to the project’s README.md
file (see section 6.4).
name: bookdown
6.2.4.3 First job
The jobs are described in the jobs
section.
renderbook
is the name of the first job: it is free.
Here, the main action will be to produce a bookdown with the render_book()
function, hence the name.
jobs:
renderbook:
runs-on: macOS-latest
The runs-on
statement describes the operating system on which the job should run.
The possible choices are Windows, Ubuntu or MacOS113.
Continuous integration of R on GitHub usually uses MacOS which has the advantage of using compiled R packages so much simpler (some packages require libraries outside of R for their compilation) and quicker to install, while allowing the use of scripts.
6.2.4.4 First steps
The steps are described in the steps
section.
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
run: |
brew install pandoc
Each step is described by its name (free) and what it does.
The strength of GitHub Actions is to allow the use of actions written by other people and stored in a public GitHub project.
An action is a script with metadata describing its use.
Its development is accompanied by successive version numbers.
An action is called by the statement uses:
, the GitHub project that contains it and its version.
In their respective GitHub project, actions exist in their development version (@master
) and in step versions (release) accessible by their number (@v1
).
These stage versions are preferable because they are stable.
General actions are made available by GitHub in the GitHub Actions organization114. The “actions/checkout” action is used to get into the main branch of the project processed by the workflow: it is usually the first step of all workflows.
The next action is the installation of R, provided by the R infrastructure organization115.
The installation of pandoc (software external to R but necessary for R Markdown) can be done by a command executed by MacOS.
It is called by run:
and can contain several lines (hence the |
).
This script depends on the operating system: brew
is the package manager of MacOS.
To avoid system specifics, it is better to use an action:
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
It is often possible to choose between calling an action or writing the corresponding code. The choice made here is to favour actions for everything to do with the system, such as installing software, but to use scripts for R commands, such as checking packages. The aim is to control the R code precisely and limit dependencies on additional packages. The opposite strategy is developed in Wickham and Bryan (2023) which relies entirely on actions to perform R tasks.
6.2.4.5 Packages
This step uses Rscript
as its command environment, which allows it to execute R commands directly.
- name: Install packages
env:
GITHUB_PAT: ${{ secrets.GH_PAT }}
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("remotes", "bookdown", "formatR", "tinytex"))
tinytex::install_tinytex(bundle = "TinyTeX")
remotes::install_deps(dependencies = TRUE)
shell: Rscript {0}
The packages used to produce the document are listed:
-
remotes for its
install_deps()
function. - bookdown for knitting.
-
formatR for formatting code snippets (
tidy=TRUE
). - tinytex to have a LaTeX distribution.
The other packages, those used by the project, are read in the DESCRIPTION
file by the install_deps()
function.
On MacOS, packages are installed by default in binary version, but from their source code if it is more recent.
The creation of binary packages takes a few days at CRAN: this situation is not rare.
Packages containing only R code or C++ code without reference to external libraries can be installed without problems.
On the other hand, if the package requires libraries external to R or a compilation of Fortran code, the installation fails.
It would therefore be necessary to install the necessary libraries (and possibly a Fortran compiler) for all the packages on which the project depends: this solution is not realistic because it implies an inventory of all the dependencies, which may change, and a large number of time-consuming and useless installations most of the time, when the binary packages are up to date.
A better solution is to force the installation of binary packages even if the source code is newer: this is the purpose of the two R options defined before calling install.packages()
.
Finally, the secret GH_PAT
is passed to R in an environment variable, GITHUB_PAT
, necessary to install packages from their source code on GitHub.
GitHub limits the rate of anonymous access for all GitHub actions (all accounts) and may reject the request: in practice, using install_github()
in a GitHub action is only possible with this environment variable.
6.2.4.6 Knitting
The production of the document is started by an R command.
- name: Render pdf book
run: |
bookdown::render_book("index.Rmd", "bookdown::pdf_book")
shell: Rscript {0}
- name: Render gitbook
run: |
bookdown::render_book("index.Rmd", "bookdown::gitbook")
shell: Rscript {0}
The parameters declared in _output.yml
are used.
The PDF file must be produced before the GitBook format in order for its download link to be added to the menu bar of the GitBook site. On the other hand, R must be closed and reopened between the two renderings otherwise the tables are not created correctly in GitBook116. The two steps should not be combined into one.
6.2.4.7 Backup
The result of the knitting, placed in the docs
folder of the virtual machine in charge of continuous integration, must be preserved so that the next job can use it.
The last step of the production job uses the upload-artifact
action for this.
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: _book
path: docs/
The content of docs
is saved as an artifact named “_book”.
Artifacts are publicly visible on the GitHub Project Actions page.
After its last step, the virtual machine is destroyed.
6.2.4.8 Publication
Publishing the artifact to the gh-pages
branch of the project requires another job.
deploy:
runs-on: ubuntu-latest
needs: renderbook
permissions:
contents: write
steps:
- name: Download artifact
uses: actions/download-artifact@v4
with:
# Artifact name
name: _book
# Destination path
path: docs
- name: Deploy to GitHub Pages
uses: Cecilapp/GitHub-Pages-deploy@v3
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
email: ${{ secrets.EMAIL }}
build_dir: docs
jekyll: no
The job is named “deploy” (the name is free). It runs on a virtual machine on Ubuntu. It can only be launched if the “renderbook” job has succeeded. Its steps are the following:
-
Download artifact: Restore the
docs
folder; -
Deploy to GitHub Pages: copy the
docs
folder to thegh-pages
branch.
This last step uses the GitHub-Pages-deploy
action provided by the Cecilapp organization.
It uses an environment variable, GITHUB_TOKEN
, to authenticate and parameters:
- email: the email address to which the run report is sent. To avoid exposing the address publicly, it has been stored in a project secret.
- buid_dir: the directory to publish.
-
jekyll:no to create an empty file named
.nojekyll
that tells GitHub pages not to try to treat their content as a Jekyll website.
GITHUB_TOKENis an authentication token provided by Github Actions for the execution of this script. Its rights are assigned in the script by the
permission` entry: here, the right to write content to the project.
6.2.5 Confidential data in a public repository
If the project contains confidential data (section 3.6), GitHub Actions must use the project’s private key to extract it from their vault.
The private key must be stored in a project secret, named “RSA”. The next step, to be inserted before the knitting step, writes the key to a file for the project code to access.
- name: Private key
run: |
cat("${{ secrets.rsa }}", file="<RepoID>.rsa"
shell: Rscript {0}
6.3 Script templates
Script templates for all types of projects are presented here.
The gh-pages
branch is created automatically by the scripts.
Check after the first execution that GitHub pages are enabled on this branch (section 3.7).
Then delete the docs
folder if it existed, push the modification on GitHub and finally add the following line to the .gitignore
file to be able to knit the projects locally without disturbing GitHub:
docs/
6.3.1 memoiR
The build_ghworkflow()
function of the memoiR package automatically creates the scripts needed to produce the package’s templates.
The script is always named memoir.yml
.
These scripts do not need a DESCRIPTION
file for the installation of the dependencies but each document must contain in its parameter code (Options
) the list of all packages needed for its knitting (stored in the Packages
variable).
They all require the same preparation: the GH_PAT
and EMAIL
secrets must be registered in the GitHub project (section 6.2.2).
6.3.1.1 Book or thesis
The workflow is called rmarkdown
; its production job is render
.
on:
push:
branches:
- master
name: rmarkdown
jobs:
render:
runs-on: macOS-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install dependencies
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("distill", "downlit", "memoiR", "rmdformats", "tinytex"))
tinytex::install_tinytex(bundle = "TinyTeX")
shell: Rscript {0}
- name: Render pdf book
run: |
bookdown::render_book("index.Rmd", "bookdown::pdf_book")
shell: Rscript {0}
- name: Render gitbook
run: |
bookdown::render_book("index.Rmd", "bookdown::gitbook")
shell: Rscript {0}
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: ghpages
path: docs
deploy:
runs-on: ubuntu-latest
needs: render
permissions:
contents: write
steps:
- name: Download artifact
uses: actions/download-artifact@v4
with:
name: ghpages
path: docs
- name: Deploy to GitHub Pages
uses: Cecilapp/GitHub-Pages-deploy@v3
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
email: ${{ secrets.EMAIL }}
build_dir: docs
jekyll: no
6.3.1.2 Articles or slideshows
The workflow is called rmarkdown
; its production job is render
.
on:
push:
branches:
- master
name: rmarkdown
jobs:
render:
runs-on: macOS-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install dependencies
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("memoiR", "rmdformats", "tinytex"))
tinytex::install_tinytex()
shell: Rscript {0}
- name: Render Rmarkdown files
run: |
RMD_PATH=($(ls | grep "[.]Rmd$"))
Rscript -e 'for (file in commandArgs(TRUE)) |>
rmarkdown::render(file, "all")' ${RMD_PATH[*]}
Rscript -e 'memoiR::build_githubpages(bundle = "TinyTeX")'
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: ghpages
path: docs
deploy:
runs-on: ubuntu-latest
needs: render
permissions:
contents: write
steps:
- name: Download artifact
uses: actions/download-artifact@v4
with:
name: ghpages
path: docs
- name: Deploy to GitHub Pages
uses: Cecilapp/GitHub-Pages-deploy@v3
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
email: ${{ secrets.EMAIL }}
build_dir: docs
jekyll: yes
The knitting stage uses a script to list all the .Rmd
files, process them (all the output formats listed in their yaml header are produced).
The build_githubpages()
function (see section 4.3.2) places the results in docs
.
The deployment job tells GitHub pages to use Jekyll, i.e. to use the README.md
file as the home page.
6.3.1.3 Localization
If the knitting step needs to change the language used by R, for example to correctly display the production date of documents, it can be changed like this:
- name: Render Rmarkdown files
run: |
Sys.setlocale("LC_TIME", "fr_FR")
lapply(list.files(pattern="*.Rmd"),
function(file) rmarkdown::render(file, "all"))
memoiR::build_githubpages()
shell: Rscript {0}
The selection of files is done by an R script, which includes a localization command, here in French.
This step can be completed by selecting a GitHub Pages theme so that the home page contains a link to the code:
run: |
echo 'theme: jekyll-theme-slate' > docs/_config.yml
The theme here is “Slate”, one of the choices offered by the GitHub pages.
6.3.1.4 Debugging
It can happen that the compilation of the .tex
file to produce the PDF file fails, although knitting in HTML does not generate an error.
The LaTeX compiler is indeed more demanding than pandoc (which produces the HTML file).
The first check consists in knitting the problematic document into PDF locally, on your workstation, with tinytex. The LaTeX packages must be updated to be the same as those used by the GitHub actions: to do this, run:
tinytex::tlmgr_update()
If the compilation works locally but not on Github, you have to inspect the .log
file which records all the events generated by the compiler, but this file is not kept after the failure of GitHub Actions.
So you have to modify the script to copy the file in docs
and then save the result despite the error.
The “Upload artifact” step is modified to run despite the failure of the previous step by adding the line if:
:
- name: Upload artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: _book
path: docs/
A step is added before “Upload artifact” to copy the result of the knitting and the .log
file:
- name: Move files to docs
if: always()
run: |
Rscript -e 'memoiR::build_githubpages()'
cp *.log docs
After the action fails, the saved artifact can be downloaded from GitHub: it is on the action summary page.
It is a compressed file that contains the whole docs
folder.
6.3.2 Blogdown website
The file called blogdown.yml
is very similar.
The name of the workflow is blogdown
and the name of the production job is buildsite
.
on:
push:
branches:
- master
schedule:
- cron: '0 22 * * 0'
name: blogdown
jobs:
buildsite:
runs-on: macOS-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install packages
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("remotes", "blogdown", "formatR"))
remotes::install_deps(dependencies = TRUE)
shell: Rscript {0}
- name: Build website
run: |
blogdown::install_hugo(force = TRUE)
blogdown::build_site(local = TRUE, build_rmd = TRUE)
shell: Rscript {0}
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: _website
path: public/
deploy:
runs-on: ubuntu-latest
needs: buildsite
permissions:
contents: write
steps:
- name: Download artifact
uses: actions/download-artifact@v4
with:
# Artifact name
name: _website
# Destination path
path: public
- name: Deploy to GitHub Pages
uses: Cecilapp/GitHub-Pages-deploy@v3
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
build_dir: public
email: ${{ secrets.EMAIL }}
jekyll: no
The Build website
step uses the blogdown package to install Hugo (the website generator) and then build the website.
If the website uses online data that warrants periodic updating, GitHub Actions can be run daily, weekly or monthly in addition to the rebuilds triggered by a repository change (see section 6.2.4.1). Here, the site is rebuilt every Sunday at 10pm.
Example: the page that displays the bibliometrics of the author’s website117 queries Google Scholar to display the citations of the publications. The site is updated weekly to keep the statistics current.
6.3.3 R Packages
An optimal script for checking a package is as follows:
on:
push:
branches:
- master
name: R-CMD-check
jobs:
R-CMD-check:
runs-on: macOS-latest
steps:
- name: Pull the repository
uses: actions/checkout@v4
- name: Install R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install R packages
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("remotes", "roxygen2", "rcmdcheck", "covr", "pkgdown"))
remotes::install_deps(dependencies = TRUE)
shell: Rscript {0}
- name: Update the documentation
run: roxygen2::roxygenize()
shell: Rscript {0}
- name: Commit and push the repository
uses: EndBug/add-and-commit@v9
- name: Check the package
run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "warning")
shell: Rscript {0}
- name: Test coverage
run: covr::codecov(type="all")
shell: Rscript {0}
- name: Install the package
run: R CMD INSTALL .
- name: Pkgdown
run: |
git config --local user.email "actions@github.com"
git config --local user.name "GitHub Actions"
Rscript -e 'pkgdown::deploy_to_branch(new_process = FALSE)'
The file is named check.yml
.
It contains only one job, named R-CMD-check
, as the workflow.
The script does not use Renv to handle packages because package checking must work with the current versions on CRAN.
Remotes installs the necessary packages from the DESCRIPTION
file.
The Roxygenize
step updates the package documentation.
The updated files are pushed into the main project branch by the add-and-commit
action.
These two steps ensure that the package is in a consistent state, even if the author failed to execute the roxygenize()
function before pushing his code to GitHub.
To avoid triggering a loop to check code pushed in this way, the access token used must be that of the current script, created by GitHub each time it is run.
By default, this token does not have the right to modify the repository.
You therefore need to give it this right: on GitHub, display the project parameters and select ‘Actions’, ‘General’.
In the “Workflow permissions” section, select “Read and write permissions”.
The Check
step checks the package.
Warnings are treated as errors.
The Test coverage
step uses the covr package to measure the coverage rate and uploads the results to the Codecov site.
Finally, the last two steps install the package and then use pkgdown to create the documentation site for the package and push it into the gh-pages
branch of the project.
This script contains only one job: the deployment of the documentation site is directly executed by pkgdown.
Its success is displayed by a badge in the README.md
file (see section 6.4)
More complex scripts are proposed by R-lib118, in particular to run the tests on several operating systems and several versions of R. These advanced tests are to be performed before submitting to CRAN (section 5.11) but consume too much resource for systematic use.
6.3.4 Pull requests
Pull requests can be tested by very similar scripts to check that they do not generate errors before merging them.
One effective method is to create a new script in the .github/workflows/
folder, starting from a copy of the existing script.
The new script will be named pr.yml
.
The trigger must be changed: pull_request
replaces push
:
on:
pull_request:
branches:
- master
6.3.4.1 memoiR
The scripts for checking documents created by memoiR must be cut after the Render gitbook
step: the artefact must not be saved and the deployment task must be deleted.
The script is as follows:
on:
pull_request:
branches:
- master
name: rmarkdown
jobs:
render:
runs-on: macOS-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install dependencies
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("distill", "downlit", "memoiR", "rmdformats", "tinytex"))
tinytex::install_tinytex(bundle = "TinyTeX")
shell: Rscript {0}
- name: Render pdf book
env:
GITHUB_PAT: ${{ secrets.GH_PAT }}
run: |
bookdown::render_book("index.Rmd", "bookdown::pdf_book")
shell: Rscript {0}
- name: Render gitbook
env:
GITHUB_PAT: ${{ secrets.GH_PAT }}
run: |
bookdown::render_book("index.Rmd", "bookdown::bs4_book")
shell: Rscript {0}
# Don't upload the artifact and don't deploy
6.3.4.2 R Packages
Scripts dedicated to checking packages should not push updates to their documentation through Roxygenize2, nor should they deploy their pkgdown updates to GitHub pages. The coverage rate does not need to be measured. The script is as follows:
on:
pull_request:
branches:
- master
name: R-CMD-check
jobs:
R-CMD-check:
runs-on: macOS-latest
steps:
- name: Pull the repository
uses: actions/checkout@v4
- name: Install R
uses: r-lib/actions/setup-r@v2
- name: Install pandoc
uses: r-lib/actions/setup-pandoc@v2
- name: Install R packages
run: |
options(pkgType = "binary")
options(install.packages.check.source = "no")
install.packages(c("remotes", "roxygen2", "rcmdcheck", "covr", "pkgdown"))
remotes::install_deps(dependencies = TRUE)
shell: Rscript {0}
- name: Update the documentation
run: roxygen2::roxygenize()
shell: Rscript {0}
# Don't push
- name: Check the package
run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "warning")
shell: Rscript {0}
# Don't test coverage
- name: Install the package
run: R CMD INSTALL .
- name: Pkgdown
# Build the package site locally
run: Rscript -e 'pkgdown::build_site()'
When pull requests are submitted, the corresponding test is run and its results included in the discussion.
6.4 Add badges
The success of GitHub Actions can be seen by adding a badge to the README.md
file, right after the file title.
On the project page, choose “Actions” then select the action (in “Workflows”).
Click on the “…” button and then on “Create Status Badge”.
Paste the Markdown code:
# Project name
![bookdown](https://github.com/<GitHubID>/<RepoID>/workflows/<Workflow>/badge.svg)
The name of the workflow was declared in the name:
entry in the GitHub actions configuration file.
The coverage rate measured by Codecov can also be displayed by a badge:
[![codecov](https://codecov.io/github/<GitHubID>/
<RepoID>/branch/master/graphs/badge.svg)]
(https://codecov.io/github/<GitHubID>/<RepoID>)