Working Smarter With {targets}

Author

Miles McBain

Introduction

The objective of this workshop is to change the way you work with R. Rather than sacrificing the fluidity and immediacy of R’s REPL-based programming in the name of ‘reproducibility’, we will find there is a middle way that lets us have both. That way is the {targets} way.

We start by setting some context:

What brought you to this workshop?
Have you had any experience with {targets}?
What problems do you have that you hope {targets} can solve?
Do you anticipate any barriers to moving forward with {targets} in your workplace?

What is {targets}?

{targets} is a framemork created by Will Landau for building data science pipelines with R. It is part of a growing niche of ‘data orchestration’ tools that conceive of data processing pipelines as graph structures. What sets {targets} apart from the rest is its incredible ergonomics and extensibility facilitated by features of the R programming language.

You can view {targets} as a replacement for the classic make tool, which is a famed computation time saver in software development. Make’s most important feature is that it can detect outputs (or ‘targets’) of a software build that have not changed since the previous build, and so can re-use them. This greatly accelerates the development / test cycle by minimising compilation time.

{targets} shares this feature, but it goes far beyond this, giving the user the ability shape the computational structure of the pipeline so that it can be run optimally within the bounds of resource constraints. Importantly for data science, its features also defeat classes of bugs that affect project reproducibility.

In practice the value {targets} delivers as seen by teams is around:

Increasing the speed of iteration on data science methodology.
Inducing a structure which makes projects more comprehensible, and more easily peer-reviwed.

The {targets} R package has cleared the high bar set by the rOpenSci peer review process peer review process, and has been accepted on CRAN.

Overview of the workshop

Trying to be foundational or like a ‘gentle introduction’. The knowledge you need to get value from {targets} is surprisingly small.
We’ll spend time up front understanding the core problems {targets} solves. This will help us articulate the value to our teams.
Over the course of the workshop we’ll progressively refactor an existing R project, written in a classic style, into a modern {targets} pipeline. This will let us see the benefits accumulate, as we deploy more advanced techniques.
We may run out of time, so sections are in priority order. There should be enough instructions to work through the stuff we don’t get to as homework.

Notation

In this workshop material {targets} is used to refer to the R package, while target or targets (no braces) refers to a node in the pipeline graph. We say targets are ‘built’ to refer to executing the code associated with a target to generate its value. The value of target can be any R object. This is a notable difference to make, where targets are files.