Debugging {targets} with new access panels

In her highly recommended talk Object of type ‘closure’ is not subsettable, Jenny Bryan discusses leaving yourself ‘access panels’, like options or arguments that turn on features that help your future self in debugging endeavours. As we shall see {targets} powerful new debugging access panels.

First we look at how problems present and then we review a spectrum of increasingly powerful debugging techniques made available by {targets}

What it looks like when things go bad

Errors

By default, when an error occurs in targets the pipeline stops. It should be pretty clear from {targets}’ output which target has thrown the error:

▶ dispatched target species_classification_model
✖ errored target species_classification_model
✖ errored pipeline [0.184 seconds]
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    I'm broken
Last error traceback:
    fit_final_species_classification_model(training_data = training(test_tra...
    stop("I'm broken")
    .handleSimpleError(function (condition)  {     state$error <- build_mess...
    h(simpleError(msg, call))

Warnings

If your problem results in warnings appearing they won’t stop the pipeline. Instead you’ll see something like:

▶ dispatched target species_classification_model
● completed target species_classification_model [3.249 seconds]
✔ skipped target species_model_validation_data
✔ skipped target base_plot_model_roc_object
✔ skipped target gg_species_class_accuracy_hexes
✔ skipped target report
▶ ended pipeline [5.439 seconds]
Warning messages:
1: I'm warning you
2: 1 targets produced warnings. Run targets::tar_meta(fields = warnings, complete_only = TRUE) for the messages.
NULL

So although we don’t immediately see which target threw the warnings, {targets} does tell us how to find that out. If we run the suggested code:

targets::tar_meta(fields = warnings, complete_only = TRUE)

we get precisely the metadata we need:

# A tibble: 1 × 2
  name                         warnings
  <chr>                        <chr>
1 species_classification_model Im warning you

If we get neither

If we just got some nonsense results we might have to work a bit harder to figure out where to start looking for the problem. The process we described for peer reviewing the pipeline in the ‘targets plan’ section is similar to how we could approach finding the logic problem efficiently.

The debugging arsenal

Call tar_load() and tinker

You’ll very quickly be able to populate all a target’s inputs in your global environment by using tar_load().

  • This is why having functions that use the same argument names as the targets they take as arguments is quite beneficial.
  • If this is not the case you might enjoy loading all the input targets and then calling debugonce, before manually running the problematic target’s expression interactively.

Use browser()

R’s classic can be brought to bear! - Just one obstacle, the targets are typically built in a separate session that we don’t have interactive access to! - We can actually run the pipeline in the current interactive R session. - Just make sure the session is pretty ‘fresh’ or you may create more problems than you solve.

By way of example:

  1. put browser() on the first line off fit_final_species_classification_model()

  2. To build the pipeline run tar_make(callr_function = NULL)

  • We’re saying “Don’t use {callr}” which is the method of creating child sessions for our pipeline execution.

End up interactively debugging the target:

✔ skipped target gg_species_distribution_hexes
✔ skipped target gg_species_distribution_months
✔ skipped target test_train_split
✔ skipped target species_classification_model_training_summary
▶ dispatched target species_classification_model
Called from: fit_final_species_classification_model(training_data = training(test_train_split),
    species_classification_model_training_summary)
Browse[1]>

Use the ‘debug’ option

This behaves like using browser() above, but is a bit better since you don’t have to make a change to your code that you could forget to undo!

  • Does anyone else commit browser() to repos embarrassingly frequently?

If you add to tar_option_set() in _targets.R

tar_option_set(
  seed = 2048,
  debug = "species_classification_model"
)

Then you can call tar_make() and the pipeline will pause for interactive debugging when species_classification_model is reached.

If you’d like to speed things up by skipping processing any other targets you can do:

tar_make(species_classification_model, callr_function = NULL, shortcut = TRUE)

And {targets} will immediately begin debugging this target.1

Being able to name a target to debug increases in usefulness once we understand a more advanced concept called ‘branching’.

Use the workspace option

This is my personal go-to when things just aren’t making sense. A ‘workspace’ is the set of all of a target’s inputs. Since targets should be pure functions, this should be all the state we need to investigate, reproduce, and fix bugs occurring in that target.

The first way to use workspaces is to set an option that automatically saves them on error:

tar_option_set(
  seed = 2048,
  workspace_on_error = TRUE
)

When an error occurs we will get a slightly different output:

✔ skipped target test_train_split
✔ skipped target species_classification_model_training_summary
▶ dispatched target species_classification_model
▶ recorded workspace species_classification_model
✖ errored target species_classification_model
✖ errored pipeline [0.215 seconds]

If we call tar_workspace(species_classification_model), all of the dependencies of species_classification_model will be loaded into the global environment. These are:

  • test_train_split
  • species_classification_model_training_summary

But isn’t this just the same as calling tar_load?

  • Hopefully / Mostly yes!
  • But occasionally through contrived circumstances you may not be tar_loading what you think you are. In this case there’s no way for that mistake to happen.
  • There are also circumstances where you might not know the names of a specific target’s inputs, and so cannot tar_load them at all.
    • More on this when we talk about ‘branching’

There’s also another way to use workspaces, when you might not be getting an error, but you want record a workspace to check on suspicious behaviour. We can instead do:

tar_option_set(
  seed = 2048,
  workspaces = c("species_classification_model", "occurrences_weather_hexes")
)

And workspaces for these targets will be recorded, whether they error or not.

In practice

In my personal experience > 90% of targets bugs can be quickly dispatched by the ‘Call tar_load() and tinker’ approach.

If that fails I reach straight for workspaces. When I am using this mysterious ‘branching’ thing I keep referring to I’ll rely on workspaces more frequently.

So if you take one thing from this section it should be:

  • There’s this ‘workspaces’ concept that will probably help if you’re having a hard time debugging something.

Footnotes

  1. It may be tempting to use shortcut more frequently to speed things up, but using shortcut is equivalent to running a numbered pipeline stage script without running the prior scripts in the ‘classic R project’ we started with. Do it too often and you’ll have reproducibility debt that needs to be paid down in bulk.↩︎