R News

Maintenance Updates of Future Backends and doFuture

by JottR · January 7, 2019

This article is originally published at https://www.jottr.org/

New versions of the following future backends are available on CRAN:

future.callr - parallelization via callr, i.e. on the local machine
future.batchtools - parallelization via batchtools, i.e. on a compute cluster with job schedulers (SLURM, SGE, Torque/PBS, etc.) but also on the local machine
future.BatchJobs - (maintained for legacy reasons) parallelization via BatchJobs, which is the predecessor of batchtools

These releases fix a few small bugs and inconsistencies that were identified with help of the future.tests framework that is being developed with support from the R Consortium.

I also released a new version of:

doFuture - use any future backend for foreach() parallelization

which comes with a few improvements and bug fixes.

The future is now.

The future is … what?

If you never heard of the future framework before, here is a simple example. Assume that you want to run

y <- lapply(X, FUN = my_slow_function)

in parallel on your local computer. The most straightforward way to achieve this is to use:

library(future.apply)
plan(multiprocess)
y <- future_lapply(X, FUN = my_slow_function)

If you have SSH access to a few machines here and there with R installed, you can use:

library(future.apply)
plan(cluster, workers = c("localhost", "gandalf.remote.edu", "server.cloud.org"))
y <- future_lapply(X, FUN = my_slow_function)

Even better, if you have access to compute cluster with an SGE job scheduler, you could use:

library(future.apply)
plan(future.batchtools::batchtools_sge)
y <- future_lapply(X, FUN = my_slow_function)

The future is … why?

The future package provides a simple, cross-platform, and lightweight API for parallel processing in R. At its core, there are three core building blocks for doing parallel processing - future(), resolved() and value()- which are used for creating the asynchronous evaluation of an R expression, querying whether it’s done or not, and collecting the results. With these fundamental building blocks, a large variety of parallel tasks can be performed, either by using these functions directly or indirectly via more feature rich higher-level parallelization APIs such as future.apply, foreach, BiocParallel or plyr with doFuture, and furrr. In all cases, how and where future R expressions are evaluated, that is, how and where the parallelization is performed, depends solely on which future backend is currently used, which is controlled by the plan() function.

One advantage of the Future API, whether it is used directly as is or via one of the higher-level APIs, is that it encapsulates the details on how and where the code is parallelized allowing the developer to instead focus on what to parallelize. Another advantage is that the end user will have control over which future backend to use. For instance, one user may choose to run an analysis in parallel on their notebook or in the cloud, whereas another may want to run it via a job scheduler in a high-performance compute (HPC) environment.

What’s next?

I’ve spent a fair bit of time working on future.tests, which is a single framework for testing future backends. It will allow developers of future backends to validate that they fully conform to the Future API. This will lower the barrier for creating a new backend (e.g. future.clustermq on top of clustermq or one on top Redis) and it will add trust for existing ones such that end users can reliably switch between backends without having to worry about the results being different or even corrupted. So, backed by future.tests, I feel more comfortable attacking some of the feature requests - and there are quite a few of them. Indeed, I’ve already implemented one of them. More news coming soon …

Happy futuring!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Maintenance Updates of Future Backends and doFuture

You may also like...

Categories

Maintenance Updates of Future Backends and doFuture

The future is … what?

The future is … why?

What’s next?

See also

You may also like...

Why RStudio Focuses on Code-Based Data Science

Impute missing data for #TidyTuesday voyages of captive Africans with tidymodels

RStudio Connect 1.6.0 – A Year in the Making!

Categories