R News

readxl 1.2.0

by Articles on Tidyverse · December 20, 2018

This article is originally published at https://www.tidyverse.org/articles/

We’re stoked to announce that readxl 1.2.0 is now available on CRAN. Learn more about readxl at http://readxl.tidyverse.org. Detailed notes are always in the change log.

The readxl package makes it easy to get tabular data out of Excel files and into R with code, not mouse clicks. It supports both the legacy .xls format and the modern XML-based .xlsx format. readxl is expressly designed to be easy to install and use on all operating systems. Therefore it has no external dependencies, such as Java or Perl, which have historically been a source of aggravation with some R packages that read Excel files.

The easiest way to install the latest version from CRAN is to install the whole tidyverse.

install.packages("tidyverse")

Alternatively, install just readxl from CRAN:

install.packages("readxl")

Regardless, you will still need to attach readxl explicitly. It is not a core tidyverse package, i.e. readxl is NOT attached via library(tidyverse). Instead, do this in your script:

library(readxl)

Column name repair

The most exciting change in the v1.2.0 release is the introduction of the new .name_repair argument to read_excel(), read_xlsx(), and read_xls(). readxl exposes the .name_repair argument that is coming soon to version 2.0.0 of the tibble package. Note: the following examples were executed with the not-yet-released version 2.0.0 of the tibble package.

First, rest assured that if your sheet has unique column names, readxl leaves them alone, as always:

read_excel(readxl_example("datasets.xlsx"), sheet = "iris", n_max = 3)
#> # A tibble: 3 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa

However, spreadsheet column names frequently leave much to be desired. .name_repair is a more flexible alternative to passing a specific vector of col_names. You can express what you want in two main ways:

Levels of name repair:
- "minimal": use "" for any missing names
- "unique": names are made unique readxl’s default
- "universal": names are made "unique" and syntactic
Name repair strategy, as a function that takes (bad) names in and returns (good) names:
- Function defined in base R, by another package, or by you
- Anonymous function, specified using a purrr-style ~ formula

Here are two examples of specifying a name repair strategy.

## pass custom function to implement "lower_snake_case"
my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
read_excel(
  readxl_example("datasets.xlsx"), n_max = 3,
  .name_repair = my_custom_name_repair
)
#> # A tibble: 3 x 5
#>   sepal_length sepal_width petal_length petal_width species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa

## use purrr-style formula to truncate names at 3 characters
read_excel(
  readxl_example("datasets.xlsx"), sheet = "chickwts", n_max = 3,
  .name_repair = ~ substr(.x, start = 1, stop = 3)
)
#> # A tibble: 3 x 2
#>     wei fee      
#>   <dbl> <chr>    
#> 1   179 horsebean
#> 2   160 horsebean
#> 3   136 horsebean

Read more in readxl’s new Column Names article.

Hello, is any one there?

readxl now displays a progress spinner in interactive sessions if it looks like the operation might take several seconds or more. This should provide some measure of reassurance when reading large sheets. This was accompanied by a rationalization of when readxl checks for user interrupts.

Updated libxls

The last user-visible change is that all known .xls regressions have been fixed. The previous version, readxl v1.1.0, included some big updates in the embedded libxls library, which were overwhelmingly positive. But there were a few reports of .xls files that went from “readable” to “unreadable”. To the best of my knowledge, those regressions have now all been addressed upstream and in readxl v1.2.0.

Acknowledgements

Special thanks to Evan Miller for his recent work on libxls.

Thank you to the 47 contributors who made this release possible: @2005m, @ajdamico, @alfredojavier5, @antuki, @apreshill, @awwsmm, @batpigandme, @berkorbay, @billdenney, @breichholf, @brianwdavis, @chrowe, @ddheart, @doctsh, @dominicshore, @Gillis, @gorkang, @gregdutkowski, @gregleleu, @hidekoji, @hlenka, @hroptatyr, @j6t, @jamesdalg, @jameshunterbr, @jennybc, @jeroen, @jimhester, @KS2907, @KyleHaynes, @llrs, @ltierney, @LTLA, @mcSamuelDataSci, @mdekstrand, @msgoussi, @N1h1l1sT, @pm321, @ptoche, @randallhelms, @rnuske, @roualdes, @rrohwer, @SebVen, @siemersn, @VincentGuyader, and @yurasmol.

Thanks for visiting r-craft.org
This article is originally published at https://www.tidyverse.org/articles/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

readxl 1.2.0

You may also like...

Categories

readxl 1.2.0

Column name repair

Hello, is any one there?

Updated libxls

Acknowledgements

You may also like...

pkgdown 1.0.0

Tips for Ellipse Summary Plot

Frank Harrell – Controversies in Predictive Modeling and Machine Learning

Categories