Update:
Good news: it looks like rowwise() is coming back to life so you don't have to
— Hadley Wickham (@hadleywickham) January 20, 2020
I recently came across
this post on alternatives to rowwise
and it got me thinking about the situations where I tend to use rowwise()
and
how I might use alternatives. One of the first things I realized is that
nearly every instance where I use rowwise()
could be replaced by
group_by(id_column)
. For someone who always found the apply
family fairly
straightforward and intuitive - I must confess, I’ve struggled more with the
purrr::map
family. They do however have a number of advantages over the
apply
family - namely unified syntax and the ability to specify the output
type. I also really like the ability to bind data frame
rows with a column for the source (map_dfr(..., .id = 'source')
). This post
is my attempt to think through some of the ways I can use pmap
and
nested columns with map
in place of rowwise.
the process
Setup dummy data frame
I’ve set up a data frame with integer, logical and character columns because
most of what I use rowwise
for is testing for conditions, the presence of a
value or numeric manipulation (sum, min, etc.).
library(tidyverse)
set.seed(1313)
dat <- tibble(!!!c(id = list(1:10),
int_ = replicate(3, sample.int(10), simplify = F),
lgl_ = replicate(3, sample(c(T, F), 10, replace = T),
simplify = F),
chr_ = replicate(3, sample(c("A", "B", "C"), 10, replace = T),
simplify = F)
), .name_repair = 'universal') %>%
rename_all(str_remove_all, pattern = '\\.')
dat
## # A tibble: 10 x 10
## id int_1 int_2 int_3 lgl_1 lgl_2 lgl_3 chr_1 chr_2 chr_3
## <int> <int> <int> <int> <lgl> <lgl> <lgl> <chr> <chr> <chr>
## 1 1 3 6 9 FALSE TRUE TRUE B B C
## 2 2 6 10 3 FALSE TRUE TRUE A A B
## 3 3 10 2 7 FALSE FALSE TRUE B A C
## 4 4 1 4 2 FALSE TRUE TRUE C C A
## 5 5 4 1 1 FALSE FALSE TRUE A C B
## 6 6 7 7 10 FALSE FALSE FALSE B C B
## 7 7 8 8 4 TRUE TRUE FALSE C C A
## 8 8 2 3 6 FALSE TRUE FALSE C C B
## 9 9 9 5 8 TRUE FALSE TRUE C C C
## 10 10 5 9 5 FALSE FALSE FALSE C C B
use pmap
pmap
seems to be what I want when I really don’t have a grouping variable,
I want to do something rowwise even if I have multiple observations (lines)
per id and/or I really want a vector output.
pmap
takes the column names as
the names of the arguments.
dat %>%
select(starts_with('lgl')) %>%
pmap_lgl(all)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
In the above example for the first row pmap would correspond to:
all(lgl_1 = TRUE, lgl_2 = TRUE, lgl_3 = FALSE)
and thus it returns
FALSE
. A consequence of this is that if you have a function that has named
arguments that are different, you will need to wrap it in an anonymous function.
You will also need to exploit this fact when writing an anonymous function.
# a function with named arguments that are different then our column names
some_fxn <- function(x, y, z){
paste(z, y, x, sep = ',')
}
dat %>%
select(starts_with('chr')) %>%
pmap_chr(some_fxn)
## Error in .f(chr_1 = .l[[1L]][[i]], chr_2 = .l[[2L]][[i]], chr_3 = .l[[3L]][[i]], : unused arguments (chr_1 = .l[[1]][[i]], chr_2 = .l[[2]][[i]], chr_3 = .l[[3]][[i]])
We can use an anonymous function to fix this.
dat %>%
select(starts_with('chr')) %>%
pmap_chr(function(chr_1, chr_2, chr_3){
some_fxn(chr_1, chr_2, chr_3)
})
## [1] "C,B,B" "B,A,A" "C,A,B" "A,C,C" "B,C,A" "B,C,B" "A,C,C" "B,C,C" "C,C,C"
## [10] "B,C,C"
We will also need to name anonymous functions properly.
dat %>%
select(starts_with('chr')) %>%
pmap_chr(function(chr_1, chr_2, chr_3){
paste(chr_1, chr_2, chr_3, sep = ',')
})
## [1] "B,B,C" "A,A,B" "B,A,C" "C,C,A" "A,C,B" "B,C,B" "C,C,A" "C,C,B" "C,C,C"
## [10] "C,C,B"
Ellipses can also be used with anonymous functions.
dat %>%
select(starts_with('chr')) %>%
pmap_lgl(function(...){
arguments <- list(...)
any(arguments == 'B')
})
## [1] TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE
If our output is a vector, we can use pmap
combined with bind_cols
to append the vector as new column.
dat %>%
select(starts_with('int')) %>%
pmap_int(sum) %>%
bind_cols(dat, sum = .) %>%
select(int_1:int_3, lgl_1, sum)
## # A tibble: 10 x 5
## int_1 int_2 int_3 lgl_1 sum
## <int> <int> <int> <lgl> <int>
## 1 3 6 9 FALSE 18
## 2 6 10 3 FALSE 19
## 3 10 2 7 FALSE 19
## 4 1 4 2 FALSE 7
## 5 4 1 1 FALSE 6
## 6 7 7 10 FALSE 24
## 7 8 8 4 TRUE 20
## 8 2 3 6 FALSE 11
## 9 9 5 8 TRUE 22
## 10 5 9 5 FALSE 19
dat %>%
select(starts_with('lgl')) %>%
pmap_lgl(all) %>%
bind_cols(dat, all_true = .) %>%
select(int_1:int_2, lgl_1:lgl_3, all_true)
## # A tibble: 10 x 6
## int_1 int_2 lgl_1 lgl_2 lgl_3 all_true
## <int> <int> <lgl> <lgl> <lgl> <lgl>
## 1 3 6 FALSE TRUE TRUE FALSE
## 2 6 10 FALSE TRUE TRUE FALSE
## 3 10 2 FALSE FALSE TRUE FALSE
## 4 1 4 FALSE TRUE TRUE FALSE
## 5 4 1 FALSE FALSE TRUE FALSE
## 6 7 7 FALSE FALSE FALSE FALSE
## 7 8 8 TRUE TRUE FALSE FALSE
## 8 2 3 FALSE TRUE FALSE FALSE
## 9 9 5 TRUE FALSE TRUE FALSE
## 10 5 9 FALSE FALSE FALSE FALSE
dat %>%
select(starts_with('lgl')) %>%
pmap_lgl(any)
## [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
use nest and rowwise
The below examples group_by(id) and use map
. This allows us to take advantage
of the tidyselect::select_helpers
to group our variables.
dat %>%
group_by(id) %>%
nest(int_vars = starts_with('int'),
lgl_vars = contains('lgl'),
chr_vars = c(chr_1, chr_2, chr_3)) %>%
mutate(sum = map_int(int_vars, sum),
all_true = map_lgl(lgl_vars, pmap_lgl, all),
any_b = map_lgl(chr_vars, function(x) {
any(map_lgl(x, ~. == 'B'))
}),
any_c = map_lgl(chr_vars, pmap_lgl, ~any(. == 'C')),
any_a = map_lgl(chr_vars, function(x) any(unlist(x) == 'A'))
) %>%
unnest(cols = c(int_vars, lgl_vars, chr_vars)) %>%
ungroup()
## # A tibble: 10 x 15
## id int_1 int_2 int_3 lgl_1 lgl_2 lgl_3 chr_1 chr_2 chr_3 sum all_true
## <int> <int> <int> <int> <lgl> <lgl> <lgl> <chr> <chr> <chr> <int> <lgl>
## 1 1 3 6 9 FALSE TRUE TRUE B B C 18 FALSE
## 2 2 6 10 3 FALSE TRUE TRUE A A B 19 FALSE
## 3 3 10 2 7 FALSE FALSE TRUE B A C 19 FALSE
## 4 4 1 4 2 FALSE TRUE TRUE C C A 7 FALSE
## 5 5 4 1 1 FALSE FALSE TRUE A C B 6 FALSE
## 6 6 7 7 10 FALSE FALSE FALSE B C B 24 FALSE
## 7 7 8 8 4 TRUE TRUE FALSE C C A 20 FALSE
## 8 8 2 3 6 FALSE TRUE FALSE C C B 11 FALSE
## 9 9 9 5 8 TRUE FALSE TRUE C C C 22 FALSE
## 10 10 5 9 5 FALSE FALSE FALSE C C B 19 FALSE
## # … with 3 more variables: any_b <lgl>, any_c <lgl>, any_a <lgl>
We can them use
unnest
to return our variables.
dat %>%
group_by(id) %>%
mutate(sum = sum(starts_with('int')))
## Error: `starts_with()` must be used within a *selecting* function.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
dat %>%
group_by(id) %>%
mutate(sum = sum(select(starts_with('int'))))
## Error: `starts_with()` must be used within a *selecting* function.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
## note as of the dev version of dplyr 0.8.99.9000 this no longer returns an
## error but it sums the column positions and thus returns 9 (2 + 3 + 4).
Finally, don’t forget to ungroup! “Artwork by ‘@allison_horst’”