Update:

I recently came across this post on alternatives to rowwise and it got me thinking about the situations where I tend to use rowwise() and how I might use alternatives. One of the first things I realized is that nearly every instance where I use rowwise() could be replaced by group_by(id_column). For someone who always found the apply family fairly straightforward and intuitive - I must confess, I’ve struggled more with the purrr::map family. They do however have a number of advantages over the apply family - namely unified syntax and the ability to specify the output type. I also really like the ability to bind data frame rows with a column for the source (map_dfr(..., .id = 'source')). This post is my attempt to think through some of the ways I can use pmap and nested columns with map in place of rowwise.

the process

Setup dummy data frame
I’ve set up a data frame with integer, logical and character columns because most of what I use rowwise for is testing for conditions, the presence of a value or numeric manipulation (sum, min, etc.).

library(tidyverse)
set.seed(1313)
dat <- tibble(!!!c(id = list(1:10), 
  int_ = replicate(3, sample.int(10), simplify = F), 
  lgl_ = replicate(3, sample(c(T, F), 10, replace = T), 
                   simplify = F), 
  chr_ = replicate(3, sample(c("A", "B", "C"), 10, replace = T), 
                   simplify = F)
), .name_repair = 'universal') %>% 
  rename_all(str_remove_all, pattern = '\\.')
dat
## # A tibble: 10 x 10
##       id int_1 int_2 int_3 lgl_1 lgl_2 lgl_3 chr_1 chr_2 chr_3
##    <int> <int> <int> <int> <lgl> <lgl> <lgl> <chr> <chr> <chr>
##  1     1     3     6     9 FALSE TRUE  TRUE  B     B     C    
##  2     2     6    10     3 FALSE TRUE  TRUE  A     A     B    
##  3     3    10     2     7 FALSE FALSE TRUE  B     A     C    
##  4     4     1     4     2 FALSE TRUE  TRUE  C     C     A    
##  5     5     4     1     1 FALSE FALSE TRUE  A     C     B    
##  6     6     7     7    10 FALSE FALSE FALSE B     C     B    
##  7     7     8     8     4 TRUE  TRUE  FALSE C     C     A    
##  8     8     2     3     6 FALSE TRUE  FALSE C     C     B    
##  9     9     9     5     8 TRUE  FALSE TRUE  C     C     C    
## 10    10     5     9     5 FALSE FALSE FALSE C     C     B

use pmap
pmap seems to be what I want when I really don’t have a grouping variable, I want to do something rowwise even if I have multiple observations (lines) per id and/or I really want a vector output. pmap takes the column names as the names of the arguments.

dat %>% 
  select(starts_with('lgl')) %>% 
  pmap_lgl(all)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

In the above example for the first row pmap would correspond to:
all(lgl_1 = TRUE, lgl_2 = TRUE, lgl_3 = FALSE) and thus it returns FALSE. A consequence of this is that if you have a function that has named arguments that are different, you will need to wrap it in an anonymous function. You will also need to exploit this fact when writing an anonymous function.

# a function with named arguments that are different then our column names
some_fxn <- function(x, y, z){
  paste(z, y, x, sep = ',')
}
dat %>% 
  select(starts_with('chr')) %>% 
  pmap_chr(some_fxn)
## Error in .f(chr_1 = .l[[1L]][[i]], chr_2 = .l[[2L]][[i]], chr_3 = .l[[3L]][[i]], : unused arguments (chr_1 = .l[[1]][[i]], chr_2 = .l[[2]][[i]], chr_3 = .l[[3]][[i]])

We can use an anonymous function to fix this.

dat %>% 
  select(starts_with('chr')) %>% 
  pmap_chr(function(chr_1, chr_2, chr_3){
    some_fxn(chr_1, chr_2, chr_3)
    })
##  [1] "C,B,B" "B,A,A" "C,A,B" "A,C,C" "B,C,A" "B,C,B" "A,C,C" "B,C,C" "C,C,C"
## [10] "B,C,C"

We will also need to name anonymous functions properly.

dat %>% 
  select(starts_with('chr')) %>% 
  pmap_chr(function(chr_1, chr_2, chr_3){
    paste(chr_1, chr_2, chr_3, sep = ',')
    })
##  [1] "B,B,C" "A,A,B" "B,A,C" "C,C,A" "A,C,B" "B,C,B" "C,C,A" "C,C,B" "C,C,C"
## [10] "C,C,B"

Ellipses can also be used with anonymous functions.

dat %>% 
  select(starts_with('chr')) %>% 
  pmap_lgl(function(...){
    arguments <- list(...)
    any(arguments == 'B')
    })
##  [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

If our output is a vector, we can use pmap combined with bind_cols to append the vector as new column.

dat %>% 
  select(starts_with('int')) %>% 
  pmap_int(sum) %>% 
  bind_cols(dat, sum = .) %>% 
  select(int_1:int_3, lgl_1, sum)
## # A tibble: 10 x 5
##    int_1 int_2 int_3 lgl_1   sum
##    <int> <int> <int> <lgl> <int>
##  1     3     6     9 FALSE    18
##  2     6    10     3 FALSE    19
##  3    10     2     7 FALSE    19
##  4     1     4     2 FALSE     7
##  5     4     1     1 FALSE     6
##  6     7     7    10 FALSE    24
##  7     8     8     4 TRUE     20
##  8     2     3     6 FALSE    11
##  9     9     5     8 TRUE     22
## 10     5     9     5 FALSE    19
dat %>% 
  select(starts_with('lgl')) %>% 
  pmap_lgl(all)  %>% 
  bind_cols(dat, all_true = .) %>% 
  select(int_1:int_2, lgl_1:lgl_3, all_true)
## # A tibble: 10 x 6
##    int_1 int_2 lgl_1 lgl_2 lgl_3 all_true
##    <int> <int> <lgl> <lgl> <lgl> <lgl>   
##  1     3     6 FALSE TRUE  TRUE  FALSE   
##  2     6    10 FALSE TRUE  TRUE  FALSE   
##  3    10     2 FALSE FALSE TRUE  FALSE   
##  4     1     4 FALSE TRUE  TRUE  FALSE   
##  5     4     1 FALSE FALSE TRUE  FALSE   
##  6     7     7 FALSE FALSE FALSE FALSE   
##  7     8     8 TRUE  TRUE  FALSE FALSE   
##  8     2     3 FALSE TRUE  FALSE FALSE   
##  9     9     5 TRUE  FALSE TRUE  FALSE   
## 10     5     9 FALSE FALSE FALSE FALSE
dat %>% 
  select(starts_with('lgl')) %>% 
  pmap_lgl(any)
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE

use nest and rowwise
The below examples group_by(id) and use map. This allows us to take advantage of the tidyselect::select_helpers to group our variables.

dat %>%
  group_by(id) %>%
   nest(int_vars = starts_with('int'), 
        lgl_vars = contains('lgl'), 
        chr_vars = c(chr_1, chr_2, chr_3)) %>% 
   mutate(sum = map_int(int_vars, sum), 
          all_true = map_lgl(lgl_vars, pmap_lgl, all), 
          any_b = map_lgl(chr_vars, function(x) {
            any(map_lgl(x, ~. == 'B'))
            }), 
          any_c = map_lgl(chr_vars, pmap_lgl, ~any(. == 'C')), 
          any_a = map_lgl(chr_vars, function(x) any(unlist(x) == 'A'))
          ) %>% 
  unnest(cols = c(int_vars, lgl_vars, chr_vars)) %>% 
  ungroup()
## # A tibble: 10 x 15
##       id int_1 int_2 int_3 lgl_1 lgl_2 lgl_3 chr_1 chr_2 chr_3   sum all_true
##    <int> <int> <int> <int> <lgl> <lgl> <lgl> <chr> <chr> <chr> <int> <lgl>   
##  1     1     3     6     9 FALSE TRUE  TRUE  B     B     C        18 FALSE   
##  2     2     6    10     3 FALSE TRUE  TRUE  A     A     B        19 FALSE   
##  3     3    10     2     7 FALSE FALSE TRUE  B     A     C        19 FALSE   
##  4     4     1     4     2 FALSE TRUE  TRUE  C     C     A         7 FALSE   
##  5     5     4     1     1 FALSE FALSE TRUE  A     C     B         6 FALSE   
##  6     6     7     7    10 FALSE FALSE FALSE B     C     B        24 FALSE   
##  7     7     8     8     4 TRUE  TRUE  FALSE C     C     A        20 FALSE   
##  8     8     2     3     6 FALSE TRUE  FALSE C     C     B        11 FALSE   
##  9     9     9     5     8 TRUE  FALSE TRUE  C     C     C        22 FALSE   
## 10    10     5     9     5 FALSE FALSE FALSE C     C     B        19 FALSE   
## # … with 3 more variables: any_b <lgl>, any_c <lgl>, any_a <lgl>

We can them use unnest to return our variables.

dat %>% 
  group_by(id) %>% 
  mutate(sum = sum(starts_with('int')))
## Error: `starts_with()` must be used within a *selecting* function.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
dat %>% 
  group_by(id) %>% 
  mutate(sum = sum(select(starts_with('int'))))
## Error: `starts_with()` must be used within a *selecting* function.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
## note as of the dev version of dplyr 0.8.99.9000 this no longer returns an
## error but it sums the column positions and thus returns 9 (2 + 3 + 4). 

Finally, don’t forget to ungroup!

If
If you bring group_by() to the party, don’t forget dplyr::ungroup()
“Artwork by @allison_horst