Underrated Tidyverse Functions
The Assignment
I’m teaching an R Programming course next term. Jessica Minnier and I are developing the Ready for R Materials into a longer and more involved course.
I think one of the most important things is to teach people how to self-learn. As learning to program is a lifelong learning activity, it’s critically important to give them these meta-learning skills. So that’s the motivation behind the Tidyverse function of the Week assignment.
I asked on Twitter:
Hi Everyone. I'm teaching an #rstats course next quarter.
— Ted Laderas, PhD 🏳️🌈 (@tladeras) November 30, 2020
One assignment is to have each student write about a #tidyverse function. What it's for and an example.
What are some less known #tidyverse functions that do a job you find useful?
Some of my favorite suggestions
Here are some of the highlights from the thread.
I loved all of these. Danielle Quinn wins the MVP award for naming so many useful functions:
dplyr::uncount()
— Danielle Quinn (she/her) (@daniellequinn88) December 1, 2020
tidyr::complete()
tidyr::fill() / replace_na()
stringr::str_detect() / str_which()
lubridate::ymd_hms() and related functions
ggplot2::labs() - so simple, yet under appreciated!
fill()
was highly suggested:
tidyr::fill() - extremely useful when creating a usable dataset out of a spreadsheet originally built for data entry, in which redundant informations are only reported once at the beginning of the group they refer to, rather than in every row as needed for the analysis.
— Luca Foppoli (@foppoli_luca) December 1, 2020
Many people suggested the window functions, including lead()
and lag()
and the cumulative functions:
Check out the dplyr window functions, cummin, cummax, cumany and cumall. They don't seen useful at first but they can solve really tricky aggregation problems. https://t.co/aDpXqSB2Vx
— Robert Kubinec (@rmkubinec) December 1, 2020
Alison Hill suggested problems()
, which helps you diagnose why your data isn’t loading:
Ooh problems is a good function for importing rx https://t.co/P4ZR57PgOG
— Alison Presmanes Hill (@apreshill) December 1, 2020
I think that deframe()
and enframe()
are really exciting, since I do this operation all the time:
tibble::deframe(), tibble::deframe()
— E. David Aja (@PeeltothePithy) December 1, 2020
coercing a two-column df to named vector, which I prefer immensely to names(df) <- vec_of_names
unite()
, separate()
and separate_rows()
also had their own contingent:
I find myself using tidyr::unite() a lot to clean messy data - particularly useful for making unique and informative ID's for each row. coalesce() and fill() are also little known gems! :)
— Guy Sutton🐝🌾🇿🇦🇿🇼 (@Guy_F_Sutton) December 1, 2020
Wow! Let’s Grab All the Tweets and Replies
I was bowled over by all of the replies. This was an unexpectedly really fun thread, and lots of recommendations from others.
I thought I would try and summarize everyone’s suggestions and compile a list of recommended functions. I used this script with some modifications to pull all the replies to my tweet. In particular, I had to request for extended
tweet mode, and I extracted a few more fields from the returned JSON.
This wrote the tweet information into a CSV file.
Then I started parsing the data. I wrote a couple of functions, remove_users_from_text()
, which removes the users from a tweet (by looking for words that begin with @
) and get_funcs()
, which uses a relatively simple regular expression to try to return the function (it looks for paired parentheses ()
or an underscore “-” to extract the functions). It actually works pretty well, and grabs most of the functions.
Then I use separate_rows()
to split the multiple functions into their separate rows. This makes it easier to tally all the functions.
remove_users_from_text <- function(col){
str_replace_all(col, "\\@\\w*", "")
}
get_funcs <- function(col){
out <- str_extract_all(col, "\\w*\\(\\)|\\w*_\\w*")
paste(out[[1]], collapse=", ")
}
parsed_tweets <- tweets %>%
rowwise() %>%
mutate(text = remove_users_from_text(text)) %>%
mutate(funcs = get_funcs(text)) %>%
ungroup() %>%
separate_rows(funcs, sep=", ") %>%
select(date, user, funcs, text, reply, parent_thread) %>%
distinct()
write_csv(parsed_tweets, file = "cleaned_tweets_incomplete.csv")
knitr::kable(parsed_tweets[1:10,-c(5:6)])
date | user | funcs | text |
---|---|---|---|
02/12/2020 16:12:48 | NathanKhadaroo | expand_grid() | tidyr::expand_grid() is really useful for creating new datasets to see how fitted models perform on new data! |
02/12/2020 06:43:45 | sleepydatum | anti_join() | dplyr::anti_join() is my personal favorite. |
02/12/2020 01:19:24 | dragonflystats | out of curiosity - who are the students? CS? Health Science? | |
02/12/2020 01:22:25 | tladeras | Biostatistics students. | |
01/12/2020 19:15:14 | eulerdiditfirst | Writing your own tidy verse functions from chaining tidy verse functions using {{}} . Seriously feels like a super power sometimes | |
01/12/2020 18:34:13 | pedro_tfonseca | dplyr::near is one of my favorite | |
01/12/2020 18:28:52 | daniellequinn88 | uncount() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
01/12/2020 18:28:52 | daniellequinn88 | complete() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
01/12/2020 18:28:52 | daniellequinn88 | fill() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
01/12/2020 18:28:52 | daniellequinn88 | replace_na() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
At this point, I realized that I just needed to hand annotate the rest of the tweets, rather than wasting my time trying to parse the rest of the cases. So I pulled everything into Excel and just annotated the ones which I couldn’t pull from.
Functions by frequency
Here are the function suggestions by frequency. Unsurprisingly, case_when()
(which I cover in the main course), has the most number of suggestions, because it’s so useful. tidyr::pivot_wider()
and tidyr::pivot_longer()
are also covered in the course.
There are some others which were new to me, and a bit of a surprise, such as coalesce()
, fill()
.
cleaned_tweets <- read_csv("cleaned_tweets.csv") %>% select(-parent_thread) %>%
mutate(user = paste0("[",user,"](",reply,")")) %>%
select(-reply)
##
## -- Column specification --------------------------------------------------------
## cols(
## date = col_character(),
## user = col_character(),
## funcs = col_character(),
## text = col_character(),
## reply = col_character(),
## parent_thread = col_character()
## )
functions_by_freq <- cleaned_tweets %>%
janitor::tabyl(funcs) %>%
filter(!is.na(funcs)) %>%
arrange(desc(n))
write_csv(functions_by_freq, "functions_by_frequency.csv")
functions_by_freq %>%
knitr::kable()
funcs | n | percent | valid_percent |
---|---|---|---|
case_when() | 16 | 0.0601504 | 0.0720721 |
pivot_longer() | 7 | 0.0263158 | 0.0315315 |
pivot_wider() | 6 | 0.0225564 | 0.0270270 |
coalesce() | 5 | 0.0187970 | 0.0225225 |
fill() | 5 | 0.0187970 | 0.0225225 |
across() | 4 | 0.0150376 | 0.0180180 |
lag() | 4 | 0.0150376 | 0.0180180 |
separate() | 4 | 0.0150376 | 0.0180180 |
separate_rows() | 4 | 0.0150376 | 0.0180180 |
str_detect() | 4 | 0.0150376 | 0.0180180 |
uncount() | 4 | 0.0150376 | 0.0180180 |
anti_join() | 3 | 0.0112782 | 0.0135135 |
complete() | 3 | 0.0112782 | 0.0135135 |
fct_reorder() | 3 | 0.0112782 | 0.0135135 |
lead() | 3 | 0.0112782 | 0.0135135 |
map() | 3 | 0.0112782 | 0.0135135 |
recode() | 3 | 0.0112782 | 0.0135135 |
replace_na() | 3 | 0.0112782 | 0.0135135 |
slice() | 3 | 0.0112782 | 0.0135135 |
str_wrap() | 3 | 0.0112782 | 0.0135135 |
{forcats} | 2 | 0.0075188 | 0.0090090 |
{tidyeval} | 2 | 0.0075188 | 0.0090090 |
add_count() | 2 | 0.0075188 | 0.0090090 |
between() | 2 | 0.0075188 | 0.0090090 |
breaks_pretty() | 2 | 0.0075188 | 0.0090090 |
distinct() | 2 | 0.0075188 | 0.0090090 |
enframe() | 2 | 0.0075188 | 0.0090090 |
fct_infreq() | 2 | 0.0075188 | 0.0090090 |
floor_date() | 2 | 0.0075188 | 0.0090090 |
gather() | 2 | 0.0075188 | 0.0090090 |
group_indices() | 2 | 0.0075188 | 0.0090090 |
group_map() | 2 | 0.0075188 | 0.0090090 |
left_join() | 2 | 0.0075188 | 0.0090090 |
mutate() | 2 | 0.0075188 | 0.0090090 |
n_distinct() | 2 | 0.0075188 | 0.0090090 |
nest() | 2 | 0.0075188 | 0.0090090 |
partial() | 2 | 0.0075188 | 0.0090090 |
pluck() | 2 | 0.0075188 | 0.0090090 |
pull() | 2 | 0.0075188 | 0.0090090 |
safely() | 2 | 0.0075188 | 0.0090090 |
tabyl() | 2 | 0.0075188 | 0.0090090 |
unite() | 2 | 0.0075188 | 0.0090090 |
unnest() | 2 | 0.0075188 | 0.0090090 |
walk() | 2 | 0.0075188 | 0.0090090 |
*_join() | 1 | 0.0037594 | 0.0045045 |
{janitor} | 1 | 0.0037594 | 0.0045045 |
{readr} | 1 | 0.0037594 | 0.0045045 |
{tsibble} | 1 | 0.0037594 | 0.0045045 |
add_predictions() | 1 | 0.0037594 | 0.0045045 |
arrange() | 1 | 0.0037594 | 0.0045045 |
as_mapper() | 1 | 0.0037594 | 0.0045045 |
ceiling_date() | 1 | 0.0037594 | 0.0045045 |
count() | 1 | 0.0037594 | 0.0045045 |
crossing() | 1 | 0.0037594 | 0.0045045 |
cut_interval() | 1 | 0.0037594 | 0.0045045 |
cut_number () | 1 | 0.0037594 | 0.0045045 |
cut_width() | 1 | 0.0037594 | 0.0045045 |
deframe() | 1 | 0.0037594 | 0.0045045 |
dense_rank() | 1 | 0.0037594 | 0.0045045 |
dplyr::first() | 1 | 0.0037594 | 0.0045045 |
dplyr::last() | 1 | 0.0037594 | 0.0045045 |
drop_na() | 1 | 0.0037594 | 0.0045045 |
every() | 1 | 0.0037594 | 0.0045045 |
expand_grid() | 1 | 0.0037594 | 0.0045045 |
fct_explicit_na() | 1 | 0.0037594 | 0.0045045 |
fct_inorder() | 1 | 0.0037594 | 0.0045045 |
fct_relevel() | 1 | 0.0037594 | 0.0045045 |
filter() | 1 | 0.0037594 | 0.0045045 |
first() | 1 | 0.0037594 | 0.0045045 |
force_tz() | 1 | 0.0037594 | 0.0045045 |
geom_count() | 1 | 0.0037594 | 0.0045045 |
glimpse() | 1 | 0.0037594 | 0.0045045 |
grepl() | 1 | 0.0037594 | 0.0045045 |
group_*() | 1 | 0.0037594 | 0.0045045 |
group_by() | 1 | 0.0037594 | 0.0045045 |
group_walk() | 1 | 0.0037594 | 0.0045045 |
hoist() | 1 | 0.0037594 | 0.0045045 |
if() | 1 | 0.0037594 | 0.0045045 |
if_else() | 1 | 0.0037594 | 0.0045045 |
janitor::clean_names() | 1 | 0.0037594 | 0.0045045 |
keep_all() | 1 | 0.0037594 | 0.0045045 |
labs() | 1 | 0.0037594 | 0.0045045 |
last() | 1 | 0.0037594 | 0.0045045 |
left_join | 1 | 0.0037594 | 0.0045045 |
make_valid() | 1 | 0.0037594 | 0.0045045 |
map_*() | 1 | 0.0037594 | 0.0045045 |
map_dfr() | 1 | 0.0037594 | 0.0045045 |
mutate_at() | 1 | 0.0037594 | 0.0045045 |
mutate_if() | 1 | 0.0037594 | 0.0045045 |
n() | 1 | 0.0037594 | 0.0045045 |
n_tile() | 1 | 0.0037594 | 0.0045045 |
na_if() | 1 | 0.0037594 | 0.0045045 |
near() | 1 | 0.0037594 | 0.0045045 |
nest_by() | 1 | 0.0037594 | 0.0045045 |
none() | 1 | 0.0037594 | 0.0045045 |
nth() | 1 | 0.0037594 | 0.0045045 |
ntile() | 1 | 0.0037594 | 0.0045045 |
parse_*() | 1 | 0.0037594 | 0.0045045 |
parse_date_time() | 1 | 0.0037594 | 0.0045045 |
paste() | 1 | 0.0037594 | 0.0045045 |
possibly() | 1 | 0.0037594 | 0.0045045 |
problems() | 1 | 0.0037594 | 0.0045045 |
read_csv() | 1 | 0.0037594 | 0.0045045 |
read_delim() | 1 | 0.0037594 | 0.0045045 |
reduce() | 1 | 0.0037594 | 0.0045045 |
relocate() | 1 | 0.0037594 | 0.0045045 |
select() | 1 | 0.0037594 | 0.0045045 |
skim() | 1 | 0.0037594 | 0.0045045 |
slice_max() | 1 | 0.0037594 | 0.0045045 |
slice_min() | 1 | 0.0037594 | 0.0045045 |
some() | 1 | 0.0037594 | 0.0045045 |
spread() | 1 | 0.0037594 | 0.0045045 |
stat_summary | 1 | 0.0037594 | 0.0045045 |
str_glue() | 1 | 0.0037594 | 0.0045045 |
str_match() | 1 | 0.0037594 | 0.0045045 |
str_remove() | 1 | 0.0037594 | 0.0045045 |
str_trim() | 1 | 0.0037594 | 0.0045045 |
str_which() | 1 | 0.0037594 | 0.0045045 |
string_extract() | 1 | 0.0037594 | 0.0045045 |
summarise() | 1 | 0.0037594 | 0.0045045 |
tidy() | 1 | 0.0037594 | 0.0045045 |
View() | 1 | 0.0037594 | 0.0045045 |
with_groups() | 1 | 0.0037594 | 0.0045045 |
with_tz() | 1 | 0.0037594 | 0.0045045 |
write_csv() | 1 | 0.0037594 | 0.0045045 |
ymd*() | 1 | 0.0037594 | 0.0045045 |
ymd_hms() | 1 | 0.0037594 | 0.0045045 |
zap_label() | 1 | 0.0037594 | 0.0045045 |
Cleaned Tweets and Threads
Here’s all of the tweets from this thread (naysayers included). They are in somewhat order (longer threads are grouped).
Here’s a link to the cleaned CSV file
knitr::kable(cleaned_tweets)
date | user | funcs | text |
---|---|---|---|
2/12/2020 16:12 | NathanKhadaroo | expand_grid() | tidyr::expand_grid() is really useful for creating new datasets to see how fitted models perform on new data! |
2/12/2020 6:43 | sleepydatum | anti_join() | dplyr::anti_join() is my personal favorite. |
2/12/2020 1:19 | dragonflystats | NA | out of curiosity - who are the students? CS? Health Science? |
2/12/2020 1:22 | tladeras | NA | Biostatistics students. |
1/12/2020 19:15 | eulerdiditfirst | NA | Writing your own tidy verse functions from chaining tidy verse functions using {{}} . Seriously feels like a super power sometimes |
1/12/2020 18:34 | pedro_tfonseca | near() | dplyr::near is one of my favorite |
1/12/2020 18:28 | daniellequinn88 | uncount() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | complete() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | fill() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | replace_na() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | str_detect() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | str_which() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | ymd_hms() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 18:28 | daniellequinn88 | labs() | dplyr::uncount(); tidyr::complete(); tidyr::fill() / replace_na(); stringr::str_detect() / str_which(); lubridate::ymd_hms() and related functions; ggplot2::labs() - so simple, yet under appreciated! |
1/12/2020 17:52 | AmeliaMN | separate() | I don’t know if it’s less known or not, by tidyr::separate() is very useful |
1/12/2020 18:04 | tladeras | pivot_wider() | Yes! Very useful. I do think that {tidyr} in general is less known outside of pivot_wider() and pivot_longer(). |
1/12/2020 18:04 | tladeras | pivot_longer() | Yes! Very useful. I do think that {tidyr} in general is less known outside of pivot_wider() and pivot_longer(). |
1/12/2020 18:32 | ElinWaring | fct_infreq() | Forcats is also not as well known but has tons of handy functions like fct_infreq(). |
1/12/2020 17:43 | rdh_CLE | dplyr::first() | Dplyr:: first and last |
1/12/2020 17:43 | rdh_CLE | dplyr::last() | Dplyr:: first and last |
1/12/2020 16:38 | IamBugsPotter | tabyl() | I know it’s not official but several times when starting out I would start collapsing things using dplyr before remembering I could just use janitor::tabyl() |
1/12/2020 16:42 | tladeras | tabyl() | tabyl() is the best, along with clean_names(). |
1/12/2020 16:42 | tladeras | janitor::clean_names() | tabyl() is the best, along with clean_names(). |
2/12/2020 1:18 | dragonflystats | {janitor} | i love Love LOVE Janitor |
1/12/2020 16:29 | aosmith16 | n_distinct() | I try to squeeze in dplyr::n_distinct() to my basic intro. In my experience mostly useful for data checking/qaqc (i.e., becoming one with your dataset). |
1/12/2020 16:43 | tladeras | skim() | Yes! Super helpful. I currently use skimr to give students an overview, but that’s super helpful in giving single variable summaries. |
1/12/2020 15:47 | harlananelson | NA | Why would you want students who are just learning r to write about some obscure function? |
1/12/2020 19:31 | tladeras | NA | The point is for them to learn on their own and teach others. These aren’t obscure functions, they’re just lesser known ones. ; ; I can’t teach them everything, so the more I can teach the meta-learning, the better they will be off in the future. |
1/12/2020 15:37 | WireMonkey | {forcats} | I use forcats all the time. It’s especially helpful in ggplot when reordering an axis. |
1/12/2020 16:44 | tladeras | {forcats} | Definitely! {forcats} has so many useful functions. |
1/12/2020 15:29 | foppoli_luca | fill() | tidyr::fill() - extremely useful when creating a usable dataset out of a spreadsheet originally built for data entry, in which redundant informations are only reported once at the beginning of the group they refer to, rather than in every row as needed for the analysis. |
1/12/2020 15:17 | SorensenOystein | uncount() | tidyr::uncount(); tidyr::unnest(); dplyr::ntile() |
1/12/2020 15:17 | SorensenOystein | unnest() | tidyr::uncount(); tidyr::unnest(); dplyr::ntile() |
1/12/2020 15:17 | SorensenOystein | ntile() | tidyr::uncount(); tidyr::unnest(); dplyr::ntile() |
1/12/2020 14:51 | randyboyes | coalesce() | dplyr::coalesce() is so handy when you need it |
1/12/2020 14:39 | Airrock_TheRed | case_when() | I personally find case_when(), select(), slice(), and separate_rows() very helpful. |
1/12/2020 14:39 | Airrock_TheRed | select() | I personally find case_when(), select(), slice(), and separate_rows() very helpful. |
1/12/2020 14:39 | Airrock_TheRed | slice() | I personally find case_when(), select(), slice(), and separate_rows() very helpful. |
1/12/2020 14:39 | Airrock_TheRed | separate_rows() | I personally find case_when(), select(), slice(), and separate_rows() very helpful. |
1/12/2020 14:28 | TooSweetGeek | NA | unroll |
1/12/2020 14:28 | threadreaderapp | NA | Saluti, you can read it here: : Hi Everyone. I’m teaching an #rstats course next quarter. One assignment is to have each student write about… https://t.co/PJH3wqv7aO Share this if you think it’s interesting. 🤖 |
1/12/2020 14:23 | rmkubinec | NA | Check out the dplyr window functions, cummin, cummax, cumany and cumall. They don’t seen useful at first but they can solve really tricky aggregation problems. https://t.co/aDpXqSB2Vx |
1/12/2020 14:12 | InflationSquare | paste() | %\(% makes using paste() easier (among other things). ; I use %T>% View() at the end of %>% chains a lot as well.; dplyr::dense_rank() is another good one that I wouldn't have come across if I didn't know the SQL equivalent.; dplyr::group_[keys, rows, indices] are neat as well | |1/12/2020 14:12 |[InflationSquare](https://twitter.com/InflationSquare/status/1333760840995049472) |View() |%\)% makes using paste() easier (among other things). ; I use %T>% View() at the end of %>% chains a lot as well.; dplyr::dense_rank() is another good one that I wouldn’t have come across if I didn’t know the SQL equivalent.; dplyr::group_[keys, rows, indices] are neat as well |
1/12/2020 14:12 | InflationSquare | dense_rank() | %\(% makes using paste() easier (among other things). ; I use %T>% View() at the end of %>% chains a lot as well.; dplyr::dense_rank() is another good one that I wouldn't have come across if I didn't know the SQL equivalent.; dplyr::group_[keys, rows, indices] are neat as well | |1/12/2020 14:12 |[InflationSquare](https://twitter.com/InflationSquare/status/1333760840995049472) |group_*() |%\)% makes using paste() easier (among other things). ; I use %T>% View() at the end of %>% chains a lot as well.; dplyr::dense_rank() is another good one that I wouldn’t have come across if I didn’t know the SQL equivalent.; dplyr::group_[keys, rows, indices] are neat as well |
1/12/2020 14:10 | PRLPoliSci | drop_na() | A lot of great ones in the thread so far! I’d also toss in drop_na |
1/12/2020 14:09 | stateofstats | pivot_wider() | pivot_wider() and pivot_longer(), formerly spread() and gather(). Incredibly useful in converting messy data into something useable |
1/12/2020 14:09 | stateofstats | pivot_longer() | pivot_wider() and pivot_longer(), formerly spread() and gather(). Incredibly useful in converting messy data into something useable |
1/12/2020 14:09 | stateofstats | spread() | pivot_wider() and pivot_longer(), formerly spread() and gather(). Incredibly useful in converting messy data into something useable |
1/12/2020 14:09 | stateofstats | gather() | pivot_wider() and pivot_longer(), formerly spread() and gather(). Incredibly useful in converting messy data into something useable |
1/12/2020 14:04 | Smith80D | fill() | tidyr::fill() is the one that I find especially useful, for all those imported Excel files with row headings that are merged. Always assumed it existed, but didn’t know its name until a colleague introduced us. |
1/12/2020 14:03 | wzzerd | case_when() | When you don’t teach case_when, students will go years nesting ifelse like absolute chumps! Alternatively, to relabel discrete data I like to left_join with a crosswalk table so the associations are not hardcoded in the script. |
1/12/2020 14:03 | wzzerd | left_join | When you don’t teach case_when, students will go years nesting ifelse like absolute chumps! Alternatively, to relabel discrete data I like to left_join with a crosswalk table so the associations are not hardcoded in the script. |
1/12/2020 14:02 | Dorialexander | lead() | dplyr::lead and dplyr::lag Very practical especially within groups and yet they can be a bit tricky since it obviously raise NAs on first/last rows. |
1/12/2020 14:02 | Dorialexander | lag() | dplyr::lead and dplyr::lag Very practical especially within groups and yet they can be a bit tricky since it obviously raise NAs on first/last rows. |
1/12/2020 14:01 | LuisDVerde | enframe() | tibble::enframe() |
1/12/2020 13:58 | sebvanliempd | pivot_longer() | pivot_longer(); pivot_wider() |
1/12/2020 13:58 | sebvanliempd | pivot_wider() | pivot_longer(); pivot_wider() |
1/12/2020 13:57 | kjhealy | case_when() | case_when() |
1/12/2020 13:51 | Laserhedvig | distinct() | Oh man I use distinct() so much, especially with arrange() before and .keep_all = T |
1/12/2020 13:51 | Laserhedvig | arrange() | Oh man I use distinct() so much, especially with arrange() before and .keep_all = T |
1/12/2020 13:51 | Laserhedvig | keep_all() | Oh man I use distinct() so much, especially with arrange() before and .keep_all = T |
1/12/2020 10:28 | ChloeFouilloux | gather() | tidyr:: gather () has saved me many a time when wrangling unruly data |
1/12/2020 10:19 | VizMonkey | add_count() | I haven’t seen add_count but that’s a good one. Also keep and discard. And string_extract |
1/12/2020 10:19 | VizMonkey | string_extract() | I haven’t seen add_count but that’s a good one. Also keep and discard. And string_extract |
1/12/2020 9:59 | MattAlhonte | separate() | separate is pretty awesome and something I covet from Python, so much so that I made a blog post about writing a hacky Pandas approximation! https://t.co/se5O4nR1sa |
1/12/2020 9:35 | Stephenpedj | NA | - You should add this founder to your candid interview list!! |
1/12/2020 8:04 | Guy_F_Sutton | unite() | I find myself using tidyr::unite() a lot to clean messy data - particularly useful for making unique and informative ID’s for each row. coalesce() and fill() are also little known gems! :) |
1/12/2020 8:04 | Guy_F_Sutton | coalesce() | I find myself using tidyr::unite() a lot to clean messy data - particularly useful for making unique and informative ID’s for each row. coalesce() and fill() are also little known gems! :) |
1/12/2020 8:04 | Guy_F_Sutton | fill() | I find myself using tidyr::unite() a lot to clean messy data - particularly useful for making unique and informative ID’s for each row. coalesce() and fill() are also little known gems! :) |
1/12/2020 7:25 | ephorie | NA | Neither of them: https://t.co/Fbw9RHE3YF |
1/12/2020 7:22 | Amit_Levinson | group_indices() | Found myself using group_indices() several times in the past weeks. Great for giving groups sequential ids. |
1/12/2020 7:10 | ReillyInnes | pivot_longer() | tidyr::pivot_longer/wider; dplyr::n_distinct; tibble::glimpse; Are some of my most used (as well as %>% ) |
1/12/2020 7:10 | ReillyInnes | n_distinct() | tidyr::pivot_longer/wider; dplyr::n_distinct; tibble::glimpse; Are some of my most used (as well as %>% ) |
1/12/2020 7:00 | bmwiernik | nest() | nest() / unnest() |
1/12/2020 7:00 | bmwiernik | unnest() | nest() / unnest() |
1/12/2020 6:51 | BenInquiring | map() | The map() family from {purrr} was a game changer for me. as_mapper() is a nifty little function, but might be a bit advanced. |
1/12/2020 6:51 | BenInquiring | as_mapper() | The map() family from {purrr} was a game changer for me. as_mapper() is a nifty little function, but might be a bit advanced. |
1/12/2020 6:18 | brodriguesco | NA | anything from {purrr} |
1/12/2020 5:24 | vishal_katti | case_when() | case_when() is one of my favourite #rstats dplyr functions. The formula-like syntax needs more explaining usually. This would be a good candidate for your assignment. |
1/12/2020 5:37 | tladeras | NA | We cover it in class. It’s way too useful to not cover it. |
1/12/2020 5:20 | EOTWorld28 | {tidyeval} | One more thing that is beneficial to users would be Non Standard Evaluation(NSE); How to send columns name/ column names are strings to user functions.; ; I am yet to get my head around sym/syms ! :) |
1/12/2020 5:22 | tladeras | {tidyeval} | We may get to curly-curly {{ }}, but it will probably be after we work with {purrr}. |
1/12/2020 4:46 | Breza | partial() | partial() is so useful! |
1/12/2020 5:18 | tladeras | partial() | pryr::partial()? |
1/12/2020 4:40 | dh_slone | tidy() | One more and then I’ll shut up ðŸ˜. sf is not part of the tidyverse, but it might as well be. Spatial file processing that is completely seamless with dplyr, ggplot, etc. I make all my maps with it these days.; And finally, the tidy() function from broom. |
1/12/2020 4:42 | tladeras | NA | Yes, {sf} is fantastic! Makes complicated spatial queries and joins much easier. |
1/12/2020 6:06 | dh_slone | make_valid() | make_valid() is the sf magic wand that solves random polygon slivers that often exist in data. |
1/12/2020 4:20 | dh_slone | NA | Not tidyverse per se, but lots of these cover the ’verse:; https://t.co/lPLTvRO02z; I keep a binder of these on my desk. |
1/12/2020 4:17 | dh_slone | between() | I have not seen between() mentioned yet. Are you covering magrittr? %<>% ? |
1/12/2020 4:26 | tladeras | between() | We will cover {magrittr} - and yes between() can be very useful. ; ; I am a little leery of the assignment pipe, because it can cause mistakes due to overwriting the data frame. |
1/12/2020 4:29 | dh_slone | NA | I’ve never done that and had to start over from the beginning. |
1/12/2020 4:06 | EOTWorld28 | group_walk() | I just learnt about the function “group_walkâ€; My requirement was to store my groups into separate csv files and group_walk() helps in just that in just single line of code!!; ; Still face palming myself to learn this so late !!😀 |
1/12/2020 4:27 | tladeras | NA | That’s a nice one! Very cool. |
1/12/2020 3:58 | cote_energy | slice_max() | slice_max, slice-min |
1/12/2020 3:58 | cote_energy | slice_min() | slice_max, slice-min |
1/12/2020 18:07 | cote_energy | n_tile() | Or n_tile! |
1/12/2020 3:55 | ellis_hughes | parse_*() | The readr parse_* functions. One of the listeners of #TidyX brought it up, and I’ve now used it so many places!! |
1/12/2020 3:43 | KellyBodwin | add_predictions() | I think broom::add_predictions() is criminally underrated. |
1/12/2020 3:09 | lisalendway | complete() | complete() |
1/12/2020 3:14 | tladeras | complete() | Good one! Always forget about complete() |
1/12/2020 3:12 | alexcookson | NA | Just used this today! So handy! |
1/12/2020 2:34 | jeremy_data | first() | These were less known to me for a long time, but that may just be my own fault :) so, first() last() and nth() on grouped data that is arranged. |
1/12/2020 2:34 | jeremy_data | last() | These were less known to me for a long time, but that may just be my own fault :) so, first() last() and nth() on grouped data that is arranged. |
1/12/2020 2:34 | jeremy_data | nth() | These were less known to me for a long time, but that may just be my own fault :) so, first() last() and nth() on grouped data that is arranged. |
1/12/2020 2:21 | usansky | anti_join() | dplyr::anti_join(); dplyr::coalesce() |
1/12/2020 2:21 | usansky | coalesce() | dplyr::anti_join(); dplyr::coalesce() |
1/12/2020 2:07 | lopierra | mutate() | Not a function, but I recently discovered you can use .before and .after with mutate() to put the new column where you want it, rather than the default all the way at the end. |
1/12/2020 1:47 | wouldeye125 | nest() | Honestly? nest() makes a lot of higher level stuff super easy |
1/12/2020 2:07 | tladeras | nest_by() | For sure. nest_by()/map() is probably one of the most powerful combos in the tidyverse. |
1/12/2020 2:07 | tladeras | map() | For sure. nest_by()/map() is probably one of the most powerful combos in the tidyverse. |
1/12/2020 1:46 | iamericfletcher | every() | every(), some(), and none() from {purrr}. |
1/12/2020 1:46 | iamericfletcher | some() | every(), some(), and none() from {purrr}. |
1/12/2020 1:46 | iamericfletcher | none() | every(), some(), and none() from {purrr}. |
1/12/2020 1:06 | PeeltothePithy | deframe() | tibble::deframe(), tibble::deframe(); coercing a two-column df to named vector, which I prefer immensely to names(df) <- vec_of_names |
1/12/2020 1:27 | tladeras | NA | This one is super helpful. I didn’t know about this one. |
1/12/2020 1:32 | grrrck | reduce() | Oh that’s cool! I often use purrr::reduce() for this and feel both clever and sorry for whoever reads my code next |
1/12/2020 1:35 | PeeltothePithy | left_join() | There are some truly horrific reduce(left_join) statements hanging around in some old code of mine, and I apologize to my erstwhile colleagues. |
1/12/2020 1:09 | PeeltothePithy | enframe() | also enframe(); ; DAMN YOU LACK OF EDIT |
1/12/2020 0:42 | CPumarFrohberg | fct_reorder() | forcats::fct_reorder()! Probably quite well-known, but its contribution to ordering levels in a visually intuitive way is not to be underestimated! |
1/12/2020 0:36 | Bouzoulay | map_dfr() | If it hasn’t been mentioned already, purrr::map_dfr() or dplyr::case_when() |
1/12/2020 0:36 | Bouzoulay | case_when() | If it hasn’t been mentioned already, purrr::map_dfr() or dplyr::case_when() |
1/12/2020 0:15 | tw0handt0uch1 | crossing() | crossing() is pretty handy and str_glue() can be quite powerful |
1/12/2020 0:15 | tw0handt0uch1 | str_glue() | crossing() is pretty handy and str_glue() can be quite powerful |
30/11/2020 23:43:53 | Luisfreii | str_trim() | stringr::str_trim() is pretty good |
30/11/2020 23:15:32 | ludictech | *_join() | The dplyr *_join()s and, well, all of stringr! str_wrap() can be pretty useful for wrapping eg plot titles to a certain length, str_match() or str_detect() are so useful… |
30/11/2020 23:15:32 | ludictech | str_wrap() | The dplyr *_join()s and, well, all of stringr! str_wrap() can be pretty useful for wrapping eg plot titles to a certain length, str_match() or str_detect() are so useful… |
30/11/2020 23:15:32 | ludictech | str_match() | The dplyr *_join()s and, well, all of stringr! str_wrap() can be pretty useful for wrapping eg plot titles to a certain length, str_match() or str_detect() are so useful… |
30/11/2020 23:15:32 | ludictech | str_detect() | The dplyr *_join()s and, well, all of stringr! str_wrap() can be pretty useful for wrapping eg plot titles to a certain length, str_match() or str_detect() are so useful… |
30/11/2020 23:17:52 | tladeras | str_wrap() | Oh yeah, str_wrap()! I had to use this for tooltips on a plotly plot recently. |
30/11/2020 23:11:26 | ludictech | read_csv() | readr::read_csv() & write_csv() … (or read_delim() more generally) ? |
30/11/2020 23:11:26 | ludictech | write_csv() | readr::read_csv() & write_csv() … (or read_delim() more generally) ? |
30/11/2020 23:11:26 | ludictech | read_delim() | readr::read_csv() & write_csv() … (or read_delim() more generally) ? |
30/11/2020 23:13:14 | tladeras | {readr} | Certainly. We spend time with both {readr} and {readxl} because I think that loading data is the biggest point of frustration for students. |
1/12/2020 3:42 | apreshill | problems() | Ooh problems is a good function for importing rx https://t.co/P4ZR57PgOG |
1/12/2020 3:48 | tladeras | NA | Ooooh. That looks great. Learning so much from this thread! |
30/11/2020 23:05:28 | ArthurGailes | across() | Don’t know how well known or is because it’s new, but I never go a day without using across() anymore |
30/11/2020 23:11:46 | tladeras | across() | across() is super useful! |
30/11/2020 22:48:18 | Trabendo_daze | case_when() | case_when() but that’s pretty well known |
30/11/2020 23:23:31 | tladeras | NA | There’s a reason it’s well known! Super Useful. |
30/11/2020 22:32:24 | JKubale | str_detect() | I don’t think str_detect(), case_when(), and zap_label() have been mentioned yet. Highly recommend. |
30/11/2020 22:32:24 | JKubale | case_when() | I don’t think str_detect(), case_when(), and zap_label() have been mentioned yet. Highly recommend. |
30/11/2020 22:32:24 | JKubale | zap_label() | I don’t think str_detect(), case_when(), and zap_label() have been mentioned yet. Highly recommend. |
30/11/2020 22:43:02 | tladeras | NA | Nice! I am a little {haven} illiterate, so happy to include this. |
30/11/2020 21:51:49 | trentlikesstats | slice() | slice() |
30/11/2020 21:45:07 | cmdline_tips | unite() | like unite() and separate(). have a post based on ’s talk https://t.co/Qre4ACTRd6 #rstats |
30/11/2020 21:45:07 | cmdline_tips | separate() | like unite() and separate(). have a post based on ’s talk https://t.co/Qre4ACTRd6 #rstats |
30/11/2020 22:08:15 | tladeras | NA | Nice! Thanks for putting this together. |
30/11/2020 22:13:18 | cmdline_tips | NA | the post was written immediately after ’s talk. I believe video of the talk is available now. |
30/11/2020 21:38:32 | robinson_es | NA | has a good talk https://t.co/s3LBiZ95tR |
30/11/2020 23:04:46 | jaredlander | NA | And herself has the lesser known stars talk https://t.co/80zdiWhIn4 |
30/11/2020 21:41:40 | ameresv | NA | One of the favorite. Also his screencast are the best. So much things to learn from it |
30/11/2020 21:40:34 | tladeras | NA | Noice! Thanks, Emily. |
30/11/2020 21:34:40 | pj_ballantyne | mutate_at() | mutate_at() and mutate_if() 😠|
30/11/2020 21:34:40 | pj_ballantyne | mutate_if() | mutate_at() and mutate_if() 😠|
30/11/2020 21:30:23 | nathaneastwood_ | with_groups() | There are plenty of lesser known experimental functions in dplyr 1.0.0 like with_groups(). Also some experimental features like .keep in mutate() |
30/11/2020 21:30:23 | nathaneastwood_ | mutate() | There are plenty of lesser known experimental functions in dplyr 1.0.0 like with_groups(). Also some experimental features like .keep in mutate() |
30/11/2020 21:28:13 | MikeMahoney218 | map_*() | purrr (and furrr) in general imo! I don’t know that map_* is more complicated than loops, but I think they’re underutilized. Also tidyr::nest and forcats::fct_reorder |
30/11/2020 21:28:13 | MikeMahoney218 | fct_reorder() | purrr (and furrr) in general imo! I don’t know that map_* is more complicated than loops, but I think they’re underutilized. Also tidyr::nest and forcats::fct_reorder |
30/11/2020 21:35:57 | tladeras | NA | We will get to {purrr} eventually. I’ve been trying to slowly distentangle the use case so one concept is learned at a time. It’s been tricky. ; ; https://t.co/A6r9jWtsCV |
30/11/2020 21:41:48 | MikeMahoney218 | safely() | TIL about safely 😂 I’ve mostly been writing package code recently & am reluctant to include tidyverse dependencies, but boy oh boy do I have some horrifying tryCatch calls that could probably stand to be replaced… |
30/11/2020 21:47:36 | tladeras | safely() | Ha. safely()/possibly() can be super useful and I just learned about it by putting this section together… |
30/11/2020 21:47:36 | tladeras | possibly() | Ha. safely()/possibly() can be super useful and I just learned about it by putting this section together… |
30/11/2020 21:28:03 | allawayr | pluck() | I went for way too long not knowing about purrr::pluck() |
30/11/2020 21:29:25 | allawayr | case_when() | Oh oh and case_when() lets me be super lazy. |
30/11/2020 21:19:36 | ijeamaka_a | fct_relevel() | Forcats::fct_relevel() and forcats::fct_reorder() |
30/11/2020 21:19:36 | ijeamaka_a | fct_reorder() | Forcats::fct_relevel() and forcats::fct_reorder() |
30/11/2020 21:19:32 | piquergaming | hoist() | Hoist() - when you’re dealing with JSON (or dynamodb in my case) it’s a lifesaver. |
30/11/2020 21:19:30 | chrishanretty | if_else() | if_else (and an example of where you need to use it/where baseR ifelse breaks down) |
30/11/2020 21:23:39 | tladeras | NA | Super useful! |
30/11/2020 21:18:48 | maggiedalena123 | anti_join() | anti_join() |
30/11/2020 21:17:37 | JJVenky | across() | mutate(across()) as in; ; data.frame(a=c(q,w,e), b=c(1,2,-1)) %>% mutate(across(c(b), na_if, -1)); ; or; ; data.frame(a=c(q,w,e), b=c(1,2,-1)) %>% mutate(across(c(b), ~replace(., .<0,NA)) |
30/11/2020 21:17:37 | JJVenky | na_if() | mutate(across()) as in; ; data.frame(a=c(q,w,e), b=c(1,2,-1)) %>% mutate(across(c(b), na_if, -1)); ; or; ; data.frame(a=c(q,w,e), b=c(1,2,-1)) %>% mutate(across(c(b), ~replace(., .<0,NA)) |
30/11/2020 21:28:10 | tladeras | across() | Yup, mutate(across()) is great. I do cover {tidyselect} in my {tidyowl} tutorials: https://t.co/pRvC9YJZQG |
30/11/2020 21:15:55 | aecoppock | coalesce() | dplyr::coalesce() |
30/11/2020 21:15:42 | _echong | pull() | dplyr::pull(), to emphasize the difference between a vector and a one-column dataframe. |
30/11/2020 21:20:04 | tladeras | pull() | This is really one of the hardest concepts to teach, but agreed, pull() makes it much more clear. |
30/11/2020 21:11:01 | apreshill | breaks_pretty() | I think all of the scales package is helpful; ; https://t.co/s5WMZcWwYR; ; I especially like breaks_pretty and the label functions: https://t.co/PtrVT2R7dM |
30/11/2020 21:15:05 | tladeras | breaks_pretty() | For sure. I usually don’t get to scales when I teach {ggplot2}, but I think it might be worth highlighting the useful cases like breaks_pretty(). |
30/11/2020 21:20:12 | apreshill | group_indices() | Oh and one more! Sometimes dplyr::group_indices is helpful. The actual reference page is less helpful, but this discussion on the implementation is quite good: https://t.co/sD3iauuN9B |
30/11/2020 20:53:23 | gvwilson | lag() | I am frequently surprised by how few people know about lag() |
2/12/2020 5:54 | EvenKeely | lead() | lead() and lag() are awesome for working with transect point data. |
2/12/2020 5:54 | EvenKeely | lag() | lead() and lag() are awesome for working with transect point data. |
30/11/2020 20:57:51 | tladeras | lead() | Agreed. The documentation/examples are a little terse for lead()/lag(), which may be why few people use them. |
30/11/2020 20:57:51 | tladeras | lag() | Agreed. The documentation/examples are a little terse for lead()/lag(), which may be why few people use them. |
30/11/2020 21:07:38 | apreshill | NA | I think in general the window functions could use some love https://t.co/8z9DdvFQgt |
30/11/2020 21:10:02 | tladeras | NA | Agreed! I think the window functions are really useful. |
30/11/2020 20:49:45 | kaiz_p | left_join() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:49:45 | kaiz_p | separate() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:49:45 | kaiz_p | recode() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:49:45 | kaiz_p | pivot_longer() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:49:45 | kaiz_p | pivot_wider() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:49:45 | kaiz_p | filter() | left_join() and other joins, separate(), recode(), pivot_longer(), pivot_wider(), filter() |
30/11/2020 20:59:28 | tladeras | recode() | All very useful! I sometimes do get confused over whether to teach recodeI() vs case_when() - they’re both useful, but the use cases are different. |
30/11/2020 20:59:28 | tladeras | case_when() | All very useful! I sometimes do get confused over whether to teach recodeI() vs case_when() - they’re both useful, but the use cases are different. |
30/11/2020 21:08:24 | kaiz_p | case_when() | Good point! ; I previously used case_when() for all my recoding needs, but then I discovered recode() and it’s so much easier. Less code = fewer mistakes! |
30/11/2020 21:08:24 | kaiz_p | recode() | Good point! ; I previously used case_when() for all my recoding needs, but then I discovered recode() and it’s so much easier. Less code = fewer mistakes! |
30/11/2020 21:09:25 | kaiz_p | case_when() | I now use case_when() mostly when I’m looking for a string – grepl() or when working across multiple columns – case_when(A == “a†& B == “b†~ “abâ€) |
30/11/2020 21:09:25 | kaiz_p | grepl() | I now use case_when() mostly when I’m looking for a string – grepl() or when working across multiple columns – case_when(A == “a†& B == “b†~ “abâ€) |
30/11/2020 21:09:25 | kaiz_p | case_when() | I now use case_when() mostly when I’m looking for a string – grepl() or when working across multiple columns – case_when(A == “a†& B == “b†~ “abâ€) |
2/12/2020 0:38 | EvenKeely | NA | (TRUE ~ You missed one) |
2/12/2020 0:41 | tladeras | NA | I feel seen. |
30/11/2020 21:11:16 | tladeras | case_when() | Very true. It can be hard to see which cases you missed when you write a case_when() statement, much like writing nested if() statements. |
30/11/2020 21:11:16 | tladeras | if() | Very true. It can be hard to see which cases you missed when you write a case_when() statement, much like writing nested if() statements. |
30/11/2020 20:43:48 | Corey_Yanofsky | replace_na() | dplyr::arrange & helper dplyr::desc; dplyr::coalesce; tidyr::replace_na; ; https://t.co/h9ew04PYcU |
30/11/2020 20:46:12 | tladeras | coalesce() | Ha! ; ; And coalesce()/replace_na() are great. Adding them to the list. |
30/11/2020 20:46:12 | tladeras | replace_na() | Ha! ; ; And coalesce()/replace_na() are great. Adding them to the list. |
30/11/2020 20:32:01 | Recon1974 | NA | do you teach tidyverse directly or base r first? |
30/11/2020 20:34:58 | tladeras | NA | I teach just enough base R for them to understand vectors, functions, and data.frames. ; ; You can see the previous class here: https://t.co/4WweuqMSl8; ; The rest is mostly tidyverse, except for the cases when they will encounter base-R a lot. |
30/11/2020 20:40:42 | tladeras | NA | I know this is probably controversial, but my goal is to get them up and working usefully as quickly as possible, rather than teach a standard programming course, which you have to learn a lot of things before you do something useful. |
30/11/2020 20:30:29 | BeltzEcology | floor_date() | lubridate::floor_date; ; I have recently become a HUGE fan! |
1/12/2020 19:17 | eulerdiditfirst | {tsibble} | Shout out to tsibble; if you’re using times series data you should def check it out |
30/11/2020 21:18:36 | JenRichmondPhD | parse_date_time() | also lubridate::parse_date_time() is kinda magic |
30/11/2020 21:49:37 | tladeras | NA | It is definitely magic. |
1/12/2020 4:12 | dh_slone | ymd*() | most of lubridate is magical. I use the heck out of ymd…() and similar, with_tz() and force_tz() take care of my biggest headaches, floor_date(), and ceiling_date(), etc. |
1/12/2020 4:12 | dh_slone | with_tz() | most of lubridate is magical. I use the heck out of ymd…() and similar, with_tz() and force_tz() take care of my biggest headaches, floor_date(), and ceiling_date(), etc. |
1/12/2020 4:12 | dh_slone | force_tz() | most of lubridate is magical. I use the heck out of ymd…() and similar, with_tz() and force_tz() take care of my biggest headaches, floor_date(), and ceiling_date(), etc. |
1/12/2020 4:12 | dh_slone | floor_date() | most of lubridate is magical. I use the heck out of ymd…() and similar, with_tz() and force_tz() take care of my biggest headaches, floor_date(), and ceiling_date(), etc. |
1/12/2020 4:12 | dh_slone | ceiling_date() | most of lubridate is magical. I use the heck out of ymd…() and similar, with_tz() and force_tz() take care of my biggest headaches, floor_date(), and ceiling_date(), etc. |
30/11/2020 20:35:21 | tladeras | NA | Ooh, this is great! Thanks! |
30/11/2020 20:37:36 | BeltzEcology | NA | Welcome! |
30/11/2020 20:15:42 | emilmalta | uncount() | I use these all the time:; tidyr::uncount(); tidyr::separate and tidyr::separate_rows(); forcats::fct_inorder(); forcats::fct_infreq() |
30/11/2020 20:15:42 | emilmalta | separate_rows() | I use these all the time:; tidyr::uncount(); tidyr::separate and tidyr::separate_rows(); forcats::fct_inorder(); forcats::fct_infreq() |
30/11/2020 20:15:42 | emilmalta | fct_inorder() | I use these all the time:; tidyr::uncount(); tidyr::separate and tidyr::separate_rows(); forcats::fct_inorder(); forcats::fct_infreq() |
30/11/2020 20:15:42 | emilmalta | fct_infreq() | I use these all the time:; tidyr::uncount(); tidyr::separate and tidyr::separate_rows(); forcats::fct_inorder(); forcats::fct_infreq() |
30/11/2020 23:49:22 | samclifford | fct_explicit_na() | Been using forcats::fct_explicit_na() of late. |
30/11/2020 20:17:34 | tladeras | separate_rows() | Yes, we cover tidyr a little bit. These are great suggestions, especially separate_rows() |
30/11/2020 20:37:29 | GenomeGal | separate_rows() | Yes! Separate_rows is my life - so useful!! |
30/11/2020 20:21:21 | emilmalta | uncount() | One thing that really made everything click for me, when learning tidy data, was that uncount() is in tidyr, and not dplyr.; ; It’s kinda subtle, but it was the thing that made me realize that tidying!=transforming. |
30/11/2020 20:31:05 | tladeras | NA | Yes, this distinction escaped me at first. I guess it’s like the form versus content distinction. |
30/11/2020 20:15:12 | kaija_bean | pivot_longer() | pivot_longer() and pivot_wider() are great! |
30/11/2020 20:15:12 | kaija_bean | pivot_wider() | pivot_longer() and pivot_wider() are great! |
30/11/2020 20:16:33 | tladeras | NA | They are great! |
30/11/2020 20:15:41 | kaija_bean | NA | And although they’re basically inverses of each other, each one had different arguments and different things to pay attention to, so I could easily see one student doing each of them without too much overlap. |
30/11/2020 20:11:38 | JayUlfelder | case_when() | A few that come to mind: dplyr::case_when(), purrr::map(), dplyr::group_map(), purrr::walk(), and purrr::pluck(). |
30/11/2020 20:11:38 | JayUlfelder | map() | A few that come to mind: dplyr::case_when(), purrr::map(), dplyr::group_map(), purrr::walk(), and purrr::pluck(). |
30/11/2020 20:11:38 | JayUlfelder | group_map() | A few that come to mind: dplyr::case_when(), purrr::map(), dplyr::group_map(), purrr::walk(), and purrr::pluck(). |
30/11/2020 20:11:38 | JayUlfelder | walk() | A few that come to mind: dplyr::case_when(), purrr::map(), dplyr::group_map(), purrr::walk(), and purrr::pluck(). |
30/11/2020 20:11:38 | JayUlfelder | pluck() | A few that come to mind: dplyr::case_when(), purrr::map(), dplyr::group_map(), purrr::walk(), and purrr::pluck(). |
30/11/2020 20:13:52 | tladeras | case_when() | Yup, these are really useful! I do cover case_when() because it’s so universally useful. ; ; And I’m going to cover {purrr} a little bit. group_map() and walk() are great suggestions. |
30/11/2020 20:13:52 | tladeras | group_map() | Yup, these are really useful! I do cover case_when() because it’s so universally useful. ; ; And I’m going to cover {purrr} a little bit. group_map() and walk() are great suggestions. |
30/11/2020 20:13:52 | tladeras | walk() | Yup, these are really useful! I do cover case_when() because it’s so universally useful. ; ; And I’m going to cover {purrr} a little bit. group_map() and walk() are great suggestions. |
30/11/2020 20:06:04 | ivelasq3 | fill() | Is tidyr::fill() lesser known? [regardless I love it]; ; Sharing just in case you haven’t seen this! https://t.co/cpJfD56rxB |
30/11/2020 20:06:44 | tladeras | NA | Ooh! This is perfect. Thanks! |
30/11/2020 19:42:31 | francisco_yira | cut_width() | ggplot2::cut_width, cut_number and cut_interval to transform continuous variables into discrete bins |
30/11/2020 19:42:31 | francisco_yira | cut_number () | ggplot2::cut_width, cut_number and cut_interval to transform continuous variables into discrete bins |
30/11/2020 19:42:31 | francisco_yira | cut_interval() | ggplot2::cut_width, cut_number and cut_interval to transform continuous variables into discrete bins |
30/11/2020 19:43:33 | tladeras | NA | Ah, very interesting! |
30/11/2020 19:40:55 | tladeras | relocate() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | count() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | n() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | distinct() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | glimpse() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | slice() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
30/11/2020 19:40:55 | tladeras | geom_count() | Here’s the list so far:; ; - dplyr::relocate(); - dplyr::count() / n(); - dplyr::distinct(); - dplyr::glimpse(); - dplyr::slice(); - ggplot2::geom_count() |
1/12/2020 17:06 | robbins_ave | add_count() | dplyr::add_count() is often useful |
1/12/2020 14:23 | delaBJL | stat_summary | ggplot2::stat_summary is fun; ; I use a lot of the stringr functions, str_remove(), str_detect(); ; pivot_longer() and pivot_wider() are fairly simple to grok and are incredibly useful |
1/12/2020 14:23 | delaBJL | str_remove() | ggplot2::stat_summary is fun; ; I use a lot of the stringr functions, str_remove(), str_detect(); ; pivot_longer() and pivot_wider() are fairly simple to grok and are incredibly useful |
1/12/2020 14:23 | delaBJL | str_detect() | ggplot2::stat_summary is fun; ; I use a lot of the stringr functions, str_remove(), str_detect(); ; pivot_longer() and pivot_wider() are fairly simple to grok and are incredibly useful |
1/12/2020 14:23 | delaBJL | pivot_longer() | ggplot2::stat_summary is fun; ; I use a lot of the stringr functions, str_remove(), str_detect(); ; pivot_longer() and pivot_wider() are fairly simple to grok and are incredibly useful |
1/12/2020 14:23 | delaBJL | pivot_wider() | ggplot2::stat_summary is fun; ; I use a lot of the stringr functions, str_remove(), str_detect(); ; pivot_longer() and pivot_wider() are fairly simple to grok and are incredibly useful |
1/12/2020 15:37 | toeb18 | str_wrap() | str_wrap is also a fantastic one |
1/12/2020 13:59 | GMFranceschini | case_when() | case_when(), group_by()/summarise() |
1/12/2020 13:59 | GMFranceschini | group_by() | case_when(), group_by()/summarise() |
1/12/2020 13:59 | GMFranceschini | summarise() | case_when(), group_by()/summarise() |
Source Code and Data
Feel free to use and modify.
- RMarkdown file used to generate this post
- Python Twitter Scraper (by Giovanni Mellini) - I used this because there wasn’t a ready made recipe in
rtweet
to extract replies - you have to use recursion to extract all of the thread replies that belong to a tweet, and this was easily modifiable. - Cleaned Tweets File (CSV)
Thank You
This post is my thank you for everyone who contributed to this thread. Thank you!