r - Day to day rolling correlations by matching name -
let have data frame below. (the data set have not small this.)
library(lubridate) x <- data.frame( date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)), name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"), observation = sample(1:10) ) # date name observation # 1 2016-06-01 10 # 2 2016-06-01 b 7 # 3 2016-06-01 c 3 # 4 2016-06-01 d 2 # 5 2016-06-02 8 # 6 2016-06-02 b 6 # 7 2016-06-02 c 4 # 8 2016-06-03 b 5 # 9 2016-06-03 c 1 # 10 2016-06-03 d 9
i want find day day correlation of observations of matching names, i.e., date 2016-06-02, want find correlation between <8, 6, 4> , <10, 7, 3> because a, b, , c common in both 2016-06-02 , 2016-06-01. can such (there better ways this):
filter(x, date %in% ymd(20160601)) %>% left_join(filter(x, date %in% ymd(20160602)), = "name") %>% transmute( date = ymd(20160602), correlation = cor(observation.x, observation.y, use = "complete.obs")) %>% `[`(1, ) # date correlation # 1 2016-06-02 0.9966159
but how do entire data frame using window functions, data frame consists of dates , correlation previous date? prefer dplyr/rcpproll solution!
dplyr
doesn't have rolling merges. assuming need 1 (not clear op, since sample data doesn't have holes), can do:
library(data.table) dt = as.data.table(x) # or setdt convert in place dt[, date := as.date(date)] # not clear op if have dates or datetimes # let's make sure it's dates dt[.(name = name, old.date = date - 1, obs = observation), on = c(name = 'name', date = 'old.date'), roll = t][ , cor(obs, observation, use = 'pairwise.complete.obs'), = date] # date v1 #1: 2016-06-01 na #2: 2016-06-02 0.9966159 #3: 2016-06-03 -0.5000000