subset - Transposing and Merging datasets in R -
i sure there answers out there question can't seem find 1 works , im absolutely new r apologies redundancy!
so have huge dataset - 17k obs 35 variables. txt file imported , coerced as.matrix. 1st column has character values , rest 34 columns has numeric values.
structure -
>str(data_m) chr [1:17933, 1:35] "rab12" "trim52" "c1orf86" "plac9" "morn3" "loc643783" "loc389541" "oaz2" ... - attr(*, "dimnames")=list of 2 ..$ : null ..$ : chr [1:35] "name" "x118" "x12" "x21" ...
now there small long form dataset 2 columns - id , gender.
> str(data_maleids) 'data.frame': 24 obs. of 2 variables: $ id : factor w/ 34 levels "x118","x12","x21",..: 8 23 9 19 10 7 5 4 2 30 ... $ gender: factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ...`
eg. -
row.names id gender 1 1 x37 male 2 2 x64 male
all want subset 1st dataset ids ( x37, x64 etc) present in 2nd dataset.
i tried transposing bigger dataset gives me issues in terms of column names , can't seem way around this.
the first comment statement "the 1st column has character values , rest 34 columns has numeric values". data_m
matrix, columns of same type. in case character. can see output of str()
. think matrix in r vector arranged in several columns.
secondly advise use data.table
package (you have install if not have yet) task. sketch of syntax this:
- read data in. there nice function
fread()
indata.table
package read data text filesdata.table
object:data_m <- fread("file.name.txt")
- key
data_m
variableid
:setkey(data_m, id)
- make vector of ids
data_maleids
:ids <- sort(unique(data_maleids$id))
. - select case need:
data_m[id %in% ids]
.