Accelerating one-to-many correlation calculations in Python -


i'd calculate pearson's correlation coefficient between vector , each row of array in python (numpy , or scipy assumed). use of standard correlation matrix calculation functions not possible due size of real data arrays , memory constraints. here's naive implementation:

import numpy np import scipy.stats sps  np.random.seed(0)  def correlateonewithmany(one, many):     """return pearson's correlation coef of 'one' each row of 'many'."""     pr_arr = np.zeros((many.shape[0], 2), dtype=np.float64)     pr_arr[:] = np.nan     row_num in np.arange(many.shape[0]):         pr_arr[row_num, :] = sps.pearsonr(one, many[row_num, :])     return pr_arr  obs, varz = 10 ** 3, 500 x = np.random.uniform(size=(obs, varz))  pr = correlateonewithmany(x[0, :], x)  %timeit correlateonewithmany(x[0, :], x) # 10 loops, best of 3: 38.9 ms per loop 

any thoughts on accelerating appreciated!

the module scipy.spatial.distance implements "correlation distance", 1 minus correlation cofficient. can use function cdist compute one-to-many distances, , correlation coefficients subtracting result 1.

here's modified version of script includes calculation of correlation coefficients using cdist:

import numpy np import scipy.stats sps scipy.spatial.distance import cdist  np.random.seed(0)  def correlateonewithmany(one, many):     """return pearson's correlation coef of 'one' each row of 'many'."""     pr_arr = np.zeros((many.shape[0], 2), dtype=np.float64)     pr_arr[:] = np.nan     row_num in np.arange(many.shape[0]):         pr_arr[row_num, :] = sps.pearsonr(one, many[row_num, :])     return pr_arr  obs, varz = 10 ** 3, 500 x = np.random.uniform(size=(obs, varz))  pr = correlateonewithmany(x[0, :], x)  c = 1 - cdist(x[0:1, :], x, metric='correlation')[0]  print(np.allclose(c, pr[:, 0])) 

timing:

in [133]: %timeit correlateonewithmany(x[0, :], x) 10 loops, best of 3: 37.7 ms per loop  in [134]: %timeit 1 - cdist(x[0:1, :], x, metric='correlation')[0] 1000 loops, best of 3: 1.11 ms per loop 

Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo