python - AgglomerativeClustering on a correlation matrix -


i have correlation matrix of typical structure of size 288x288 defined by:

from sklearn.cluster import agglomerativeclustering df = read_returns() correl_matrix = df.corr() 

where read_returns gives me dataframe date index, , columns of returns of assets.

now - want cluster these correlations reduce population size.

by doing reading , experimenting discovered agglomerativeclustering - , appears @ first pass appropriate solution problem.

i define distance metric ((.5*(1-correl_matrix))**.5) , have:

cluster = agglomerativeclustering(n_clusters=40, linkage='average') cluster.fit(((.5*(1-correl_matrix))**.5).values) label_groups = cluster.labels_ 

to observe of data , cross check work pick out cluster 1 , observe pairwise correlations , find min correlation between 2 items group in dataset find :

single_cluster = [] in range(0,correl_matrix.shape[0]):     if label_groups[i]==1:         single_cluster.append(correl_matrix.index[i])  min_correl = 1.0 x in single_cluster:     y in single_cluster:         if x<>y:             if correl_matrix[x][y]<min_correl:                 min_correl = correl_matrix[x][y]  print min_correl 

and min pairwise correlation of .20

to me seems quite low - "low based off what?" fair question have no answer.

i anticipate/enforce each pairwise correlation of cluster >=.7 or this.

is possible in agglomerativeclustering?

am accidentally going down wrong path?

hierarchical clustering supports different "linkage" strategies.

  • single-link: connects points on minimum distance others in cluster
  • complete-link: connects based on maximum distance cluster
  • ...

if want high minimum correlation = small maximum distance, calls complete linkage.

you may want treat negative correlations "good", too. i.e. use dist = 1 - abs(corr).

make sure use ghe dendrogram. if have outliers in data, want cut (n_clusters+n_outliers) partitions.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo