python - pandas - rank elements of dataframe -
data = pandas.dataframe(numpy.random.randn(4,3)) print data out[4]: 0 1 2 0 -1.122880 -2.662009 1.180418 1 -0.335768 0.162640 0.105928 2 -1.282813 0.049638 1.532208 3 -0.422884 -1.110049 0.031648
working huge dataset , trying efficiently return tuples rank elements of dataframe. tried few awkward sequences of apply()
, rank()
, such want nicer.
looking function get_ranks(data)
return ordered set of (row, col) tuples. above: (2,2), (0,2), (3,2), (1,1), ...
i searched around bunch haven't found commentary applying in particular. should cat rows or cols , rank there? or there more direct path?
here can :
>>> import pandas pd >>> import numpy np >>> df = pd.dataframe(np.random.randn(4,3)) >>> df 0 1 2 0 1.644294 1.476467 -0.137539 1 -0.448040 -0.329539 -0.996425 2 -1.015308 -1.397746 0.369095 3 -0.570194 -0.989716 -1.489257 >>> df2 = pd.dataframe(df.values.flatten()) >>> df2 0 0 1.644294 1 1.476467 2 -0.137539 3 -0.448040 4 -0.329539 5 -0.996425 6 -1.015308 7 -1.397746 8 0.369095 9 -0.570194 10 -0.989716 11 -1.489257 >>> df3 = df2.rank() >>> df3['row'] = df3.index % 4 >>> df3['column'] = (df3.index/4).astype(int) >>> df3 0 row column 0 12.0 0 0 1 11.0 1 0 2 9.0 2 0 3 7.0 3 0 4 8.0 0 1 5 4.0 1 1 6 3.0 2 1 7 2.0 3 1 8 10.0 0 2 9 6.0 1 2 10 5.0 2 2 11 1.0 3 2
some explanations :
i flatten original dataframe, , use rank()
rank of values in flattened array. use modulo , division operations original position of value.
the resulting dataframe has 3 columns : first 1 rank of value (12 -> max, 1 -> min), second 1 index of original row of value, , third index of original column of value.
hope it'll helpful, , please let me know if it's not entirely clear.