python - Precision and Recall on PySpark DecisionTree model diverges from manual results -


i trained decisiontree model on pyspark dataframe. resulting dataframe simulated below:

rdd = sc.parallelize(     [         (0., 1.),          (0., 0.),          (0., 0.),          (1., 1.),          (1.,0.),          (1.,0.),         (1.,1.),         (1.,1.)     ] ) df = sqlcontext.createdataframe(rdd, ["prediction", "target_index"]) df.show() +----------+------------+ |prediction|target_index| +----------+------------+ |       0.0|         1.0| |       0.0|         0.0| |       0.0|         0.0| |       1.0|         1.0| |       1.0|         0.0| |       1.0|         0.0| |       1.0|         1.0| |       1.0|         1.0| +----------+------------+ 

so let's calculate metric, recall:

metricsp = multiclassmetrics(df.rdd) print metricsp.recall() 0.625 

ok. let's try confirm correct:

tp = df[(df.target_index == 1) & (df.prediction == 1)].count() tn = df[(df.target_index == 0) & (df.prediction == 0)].count() fp = df[(df.target_index == 0) & (df.prediction == 1)].count() fn = df[(df.target_index == 1) & (df.prediction == 0)].count() print "true positives:", tp print "true negatives:", tn print "false positives:", fp print "false negatives:", fn print "total", df.count() true positives: 3 true negatives: 2 false positives: 2 false negatives: 1 total 8 

and calculate recall:

r = float(tp)/(tp + fn) print "recall", r  recall 0.75 

and results differ. i'm doing wrong?

btw, functions metrics class giving same results:

print metricsp.recall() print metricsp.precision() print metricsp.fmeasure() 0.625 0.625 0.625 

the problem using multiclassmetrics processing output of binary classifier. docs:

recall() returns recall (equals precision multiclass classifier because sum of false positives equal sum of false negatives) 

to correct results, use recall(label=1):

>>> print metricsp.recall(label=1) 0.75 

btw, headers in df.show() seem jumbled up, should be:

+----------+------------+ |prediction|target_index| +----------+------------+ |       0.0|         1.0| |       0.0|         0.0| |       0.0|         0.0| |       1.0|         1.0| |       1.0|         0.0| |       1.0|         0.0| |       1.0|         1.0| |       1.0|         1.0| +----------+------------+ 

Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo