Mongodb aggregation - sort makes the query very slow -
mongodb 3.2, installed on centos 6, plenty of ram , disk. i've collection 10k documents of following structure:
{ "id":5752034, "score":7.6, "name":"asus x551 15.6-inch laptop", "categoryid":"803", "positiveaspects":[{ "id":30030525, "name":"price", "score":9.8, "frequency":139, "rank":100098 }, { "id":30028399, "name":"use", "score":9.9, "frequency":99, "rank":100099 } . . ] }
for each document, nested array positiveaspects has few hundreds of elements.
the collectoin has follwing indexes:
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "positiveaspects.id" : 1.0, "positiveaspects.score" : 1.0 }, "name" : "positiveaspects.id_1_positiveaspects.score_1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "categoryid" : 1.0, "score" : 1.0 }, "name" : "categoryid_1_score_1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "rank" : -1.0 }, "name" : "rank_-1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "positiveaspects.rank" : -1.0 }, "name" : "positiveaspects.rank_-1", "ns" : "proddb.product_trees" }
i run following aggregation, takes 40 seconds:
{ aggregate:"product_trees", pipeline:[ { $match:{ categoryid:"803", score:{ $gte:8.0 } } }, { $unwind:"$positiveaspects" }, { $match:{ positiveaspects.id:30030525, positiveaspects.score:{ $gte:9.0 } } }, { $sort:{ positiveaspects.rank:-1 } }, { $project:{ _id:0, score:1, id:1, name:1, positiveaspects:1 } }, { $limit:10 } ] }
with following explain:
2016-06-01t16:10:49.140-0500 d query [conn47] beginning planning... ============================= options = no_blocking_sort index_intersection canonical query: ns=proddb.product_treestree: $and categoryid == "803" score $gte 8.0 sort: {} proj: {} ============================= 2016-06-01t16:10:49.140-0500 d query [conn47] index 0 kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] index 1 kp: { positiveaspects.id: 1.0, positiveaspects.score: 1.0 } multikey name: 'positiveaspects.id_1_positiveaspects.score_1' io: { v: 1, key: { positiveaspects.id: 1.0, positiveaspects.score: 1.0 }, name: "positiveaspects.id_1_positiveaspects.score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] index 2 kp: { categoryid: 1.0, score: 1.0 } name: 'categoryid_1_score_1' io: { v: 1, key: { categoryid: 1.0, score: 1.0 }, name: "categoryid_1_score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] index 3 kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] index 4 kp: { positiveaspects.rank: -1.0 } multikey name: 'positiveaspects.rank_-1' io: { v: 1, key: { positiveaspects.rank: -1.0 }, name: "positiveaspects.rank_-1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] predicate on field 'score' 2016-06-01t16:10:49.140-0500 d query [conn47] predicate on field 'categoryid' 2016-06-01t16:10:49.140-0500 d query [conn47] relevant index 0 kp: { categoryid: 1.0, score: 1.0 } name: 'categoryid_1_score_1' io: { v: 1, key: { categoryid: 1.0, score: 1.0 }, name: "categoryid_1_score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query [conn47] rated tree: $and categoryid == "803" || first: 0 notfirst: full path: categoryid score $gte 8.0 || first: notfirst: 0 full path: score 2016-06-01t16:10:49.140-0500 d query [conn47] tagging memoid 1 2016-06-01t16:10:49.140-0500 d query [conn47] enumerator: memo before moving: 2016-06-01t16:10:49.140-0500 d query [conn47] build solntree tagged tree: $and categoryid == "803" || selected index #0 pos 0 score $gte 8.0 || selected index #0 pos 1 2016-06-01t16:10:49.140-0500 d query [conn47] planner: adding solution: fetch ---fetched = 1 ---sortedbydiskloc = 0 ---getsort = [{ categoryid: 1 }, { categoryid: 1, score: 1 }, { score: 1 }, ] ---child: ------ixscan ---------keypattern = { categoryid: 1.0, score: 1.0 } ---------direction = 1 ---------bounds = field #0['categoryid']: ["803", "803"], field #1['score']: [8.0, inf.0] ---------fetched = 0 ---------sortedbydiskloc = 0 ---------getsort = [{ categoryid: 1 }, { categoryid: 1, score: 1 }, { score: 1 }, ] 2016-06-01t16:10:49.140-0500 d query [conn47] planner: outputted 1 indexed solutions. 2016-06-01t16:10:49.140-0500 d query [conn47] 1 plan available; run not cached. query: { categoryid: "803", score: { $gte: 8.0 } } sort: {} projection: {}, plansummary: ixscan { categoryid: 1.0, score: 1.0 } 2016-06-01t16:11:27.170-0500 command [conn47] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryid: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveaspects" }, { $match: { positiveaspects.id: 30030525, positiveaspects.score: { $gte: 9.0 } } }, { $sort: { positiveaspects.rank: -1 } }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveaspects: 1 } }, { $limit: 10 } ], cursor: {} } keyupdates:0 writeconflicts:0 numyields:226 reslen:7459 locks:{ global: { acquirecount: { r: 906 } }, database: { acquirecount: { r: 453 } }, collection: { acquirecount: { r: 453 } } } protocol:op_query 38030ms
taking out $sort
, query runs in 2 seconds.
can explain why $sort
cause such performance hit, considerig there index can use? there index missed can done in order fix?
thanks!
mongodb aggregarion - sort makes query slow
it's because $sort
not using index when not used in stage of aggregation framework. take advantage of indexing, $sort or $match must used first stage.
please see pipeline operators , indexes