Mongodb aggregation - sort makes the query very slow -


mongodb 3.2, installed on centos 6, plenty of ram , disk. i've collection 10k documents of following structure:

{   "id":5752034,   "score":7.6,   "name":"asus x551 15.6-inch laptop",    "categoryid":"803",   "positiveaspects":[{                        "id":30030525,                        "name":"price",                        "score":9.8,                        "frequency":139,                        "rank":100098                      },                      {                        "id":30028399,                        "name":"use",                        "score":9.9,                        "frequency":99,                        "rank":100099                      }                      .                      .                 ] } 

for each document, nested array positiveaspects has few hundreds of elements.

the collectoin has follwing indexes:

{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "positiveaspects.id" : 1.0, "positiveaspects.score" : 1.0 }, "name" : "positiveaspects.id_1_positiveaspects.score_1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "categoryid" : 1.0, "score" : 1.0 }, "name" : "categoryid_1_score_1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "rank" : -1.0 }, "name" : "rank_-1", "ns" : "proddb.product_trees" } { "v" : 1, "key" : { "positiveaspects.rank" : -1.0 }, "name" : "positiveaspects.rank_-1", "ns" : "proddb.product_trees" } 

i run following aggregation, takes 40 seconds:

{     aggregate:"product_trees",   pipeline:[     {        $match:{           categoryid:"803",         score:{              $gte:8.0         }      }   },   {        $unwind:"$positiveaspects"   },   {        $match:{           positiveaspects.id:30030525,         positiveaspects.score:{              $gte:9.0         }      }   },   {        $sort:{           positiveaspects.rank:-1      }   },   {        $project:{           _id:0,         score:1,         id:1,         name:1,         positiveaspects:1      }   },   {        $limit:10   }  ] } 

with following explain:

2016-06-01t16:10:49.140-0500 d query    [conn47] beginning planning... ============================= options = no_blocking_sort index_intersection canonical query: ns=proddb.product_treestree: $and     categoryid == "803"     score $gte 8.0 sort: {} proj: {} ============================= 2016-06-01t16:10:49.140-0500 d query    [conn47] index 0 kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] index 1 kp: { positiveaspects.id: 1.0, positiveaspects.score: 1.0 } multikey name: 'positiveaspects.id_1_positiveaspects.score_1' io: { v: 1, key: { positiveaspects.id: 1.0, positiveaspects.score: 1.0 }, name: "positiveaspects.id_1_positiveaspects.score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] index 2 kp: { categoryid: 1.0, score: 1.0 } name: 'categoryid_1_score_1' io: { v: 1, key: { categoryid: 1.0, score: 1.0 }, name: "categoryid_1_score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] index 3 kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] index 4 kp: { positiveaspects.rank: -1.0 } multikey name: 'positiveaspects.rank_-1' io: { v: 1, key: { positiveaspects.rank: -1.0 }, name: "positiveaspects.rank_-1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] predicate on field 'score' 2016-06-01t16:10:49.140-0500 d query    [conn47] predicate on field 'categoryid' 2016-06-01t16:10:49.140-0500 d query    [conn47] relevant index 0 kp: { categoryid: 1.0, score: 1.0 } name: 'categoryid_1_score_1' io: { v: 1, key: { categoryid: 1.0, score: 1.0 }, name: "categoryid_1_score_1", ns: "proddb.product_trees" } 2016-06-01t16:10:49.140-0500 d query    [conn47] rated tree: $and     categoryid == "803"  || first: 0 notfirst: full path: categoryid     score $gte 8.0  || first: notfirst: 0 full path: score 2016-06-01t16:10:49.140-0500 d query    [conn47] tagging memoid 1 2016-06-01t16:10:49.140-0500 d query    [conn47] enumerator: memo before moving: 2016-06-01t16:10:49.140-0500 d query    [conn47] build solntree tagged tree: $and     categoryid == "803"  || selected index #0 pos 0     score $gte 8.0  || selected index #0 pos 1 2016-06-01t16:10:49.140-0500 d query    [conn47] planner: adding solution: fetch ---fetched = 1 ---sortedbydiskloc = 0 ---getsort = [{ categoryid: 1 }, { categoryid: 1, score: 1 }, { score: 1 }, ] ---child: ------ixscan ---------keypattern = { categoryid: 1.0, score: 1.0 } ---------direction = 1 ---------bounds = field #0['categoryid']: ["803", "803"], field #1['score']: [8.0, inf.0] ---------fetched = 0 ---------sortedbydiskloc = 0 ---------getsort = [{ categoryid: 1 }, { categoryid: 1, score: 1 }, { score: 1 }, ] 2016-06-01t16:10:49.140-0500 d query    [conn47] planner: outputted 1 indexed solutions. 2016-06-01t16:10:49.140-0500 d query    [conn47] 1 plan available; run not cached. query: { categoryid: "803", score: { $gte: 8.0 } } sort: {} projection: {}, plansummary: ixscan { categoryid: 1.0, score: 1.0 } 2016-06-01t16:11:27.170-0500 command  [conn47] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryid: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveaspects" }, { $match: { positiveaspects.id: 30030525, positiveaspects.score: { $gte: 9.0 } } }, { $sort: { positiveaspects.rank: -1 } }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveaspects: 1 } }, { $limit: 10 } ], cursor: {} } keyupdates:0 writeconflicts:0 numyields:226 reslen:7459 locks:{ global: { acquirecount: { r: 906 } }, database: { acquirecount: { r: 453 } }, collection: { acquirecount: { r: 453 } } } protocol:op_query 38030ms 

taking out $sort, query runs in 2 seconds.

can explain why $sort cause such performance hit, considerig there index can use? there index missed can done in order fix?

thanks!

mongodb aggregarion - sort makes query slow

it's because $sort not using index when not used in stage of aggregation framework. take advantage of indexing, $sort or $match must used first stage.

please see pipeline operators , indexes


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo