optimization - Algorithm for >2D skyline query/efficient frontier -
the problem @ hand:
given set of n points in d dimensional space, coordinates >= 0 (in 2d points in 1st quadrant, in 3d in 1st octant, , on...), remove points have point has value bigger or equal in every coordinate.
in 2d, result this:
(image vincent zoonekynd's answer here) , there simple algorithm, detailed in answer, runs in n*log(n)
. chunking should have brought n*log(h)
, optimizations on question.
i interested in extending solution 3 dimensions (and possibly 4, if it's still reasonable), current 3d algorithm pretty slow, cumbersome , doesn't generalize 4d nicely:
- sort points on x axis, annotate position of each point
- initialize sort of segment tree n leaves, leaves hold points' y values , node hold
max(child1, child2)
- sort points on z axis
- for every point largest z:
- check position in x order, try put in segment tree in position
- check first if there point down (so has > z), @ higher place (so has > x) bigger y (this costs log(n), tree)
- if said point found, current point discarded, otherwise it's inserted , tree updated
this still runs in n*log(n)
, requires 2 different sorts , 2*n
-big structure.
extending require sort , prohibitive 2*n^2
-big quad tree.
are there more efficient (especially cpu-wise) approaches?
i don't think it's relevant, i'm writing in c, code here.
if doing in n-dimensions use nearest neighbor k-d tree. tree fast way sort points based on distance location in n-d space. default k-d tree sorts points based on euclidean distance location creating nested trees.
there exists way change distance metric match going for. after have tree built - want points "furthest" (according metric) origin.
euclidean distance metric:
sqrt( sum_over_dimensions (coord**2))
i suggest suggest metric (which may wrong):
sum_over_dimensions (coord)
links:
wiki k-d tree:
https://en.wikipedia.org/wiki/k-d_tree
overflow post k-d tree metrics:
can use arbitrary metrics search kd-trees?
definition of mathematical metric:
https://en.wikipedia.org/wiki/metric_(mathematics)
in summation - suspect if spent enough time on problem build robust n-dimensional solution problem has "worst-case complexity of o(kn log n). [where k number of dimensions]". suspect difficult better, because large-dimensional nearest neighbor algorithms known unsolved problem.