postgresql - Slow query due to planer invalid stats - even after analyze -
i have 2 tables:
create table sf.dir_current ( id bigint primary key, volume_id integer not null, path varchar not null ); create index dir_volid_path_indx on dir_current (volume_id, path); create table sf.event ( id bigint, -- no primary key here! volume_id integer not null, parent_path varchar not null, type bigint, depth integer );
table dir contains ~50 millions of rows , in rows volume_id = 1. table events contains ~20k rows.
i execute following query (in plsql function - vol_id, min_id, max_id , on function params):
select dir.id parent_id, event event_row sf.event event left outer join sf.dir_current dir on dir.volume_id = vol_id , parent_path = dir.path event.volume_id = vol_id , event.id between min_id , max_id , (depth_filter null or event.depth = depth_filter) , (type_filter null or event.type = type_filter) order event.depth;
everything works fine when rows in dir table has volume_id = 1. after adding few thousand rows volume_id = 2 (and running analyze) query takes long. here explain of long running query: explain.depesz.com
as visible query planner had no idea there many rows volume_id = 2 , created plan far optimal.
after debugging found out analyze did not find row volume_id = 2. confirmed query:
starfish=# select most_common_vals, n_distinct pg_stats tablename = 'dir_current' , attname = 'volume_id'; most_common_vals | n_distinct ------------------+------------ {1} | 1 (1 row)
after few analyze's finnally finds values vol_id = 2 , query gets normal execution time: explain.depesz.com
question: how prevent extremely long query time? there way force analyze find rows? or maybe manually modify stats column (setting n_distinct vol_id column not help).
i'm using postresql 9.5