python - NTILE for Sqlite from Pandas gives OPERATIONAL ERROR -
i'm trying use ntile function querying sqlite database pandas, haven't succeeded, though i've rechecked syntax many times.
self-contained example below. setup:
import pandas pd sqlalchemy import create_engine disk_engine = create_engine('sqlite:///test.db') marks = pd.dataframe({'studentid': ['s1', 's2', 's3', 's4', 's5'], 'marks': [75, 83, 91, 83, 93]}) marks.to_sql('marks_sql', disk_engine, if_exists='replace')
now try use ntile:
q = """select studentid, marks, ntile(2) on (order marks desc) groupexample marks_sql""" pd.read_sql_query(q, disk_engine)
the traceback long, it's main parts are:
operationalerror: near "(": syntax error operationalerror: (sqlite3.operationalerror) near "(": syntax error [sql: 'select studentid, marks, ntile(2) on (order marks desc)\n groupexample marks_sql']
thanks!
there no ntile () over
functionality in sqlite
gives me same error, need create using more complex query or functions
here list of unsupported analytical functions not available in sqlite
ntile 1 of these
the optimizer goes inside query first find over
, thinks column name , not expect (
follow column name, gives error.
to replicate ntile try this:
select * , case when (select count(*)+0.0 marks_sql b table.marks >= b.marks) /(select count(*) marks_sql ) >0.5 1 else 2 end marks_sql;
in order in such way table can grow in size , technique still applies have few things:
so first order table marks
(essentially create ranking). counts rows higher or equal marks
:
select count(*)+0.0 marks_sql b table.marks >= b.marks --rank of mark
we add 0.0
make number float our fraction works in next step.
we take rank , divide total row count
select count(*) marks_sql -- row count
this gives distribution on range of scores, percentile each student. not care each exact percentile, care ntile(2)
or whether in top half.
that case
statement comes play. if percentile of student on 50% fall in #1 group, top 50th percentile. else falls in #2 group.