Python Pandas - Moving Average with uneven period lengths -
i'm trying figure out how deal time series data in pandas has uneven period lengths. first example i'm looking @ how calculate moving average last 15 days. here example of data (time utc)
index date_time data 46701 1/06/2016 19:27 15.00 46702 1/06/2016 19:28 18.25 46703 1/06/2016 19:30 16.50 46704 1/06/2016 19:33 17.20 46705 1/06/2016 19:34 18.18
i'm not sure if should fill in data 1 minute increments, or if there smarter way... if has suggestions appreciated
thanks - kc
you can this.
- resample @ frequency want (or downsampling)
- you have pay attention here resampling strategy. has consistent meaning of data. here have arbitrary used
bfill
(back fill use next valid value) strategy more appropriateffill
(forward fill propagates last valid value).
- you have pay attention here resampling strategy. has consistent meaning of data. here have arbitrary used
- compute moving average.
- maybe have deal index
note: syntax rolling
has been introduced in pandas 0.18.0. possible same thing in previous version pd.rolling_mean
.
# test data d = {'data': [15.0, 18.25, 16.5, 17.199999999999999, 18.18], 'date_time': ['1/06/2016 19:27', '1/06/2016 19:28', '1/06/2016 19:30', '1/06/2016 19:33', '1/06/2016 19:34'], 'index': [46701, 46702, 46703, 46704, 46705]} df = dataframe(d) df['date_time'] = pd.to_datetime(df['date_time']) # setting date index df.set_index('date_time', inplace=true) # resampling data df = df.resample('1t').bfill() # performing moving average df['moving'] = df['data'].rolling(window=3, center=true).mean() df.plot(y=['data', 'moving']) df data index moving date_time 2016-01-06 19:27:00 15.00 46701 nan 2016-01-06 19:28:00 18.25 46702 16.583333 2016-01-06 19:29:00 16.50 46703 17.083333 2016-01-06 19:30:00 16.50 46703 16.733333 2016-01-06 19:31:00 17.20 46704 16.966667 2016-01-06 19:32:00 17.20 46704 17.200000 2016-01-06 19:33:00 17.20 46704 17.526667 2016-01-06 19:34:00 18.18 46705 nan
edit
here example missing data.
# random data parameters num_sample = (0, 100) nb_sample = 1000 start_date = '2016-06-02' freq = '2t' random_state = np.random.randomstate(0) # generating random data df = pd.dataframe({'data': random_state.randint(num_sample[0], num_sample[1], nb_sample)}, index=random_state.choice( pd.date_range(start=pd.to_datetime(start_date), periods=nb_sample * 3, freq=freq), nb_sample)) # removing duplicate index df = df.groupby(df.index).first() # removing data closed periods df.loc[(df.index.hour >= 22) | (df.index.hour <= 7),'data'] = np.nan # resampling df = df.resample('1t').ffill() # moving average hours df['avg'] = df['data'].rolling(window=60).mean() ax = df.plot(kind='line', subplots=true)