Python Pandas - Moving Average with uneven period lengths -
i'm trying figure out how deal time series data in pandas has uneven period lengths. first example i'm looking @ how calculate moving average last 15 days. here example of data (time utc)
index date_time data 46701 1/06/2016 19:27 15.00 46702 1/06/2016 19:28 18.25 46703 1/06/2016 19:30 16.50 46704 1/06/2016 19:33 17.20 46705 1/06/2016 19:34 18.18 i'm not sure if should fill in data 1 minute increments, or if there smarter way... if has suggestions appreciated
thanks - kc
you can this.
- resample @ frequency want (or downsampling)
- you have pay attention here resampling strategy. has consistent meaning of data. here have arbitrary used
bfill(back fill use next valid value) strategy more appropriateffill(forward fill propagates last valid value).
- you have pay attention here resampling strategy. has consistent meaning of data. here have arbitrary used
- compute moving average.
- maybe have deal index
note: syntax rolling has been introduced in pandas 0.18.0. possible same thing in previous version pd.rolling_mean.
# test data d = {'data': [15.0, 18.25, 16.5, 17.199999999999999, 18.18], 'date_time': ['1/06/2016 19:27', '1/06/2016 19:28', '1/06/2016 19:30', '1/06/2016 19:33', '1/06/2016 19:34'], 'index': [46701, 46702, 46703, 46704, 46705]} df = dataframe(d) df['date_time'] = pd.to_datetime(df['date_time']) # setting date index df.set_index('date_time', inplace=true) # resampling data df = df.resample('1t').bfill() # performing moving average df['moving'] = df['data'].rolling(window=3, center=true).mean() df.plot(y=['data', 'moving']) df data index moving date_time 2016-01-06 19:27:00 15.00 46701 nan 2016-01-06 19:28:00 18.25 46702 16.583333 2016-01-06 19:29:00 16.50 46703 17.083333 2016-01-06 19:30:00 16.50 46703 16.733333 2016-01-06 19:31:00 17.20 46704 16.966667 2016-01-06 19:32:00 17.20 46704 17.200000 2016-01-06 19:33:00 17.20 46704 17.526667 2016-01-06 19:34:00 18.18 46705 nan edit
here example missing data.
# random data parameters num_sample = (0, 100) nb_sample = 1000 start_date = '2016-06-02' freq = '2t' random_state = np.random.randomstate(0) # generating random data df = pd.dataframe({'data': random_state.randint(num_sample[0], num_sample[1], nb_sample)}, index=random_state.choice( pd.date_range(start=pd.to_datetime(start_date), periods=nb_sample * 3, freq=freq), nb_sample)) # removing duplicate index df = df.groupby(df.index).first() # removing data closed periods df.loc[(df.index.hour >= 22) | (df.index.hour <= 7),'data'] = np.nan # resampling df = df.resample('1t').ffill() # moving average hours df['avg'] = df['data'].rolling(window=60).mean() ax = df.plot(kind='line', subplots=true) 
