Python Pandas - Moving Average with uneven period lengths -


i'm trying figure out how deal time series data in pandas has uneven period lengths. first example i'm looking @ how calculate moving average last 15 days. here example of data (time utc)

index   date_time         data 46701   1/06/2016 19:27   15.00 46702   1/06/2016 19:28   18.25 46703   1/06/2016 19:30   16.50 46704   1/06/2016 19:33   17.20 46705   1/06/2016 19:34   18.18 

i'm not sure if should fill in data 1 minute increments, or if there smarter way... if has suggestions appreciated

thanks - kc

you can this.

  • resample @ frequency want (or downsampling)
    • you have pay attention here resampling strategy. has consistent meaning of data. here have arbitrary used bfill (back fill use next valid value) strategy more appropriate ffill (forward fill propagates last valid value).
  • compute moving average.
  • maybe have deal index

note: syntax rolling has been introduced in pandas 0.18.0. possible same thing in previous version pd.rolling_mean.

# test data d = {'data': [15.0, 18.25, 16.5, 17.199999999999999, 18.18],  'date_time': ['1/06/2016 19:27',   '1/06/2016 19:28',   '1/06/2016 19:30',   '1/06/2016 19:33',   '1/06/2016 19:34'],  'index': [46701, 46702, 46703, 46704, 46705]}  df = dataframe(d) df['date_time'] = pd.to_datetime(df['date_time'])  # setting date index df.set_index('date_time', inplace=true) # resampling data df = df.resample('1t').bfill() # performing moving average df['moving'] = df['data'].rolling(window=3, center=true).mean() df.plot(y=['data', 'moving']) df                       data  index     moving date_time                                    2016-01-06 19:27:00  15.00  46701        nan 2016-01-06 19:28:00  18.25  46702  16.583333 2016-01-06 19:29:00  16.50  46703  17.083333 2016-01-06 19:30:00  16.50  46703  16.733333 2016-01-06 19:31:00  17.20  46704  16.966667 2016-01-06 19:32:00  17.20  46704  17.200000 2016-01-06 19:33:00  17.20  46704  17.526667 2016-01-06 19:34:00  18.18  46705        nan 

plot

edit

here example missing data.

# random data parameters num_sample = (0, 100) nb_sample = 1000 start_date = '2016-06-02' freq = '2t'  random_state = np.random.randomstate(0)  # generating random data df = pd.dataframe({'data': random_state.randint(num_sample[0], num_sample[1], nb_sample)},                           index=random_state.choice(                               pd.date_range(start=pd.to_datetime(start_date), periods=nb_sample * 3,                                             freq=freq),                               nb_sample)) # removing duplicate index df = df.groupby(df.index).first() # removing data closed periods df.loc[(df.index.hour >= 22) | (df.index.hour <= 7),'data'] = np.nan # resampling df = df.resample('1t').ffill() # moving average hours df['avg'] = df['data'].rolling(window=60).mean()  ax = df.plot(kind='line', subplots=true) 

enter image description here


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo