python - Pandas DatetimeIndex vs to_datetime discrepancies -
i'm trying convert pandas series of epoch timestamps human-readable times. there @ least 2 obvious ways this: pd.datetimeindex
, pd.to_datetime()
. seem work in quite different ways:
in [1]: import pandas pd in [3]: nanos = pd.series([1462282258000000000, 1462282258100000000, 1462282258200000000]) in [4]: pd.to_datetime(nanos) out[4]: 0 2016-05-03 13:30:58.000 1 2016-05-03 13:30:58.100 2 2016-05-03 13:30:58.200 dtype: datetime64[ns] in [5]: pd.datetimeindex(nanos) out[5]: datetimeindex([ '2016-05-03 13:30:58', '2016-05-03 13:30:58.100000', '2016-05-03 13:30:58.200000'], dtype='datetime64[ns]', freq=none)
with to_datetime()
, display resolution milliseconds, , .000
printed on whole seconds. datetimeindex
, display resolution microseconds (which like), decimal part omitted on whole seconds.
then, try converting time zone:
in [12]: pd.datetimeindex(nanos).tz_localize('utc') out[12]: datetimeindex([ '2016-05-03 13:30:58+00:00', '2016-05-03 13:30:58.100000+00:00', '2016-05-03 13:30:58.200000+00:00'], dtype='datetime64[ns, utc]', freq=none) in [13]: pd.to_datetime(nanos).tz_localize('utc') typeerror: index not valid datetimeindex or periodindex
this strange: timezone functions don't work plain datetime series, datetimeindex. why be? tz_localize()
method exists , documented here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.series.tz_localize.html
i've tried pandas 0.17.0 , 0.18.1 same results.
i'm not trying make actual index, else being equal have expected use to_datetime()
- can't time zone methods work it.
there 1 way convert things, pd.to_datetime()
, yes can directly construct datetimeindex
, restrictive on purpose, while to_datetime
quite flexible.
so to_datetime
give similar object input, if input array-like, datetimeindex
, input series
series
.
in [5]: nanos = [1462282258000000000, 1462282258100000000, 1462282258200000000]
by default convert unit='ns'
lines here
in [7]: pd.to_datetime(nanos) out[7]: datetimeindex(['2016-05-03 13:30:58', '2016-05-03 13:30:58.100000', '2016-05-03 13:30:58.200000'], dtype='datetime64[ns]', freq=none)
so 1 thing make series out of this. index integer here, values datetimes.
in [10]: s = series(pd.to_datetime(nanos)) in [11]: s out[11]: 0 2016-05-03 13:30:58.000 1 2016-05-03 13:30:58.100 2 2016-05-03 13:30:58.200 dtype: datetime64[ns]
you can use .dt
accessor operate on values. series.tz_localize
operates on index.
in [12]: s.dt.tz_localize('us/eastern') out[12]: 0 2016-05-03 13:30:58-04:00 1 2016-05-03 13:30:58.100000-04:00 2 2016-05-03 13:30:58.200000-04:00 dtype: datetime64[ns, us/eastern]