The future of time series data in sunpy#

In late 2022 I got a small development grant from NumFocus to scope the future of time series data in sunpy. The successful application can be read on the sunpy wiki. The application contains context that I won’t repeat here. This blog post is the key outcome of this grant, with a record of what I did, the recommendations I made, and any decisions we came to as a community.

User requirements#

The first stage of my work investigated what the user requirements are for a sunpy data container for time series data. As part of this I used my own experience and the following community engagement:

From these discsusions came the following list of requirements:



Store data that is a function of time

This means the time column should be treated as the index or coordinates to the data, and be stored as a time-like type.

Handle different time scales

Data can have times defined in a variety of different time scales (e.g. UTC, TAI)

Store multi-dimensional data

Although time is a common index to timeseries data, it isn’t always the only one. As an example, velocity distribution functions measured in the solar wind are 4D datasets, with data as a function of time and three dimensions in velocity space.

Handle time scales with leapseconds

Some timescales can contain timestamps that occur within a leapsecond.

Store and use physical units with the data and any non-time indices

Store data in a format that can be used with scientific Python libraries

Support for storing out-of memory datasets

Store metadata alongside actual data

Have a way to store an observer coordinate alongside the time index

Have an easy way to do common data manipulation tasks

e.g. interpolating, resampling, rebinning

Have a way to combine multiple timeseries objects, and keep track of metadata

Ability to convert to other common time series objects (e.g. pandas.DataFrame)

Functionality for loading and saving out to common file formats

Existing options for a data container#

The next step was to identify a set of possible data containers that could be used to store time- series data in sunpy. The identified options were:

  • astropy.timeseries.TimeSeries

  • pandas.DataFrame

  • xarray.DataArray (or xarray.DataSet)

  • numpy.ndarray

  • ndcube

What do other projects use?#

I also looked at what Python in Heliophysics projects use (as of writing, in Jan 2023):




Custom TimeSeries object, backed by pandas.DataFrame

HAPI Client



Not clear if users can access the data itself


Unclear if there is any specific timeseries container object












Custom DataContainer object, backed by numpy.ndarray

There is no common container used, with only astropy.TimeSeries not represented out of the possible options above.

What datasets does sunpy currently support?#

sunpy currently has built in support for reading CDF files that conform to the Space Physics Guidelines for CDF, as long as the dataset is one- or two- dimensional. Alongside this several custom data readers have been written to support different data sources:

(links point to the data source information web page)

Data product(s)

File format




Text file

FERMI GBM summary




PROBA-2 LYRA ligthcurve


NOAA solar cycle monthly indices


NOAA solar cycle predicted indices


NoRH radio


RHESSI x-ray summary


Evaluating options#

Having found possible options, in this section I’ve evaluated them against the criteria set out above.


Time-like index data


Can store datetime64 data, but no support for indexes

Different time scales


No support

Multi-dimensional data


Physical units


No support

Interop with scientific Python


Out of memory


numpy arrays are always in memory



No support

Observer coordinates


No support

Easy data manipulation




Can save to binary .npy format or text file


Time-like index data


Different time scales


No support

Multi-dimensional data


Possible, but recommended to use xarray instead

Physical units


No native support (tracking issue), could be possible with pint-pands

Interop with scientific Python


Out of memory


pandas DataFrames are always in memory



Possible to add additional properties to a DataFrame

Observer coordinates


No support

Easy data manipulation


Many built in methods for manipulating time-like data



Lots of I/O options


Time-like index data


Different time scales


No support

Multi-dimensional data


Physical units


No native support (tracking issue), could be possible with pint-xarray

Interop with scientific Python


Out of memory


Support for computing using dask



Possible to add metadata to a DataArray

Observer coordinates


Support for adding “non-dimensional” coordinates (e.g. longitude/latitude), but not clear if storing astropy SkyCoord would work

Easy data manipulation


Many built in methods for manipulating time-like data



Lots of I/O options


Time-like index data


Different time scales


Multi-dimensional data


Physical units


Interop with scientific Python


Out of memory


Apparently there is some support, but this is undocumented.



Can store on the .meta attribute

Observer coordinates


Easy data manipulation




I/O is done via astropy.table.Table


Time-like index data


Different time scales


Multi-dimensional data


Physical units


Interop with scientific Python


Out of memory


Seems to be supported in theory, but little docs



Can store arbitrary FITS metadata

Observer coordinates


No support for extra coordinates

Easy data manipulation


Very few manipulation methods implemented



Initial recommendations#

  • numpy.ndarray doesn’t implement several key features, and these are almost certainly out of scope for future ndarray development, so I suggest ndarray is discounted.

  • xarray.DataArray builds on top of pandas.DataFrame with additional features that would be useful to us, I suggest pandas.DataFrame is discounted.

  • NDCube is designed specifically to store data that is associated with a FITS world coordinate system (WCS). While some solar timeseries data is already in the FITS format, a large portion is in CDF format which is tabular, which FITS is not primarily designed to represent. So I suggest NDCube is discounted.

At a SunPy community meeting there was a consensus agreement that going forward we should consider astropy.TimeSeries and xarray.DataArray as the two options to consider.

These two options have the following comparison:



Time-like index data



Different time scales



Multi-dimensional data



Physical units



Interop with scientific Python



Out of memory






Observer coordinates



Easy data manipulation






My initial recommendation would be to adopt xarray.DataArray, as the two red items have a strong possibility of being solved with DataArray:

  • It should (I haven’t confirmed this) be possible to convert times in different time scales (including ones with leap seconds) to a single timescale that doesn’t have leap seconds, and store this in an xarray.DataArray.

  • Although there is not native support for units in DataArray currently, there is interest and ongoing development to support them.

It is unclear to me (because I did not have time to investigate) how hard it would be to implement support for storing rich coordinates (ie. astropy.SkyCoord) in the extra_coords part of xarray data structures.

In contrast I think implementing multi-dimensional data in astropy.TimeSeries, adding documentation for out of memory datasets, and implementing easy data manipulation methods would take significantly more effor than this. Finally, xarray has a much bigger development community than astropy.TimeSeries, so implementing bug fixes and new features would probably be much easier with xarray.

Putting astropy objects in xarray structures#

For the final part of the small development grant, I investigated the changes needed to put astropy objects in xarray structures.

As a model for doing this, it is currently possible to store unitful data created with pint in xarray structures. Support for doing this has two components:

  • xarray natively supports storing duck arrays

  • xarray-pint provides a set of accessors that can be used to serialise and deserialise unitful data so that it can be saved to a file and loaded again. It does this by converting the unit data into metadata, with strings representing units.

It is not currently possible to store astropy.Quantity objects in xarray structures, as they inherit directly from ndarray, and get coerced from Quantity to ndarray during the xarray structure initialisation. I think fixing this is (at least initially) a one line change, changing (what was xarray/core/ on commit hash 51554f2638bc9e4a527492136fe6f54584ffa75d) from

data = np.asarray(data)


if not isinstance(data, np.array):
    data = np.asarray(data)

Before moving forward with this it needs to be possible to run the full unit tests in xarray with astropy.Quantity. I started work on this in these two PRs: