pyFTS.data package

Module contents

Module for pyFTS standard datasets facilities

Submodules

pyFTS.data.common module

pyFTS.data.common.get_dataframe(filename, url, sep=';', compression='infer')[source]

This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.

Parameters:
  • filename – dataset local filename
  • url – dataset internet URL
  • sep – CSV field separator
  • compression – type of compression
Returns:

Pandas dataset

Datasets

AirPassengers dataset

Monthly totals of a airline passengers from USA, from January 1949 through December 1960.

Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.

pyFTS.data.AirPassengers.get_data()[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.AirPassengers.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Bitcoin dataset

Bitcoin to USD quotations

Daily averaged index, by business day, from 2010 to 2018.

Source: https://finance.yahoo.com/quote/BTC-USD?p=BTC-USD

pyFTS.data.Bitcoin.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.Bitcoin.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

DowJones dataset

DJI - Dow Jones

Daily averaged index, by business day, from 1985 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.DowJones.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.DowJones.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Enrollments dataset

Yearly University of Alabama enrollments from 1971 to 1992.

pyFTS.data.Enrollments.get_data()[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.Enrollments.get_dataframe()[source]

Ethereum dataset

Ethereum to USD quotations

Daily averaged index, by business day, from 2016 to 2018.

Source: https://finance.yahoo.com/quote/ETH-USD?p=ETH-USD

pyFTS.data.Ethereum.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.Ethereum.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

EUR-GBP dataset

FOREX market EUR-GBP pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURGBP.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.EURGBP.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

EUR-USD dataset

FOREX market EUR-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURUSD.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.EURUSD.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

GBP-USD dataset

FOREX market GBP-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.GBPUSD.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.GBPUSD.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

INMET dataset

INMET - Instituto Nacional Meteorologia / Brasil

Belo Horizonte station, from 2000-01-01 to 31/12/2012

Source: http://www.inmet.gov.br

pyFTS.data.INMET.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

NASDAQ module

National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)

Daily averaged index by business day, from 2000 to 2016.

Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC

pyFTS.data.NASDAQ.get_data(field='avg')[source]

Get a simple univariate time series data.

Parameters:field – the dataset field name to extract
Returns:numpy array
pyFTS.data.NASDAQ.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

SONDA dataset

SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.

Brasilia station

Source: http://sonda.ccst.inpe.br/

pyFTS.data.SONDA.get_data(field)[source]

Get a simple univariate time series data.

Parameters:field – the dataset field name to extract
Returns:numpy array
pyFTS.data.SONDA.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

S&P 500 dataset

S&P500 - Standard & Poor’s 500

Daily averaged index, by business day, from 1950 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.SP500.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.SP500.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

TAIEX dataset

The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)

Daily averaged index by business day, from 1995 to 2014.

Source: http://www.twse.com.tw/en/products/indices/Index_Series.php

pyFTS.data.TAIEX.get_data()[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.TAIEX.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

pyFTS.data.artificial module

Facilities to generate synthetic stochastic processes

pyFTS.data.artificial.generate_gaussian_linear(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Gaussian distribution, with constant or linear changing parameters

Parameters:
  • mu_ini – Initial mean
  • sigma_ini – Initial variance
  • mu_inc – Mean increment after ‘num’ samples
  • sigma_inc – Variance increment after ‘num’ samples
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.generate_uniform_linear(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Uniform distribution, with constant or linear changing bounds

Parameters:
  • mu_ini – Initial mean
  • sigma_ini – Initial variance
  • mu_inc – Mean increment after ‘num’ samples
  • sigma_inc – Variance increment after ‘num’ samples
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.random_walk(n=500, type='gaussian')[source]
pyFTS.data.artificial.white_noise(n=500)[source]

Henon chaotic time series

  1. Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)

dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x

pyFTS.data.henon.get_data(var, a=1.4, b=0.3, initial_values=[1, 1], iterations=1000)[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.henon.get_dataframe(a=1.4, b=0.3, initial_values=[1, 1], iterations=1000)[source]

Return a dataframe with the bivariate Henon Map time series (x, y).

Parameters:
  • a – Equation coefficient
  • b – Equation coefficient
  • initial_values – numpy array with the initial values of x and y. Default: [1, 1]
  • iterations – number of iterations. Default: 1000
Returns:

Panda dataframe with the x and y values

Logistic_map chaotic time series

May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.

x(t) = r * x(t-1) * (1 - x(t -1) )

pyFTS.data.logistic_map.get_data(r=4, initial_value=0.3, iterations=100)[source]

Return a list with the logistic map chaotic time series.

Parameters:
  • r – Equation coefficient
  • initial_value – Initial value of x. Default: 0.3
  • iterations – number of iterations. Default: 100
Returns:

Lorentz chaotic time series

Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz

pyFTS.data.lorentz.get_data(var, a=10.0, b=28.0, c=2.6666666666666665, dt=0.01, initial_values=[0.1, 0, 0], iterations=1000)[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.lorentz.get_dataframe(a=10.0, b=28.0, c=2.6666666666666665, dt=0.01, initial_values=[0.1, 0, 0], iterations=1000)[source]

Return a dataframe with the multivariate Lorenz Map time series (x, y, z).

Parameters:
  • a – Equation coefficient. Default value: 10
  • b – Equation coefficient. Default value: 28
  • c – Equation coefficient. Default value: 8.0/3.0
  • dt – Time differential for continuous time integration. Default value: 0.01
  • initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0]
  • iterations – number of iterations. Default: 1000
Returns:

Panda dataframe with the x, y and z values

Mackey-Glass chaotic time series

Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.

dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10

pyFTS.data.mackey_glass.get_data(b=0.1, c=0.2, tau=17, initial_values=array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5 ]), iterations=1000)[source]

Return a list with the Mackey-Glass chaotic time series.

Parameters:
  • b – Equation coefficient
  • c – Equation coefficient
  • tau – Lag parameter, default: 17
  • initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18)
  • iterations – number of iterations. Default: 1000
Returns:

Rossler chaotic time series

    1. Rössler, Phys. Lett. 57A, 397 (1976).

dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )

pyFTS.data.rossler.get_data(var, a=0.2, b=0.2, c=5.7, dt=0.01, initial_values=[0.001, 0.001, 0.001], iterations=5000)[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.rossler.get_dataframe(a=0.2, b=0.2, c=5.7, dt=0.01, initial_values=[0.001, 0.001, 0.001], iterations=5000)[source]

Return a dataframe with the multivariate Rössler Map time series (x, y, z).

Parameters:
  • a – Equation coefficient. Default value: 0.2
  • b – Equation coefficient. Default value: 0.2
  • c – Equation coefficient. Default value: 5.7
  • dt – Time differential for continuous time integration. Default value: 0.01
  • initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001]
  • iterations – number of iterations. Default: 5000
Returns:

Panda dataframe with the x, y and z values

Sunspots dataset

Monthly sunspot numbers from 1749 to May 2016

Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/

pyFTS.data.sunspots.get_data()[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.sunspots.get_dataframe()[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame