pyFTS.data package¶
Module contents¶
Module for pyFTS standard datasets facilities
Submodules¶
pyFTS.data.common module¶
-
pyFTS.data.common.
get_dataframe
(filename, url, sep=';', compression='infer')[source]¶ This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.
Parameters: - filename – dataset local filename
- url – dataset internet URL
- sep – CSV field separator
- compression – type of compression
Returns: Pandas dataset
Datasets¶
Artificial and synthetic data generators¶
Facilities to generate synthetic stochastic processes
-
class
pyFTS.data.artificial.
SignalEmulator
(**kwargs)[source]¶ Bases:
object
Emulate a complex signal built from several additive and non-additive components
-
blip
(**kwargs)[source]¶ Creates an outlier greater than the maximum or lower then the minimum previous values of the signal, and insert it on a random location of the signal.
Returns: the current SignalEmulator instance, for method chaining
-
components
= None¶ Components of the signal
-
incremental_gaussian
(mu, sigma, **kwargs)[source]¶ Creates an additive gaussian interference on a previous signal
Parameters: - mu – increment on mean
- sigma – increment on variance
- start – lag index to start this signal, the default value is 0
- it – Number of iterations, the default value is 1
- length – Number of samples generated on each iteration, the default value is 100
- vmin – Lower bound value of generated data, the default value is None
- vmax – Upper bound value of generated data, the default value is None
Returns: the current SignalEmulator instance, for method chaining
-
periodic_gaussian
(type, period, mu_min, sigma_min, mu_max, sigma_max, **kwargs)[source]¶ Creates an additive periodic gaussian interference on a previous signal
Parameters: - type – ‘linear’ or ‘sinoidal’
- period – the period of recurrence
- mu – increment on mean
- sigma – increment on variance
- start – lag index to start this signal, the default value is 0
- it – Number of iterations, the default value is 1
- length – Number of samples generated on each iteration, the default value is 100
- vmin – Lower bound value of generated data, the default value is None
- vmax – Upper bound value of generated data, the default value is None
Returns: the current SignalEmulator instance, for method chaining
-
stationary_gaussian
(mu, sigma, **kwargs)[source]¶ Creates a continuous Gaussian signal with mean mu and variance sigma.
Parameters: - mu – mean
- sigma – variance
- additive – If False it cancels the previous signal and start this one, if True this signal is added to the previous one
- start – lag index to start this signal, the default value is 0
- it – Number of iterations, the default value is 1
- length – Number of samples generated on each iteration, the default value is 100
- vmin – Lower bound value of generated data, the default value is None
- vmax – Upper bound value of generated data, the default value is None
Returns: the current SignalEmulator instance, for method chaining
-
-
pyFTS.data.artificial.
generate_gaussian_linear
(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]¶ Generate data sampled from Gaussian distribution, with constant or linear changing parameters
Parameters: - mu_ini – Initial mean
- sigma_ini – Initial variance
- mu_inc – Mean increment after ‘num’ samples
- sigma_inc – Variance increment after ‘num’ samples
- it – Number of iterations
- num – Number of samples generated on each iteration
- vmin – Lower bound value of generated data
- vmax – Upper bound value of generated data
Returns: A list of it*num float values
-
pyFTS.data.artificial.
generate_linear_periodic_gaussian
(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶ Generates a periodic linear variation on mean and variance
Parameters: - period – the period of recurrence
- mu_min – initial (and minimum) mean of each period
- sigma_min – initial (and minimum) variance of each period
- mu_max – final (and maximum) mean of each period
- sigma_max – final (and maximum) variance of each period
- it – Number of iterations
- num – Number of samples generated on each iteration
- vmin – Lower bound value of generated data
- vmax – Upper bound value of generated data
Returns: A list of it*num float values
-
pyFTS.data.artificial.
generate_sinoidal_periodic_gaussian
(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶ Generates a periodic sinoidal variation on mean and variance
Parameters: - period – the period of recurrence
- mu_min – initial (and minimum) mean of each period
- sigma_min – initial (and minimum) variance of each period
- mu_max – final (and maximum) mean of each period
- sigma_max – final (and maximum) variance of each period
- it – Number of iterations
- num – Number of samples generated on each iteration
- vmin – Lower bound value of generated data
- vmax – Upper bound value of generated data
Returns: A list of it*num float values
-
pyFTS.data.artificial.
generate_uniform_linear
(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]¶ Generate data sampled from Uniform distribution, with constant or linear changing bounds
Parameters: - mu_ini – Initial mean
- sigma_ini – Initial variance
- mu_inc – Mean increment after ‘num’ samples
- sigma_inc – Variance increment after ‘num’ samples
- it – Number of iterations
- num – Number of samples generated on each iteration
- vmin – Lower bound value of generated data
- vmax – Upper bound value of generated data
Returns: A list of it*num float values
AirPassengers dataset¶
Monthly totals of a airline passengers from USA, from January 1949 through December 1960.
Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.
Bitcoin dataset¶
Bitcoin to USD quotations
Daily averaged index, by business day, from 2010 to 2018.
DowJones dataset¶
DJI - Dow Jones
Daily averaged index, by business day, from 1985 to 2017.
Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
Enrollments dataset¶
Yearly University of Alabama enrollments from 1971 to 1992.
Ethereum dataset¶
Ethereum to USD quotations
Daily averaged index, by business day, from 2016 to 2018.
EUR-GBP dataset¶
FOREX market EUR-GBP pair.
Daily averaged quotations, by business day, from 2016 to 2018.
EUR-USD dataset¶
FOREX market EUR-USD pair.
Daily averaged quotations, by business day, from 2016 to 2018.
GBP-USD dataset¶
FOREX market GBP-USD pair.
Daily averaged quotations, by business day, from 2016 to 2018.
INMET dataset¶
INMET - Instituto Nacional Meteorologia / Brasil
Belo Horizonte station, from 2000-01-01 to 31/12/2012
Source: http://www.inmet.gov.br
Malaysia dataset¶
Hourly Malaysia eletric load and tempeature
NASDAQ module¶
National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)
Daily averaged index by business day, from 2000 to 2016.
Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC
SONDA dataset¶
SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.
Brasilia station
Source: http://sonda.ccst.inpe.br/
S&P 500 dataset¶
S&P500 - Standard & Poor’s 500
Daily averaged index, by business day, from 1950 to 2017.
Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
TAIEX dataset¶
The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)
Daily averaged index by business day, from 1995 to 2014.
Source: http://www.twse.com.tw/en/products/indices/Index_Series.php
Henon chaotic time series¶
- Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)
dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x
-
pyFTS.data.henon.
get_data
(var, a=1.4, b=0.3, initial_values=[1, 1], iterations=1000)[source]¶ Get a simple univariate time series data.
Parameters: var – the dataset field name to extract Returns: numpy array
-
pyFTS.data.henon.
get_dataframe
(a=1.4, b=0.3, initial_values=[1, 1], iterations=1000)[source]¶ Return a dataframe with the bivariate Henon Map time series (x, y).
Parameters: - a – Equation coefficient
- b – Equation coefficient
- initial_values – numpy array with the initial values of x and y. Default: [1, 1]
- iterations – number of iterations. Default: 1000
Returns: Panda dataframe with the x and y values
Logistic_map chaotic time series¶
May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.
x(t) = r * x(t-1) * (1 - x(t -1) )
Lorentz chaotic time series¶
Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz
-
pyFTS.data.lorentz.
get_data
(var, a=10.0, b=28.0, c=2.6666666666666665, dt=0.01, initial_values=[0.1, 0, 0], iterations=1000)[source]¶ Get a simple univariate time series data.
Parameters: var – the dataset field name to extract Returns: numpy array
-
pyFTS.data.lorentz.
get_dataframe
(a=10.0, b=28.0, c=2.6666666666666665, dt=0.01, initial_values=[0.1, 0, 0], iterations=1000)[source]¶ Return a dataframe with the multivariate Lorenz Map time series (x, y, z).
Parameters: - a – Equation coefficient. Default value: 10
- b – Equation coefficient. Default value: 28
- c – Equation coefficient. Default value: 8.0/3.0
- dt – Time differential for continuous time integration. Default value: 0.01
- initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0]
- iterations – number of iterations. Default: 1000
Returns: Panda dataframe with the x, y and z values
Mackey-Glass chaotic time series¶
Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.
dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10
-
pyFTS.data.mackey_glass.
get_data
(b=0.1, c=0.2, tau=17, initial_values=array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5 ]), iterations=1000)[source]¶ Return a list with the Mackey-Glass chaotic time series.
Parameters: - b – Equation coefficient
- c – Equation coefficient
- tau – Lag parameter, default: 17
- initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18)
- iterations – number of iterations. Default: 1000
Returns:
Rossler chaotic time series¶
- Rössler, Phys. Lett. 57A, 397 (1976).
dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )
-
pyFTS.data.rossler.
get_data
(var, a=0.2, b=0.2, c=5.7, dt=0.01, initial_values=[0.001, 0.001, 0.001], iterations=5000)[source]¶ Get a simple univariate time series data.
Parameters: var – the dataset field name to extract Returns: numpy array
-
pyFTS.data.rossler.
get_dataframe
(a=0.2, b=0.2, c=5.7, dt=0.01, initial_values=[0.001, 0.001, 0.001], iterations=5000)[source]¶ Return a dataframe with the multivariate Rössler Map time series (x, y, z).
Parameters: - a – Equation coefficient. Default value: 0.2
- b – Equation coefficient. Default value: 0.2
- c – Equation coefficient. Default value: 5.7
- dt – Time differential for continuous time integration. Default value: 0.01
- initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001]
- iterations – number of iterations. Default: 5000
Returns: Panda dataframe with the x, y and z values
Sunspots dataset¶
Monthly sunspot numbers from 1749 to May 2016
Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/