pyFTS.data package¶
Module contents¶
Module for pyFTS standard datasets facilities
Submodules¶
pyFTS.data.common module¶
- pyFTS.data.common.get_dataframe(filename: str, url: str, sep: str = ';', compression: str = 'infer') pandas.core.frame.DataFrame [source]¶
This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.
- Parameters
filename – dataset local filename
url – dataset internet URL
sep – CSV field separator
compression – type of compression
- Returns
Pandas dataset
Datasets¶
Artificial and synthetic data generators¶
Facilities to generate synthetic stochastic processes
- class pyFTS.data.artificial.SignalEmulator(**kwargs)[source]¶
Bases:
object
Emulate a complex signal built from several additive and non-additive components
- blip(**kwargs)[source]¶
Creates an outlier greater than the maximum or lower then the minimum previous values of the signal, and insert it on a random location of the signal.
- Returns
the current SignalEmulator instance, for method chaining
- components¶
Components of the signal
- incremental_gaussian(mu: float, sigma: float, **kwargs)[source]¶
Creates an additive gaussian interference on a previous signal
- Parameters
mu – increment on mean
sigma – increment on variance
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None
- Returns
the current SignalEmulator instance, for method chaining
- periodic_gaussian(type: str, period: int, mu_min: float, sigma_min: float, mu_max: float, sigma_max: float, **kwargs)[source]¶
Creates an additive periodic gaussian interference on a previous signal
- Parameters
type – ‘linear’ or ‘sinoidal’
period – the period of recurrence
mu – increment on mean
sigma – increment on variance
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None
- Returns
the current SignalEmulator instance, for method chaining
- stationary_gaussian(mu: float, sigma: float, **kwargs)[source]¶
Creates a continuous Gaussian signal with mean mu and variance sigma.
- Parameters
mu – mean
sigma – variance
additive – If False it cancels the previous signal and start this one, if True this signal is added to the previous one
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None
- Returns
the current SignalEmulator instance, for method chaining
- pyFTS.data.artificial.generate_gaussian_linear(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]¶
Generate data sampled from Gaussian distribution, with constant or linear changing parameters
- Parameters
mu_ini – Initial mean
sigma_ini – Initial variance
mu_inc – Mean increment after ‘num’ samples
sigma_inc – Variance increment after ‘num’ samples
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data
- Returns
A list of it*num float values
- pyFTS.data.artificial.generate_linear_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶
Generates a periodic linear variation on mean and variance
- Parameters
period – the period of recurrence
mu_min – initial (and minimum) mean of each period
sigma_min – initial (and minimum) variance of each period
mu_max – final (and maximum) mean of each period
sigma_max – final (and maximum) variance of each period
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data
- Returns
A list of it*num float values
- pyFTS.data.artificial.generate_sinoidal_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶
Generates a periodic sinoidal variation on mean and variance
- Parameters
period – the period of recurrence
mu_min – initial (and minimum) mean of each period
sigma_min – initial (and minimum) variance of each period
mu_max – final (and maximum) mean of each period
sigma_max – final (and maximum) variance of each period
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data
- Returns
A list of it*num float values
- pyFTS.data.artificial.generate_uniform_linear(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]¶
Generate data sampled from Uniform distribution, with constant or linear changing bounds
- Parameters
mu_ini – Initial mean
sigma_ini – Initial variance
mu_inc – Mean increment after ‘num’ samples
sigma_inc – Variance increment after ‘num’ samples
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data
- Returns
A list of it*num float values
AirPassengers dataset¶
Monthly totals of a airline passengers from USA, from January 1949 through December 1960.
Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.
Bitcoin dataset¶
Bitcoin to USD quotations
Daily averaged index, by business day, from 2010 to 2018.
DowJones dataset¶
DJI - Dow Jones
Daily averaged index, by business day, from 1985 to 2017.
Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
Enrollments dataset¶
Yearly University of Alabama enrollments from 1971 to 1992.
Ethereum dataset¶
Ethereum to USD quotations
Daily averaged index, by business day, from 2016 to 2018.
EUR-GBP dataset¶
FOREX market EUR-GBP pair.
Daily averaged quotations, by business day, from 2016 to 2018.
EUR-USD dataset¶
FOREX market EUR-USD pair.
Daily averaged quotations, by business day, from 2016 to 2018.
GBP-USD dataset¶
FOREX market GBP-USD pair.
Daily averaged quotations, by business day, from 2016 to 2018.
INMET dataset¶
INMET - Instituto Nacional Meteorologia / Brasil
Belo Horizonte station, from 2000-01-01 to 31/12/2012
Source: http://www.inmet.gov.br
Malaysia dataset¶
Hourly Malaysia eletric load and tempeature
NASDAQ module¶
National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)
Daily averaged index by business day, from 2000 to 2016.
Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC
SONDA dataset¶
SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.
Brasilia station
Source: http://sonda.ccst.inpe.br/
S&P 500 dataset¶
S&P500 - Standard & Poor’s 500
Daily averaged index, by business day, from 1950 to 2017.
Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
TAIEX dataset¶
The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)
Daily averaged index by business day, from 1995 to 2014.
Source: http://www.twse.com.tw/en/products/indices/Index_Series.php
Henon chaotic time series¶
Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)
dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x
- pyFTS.data.henon.get_data(var: str, a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) pandas.core.frame.DataFrame [source]¶
Get a simple univariate time series data.
- Parameters
var – the dataset field name to extract
- Returns
numpy array
- pyFTS.data.henon.get_dataframe(a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) pandas.core.frame.DataFrame [source]¶
Return a dataframe with the bivariate Henon Map time series (x, y).
- Parameters
a – Equation coefficient
b – Equation coefficient
initial_values – numpy array with the initial values of x and y. Default: [1, 1]
iterations – number of iterations. Default: 1000
- Returns
Panda dataframe with the x and y values
Logistic_map chaotic time series¶
May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.
x(t) = r * x(t-1) * (1 - x(t -1) )
- pyFTS.data.logistic_map.get_data(r: float = 4, initial_value: float = 0.3, iterations: int = 100) list [source]¶
Return a list with the logistic map chaotic time series.
- Parameters
r – Equation coefficient
initial_value – Initial value of x. Default: 0.3
iterations – number of iterations. Default: 100
- Returns
Lorentz chaotic time series¶
Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz
- pyFTS.data.lorentz.get_data(var: str, a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) pandas.core.frame.DataFrame [source]¶
Get a simple univariate time series data.
- Parameters
var – the dataset field name to extract
- Returns
numpy array
- pyFTS.data.lorentz.get_dataframe(a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) pandas.core.frame.DataFrame [source]¶
Return a dataframe with the multivariate Lorenz Map time series (x, y, z).
- Parameters
a – Equation coefficient. Default value: 10
b – Equation coefficient. Default value: 28
c – Equation coefficient. Default value: 8.0/3.0
dt – Time differential for continuous time integration. Default value: 0.01
initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0]
iterations – number of iterations. Default: 1000
- Returns
Panda dataframe with the x, y and z values
Mackey-Glass chaotic time series¶
Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.
dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10
- pyFTS.data.mackey_glass.get_data(b: float = 0.1, c: float = 0.2, tau: float = 17, initial_values: numpy.ndarray = array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5]), iterations: int = 1000) list [source]¶
Return a list with the Mackey-Glass chaotic time series.
- Parameters
b – Equation coefficient
c – Equation coefficient
tau – Lag parameter, default: 17
initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18)
iterations – number of iterations. Default: 1000
- Returns
Rossler chaotic time series¶
Rössler, Phys. Lett. 57A, 397 (1976).
dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )
- pyFTS.data.rossler.get_data(var: str, a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) numpy.ndarray [source]¶
Get a simple univariate time series data.
- Parameters
var – the dataset field name to extract
- Returns
numpy array
- pyFTS.data.rossler.get_dataframe(a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) pandas.core.frame.DataFrame [source]¶
Return a dataframe with the multivariate Rössler Map time series (x, y, z).
- Parameters
a – Equation coefficient. Default value: 0.2
b – Equation coefficient. Default value: 0.2
c – Equation coefficient. Default value: 5.7
dt – Time differential for continuous time integration. Default value: 0.01
initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001]
iterations – number of iterations. Default: 5000
- Returns
Panda dataframe with the x, y and z values
Sunspots dataset¶
Monthly sunspot numbers from 1749 to May 2016
Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/