|
This Data Science laboratorial course was designed independently of the data science tool to use. However, it is my opinion that nowadays, the one that demonstrates the best balance between flexibility and results is Python, along with the set of packages specific created for this context.
The course is organized in several modules, covering the two main topics: mining multidimensional data (tabular format) and mining time series.
Data format | Lab | Topic | Procedures |
---|---|---|---|
Accessory Files | data folder | ||
config.py | |||
dslabs.mplstyle | |||
ds_charts.py | |||
ts_functions.py | |||
Multidimensional data (tabular format) | Lab 0 | Python for DS | Loading data with pandas |
Basic charts with matplotlib.pyplot | |||
Lab 1 | Data profiling | Data dimensionality | |
Data distribution | |||
Data granularity | |||
Data sparsity | |||
Lab 2 | Data preparation | Missing values imputation | |
Scaling | |||
Dummification | |||
Data balancing | |||
Lab 3 | Classification | ||
Training Strategies | |||
Naive Bayes | |||
KNN | |||
Decision Trees | |||
Random Forests | |||
Gradient Boosting | |||
Neural networks (MLP) | |||
Lab 4 | Clustering | Clustering | |
Feature Extraction | |||
Lab 5 | Pattern Mining | ||
Time Series | Lab 6 | Profiling | |
Transformation | |||
Forecasting | |||
Motif Discovery |