Task 5 moves the focus from descriptive towards predictive tasks to forecast traffic dynamics and anticipate congestions and other mobility problems. Three major activities will be pursued:

A1) Advanced predictive models of car traffic
This activity aims to develop and robustly estimate mobility parameters (such as traffic congestion on a specific location) along a time horizon. To this end, we will start to produce auto-regressive forecasters that provide a baseline of comparison for more expedite predictive models. Then, we will extend the spatiotemporal pattern-centric and generative views from previous Task 4 to respectively produce associative classifiers and deep learning methods able to predict the circulation dynamics along the city streets. 

The previous goals will be pursued for different data sources: 
(a) car traffic data from two major types of sensors:
– fixed sensors: discrete data models of car traffic from loop counters by exploring data mining methods well-prepared to learn from multivariate time series;
– mobile sensors: continuous data models of car traffic from WAZE (geolocalized speed data) by exploring data mining methods well prepared to learn from trajectory data.
(b) public bus data operated and maintained by CARRIS (http://www.carris.pt/). In the context of public bus data, we have the description of the active buses and subway stations, and the number of entry validations per location along their routes. Such data can be mapped into collections of events. As a result, state-of-the-art methods on spatiotemporal pattern mining and classification of geolocalized event data can be promptly applied and extended for the analysis of public transportation.
(c) bike sharing data (GIRA initiative by EMEL and CML entities – https://www.gira-bicicletasdelisboa.pt/). Bike sharing data can be mapped into collections of events, with the additional benefit that events can be grouped by users and origin-destination routes per user are well-defined. Such additional knowledge poses unique challenges and opportunities for the advanced analysis of circulation dynamics, which will be properly explored in the context of this activity.

A2) Predictions with guarantees of statistical significance 
Given the daily impact of traffic routing decisions, it is essential to assess whether the discovered relations deviate from expectations (against a null data model) in order to strictly guarantee that the outputs are statistically significant. In this context, we will explore: 1) new statistical tests to assess the trustworthy degree of each prediction, as well as 2) new heuristics to be incorporated within the learning process in order to minimize false positive and false negative decisions.

A3) Integrative predictive models of mobility from multiple transportation modalities
Principles from integrative data mining at the preprocessing, learning and postprocessing stages will be considered to learn comprehensive predictive models of city traffic. To this end, we will combine multi-modal traffic data and discover relevant cross-modality relations to enhance the target predictors.

This task builds upon the data collected in Task 2 and exploratory analysis in Task 3. The learned models produced in this task will be enhanced in the next task (Task 5) in the presence of situational context data.
INESC-ID will be responsible for the execution of this task, aided by LNEC, CML and its mobility policy staff members.. INESC-ID has extensive expertise on learning predictive models from complex data structures, including multi-period classification of time series data and multi-attribute event data (Henriques, 2015a). BPD-1 and BI-3 grant holders will also work in this task.