Étienne Simon

Kaggle: ECML/PKDD 15, Taxi Trajectory Prediction (I)



The task was to predict the destination of a taxi based on the beginning of its trajectory, represented as a variable length of GPS points, and diverse associated meta-information, such as the departure time, the driver id and client information.

The provided training dataset was composed of more than 1.7 million complete trajectories which correspond to the activity of taxis in Porto for a full year.

On the following videos you can see the prediction of our model (blue dot) as the taxi advance (a GPS coordinate was received every 15 seconds):


Figure 1: Multi-Layer Perceptron
Figure 2: Bidirectional Recurrent Neural Network
Figure 3: Memory Network

The model we won with was a simple multi-layer perception presented in figure 1 which took as input the first 5 and last 5 points of the trajectory.

As the destination we aim to predict is composed of two scalar values (latitude and longitude), it is natural to have two output neurons. However, we found that it was difficult to train such a simple model because it does not take into account any prior information on the distribution of the data. To tackle this issue, we integrate prior knowledge of the destinations directly in the architecture of our model: instead of predicting directly the destination position, we use a predefined set of a few thousand destination cluster centers and a hidden layer that associates a scalar value (similar to a probability) to each of these clusters. As the network must output a single destination position, for our output prediction, we compute a weighted average of the predefined destination cluster centers.

At the end of the competition we had other models coded: bidirectional RNN (figure 2) and memory network (figure 3) ; But we only managed to train them satisfactorily post-mortem, in the end, a variant of the bidirectional RNN performs best (see table 1).


Figure 4: t-SNE 2D projection of the embeddings of quarter of hours (24×4=96 quarters in a day)
Figure 5: t-SNE 2D projection of the embeddings of the week of the year (52 weeks in a year)

To use meta-data, we embedded them with lookup tables. The timestamp was divided into higher-level variables that better describe human activity.

In figure 4 and figure 5 you can see the t-SNE of two embeddings showing that the model learned the structure of the day/year.

ModelCustom TestKaggle PublicKaggle Private
MLP, clustering (winning model) 2.81 2.39 1.87
MLP, direct output 2.97 3.44 3.88
MLP, clustering, no embeddings 2.93 2.64 2.17
MLP, clustering, embeddings only 4.29 3.52 3.76
RNN 3.14 2.49 2.39
Bidirectional RNN 3.01 2.65 2.33
Bidirectional RNN with window 2.60 3.15 2.06
Memory network 2.87 2.77 2.20
Second-place team 2.36 2.09
Third-place team 2.45 2.11
Table 1: Error (mean distance error in kilometer) on different test set

Our winning submission on Kaggle scored 2.03 but the model had not been trained until convergence.