DP Mobility Report

Madrid Dataset - no privacy

Configuration

Max. trips per user 20
Privacy budget -
User privacy True
Budget split Evenly distributed
Evaluation dev. mode False
Excluded analyses None

No noise has been added.

The following table shows the key figures of the dataset.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate and the 95% confidence interval.

privacy budget: None

estimate 95% CI: +/-
Number of records 445,488 0
Number of distinct trips 222,744 0
Number of complete trips (start and and point) 222,744 0
Number of incomplete trips (single point) 0 0
Number of distinct users 75,208 0
Number of distinct locations (lat & lon combination) 1,236 0

The following table shows the number of missing values for each column of the dataset.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate and the 95% confidence interval.

privacy budget: None

estimate 95% CI: +/-
User ID (uid) 0 0
Trip ID (tid) 0 0
Timestamp (datetime) 0 0
Latitude (lat) 2,007 0
Longitude (lng) 2,007 0

This visualization shows the relative number of trips on a timeline. Depending on the timespan of the dataset, it is either aggregated by day, week or month (indicted below the graph).

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (blue line) and the 95% confidence interval. The confidence interval is visualized as the shaded error band.

The y-axis shows the percentage of trips while the x-axis shows the timeline.

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:36.072772 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Timestamps have been aggregated by week.

Min. 2018-02-08
Max. 2018-06-11

This histogram shows the relative number of trips per weekday.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

The y-axis shows the percentage of trips while the x-axis shows the weekdays.

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:36.234567 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

This linechart shows the relative number of trips per hour over the course of a day, disaggregated by weekday and weekend.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (lines). The confidence interval is indicated below but not visualized in the graph due to visual clarity.

The legend shows the different time categories (weekday start, weekday end, weekend start, weekend end) indicating the start and end timestamp of each trip and if the trip was during the week or on the weekend.

The y-axis shows the percentage of trips while the x-axis shows the hour of the day.

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:36.404362 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

The following map shows the spatial distribution of the dataset according to the provided tessellation. Records outside the tessellation are indicated as number of outliers below the map.

The legend below shows the number of visits per tile ranging from 0 to the maximum number of visits per tile.

The allocated privacy budget for this map is shown below and noise is applied accordingly onto the counts. The confidence interval is indicated below.

Tiles below a certain threshold are grayed out: Due to the applied noise, tiles with a low visit count are likely to contain a high percentage of noise. For usability reasons, such unrealistic values are grayed out. More specifically: The threshold is set so that values for tiles with a 5% chance (or higher) of deviating more than 20 percentage points from the estimated value are not shown.

privacy budget: None

95% CI: +/- 0 visit(s)

2023-03-23T13:16:41.301190 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

2007 (0.45%) points are outside the given tessellation (95% confidence interval ± 0).

This table shows the mean number of visits per tile as well as the five-number summary consisting of: the most extreme values in the dataset (the maximum and minimum values), the lower and upper quartiles, and the median.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used.

Mean 352
Min. 0
25% 137
Median 334
75% 516
Max. 1,488

The following visualization shows the cumulated relative number of visits. This means that the tiles are sorted according to the number of visits in descending order and the relative number of visits are added tile by tile. Thus, you can use the graph to evaluate how many tiles are needed to cover a certain share of the visits.

If all tiles are visited equally, the cumulated sum follows a straight diagonal line (gray line). The larger the share of single tiles in the total number of visits, the steeper the curve.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used.

2023-03-23T13:16:41.513018 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

The following visualization shows the ranking of most frequently visited tiles.

The y-axis shows the tile name (if provided) and tile ID in order of the ranking. The x-axis shows the number of visits per tile.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used. The 95% confidence interval of the visits per tile indicated above also applies here and is visualized with error bars.

2023-03-23T13:16:41.703935 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Each map shows the arrivals (destinations) for the respective time window for each tile, split by weekday and weekend, as absolute counts and as the deviation from tile average. The tile average is defined as the mean number of visits in one tile across all time windows. Thus, the deviation from tile average indicates higher or lower number of visits for this tile during certain time windows of a day.

Tiles below a certain threshold are grayed out: Due to the applied noise, tiles with a low visit count are likely to contain a high percentage of noise. For usability reasons, such unrealistic values are grayed out. More specifically: The threshold is set so that values for tiles with a 5% chance (or higher) of deviating more than 20 percentage points from the estimated value are not shown.

privacy budget: None

95% CI: +/- 0 visit(s)

Weekday

Number of visits

Deviation from tile average

The average of each tile over all time windows equals 1 (100% of average traffic). A value of < 1 (> 1) means that a tile is visited less (more) frequently in this time window than it is on average.

User configuration of timewindows: ['2 - 6', '6 - 10', '10 - 14', '14 - 18', '18 - 22']

The following map shows the origin-destination (OD) flows between the tiles according to the provided tessellation, meaning the number of trips between respective start and end tiles.

The origin of the OD flows is indicated by a small circle and by clicking on one OD connection, information on the origin and destination cell name as well as the count of for this OD connection will show up.

The legend for the intra-tile flows is below. The intra-tile flow is defined as an OD connections that starts and ends in the same tile.

The allocated privacy budget for this map is shown below and noise is applied accordingly onto the counts. The confidence interval is indicated below.

Flows below a certain threshold are not displayed (grayed out for intra-tile flows): Due to the applied noise, flows with a low count are likely to contain a high percentage of noise. For usability reasons, such unrealistic values are not displayed/grayed out. More specifically: The threshold is set so that values for flows with a 5% chance (or higher) of deviating more than 20 percentage points from the estimated value are not shown.

privacy budget: None

95% CI: +/-0 flow(s)

User configuration: display max. top 300 OD connections on map

2023-03-23T13:16:54.250799 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

This table shows the mean number of visits per tile as well as the five-number summary consisting of: the most extreme values in the dataset (the maximum and minimum values), the lower and upper quartiles, and the median.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used.

Mean 2
Min. 1
25% 1
Median 1
75% 2
Max. 370

The following visualization shows the cumulated relative number of flows per OD pair. This means that the OD pairs are sorted according to the number of flows in descending order and the relative number of flows are added OD pair by OD pair. Thus, you can use the graph to evaluate how many OD pairs are needed to cover a certain share of the flows.

If all OD pairs are visited equally, the cumulated sum follows a straight diagonal line (gray line). The larger the share of a single OD pair in the total number of flows, the steeper the curve.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used.

2023-03-23T13:16:54.390402 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

The following visualization shows the ranking of most frequently visited OD connections.

The y-axis shows the tile name of origin and destination in order of the ranking. The x-axis shows the number of flows per OD pair.

These values are computed from the counts visualized above. Thus, no extra privacy budget is used. The 95% confidence interval of the flows per OD pair indicated above also applies here and is visualized with error bars.

2023-03-23T13:16:54.529300 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

The following histogram shows the distribution of travel time. The travel time is computed as the time difference between start and end timestamp of a trip in minutes.

The y-axis indicates the relative counts of trips while the x-axis shows the range of histogram bins in minutes according to the user configurated bin size and maximum value.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/-: 0 %

2023-03-23T13:16:54.706669 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

User configuration for histogram chart:
maximum value: 90
bin size: 5

Five number summary: travel time

Min. 0.00
25% 10.00
Median 20.00
75% 30.00
Max. 975.00

The following histogram shows the distribution of jump length. The jump length is the straight-line distance between the origin and destination.

The y-axis indicates the relative counts of trips while the x-axis shows the range of histogram bins in kilometers according to the user configurated bin size and maximum value.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:54.880367 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

User configuration for histogram chart:
maximum value: 30
bin size: 3

Five number summary: jump length

Min. 0.00
25% 0.64
Median 2.25
75% 7.56
Max. 87.17

The following histogram shows the distribution of number of trips per user, i.e. how many trips a user contributed to the dataset.

The y-axis indicates the relative number of users and the x-axis shows the range of the histogram bins according to the user configured maximum of trips per user.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:55.034685 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Trips per user are limited according to the configured maximum of trips per user: 20

Five number summary: trips per user

Min. 1.00
25% 2.00
Median 2.00
75% 4.00
Max. 20.00

The following histogram shows the distribution of time between consecutive trips of a user, i.e. the time that passes between the end of one trip and the beginning of the following trip of one user.

The y-axis shows the relative number of trips and the x-axis shows the range of histogram bins in hours between trips of the same user according to the user configured bin size and maximum value.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:55.180393 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

User configuration for histogram chart:
maximum value: None
bin size: None

Five number summary: time between consecutive trips of a user

Min. -1 days +23:50:00
25% 0 days 00:45:00
Median 0 days 02:10:00
75% 0 days 05:50:00
Max. 0 days 18:20:00

Plausibility check: There are overlapping trips in the dataset. The negative minimum time delta implies that there is a trip of a user that starts before the previous one has ended. This might be an indication of an error in the dataset.

The following histogram shows the distribution of the radii of gyration. The radius of gyration is the characteristic distance traveled by an individual during a period of time.

The y-axis shows the relative number of users and the x-axis shows the range of histogram bins in kilometers according to the user configured bin size and maximum value.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:55.334927 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

User configuration for histogram chart:
maximum value: 18
bin size: 1.5

Five number summary: radius of gyration

Min. 0.00
25% 0.56
Median 1.95
75% 5.23
Max. 43.58

The following histogram shows the distribution of how many distinct tiles a user has visited. It describes the diversity of locations a user visits.

The y-axis shows the relative number of users and the x-axis the number of distinct tiles according to the user configurated bin size and maximum value.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:55.493850 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

User configuration for histogram chart:
maximum value: None
bin size: None

Min. 1.00
25% 2.00
Median 2.00
75% 3.00
Max. 9.00

The following histogram shows the distribution of the mobility entropy.

The mobility entropy characterizes the heterogeneity of the users visitation patterns and can be interpreted as a measure for the predictability of a users location. If a user only visits a single tile, the entropy is 0, i.e., their location is highly predictable. If a user visits, e.g., four different tiles each 10 times, the entropy is 1, i.e., their location is not predictable as every of the four tiles has the same probability to be visited by the user. Intuitively, the more trips per user are entailed in the data, the more meaningful the mobility entropy.

The y-axis shows the relative counts of users and the x-axis shows the range of histogram bins of the mobility entropy.

The allocated privacy budget for this statistic is shown below and noise is applied accordingly to compute the estimate (bars) and the 95% confidence interval (error bar).

privacy budget: None

95% CI: +/- 0 %

2023-03-23T13:16:55.639880 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Five number summary: mobility entropy

Min. 0.00
25% 0.92
Median 1.00
75% 1.00
Max. 1.00