Usage¶
Introduction¶
R5py is a Python library for routing and calculating travel time matrices on multimodal transport networks (walk, bike, public transport and car).
It provides a simple and friendly interface to R5 (the Rapid Realistic Routing on Real-world and Reimagined networks) which is a routing engine developed by Conveyal. R5py
is designed to interact with GeoPandas GeoDataFrames, and it is inspired by r5r which is a similar wrapper developed for R. R5py
exposes some of R5’s functionality via its Python API, in a syntax similar to r5r’s. At the time of this writing, only the computation of travel time matrices has been fully implemented. Over time, r5py
will be expanded to incorporate other functionalities from R5.
Data requirements¶
Data for creating a routable network¶
When calculating travel times with r5py
, you typically need a couple of datasets:
A road network dataset from OpenStreetMap (OSM) in Protocolbuffer Binary (
.pbf
) -format:This data is used for finding the fastest routes and calculating the travel times based on walking, cycling and driving. In addition, this data is used for walking/cycling legs between stops when routing with transit.
Hint: Sometimes you might need modify the OSM data beforehand, e.g. by cropping the data or adding special costs for travelling (e.g. for considering slope when cycling/walking). When doing this, you should follow the instructions at Conveyal website. For adding customized costs for pedestrian and cycling analyses, see this repository.
A transit schedule dataset in General Transit Feed Specification (GTFS.zip) -format (optional):
This data contains all the necessary information for calculating travel times based on public transport, such as stops, routes, trips and the schedules when the vehicles are passing a specific stop. You can read about GTFS standard from here.
Hint:
r5py
can also combine multiple GTFS files, as sometimes you might have different GTFS feeds representing e.g. the bus and metro connections.
Data for origin and destination locations¶
In addition to OSM and GTFS datasets, you need data that represents the origin and destination locations (OD-data) for routings. This data is typically stored in one of the geospatial data formats, such as Shapefile, GeoJSON or GeoPackage. As r5py
is build on top of geopandas
, it is easy to read OD-data from various different data formats.
Where to get these datasets?¶
Here are a few places from where you can download the datasets for creating the routable network:
OpenStreetMap data in PBF-format:
pyrosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).
pydriosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).
GeoFabrik -website. Has data extracts for many pre-defined areas (countries, regions, etc).
BBBike -website. Has data extracts readily available for many cities across the world. Also supports downloading data by specifying your own area or interest.
Protomaps -website. Allows to download the data with custom extent by specifying your own area of interest.
GTFS data:
Transitfeeds -website. Easy to navigate and find GTFS data for different countries and cities. Includes current and historical GTFS data. Notice: The site will be depracated in the future.
Mobility Database -website. Will eventually replace TransitFeeds -website.
Transitland -website. Find data based on country, operator or feed name. Includes current and historical GTFS data.
Sample datasets¶
In the following tutorial, we use various open source datasets:
The point dataset for Helsinki has been obtained from Helsinki Region Environmental Services (HSY) licensed under a Creative Commons By Attribution 4.0.
The street network for Helsinki is a cropped and filtered extract of OpenStreetMap (© OpenStreetMap contributors, ODbL license)
The GTFS transport schedule dataset for Helsinki is a cropped and minimised copy of Helsingin seudun liikenne’s (HSL) open dataset Creative Commons BY 4.0.
Installation¶
Before you can start using r5py
, you need install it and a few libraries. Check installation instructions for more details.
Configuring r5py
before using it¶
It is possible to configure r5py
in a few different ways (see configuration instructions for details). One of the options that you most likely want to adjust, is configuring how much memory (RAM) r5py
will consume during the calculations. r5py
runs a powerful Java engine under the hood, and by default it will use 80 % of the available memory for doing the calculations. However, you can easily adjust this.
If you want to allocate e.g. a maximum of 5 Gb of RAM for the tool, you can do so by running:
import sys
sys.argv.append(["--max-memory", "5G"])
By running this, r5py
will use at maximum 5 Gb of memory. However, it does not mean that the tool will necessary use all of this memory if it does not need it.
Important
Notice that changing the amount of allocated memory should alway be done as the first thing in your script, i.e. it should be run before importing r5py
.
Getting started with r5py
¶
Next, we will learn how to calculate travel times with r5py
between locations spread around the city center area of Helsinki, Finland.
Load the origin and destination data¶
Let’s start by downloading a sample point dataset into a geopandas GeoDataFrame
that we can use as our origin and destination locations. For the sake of this exercise, we have prepared a grid of points covering parts of Helsinki. The point data also contains information about residents of each 250 meter cell:
import geopandas
points_url = "https://github.com/r5py/r5py/raw/main/docs/data/Helsinki/population_points_2020.gpkg"
points = geopandas.read_file(points_url)
points.head()
id | population | geometry | |
---|---|---|---|
0 | 0 | 389 | POINT (24.90770 60.16199) |
1 | 1 | 296 | POINT (24.90771 60.15974) |
2 | 2 | 636 | POINT (24.90772 60.15750) |
3 | 3 | 1476 | POINT (24.90772 60.15526) |
4 | 4 | 23 | POINT (24.91219 60.16648) |
The points
GeoDataFrame contains a few columns, namely id
, population
and geometry
. The id
column with unique values and geometry
columns are required for r5py
to work. If your input point dataset does not have an id
column with unique values, r5py
will throw an error.
To get a better sense of the data, let’s create a map that shows the locations of the points and visualise the number of people living in each cell (the cells are represented by their centre point):
points.explore("population", cmap="Reds", marker_kwds={"radius": 12})
Matplotlib is building the font cache; this may take a moment.
Let’s pick one of these points to represent our origin and store it in a separate GeoDataFrame:
origin = points.loc[points["id"] == 54].copy()
origin.explore(color="blue", max_zoom=14, marker_kwds={"radius": 12})
Load transport network¶
Virtually all operations of r5py
require a transport network. In this example, we use data from Helsinki metropolitan area, which you can find in the source code repository of r5py in docs/data/
(see here). To import the street and public transport networks, instantiate an r5py.TransportNetwork
with the file paths to the OSM extract and the GTFS files:
# Allow 8 GB of memory
import sys
sys.argv.append(["--max-memory", "8G"])
from r5py import TransportNetwork
transport_network = TransportNetwork(
"../data/Helsinki/kantakaupunki.osm.pbf",
[
"../data/Helsinki/GTFS.zip"
]
)
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.mapdb.Volume$ByteBufferVol (file:/home/docs/.cache/r5py/r5-v6.6-all.jar) to method java.nio.DirectByteBuffer.cleaner() WARNING: Please consider reporting this to the maintainers of org.mapdb.Volume$ByteBufferVol WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release
At this stage, r5py
has created the routable transport network and it is stored in the transport_network
variable. We can now start using this network for doing the travel time calculations.
Compute travel time matrix from one to all locations¶
A travel time matrix is a dataset detailing the travel costs (e.g., time) between given locations (origins and destinations) in a study area. To compute a travel time matrix with r5py
based on public transportation, we first need to initialize an r5py.TravelTimeMatrixComputer
-object. As inputs, we pass following arguments for the TravelTimeMatrixComputer
:
transport_network
, which we created in the previous step representing the routable transport network.origins
, which is a GeoDataFrame with one location that we created earlier (however, you can also use multiple locations as origins).destinations
, which is a GeoDataFrame representing the destinations (in our case, thepoints
GeoDataFrame).departure
, which should be Python’sdatetime
-object (in our case standing for “22nd of February 2022 at 08:30”) to tellr5py
that the schedules of this specific time and day should be used for doing the calculations.Note: By default,
r5py
summarizes and calculates a median travel time from all possible connections within one hour from given depature time (with 1 minute frequency). It is possible to adjust this time window usingdeparture_time_window
-parameter (see details here).
transport_modes
, which determines the travel modes that will be used in the calculations. These can be passed using the options from theTransitMode
andLegMode
-classes.Hint: To see all available options, run
help(TransitMode)
orhelp(LegMode)
.
Note
In addition to these ones, the constructor also accepts many other parameters listed here, such as walking and cycling speed, maximum trip duration, maximum number of transit connections used during the trip, etc.
Now, we will first create a travel_time_matrix_computer
instance as described above:
import datetime
from r5py import TravelTimeMatrixComputer, TransitMode, LegMode
travel_time_matrix_computer = TravelTimeMatrixComputer(
transport_network,
origins=origin,
destinations=points,
departure=datetime.datetime(2022,2,22,8,30),
transport_modes=[TransitMode.TRANSIT, LegMode.WALK]
)
Warning: SIGINT handler expected:libjvm.so+0xbde490 found:libpython3.10.so.1.0+0x284690
Running in non-interactive shell, SIGINT handler is replaced by shell
Signal Handlers:
SIGSEGV: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGBUS: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGFPE: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGPIPE: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGXFSZ: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGILL: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR2: [libjvm.so+0xbddd20], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
SIGHUP: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGINT: [libpython3.10.so.1.0+0x284690], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_ONSTACK
SIGTERM: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGQUIT: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
Running this initializes the TravelTimeMatrixComputer
, but any calculations were not done yet.
To actually run the computations, we need to call .compute_travel_times()
on the instance, which will calculate the travel times between all points:
travel_time_matrix = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix.head()
from_id | to_id | travel_time | |
---|---|---|---|
0 | 54 | 0 | 26 |
1 | 54 | 1 | 29 |
2 | 54 | 2 | 29 |
3 | 54 | 3 | 31 |
4 | 54 | 4 | 28 |
As a result, this returns a pandas.DataFrame
which we stored in the travel_time_matrix
-variable. The values in the travel_time
column are travel times in minutes between the points identified by from_id
and to_id
. As you can see, the id
value in the from_id
column is the same for all rows because we only used one origin location as input.
To get a better sense of the results, let’s create a travel time map based on our results. We can do this easily by making a table join between the points
GeoDataFrame and the travel_time_matrix
. The key in the travel_time_matrix
table is the column to_id
and the corresponding key in points
GeoDataFrame is the column id
:
join = points.merge(travel_time_matrix, left_on="id", right_on="to_id")
join.head()
id | population | geometry | from_id | to_id | travel_time | |
---|---|---|---|---|---|---|
0 | 0 | 389 | POINT (24.90770 60.16199) | 54 | 0 | 26 |
1 | 1 | 296 | POINT (24.90771 60.15974) | 54 | 1 | 29 |
2 | 2 | 636 | POINT (24.90772 60.15750) | 54 | 2 | 29 |
3 | 3 | 1476 | POINT (24.90772 60.15526) | 54 | 3 | 31 |
4 | 4 | 23 | POINT (24.91219 60.16648) | 54 | 4 | 28 |
Now we have the travel times attached to each point, and we can easily visualize them on a map:
join.explore("travel_time", cmap="Greens", marker_kwds={"radius": 12})
Compute travel time matrix from all to all locations¶
Running the calculations between all points in our sample dataset can be done in a similar manner as calculating the travel times from one origin to all destinations.
Since, calculating these kind of all-to-all travel time matrices is quite typical when doing accessibility analyses, it is actually possible to calculate a cross-product between all points just by using the origins
parameter (i.e. without needing to specify a separate set for destinations). r5py
will use the same points as destinations and produce a full set of origins and destinations:
travel_time_matrix_computer = TravelTimeMatrixComputer(
transport_network,
origins=points,
departure=datetime.datetime(2022,2,22,8,30),
transport_modes=[TransitMode.TRANSIT, LegMode.WALK]
)
travel_time_matrix_all = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_all.head()
from_id | to_id | travel_time | |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 0 | 1 | 7 |
2 | 0 | 2 | 10 |
3 | 0 | 3 | 18 |
4 | 0 | 4 | 13 |
travel_time_matrix_all.tail()
from_id | to_id | travel_time | |
---|---|---|---|
87 | 91 | 87 | 27 |
88 | 91 | 88 | 23 |
89 | 91 | 89 | 10 |
90 | 91 | 90 | 6 |
91 | 91 | 91 | 0 |
len(travel_time_matrix_all)
8464
As we can see from the outputs above, now we have calculated travel times between all points (n=92) in the study area. Hence, the resulting DataFrame has almost 8500 rows (92x92=8464). Based on these results, we can for example calculate the median travel time to or from a certain point, which gives a good estimate of the overall accessibility of the location in relation to other points:
median_times = travel_time_matrix_all.groupby("from_id")["travel_time"].median()
median_times
from_id
0 23.0
1 25.0
2 28.0
3 29.0
4 26.5
...
87 25.5
88 24.0
89 23.0
90 26.0
91 27.5
Name: travel_time, Length: 92, dtype: float64
To estimate, how long does it take in general to travel between locations in our study area (i.e. what is the baseline accessibility in the area), we can calculate the mean (or median) of the median travel times showing that it is approximately 22 minutes:
median_times.mean()
22.119565217391305
Naturally, we can also visualize these values on a map:
overall_access = points.merge(median_times.reset_index(), left_on="id", right_on="from_id")
overall_access.head()
id | population | geometry | from_id | travel_time | |
---|---|---|---|---|---|
0 | 0 | 389 | POINT (24.90770 60.16199) | 0 | 23.0 |
1 | 1 | 296 | POINT (24.90771 60.15974) | 1 | 25.0 |
2 | 2 | 636 | POINT (24.90772 60.15750) | 2 | 28.0 |
3 | 3 | 1476 | POINT (24.90772 60.15526) | 3 | 29.0 |
4 | 4 | 23 | POINT (24.91219 60.16648) | 4 | 26.5 |
overall_access.explore("travel_time", cmap="Blues", scheme="natural_breaks", k=4, marker_kwds={"radius": 12})
In out study area, there seems to be a bit poorer accessibility in the Southern areas and on the edges of the region (i.e. we wittness a classic edge-effect here).
Advanced usage¶
Compute travel times with a detailed breakdown of the routing results¶
In case you are interested in more detailed routing results, it is possible to specify breakdown=True
once initializing the TravelTimeMatrixComputer
object. This will provide not only the same information as in the previous examples, but it also brings more detailed information about the routings. When breakdown is enabled, r5py
produces information about the used routes for each origin-destination pair, as well as total time disaggregated by access, waiting, in-vehicle and transfer times:
travel_time_matrix_computer = TravelTimeMatrixComputer(
transport_network,
origins=origin,
destinations=points,
departure=datetime.datetime(2022,2,22,8,30),
transport_modes=[TransitMode.TRANSIT, LegMode.WALK],
breakdown=True,
)
travel_time_matrix_detailed = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_detailed.head()
from_id | to_id | travel_time | routes | board_stops | alight_stops | ride_times | access_time | egress_time | transfer_time | wait_times | total_time | n_iterations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 54 | 0 | 26 | [1030, 1021] | [1050106, 1040278] | [1040144, 1201129] | [4.0, 4.0] | 3.8 | 7.8 | 2.3 | [2.8, 1.2] | 26.0 | 1 |
1 | 54 | 1 | 29 | [31M2] | [1020602] | [1201602] | [3.0] | 15.9 | 12.4 | 0.0 | [1.5] | 32.9 | 3 |
2 | 54 | 2 | 29 | [31M2, 1008] | [1040602, 1201432] | [1201602, 1203401] | [2.0, 2.0] | 14.5 | 4.4 | 1.6 | [2.0, 4.7] | 31.2 | 1 |
3 | 54 | 3 | 31 | [1030, 31M1] | [1050106, 1040602] | [1040144, 1201602] | [4.0, 2.0] | 3.8 | 21.9 | 2.0 | [1.3, 1.2] | 36.3 | 1 |
4 | 54 | 4 | 28 | [31M2] | [1020602] | [1201602] | [3.0] | 15.9 | 11.4 | 0.0 | [1.5] | 31.8 | 4 |
As you can see, the result contains much more information than earlier, see the following table for explanations:
Column |
Description |
Data type |
---|---|---|
routes |
The route-ids (lines) used during the trip |
list |
board_stops |
The stop-ids of the boarding stops |
list |
alight_stops |
The stop-ids of the alighting stops |
list |
ride_times |
In vehicle ride times of individual journey legs |
list |
access_time |
The time it takes for the “first mile” of a trip |
float |
egress_time |
The time it takes for the “last mile” of a trip |
float |
transfer_time |
The time it takes to transfer from vechile to another |
float |
wait_times |
The time(s) it take to wait for the vehicle at a stop |
list |
total_time |
Sum(ride_times, access_time, egress_time, transfer_time, wait_times) |
float |
n_iterations |
Number of iterations used for calculating the travel times |
int |
Compute travel times for different percentiles¶
Because r5py
calculates travel times for all possible transit departure possibilities within an hour (with one minute frequency), we basically get a distribution of travel times. It is possible to gather and return information about the travel times at different percentiles of this distribution based on all computed trips (sorted from the fastest to slowest connections). By default, the returned time in r5py
is the median travel time (i.e. 50
). You can access these percentiles by using a parameter percentiles
which accepts a list of integers representing different percentiles, such as [25, 50, 75]
which returns the travel times at those percentiles:
travel_time_matrix_computer = TravelTimeMatrixComputer(
transport_network,
origins=origin,
destinations=points,
departure=datetime.datetime(2022,2,22,8,30),
transport_modes=[TransitMode.TRANSIT, LegMode.WALK],
percentiles=[25, 50, 75],
)
travel_time_matrix_detailed = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_detailed.head()
from_id | to_id | travel_time_p25 | travel_time_p50 | travel_time_p75 | |
---|---|---|---|---|---|
0 | 54 | 0 | 24 | 26 | 27 |
1 | 54 | 1 | 26 | 28 | 29 |
2 | 54 | 2 | 27 | 28 | 30 |
3 | 54 | 3 | 29 | 30 | 33 |
4 | 54 | 4 | 26 | 28 | 30 |