Usage

Introduction

R5py is a Python library for routing and calculating travel time matrices on multimodal transport networks (walk, bike, public transport and car). It provides a simple and friendly interface to R5 (the Rapid Realistic Routing on Real-world and Reimagined networks) which is a routing engine developed by Conveyal. R5py is designed to interact with GeoPandas GeoDataFrames, and it is inspired by r5r which is a similar wrapper developed for R. R5py exposes some of R5’s functionality via its Python API, in a syntax similar to r5r’s. At the time of this writing, only the computation of travel time matrices has been fully implemented. Over time, r5py will be expanded to incorporate other functionalities from R5.

Data requirements

Data for creating a routable network

When calculating travel times with r5py, you typically need a couple of datasets:

  • A road network dataset from OpenStreetMap (OSM) in Protocolbuffer Binary (.pbf) -format:

    • This data is used for finding the fastest routes and calculating the travel times based on walking, cycling and driving. In addition, this data is used for walking/cycling legs between stops when routing with transit.

    • Hint: Sometimes you might need modify the OSM data beforehand, e.g. by cropping the data or adding special costs for travelling (e.g. for considering slope when cycling/walking). When doing this, you should follow the instructions at Conveyal website. For adding customized costs for pedestrian and cycling analyses, see this repository.

  • A transit schedule dataset in General Transit Feed Specification (GTFS.zip) -format (optional):

    • This data contains all the necessary information for calculating travel times based on public transport, such as stops, routes, trips and the schedules when the vehicles are passing a specific stop. You can read about GTFS standard from here.

    • Hint: r5py can also combine multiple GTFS files, as sometimes you might have different GTFS feeds representing e.g. the bus and metro connections.

Data for origin and destination locations

In addition to OSM and GTFS datasets, you need data that represents the origin and destination locations (OD-data) for routings. This data is typically stored in one of the geospatial data formats, such as Shapefile, GeoJSON or GeoPackage. As r5py is build on top of geopandas, it is easy to read OD-data from various different data formats.

Where to get these datasets?

Here are a few places from where you can download the datasets for creating the routable network:

  • OpenStreetMap data in PBF-format:

    • pyrosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).

    • pydriosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).

    • GeoFabrik -website. Has data extracts for many pre-defined areas (countries, regions, etc).

    • BBBike -website. Has data extracts readily available for many cities across the world. Also supports downloading data by specifying your own area or interest.

    • Protomaps -website. Allows to download the data with custom extent by specifying your own area of interest.

  • GTFS data:

    • Transitfeeds -website. Easy to navigate and find GTFS data for different countries and cities. Includes current and historical GTFS data. Notice: The site will be depracated in the future.

    • Mobility Database -website. Will eventually replace TransitFeeds -website.

    • Transitland -website. Find data based on country, operator or feed name. Includes current and historical GTFS data.

Sample datasets

In the following tutorial, we use various open source datasets:

  • The point dataset for Helsinki has been obtained from Helsinki Region Environmental Services (HSY) licensed under a Creative Commons By Attribution 4.0.

  • The street network for Helsinki is a cropped and filtered extract of OpenStreetMap (© OpenStreetMap contributors, ODbL license)

  • The GTFS transport schedule dataset for Helsinki is a cropped and minimised copy of Helsingin seudun liikenne’s (HSL) open dataset Creative Commons BY 4.0.

Installation

Before you can start using r5py, you need install it and a few libraries. Check installation instructions for more details.

Configuring r5py before using it

It is possible to configure r5py in a few different ways (see configuration instructions for details). One of the options that you most likely want to adjust, is configuring how much memory (RAM) r5py will consume during the calculations. r5py runs a powerful Java engine under the hood, and by default it will use 80 % of the available memory for doing the calculations. However, you can easily adjust this.

If you want to allocate e.g. a maximum of 5 Gb of RAM for the tool, you can do so by running:

import sys
sys.argv.append(["--max-memory", "5G"])

By running this, r5py will use at maximum 5 Gb of memory. However, it does not mean that the tool will necessary use all of this memory if it does not need it.

Important

Notice that changing the amount of allocated memory should alway be done as the first thing in your script, i.e. it should be run before importing r5py.

Getting started with r5py

Next, we will learn how to calculate travel times with r5py between locations spread around the city center area of Helsinki, Finland.

Load the origin and destination data

Let’s start by downloading a sample point dataset into a geopandas GeoDataFrame that we can use as our origin and destination locations. For the sake of this exercise, we have prepared a grid of points covering parts of Helsinki. The point data also contains information about residents of each 250 meter cell:

import geopandas 

points_url = "https://github.com/r5py/r5py/raw/main/docs/data/Helsinki/population_points_2020.gpkg"
points = geopandas.read_file(points_url)
points.head()
id population geometry
0 0 389 POINT (24.90770 60.16199)
1 1 296 POINT (24.90771 60.15974)
2 2 636 POINT (24.90772 60.15750)
3 3 1476 POINT (24.90772 60.15526)
4 4 23 POINT (24.91219 60.16648)

The points GeoDataFrame contains a few columns, namely id, population and geometry. The id column with unique values and geometry columns are required for r5py to work. If your input point dataset does not have an id column with unique values, r5py will throw an error.

To get a better sense of the data, let’s create a map that shows the locations of the points and visualise the number of people living in each cell (the cells are represented by their centre point):

points.explore("population", cmap="Reds", marker_kwds={"radius": 12})
Matplotlib is building the font cache; this may take a moment.
Make this Notebook Trusted to load map: File -> Trust Notebook

Let’s pick one of these points to represent our origin and store it in a separate GeoDataFrame:

origin = points.loc[points["id"] == 54].copy()
origin.explore(color="blue", max_zoom=14, marker_kwds={"radius": 12})
Make this Notebook Trusted to load map: File -> Trust Notebook

Load transport network

Virtually all operations of r5py require a transport network. In this example, we use data from Helsinki metropolitan area, which you can find in the source code repository of r5py in docs/data/ (see here). To import the street and public transport networks, instantiate an r5py.TransportNetwork with the file paths to the OSM extract and the GTFS files:

# Allow 8 GB of memory
import sys
sys.argv.append(["--max-memory", "8G"])
from r5py import TransportNetwork

transport_network = TransportNetwork(
    "../data/Helsinki/kantakaupunki.osm.pbf",
    [
        "../data/Helsinki/GTFS.zip"
    ]
)
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.mapdb.Volume$ByteBufferVol (file:/home/docs/.cache/r5py/r5-v6.6-all.jar) to method java.nio.DirectByteBuffer.cleaner()
WARNING: Please consider reporting this to the maintainers of org.mapdb.Volume$ByteBufferVol
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

At this stage, r5py has created the routable transport network and it is stored in the transport_network variable. We can now start using this network for doing the travel time calculations.

Compute travel time matrix from one to all locations

A travel time matrix is a dataset detailing the travel costs (e.g., time) between given locations (origins and destinations) in a study area. To compute a travel time matrix with r5py based on public transportation, we first need to initialize an r5py.TravelTimeMatrixComputer -object. As inputs, we pass following arguments for the TravelTimeMatrixComputer:

  • transport_network, which we created in the previous step representing the routable transport network.

  • origins, which is a GeoDataFrame with one location that we created earlier (however, you can also use multiple locations as origins).

  • destinations, which is a GeoDataFrame representing the destinations (in our case, the points GeoDataFrame).

  • departure, which should be Python’s datetime -object (in our case standing for “22nd of February 2022 at 08:30”) to tell r5py that the schedules of this specific time and day should be used for doing the calculations.

    • Note: By default, r5py summarizes and calculates a median travel time from all possible connections within one hour from given depature time (with 1 minute frequency). It is possible to adjust this time window using departure_time_window -parameter (see details here).

  • transport_modes, which determines the travel modes that will be used in the calculations. These can be passed using the options from the TransitMode and LegMode -classes.

    • Hint: To see all available options, run help(TransitMode) or help(LegMode).

Note

In addition to these ones, the constructor also accepts many other parameters listed here, such as walking and cycling speed, maximum trip duration, maximum number of transit connections used during the trip, etc.

Now, we will first create a travel_time_matrix_computer instance as described above:

import datetime
from r5py import TravelTimeMatrixComputer, TransitMode, LegMode


travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=origin,
    destinations=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransitMode.TRANSIT, LegMode.WALK]
)
Warning: SIGINT handler expected:libjvm.so+0xbde490  found:libpython3.10.so.1.0+0x284690
Running in non-interactive shell, SIGINT handler is replaced by shell
Signal Handlers:
SIGSEGV: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGBUS: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGFPE: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGPIPE: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGXFSZ: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGILL: [libjvm.so+0xbdde80], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR2: [libjvm.so+0xbddd20], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
SIGHUP: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGINT: [libpython3.10.so.1.0+0x284690], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_ONSTACK
SIGTERM: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGQUIT: [libjvm.so+0xbde490], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO

Running this initializes the TravelTimeMatrixComputer, but any calculations were not done yet. To actually run the computations, we need to call .compute_travel_times() on the instance, which will calculate the travel times between all points:

travel_time_matrix = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix.head()
from_id to_id travel_time
0 54 0 26
1 54 1 29
2 54 2 29
3 54 3 31
4 54 4 28

As a result, this returns a pandas.DataFrame which we stored in the travel_time_matrix -variable. The values in the travel_time column are travel times in minutes between the points identified by from_id and to_id. As you can see, the id value in the from_id column is the same for all rows because we only used one origin location as input.

To get a better sense of the results, let’s create a travel time map based on our results. We can do this easily by making a table join between the points GeoDataFrame and the travel_time_matrix. The key in the travel_time_matrix table is the column to_id and the corresponding key in points GeoDataFrame is the column id:

join = points.merge(travel_time_matrix, left_on="id", right_on="to_id")
join.head()
id population geometry from_id to_id travel_time
0 0 389 POINT (24.90770 60.16199) 54 0 26
1 1 296 POINT (24.90771 60.15974) 54 1 29
2 2 636 POINT (24.90772 60.15750) 54 2 29
3 3 1476 POINT (24.90772 60.15526) 54 3 31
4 4 23 POINT (24.91219 60.16648) 54 4 28

Now we have the travel times attached to each point, and we can easily visualize them on a map:

join.explore("travel_time", cmap="Greens", marker_kwds={"radius": 12})
Make this Notebook Trusted to load map: File -> Trust Notebook

Compute travel time matrix from all to all locations

Running the calculations between all points in our sample dataset can be done in a similar manner as calculating the travel times from one origin to all destinations. Since, calculating these kind of all-to-all travel time matrices is quite typical when doing accessibility analyses, it is actually possible to calculate a cross-product between all points just by using the origins parameter (i.e. without needing to specify a separate set for destinations). r5py will use the same points as destinations and produce a full set of origins and destinations:

travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransitMode.TRANSIT, LegMode.WALK]
)
travel_time_matrix_all = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_all.head()
from_id to_id travel_time
0 0 0 0
1 0 1 7
2 0 2 10
3 0 3 18
4 0 4 13
travel_time_matrix_all.tail()
from_id to_id travel_time
87 91 87 27
88 91 88 23
89 91 89 10
90 91 90 6
91 91 91 0
len(travel_time_matrix_all)
8464

As we can see from the outputs above, now we have calculated travel times between all points (n=92) in the study area. Hence, the resulting DataFrame has almost 8500 rows (92x92=8464). Based on these results, we can for example calculate the median travel time to or from a certain point, which gives a good estimate of the overall accessibility of the location in relation to other points:

median_times = travel_time_matrix_all.groupby("from_id")["travel_time"].median()
median_times
from_id
0     23.0
1     25.0
2     28.0
3     29.0
4     26.5
      ... 
87    25.5
88    24.0
89    23.0
90    26.0
91    27.5
Name: travel_time, Length: 92, dtype: float64

To estimate, how long does it take in general to travel between locations in our study area (i.e. what is the baseline accessibility in the area), we can calculate the mean (or median) of the median travel times showing that it is approximately 22 minutes:

median_times.mean()
22.119565217391305

Naturally, we can also visualize these values on a map:

overall_access = points.merge(median_times.reset_index(), left_on="id", right_on="from_id")
overall_access.head()
id population geometry from_id travel_time
0 0 389 POINT (24.90770 60.16199) 0 23.0
1 1 296 POINT (24.90771 60.15974) 1 25.0
2 2 636 POINT (24.90772 60.15750) 2 28.0
3 3 1476 POINT (24.90772 60.15526) 3 29.0
4 4 23 POINT (24.91219 60.16648) 4 26.5
overall_access.explore("travel_time", cmap="Blues", scheme="natural_breaks", k=4, marker_kwds={"radius": 12})
Make this Notebook Trusted to load map: File -> Trust Notebook

In out study area, there seems to be a bit poorer accessibility in the Southern areas and on the edges of the region (i.e. we wittness a classic edge-effect here).

Advanced usage

Compute travel times with a detailed breakdown of the routing results

In case you are interested in more detailed routing results, it is possible to specify breakdown=True once initializing the TravelTimeMatrixComputer object. This will provide not only the same information as in the previous examples, but it also brings more detailed information about the routings. When breakdown is enabled, r5py produces information about the used routes for each origin-destination pair, as well as total time disaggregated by access, waiting, in-vehicle and transfer times:

travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=origin,
    destinations=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransitMode.TRANSIT, LegMode.WALK],
    breakdown=True,
)
travel_time_matrix_detailed = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_detailed.head()
from_id to_id travel_time routes board_stops alight_stops ride_times access_time egress_time transfer_time wait_times total_time n_iterations
0 54 0 26 [1030, 1021] [1050106, 1040278] [1040144, 1201129] [4.0, 4.0] 3.8 7.8 2.3 [2.8, 1.2] 26.0 1
1 54 1 29 [31M2] [1020602] [1201602] [3.0] 15.9 12.4 0.0 [1.5] 32.9 3
2 54 2 29 [31M2, 1008] [1040602, 1201432] [1201602, 1203401] [2.0, 2.0] 14.5 4.4 1.6 [2.0, 4.7] 31.2 1
3 54 3 31 [1030, 31M1] [1050106, 1040602] [1040144, 1201602] [4.0, 2.0] 3.8 21.9 2.0 [1.3, 1.2] 36.3 1
4 54 4 28 [31M2] [1020602] [1201602] [3.0] 15.9 11.4 0.0 [1.5] 31.8 4

As you can see, the result contains much more information than earlier, see the following table for explanations:

Column

Description

Data type

routes

The route-ids (lines) used during the trip

list

board_stops

The stop-ids of the boarding stops

list

alight_stops

The stop-ids of the alighting stops

list

ride_times

In vehicle ride times of individual journey legs

list

access_time

The time it takes for the “first mile” of a trip

float

egress_time

The time it takes for the “last mile” of a trip

float

transfer_time

The time it takes to transfer from vechile to another

float

wait_times

The time(s) it take to wait for the vehicle at a stop

list

total_time

Sum(ride_times, access_time, egress_time, transfer_time, wait_times)

float

n_iterations

Number of iterations used for calculating the travel times

int

Compute travel times for different percentiles

Because r5py calculates travel times for all possible transit departure possibilities within an hour (with one minute frequency), we basically get a distribution of travel times. It is possible to gather and return information about the travel times at different percentiles of this distribution based on all computed trips (sorted from the fastest to slowest connections). By default, the returned time in r5py is the median travel time (i.e. 50). You can access these percentiles by using a parameter percentiles which accepts a list of integers representing different percentiles, such as [25, 50, 75] which returns the travel times at those percentiles:

travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=origin,
    destinations=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransitMode.TRANSIT, LegMode.WALK],
    percentiles=[25, 50, 75],
)
travel_time_matrix_detailed = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_detailed.head()
from_id to_id travel_time_p25 travel_time_p50 travel_time_p75
0 54 0 24 26 27
1 54 1 26 28 29
2 54 2 27 28 30
3 54 3 29 30 33
4 54 4 26 28 30