Haversine Mapping for Spatial Integration in Graph Convolutional Networks

Thomas A. Fink™
4 min readOct 22, 2023

--

Introduction

Alright so you have traffic data already for your temporal graph convolutional network but would like to incorporate additional data such as weather measurements collected by additional weather sensors. Using the Haversine Formula we can find the nearest weather sensor for each traffic sensor. This data is then appended and matched via the timestamps to and exported in merged_speed_traffic_and_air_temperature_data.csv file.

merged_speed_traffic_and_air_temperature_data.csv

DATETIMESTAMP,773869_speed,767541_speed,767542_speed, ... ,773869_temp,767541_temp,767542_temp
2012-03-01 00:00:00,64.375,67.625,67.125,61.5,66.875, ... ,nan, 51.8, 51.8

Now you just need to forwards fill the nan values and normalize the data for a T-GCN model and feed it in.

Traffic sensors are computed to the nearest weather sensor for the pems-bay and metr-la datasets.

Repo Structure

A brief file structure overview of the repository is provided.

/
map_spatial_integration_haversine_mapping.py
spatial_integration_haversine_mapping.py
- / data / mpetr-la /
- /sensors
metr_la_sensors_traffic.csv
metr_la_sensors_weather.csv
- /traffic
speed.csv
...
- /weather
air_temp_set_1_fahrenheit.csv
...

- / output / metr-la /
merged_speed_traffic_and_air_temperature_data.csv
sensor_map.html

Prerequisites

Before jumping into the code the following requirements and packages are needed to run the code:

Python 3.10.6
pip3 install pandas
pip3 install numpy
pip3 install folium

First the packages that were just installed are imported into our file adjacency_matrix.py

import numpy as np
import pandas as pd
import os

Code

The Haversine formula calculates the shortest distance between two points on the surface of a sphere, given their longitudes and latitudes.

Given two points:

and

The Haversine formula is:

r is the radius of the Earth (approximately 6371 kilometers).

# Haversine formula to calculate the distance between two geographical points
def haversine(lat1, lon1, lat2, lon2):
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
r = 6371 # Radius of Earth in kilometers
return c * r

To find the nearest weather sensor for a given traffic sensor, we calculate the distance between the traffic sensor and each weather sensor using the Haversine formula. The weather sensor with the smallest distance is considered the nearest.

Given a traffic sensor

and a set of weather sensors

, the nearest weather sensor w_k is

def find_nearest_weather_sensor(traffic_lat, traffic_lon, weather_df):
distances = weather_df.apply(lambda row: haversine(traffic_lat, traffic_lon, row['lat'], row['long']), axis=1)
return weather_df.iloc[distances.idxmin()]['detid']

Using the above method, each traffic sensor is mapped to its nearest weather sensor. This results in a dictionary where the keys are traffic sensor IDs and the values are the corresponding nearest weather sensor IDs.

Where M is the mapping function.

# Map each traffic sensor to its nearest weather sensor
traffic_sensors_df['nearest_weather_sensor'] = traffic_sensors_df.apply(
lambda row: find_nearest_weather_sensor(row['lat'], row['long'], weather_sensors_df),
axis=1
)

# Dictionary mapping of traffic sensor to its nearest weather sensor
sensor_to_weather_mapping = dict(zip(traffic_sensors_df['detid'], traffic_sensors_df['nearest_weather_sensor']))

The traffic and temperature are matched via the date timestamps and saved into one csv file. spatial_integration_haversine_mapping.py

# Function to merge data based on timestamps
def merge_data_on_timestamps(traffic_speed_df, air_temp_df, sensor_to_weather_mapping):
# Initialize merged dataframe with DATETIMESTAMP column
merged_df = pd.DataFrame()
merged_df["DATETIMESTAMP"] = traffic_speed_df["DATETIMESTAMP"]

# Iterate through each sensor in the traffic_speed_df
for sensor in traffic_speed_df.columns[1:]:
# Copy the speed data
merged_df[sensor] = traffic_speed_df[sensor]

# Find corresponding weather sensor
weather_sensor = sensor_to_weather_mapping[int(sensor)]

# Find corresponding temperature data
if f"{weather_sensor}" in air_temp_df.columns:
merged_df[f"{sensor}_temp"] = air_temp_df[f"{weather_sensor}"].reindex_like(traffic_speed_df)

return merged_df
...
merged_df.to_csv(merged_file_path, index=False)

The GitHub repository can be found here.

--

--

Thomas A. Fink™
Thomas A. Fink™

Written by Thomas A. Fink™

amateur geopol • astro • #design • ios+android software engineering • data science • neural networks • fiat enthusiast • student • 🇺🇸🇪🇺

No responses yet