Spatiotemporal Analysis to identify Crash-prone Road sections

1. Introduction

Figure 1: Spatial proximity playing an important role in determining the outcome of a sporting incident - Hotspot technology in Cricket confirms that the ball has established contact with bat

I have fond memories of playing the Hot & Cold game as a kid. Perhaps you do, too - hiding a tiny object somewhere in the house and asking siblings and friends to find it within a preset period of time.

Upon being asked by a seeker, the hider has to give a verbal cue such as Hot, Very Hot, Cold and so on to signal the seeker's proximity to the hidden object - Hot implying near and Cold implying far. The joy derived from hiding the object securely or spotting it within the stipulated time frame was immense! The seekers would also protest vehemently if they found the hider's proximity cue to be false or misleading😁.

Figure 2: Adding 'attributes', i.e. information to spatial positions (coordinates), helps make sense of geopolitical events - this map depicts which entity controls each provincial territory in Afghanistan during the Taliban offensive of 2021. Source: BBC

In this post, I will demonstrate a real-world application of the Hot & Cold game - one which makes use of statistical cues. To perform Spatiotemporal Analysis of Vehicle Accidents in Brevard County in Florida, USA, I've used Esri's ArcGIS Pro - an advanced Location Analytics (GIS) software.

Much thanks to Lauren Scott Griffith & Lixin Huang for developing the tutorial on this topic on Esri Learn ArcGIS website.

Hyperlinks to Sections:

Introduction
Setting up the Datasets & Initiating Location Analytics
Performing Advanced Location Analytics using Getis-Ord Gi* Hot Spot method
Modeling the Workflow to Automate the Analysis
Visualizing the Results of Spatiotemporal Analysis in a 3D Scene
About the Firm

I have elaborated the case workings in a detailed manner in this post and would highly recommend that you read it. If you prefer to see a demonstration of the technology-at-work (less-detailed), here is a 16-minute video of the same-

Video 1: Walkthrough of deploying Location Analytics on Vehicle Crash Records

Spatiotemporal Analysis is a powerful way of making sense of Location Datasets - the data can be studied in both, its spatial (positional) form and temporal (time) form simultaneously. While I will begin explaining the implications of this two-dimensional analysis shortly, in case you are not familiar with what each dimension entails, here are two demonstrations that would help - Spatial (Site Suitability Analysis for Wildlife Habitat) and Temporal (Detecting Ships in Suez Canal, Egypt).

For the spatiotemporal analysis of Vehicle Accident records, I will make use of the Space Time Cube, Emerging Hot Spot Analysis and Hot Spot Analysis (Getis-Ord Gi*) geoprocessing tools within ArcGIS Pro software.

You may watch the two guides below to obtain an understanding of the methodology involved-

Video 2: Space Time Cube geoprocessing tool in ArcGIS Pro

Video 3: Hot Spot Analysis in ArcGIS Pro

Interesting, isn't it? Now to begin begin demonstrating its use...

2. Setting Up the Datasets & Initiating Location Analytics

At first, I will load the dataset containing the location and details of 100,000+ Vehicle Crash records from 2010 to 2015 (6 years) in Brevard County of Florida, USA onto ArcGIS Pro software. Aside from the positional information i.e. the exact coordinate of the recorded accident locations, several attributes pertaining to each Crash site are also present in the dataset (refer Figure 3 and Figure 4) - such as date and time of the crash, number of fatalities, number of injuries, the cause of the crash (whether the driver was under the influence of alcohol, was distracted, and so on), the prevalent weather conditions during the time of the crash, among other factors.

I believe such attributes are captured by Law Enforcement/Police around the world and so, replicating this workflow on another dataset is entirely possible and could be of high utility.

Figure 3: Vehicle Crash Records Attributes Part I

Figure 4: Vehicle Crash Records Attributes Part II

As the dataset contains Positional information i.e. the coordinates of each crash site, I can plot them on a standard 2-Dimensional (XY) Map-

Figure 5: Plotting Vehicle Crash Records on a 2D map

Besides the historical crash information, I possess another important (commonly available globally) dataset - the digitized Road Network of Brevard County-

Figure 6: A cross-section of the digitized Road Network layer of Brevard County

The next step involves restructuring the Crash Records dataset into a Space Time Cube format.

To explain using an analogy, just as we use Pivot Table tool in Microsoft Excel to convert raw data into a more meaningful and interpretable tabular summary, similarly, ArcGIS software is able to restructure and compartmentalize the Location Dataset into distinct Spatiotemporal buckets i.e. Bins using the Create Space Time Cube by Aggregating Points geoprocessing tool.

Figure 7: Pivot Table tabular summary of raw Sales Data in Microsoft Excel

Figure 8: Setting the parameters in the Create Space Time Cube by Aggregating Points geoprocessing tool

I am instructing the geoprocessing tool that each individual bucket i.e. Bin in the Space Time Cube should aggregate information corresponding to 2 miles of territory (spatial) across 16 weeks (temporal).

I have chosen not to render the output of Space Time Cube on the map as it is not relevant - you can review the tool's Output Summary in Figure 9 below though - it encapsulates how the raw location data has been restructured spatiotemporally into distinct Bins.

The Space Time Cube output file is directly saved in my system and I will use it as an input in the next geoprocessing step. Also, in the last phase of this demonstration, I will visually render and explain the processed Space Time Cube results in detail.

Figure 9: Output Summary upon running the Space Time Cube geoprocessing tool on Crash Records data

The next step involves performing an Emerging Hot Spot Analysis, also known as Space Time Pattern Mining.

Figure 10: Setting the parameters in the Emerging Hot Spot Analysis Geoprocessing tool

Lest you think of it as so, the Emerging Hot Spot Analysis output is not a depiction of the density of crashes. Rather, it is a visualization of the trend of crashes within a Bin i.e in a 2 miles cross-section over 16 weeks of time.

That being said, the trend derivation also takes into account (i.e. is relative to) the crash density trend in the neighboring Bins (Spatial as well as Temporal neighbors).

Do refer the technical note for the Neighborhood Time Step concept in Figure 12.

The derived Trend for a Bin will fall under one of these eight Hot or Cold Spot categories - New, Consecutive, Intensifying, Persistent, Diminishing, Sporadic, Oscillating or Historical - description mentioned in the Infographic below-

Figure 11: Description of the 8 types of Hot Spot categories. Source: ArcGIS Pro Geoprocessing Tool Reference

Next up, I will deploy the Emerging Hot Spot Analysis tool on the Count of Total Road Crashes over a single Neighborhood Time Step. Refer the technical note on Neighborhood Time Step below-

Figure 12: Technical Note on Neighborhood Time Step Statistics. Source: ArcGIS Pro Community

The output generated upon running the Emerging Hot Spot Analysis tool on the Count of Total Road Crashes over a single Neighborhood Time Step is depicted below-

Figure 13: Output generated upon running Emerging Hot Spot Analysis tool on Total Count of Road Crash Records over a single Neighborhood Time Step (the parameter)

Alongside the Map-based output, the Summary of Results table is also generated (Figure 14) which totals the Hot Spot classifications of the 221 Bins in the Space Time Cube - there are 2 New Hot Spots, 17 Consecutive Hot Spots, 59 Sporadic Hot Spots, 13 Oscillating Hot Spots, 23 Persistent Cold Spots, 18 Diminishing Cold Spots and 3 Sporadic Cold Spots (Figure 11 contains the description of each of these Hot Spot types). These account for 135 Bins in total - the remaining 86 Bins are neither Hot Spots nor Cold Spots i.e. these are statistically not significant.

Figure 14: Emerging Hot Spot Analysis tool's Summary of Results table

The category most commonly found in the Hot Spot output is the Sporadic Hot Spot which implies that the 59 affected Bins frequently switch between being a Hot Spot and not being one. So while we are certain that these Bins do not have any history of Cold Spots i.e. they are not completely devoid of crash incidents, however, they do not have a clearly distinguishable crash trend either - the statistical significance of the Hot Spots, whenever existent, is not always very high (refer the description of the Sporadic Hot Spot again if you'd like to).

As an Analyst who is seeking to identify risky road sections that are prone to crash incidents, it is the New (2), Persistent (0) and Intensifying (0) Hot Spot Bins that would be of primary interest to me.

Those who've been closely following this post thus far would have observed that I have not factored in Brevard County's Road Network dataset in the Space Time Cube and Emerging Hot Spot Analysis steps thus far. Yes, you are correct - the 2 miles Distance parameter that I set in the Emerging Hot Spot tool was Euclidean in nature i.e. based on straight-line distance between two points - it did not represent the real-world road distance connecting two Bins spatially.

Therefore, as I'd imagine you'll agree, in order to interpret the trend of Road Crashes for each Bin in a more accurate manner, I must factor in the digitized Road Network layer which contains information regarding actual Road Lengths.

Figure 15: Euclidean coverage from a defined starting point for given drive times (15, 30 and 50 minutes) assumes the form of concentric circles

Figure 16: In comparison, Road Network coverage from a defined starting point for a given drive time would be irregular in shape as road lengths and connectivity are not linear in all directions. While not as beautiful as the Euclidean representation, this is an accurate depiction of how far one would get after driving in any direction for a given period of time (45 mins)

Before I proceed to run the Hot Spot tool factoring in the county's digitized Road Network dataset, I would like to clean the raw Crash records dataset as there are some misleading datapoints within.

Observe in Figure 17 below that the location of some of the Crash sites (red dots) are not positioned directly on top of an existing road - rather, they are located beyond the road extent. This could be due to several reasons - it could be that the recorded site was where the vehicle ended-up in the aftermath of the accident and not where the crash had initially occurred. Or it could even be a case of uncalibrated/inaccurate GPS reading.

Anyways, these glitches certainly need to be corrected - the Crash sites must correspond to a location on top of the Road Network for the Hot Spot Tool to be able to factor it into its calculation.

Figure 17: Anomalies in Crash locations - certain spots are outside the Road boundary

Figure 18: Snap geoprocessing tool will help move the inaccurate records to on top of the Road Network

To make the adjustment, I will use the Snap geoprocessing tool - where I shall instruct the GIS software to move any Road Crash sites that are within 0.25 miles of a nearby road to on top of that Road (I have assumed that the Crash spots that are beyond 0.25 are faulty records and shall ignore it from the analysis).

The Snap tool corrects the irregularities, as can be observed in the figure below...

Figure 19: All Crash sites on top of Roads after running the Snap tool (compare it with Figure 17)

Figure 20: Setting the parameters in the Spatial Join geoprocessing tool

… which therefore allows me to perform the next step which is to integrate the two layers - the Crash locations and the digitized Road Network seamlessly, by using the Spatial Join geoprocessing tool.

Figure 21: Road Crash data 'joined' with the Road Network data as depicted in this pop-up - total 4 crashes occurred at the highlighted intersection

As evident in the popup in Figure 21, the Crash location dataset is now linked to the Road Network dataset. I am now ready to run the Hot Spot Analysis tool again...

Or am I?...

Figure 22: Using the Calculate Field geoprocessing tool, I will compute the Crash Rate per mile per year

Actually, no. There is one more irregularity left to correct. Longer roads in the Road Network will naturally have more crashes assigned to them and the Hot Spot output will be biased towards it. This isn't the right approach for analysis and would hamper the quality of interpretation.

In order to rectify this implicit defect, I will compute a new field in the Road Network Dataset - Crash Rate per mile, per year.

The newly computed field (Crash_Rate on the extreme right in Figure 23 below) decouples the number of Crashes (Join_Count) from the Road Length (Shape_Length), thereby enabling me to perform a more accurate analysis and interpretation of Crash Hot Spots.

Figure 23: Newly-computed Crash Rate per mile per year column (Crash_Rate) added to the Attribute Table

3. Performing Advanced Location Analytics using Getis-Ord Gi* Hot Spot method

Now that all the irregularities are sorted, I am definitely ready to perform Hot Spot Analysis again. Just that, instead of the Emerging Hot Spot Analysis tool, I'll be using the Hot Spot Analysis (Getis-Ord Gi*) tool. The Getis-Ord Gi* statistic would help me to spatially restrict the Hot Spot Analysis to just the Road Network (Figure 26) instead of the entire 2 miles Euclidean coverage of a Bin as was done previously (Figure 13).

Prior to running the tool, I will choose to assign weightage to not just the exact spot where the Crash occurred but also to the entire section of the road where the Crash sequence would have unfolded - right from the spot where the driver would have spotted the nearing obstacle till the spot where he/she would have crashed - this would be the accurate representation of the Accident-prone Road Section.

Figure 24: Description of the Getis-Ord Gi* Hot Spot Analysis geoprocessing tool

Getis-Ord Gi* Hot Spot Analysis Geoprocessing tool parameters — Figure 25: Getis-Ord Gi* Hot Spot Analysis geoprocessing tool parameters

This length of Crash sequence - technically known as Impedance Distance Cutoff parameter - which I shall use to derive the spatial weightage (conceptualization of spatial relationships parameter in Figure 25) is 110 meters (length of an American football field) which is the minimum stopping sight distance for a vehicle traveling at 45 miles per hour.

You may read a technical note on how to generate Spatial Weights for a Network Dataset here.

Upon running the Hot Spot Analysis (Getis-Ord Gi*) tool, the Hot Spots are now located on top of the Road Network (a desirable change to Emerging Hot Spot Analysis tool's Euclidean output as can be seen in Figure 13) -

Figure 26: Cross-sectional view of the output upon running Getis-Ord Gi* Hot Spot Analysis Geoprocessing tool

While Figure 26 depicts the hot spots derived from statistically analyzing all the Crash sites within Brevard County, let me refine the analysis by using the Hot Spot Analysis (Getis-Ord Gi*) tool on specific Crash types beginning first with Crashes which led to Fatalities(the statistical methodology remains the same, just the data field to be analyzed is changed in the Tool parameters).

I knew that there would naturally be a distinct change in the Hot Spot output. The GIS software allows me to compare both the Hot Spots outputs visually by stacking it side-by-side:

Figure 27: Comparing the All Crashes Hot Spot output (left) and Crashes that led to Fatalities Hot Spot output (right) - this is a cross-sectional view and not the entire output

This visual representation appears insightful - Hot Spots have emerged at new locations in the Fatality Hot Spot output on the right in Figure 27 which the analyst must pay close attention to. You would appreciate how running the Hot Spot tool on a specific attribute (Fatality) helped to bring to the fore the high-risk road sections that were diluted in the analysis of All Crashes and hence weren't visible in that Hot Spot output.

Similarly, I will repeat the same Hot Spot Analysis on Crashes where the Driver was under the influence of Alcohol and compare the output to the All Crashes Hot Spot output -

Figure 28: Cross-sectional view of Hot Spots of All Crashes (left) and Hot Spots of Crashes where the driver was under the influence of Alcohol (right)

The Alcohol-induced Crashes Hot Spot output shows several Hot Spots on the road running parallel to the river. This could be insightful for Law Enforcement - perhaps an indication of late night riverside partying?

This type of Spatiotemporal Analysis could be of high utility to multiple stakeholders. While the Municipal Corporation could decide to widen the roads at the Fatality-prone zones, the police may want to crack down on drunk-driving originating at riverside pubs.

By cleaning up the data initially, I have emphasized the need to use high-quality data inputs in order to make an accurate assessment of the data output generated by the powerful geospatial software.

4. Modeling the Workflow to automate the Analysis

Modern GIS platforms are very dynamic - let me demonstrate how one can analyze the Crash records much faster and in more detail up next.

The next question that I'd like to analyze is which hours of the day are the Crashes peaking in?

Fortunately, I can generate Charts and Tables from within the GIS platform itself - here's a Line Chart of three Attribute fields - Crash Count, Crash Hour of the Day, and Crash Day of the Week-

Line Chart generated by GIS itself capturing Crash count, Hour of the day, and Day of the Week — Figure 29: Line Chart generated by GIS itself capturing Crash count, Hour of the Day, and Day of the Week

Any insights that you can glean?

Let me change the Symbology-

Figure 30: Line Chart with New Symbology depicting Crash count, Hour of the Day and Day of the Week

Try once again. What can you assess?

The Crash incidents peak between 3 pm - 5 pm, particularly on Weekdays (Monday - Friday).

So now, let me run the next Hot Spot analysis on this newly-discovered Peak Crash time frame - it is sure to throw up interesting output. However, instead of running the entire workflow one-step-at-a-time (as demonstrated previously), I'll utilize an Analysis Model this time - this would allow me to replicate the entire workflow multiple times with different parameters, at the click of a button!

Figure 31: Analysis Model to compute the Hot Spot (Getis-Ord Gi*) during the Peak Crash time frame that I've discovered

This Model (Figure 31 above) may appear complicated at first glance, however, it is just a Graphical Codification of the methodology that I've demonstrated earlier in this post.

Figure 32: the manual Geoprocessing Tool - Create Day/Time Hot Spot Map which will be automated in the Geo-Analysis model

Recollect that I've demonstrated the four distinct steps that have been codified in the Model -

Step 1: Selecting the Crash Attribute field that one seeks to analyze

Step 2: Snapping the outlier Crash sites to on top of the Road Network

Step 3: Standardizing the Count of Crashes to the Road Network by computing Crash Rate per Mile per Year

Step 4: Creating a Hot-Spot Map using Hot Spot Analysis (Getis-Ord Gi*) tool

Such Models are not overly difficult to configure and may/may not require prior programming knowledge.

That being said, having such ready-to-run Models allows an analyst to replicate important workflows quickly, and in an error-free manner. It saves precious time as well as needless effort enabling one to perform high-end Location Analytics with a single click!

Upon running the model using the Peak Crash time frame, let me compare the generated output to the All Crashes Hot Spot output-

Figure 33: Cross-sectional view of All Crashes Hot Spots (left) compared to the Peak Crashes time frame Hot Spots (right)

Once again, new Hot Spots have emerged and the analyst should deep-dive into these locations to understand why they are Crash-prone (maybe, they are busy commercial thoroughfares) and what can be done to make it safer (installing traffic controllers/speed breakers/signage/road signals etc.).

5. Visualizing the Results of Spatiotemporal Analysis in a 3D Scene

Like when I experienced it for the first time, you may be blown away by this final study segment - visualizing Spatiotemporal Analysis in 3D! And no, you do not need to wear 3D glasses to view the demonstration😎.

So far you've witnessed Hot Spots on a two-dimensional (XY) frame - so while you know that a particular site is a Hot Spot (spatial view), you don't really know how the Hot Spot has evolved during the six-year time frame of the Crash records dataset (temporal view). Let me demonstrate it so that you can appreciate the utility of analyzing the dataset through Space and Time simultaneously.

First, let me me create Yearly Hot Spot Maps for each of the six years 2010 to 2015 (the temporal analysis will be done on a y-o-y basis). I will utilize an Analysis Model (Figure 34) to generate these outputs - It will save me from the labor of manually replicating the workflow six times.

Figure 34: This Analysis Model will allow me to create Yearly Crash Hot Spot Maps for each of the six years using any Crash Attribute as the parameter

I'll utilize another Model (Figure 34) to perform the Hot Spot analysis on particular Hours and Days of my choosing - that would be the Peak Crashes time frame that I had discovered earlier (3 pm to 5 pm on weekdays). The Yearly Hot Spot Maps generated in the previous Model will serve as inputs in this Model (see the yellow Yearly Hot Spot Maps input box in the last row)-

Figure 34: This Model will allow me to perform Hot Spot Analysis for the Peak Crash time frame using the Yearly Hot Spot Maps generated in the previous Model as inputs

Let me reveal to you the spatiotemporal rendition of the Year-on-Year Hot Spots of All Road Crashes during 3 pm-5 pm on Weekdays on a 3D Map Scene in ArcGIS Pro software-

Figure 36: Spatiotemporal rendition of the Hotspots for All Crashes during Peak time frame for each of the six years on a 3D Map Scene in ArcGIS Pro

The output in Figure 36 above may appear daunting at first. Let me explain to you what it is depicting and how to interpret it. Refer the highlighted portion in Figure 37 below-

Figure 37: This highlighted stack is representative of a 'Sporadic' Hot Spot

At the highlighted intersection on Prospect Ave road above, you are seeing a Sporadic Hot Spot (refer description). Recollect that this Hot Spot type was most prevalent in Brevard county upon running the Emerging Hot Spot Analysis tool earlier.

The first year's (2010) Hot Spot is visualized right at the bottom - refer the Map's Legend on the left of the image - the dark red shade represents a highly Statistically Significant Hot Spot (99% confidence). In the second year (2011), the Hot Spot disappears completely i.e. is Statistically not Significant as indicated in the Legend. In the third year (2012), the Hot Spot reappears and is Statistically Significant albeit weaker (light red shade represents 95% confidence) than the Hot Spot of 2010. In the fourth year (2013), the Hot Spot disappears once again only to reappear with maximum intensity in the fifth year(2014). And finally, the Hot Spot disappears once again in the sixth year (2015).

I hope that you have grasped how to interpret this spatiotemporal Hot Spot rendition and appreciate its utility.

How would you interpret this Hot Spot stack-

Figure 38: This Hot Spot stack represents a Persistent type as it is highly Statistically Significant (99% confidence) across all the six years (2010-2015) - this is a prime example of a Crash-prone Road Section and should be resolved on priority

And this one-

Figure 39: This Hot Spot stack represents a No Pattern Detected type as Hot Spot appears only in the fourth year (2013) where it has a moderately-high Statistical Significance (95% confidence)

Thank you for considering to view this elaborate demonstration. I hope that you enjoyed exploring the subject matter. Feel free to share your feedback.

Figure 40: Spatiotemporal rendition of Road Crash Hotspots over a period of six years during Peak time frame. Cross-sectional view of a 3D Scene in ArcGIS Pro.

ABOUT US

Intelloc Mapping Services, Kolkata | Mapmyops.com offers Mapping services that can be integrated with Operations Planning, Design and Audit workflows. These include but are not limited to Drone Services, Subsurface Mapping Services, Location Analytics & App Development, Supply Chain Services, Remote Sensing Services and Wastewater Treatment. The services can be rendered pan-India and will aid your organization to meet its stated objectives pertaining to Operational Excellence, Sustainability and Growth.

Broadly, the firm's area of expertise can be split into two categories - Geographic Mapping and Operations Mapping. The Infographic below highlights our capabilities-

Mapmyops (Intelloc Mapping Services) - Range of Capabilities and Problem Statements that we can help address

Our Mapping for Operations-themed workflow demonstrations can be accessed from the firm's Website / YouTube Channel and an overview can be obtained from this brochure. Happy to address queries and respond to documented requirements. Custom Demonstration, Training & Trials are facilitated only on a paid-basis. Looking forward to being of service.

Regards,

Arpit Shah