Spatiotemporal Analysis to identify Crash-prone Road sections
- Arpit Shah
- Jan 31, 2022
- 16 min read
Updated: 7 hours ago
1. Introduction

I have fond memories of playing the Hot & Cold game as a kid. Perhaps you do, too - hiding a tiny object somewhere in the house and asking siblings and friends to find it within a preset period of time.
Upon being asked by a seeker, the hider has to give a verbal cue such as Hot, Very Hot, Cold and so on to signal the seeker's proximity to the hidden object - Hot implying near and Cold implying far. The joy derived from hiding the object securely or spotting it within the stipulated time frame was immense! The seekers would also protest vehemently if they found the hider's proximity cue to be false or misleading😁.

In this post, I will demonstrate a real-world application of the Hot & Cold game - one which makes use of statistical cues. To perform Spatiotemporal Analysis of Vehicle Accidents in Brevard County in Florida, USA, I've used Esri's ArcGIS Pro - an advanced Location Analytics (GIS) software.
Much thanks to Lauren Scott Griffith & Lixin Huang for developing the tutorial on this topic on Esri Learn ArcGIS website.
Hyperlinks to Sections:
I have elaborated the case workings in a detailed manner in this post and would highly recommend that you read it. If you prefer to see a demonstration of the technology-at-work (less-detailed), here is a 16-minute video of the same-
Spatiotemporal Analysis is a powerful way of making sense of Location Datasets - the data can be studied in both, its spatial (positional) form and temporal (time) form simultaneously. While I will begin explaining the implications of this two-dimensional analysis shortly, in case you are not familiar with what each dimension entails, here are two demonstrations that would help - Spatial (Site Suitability Analysis for Wildlife Habitat) and Temporal (Detecting Ships in Suez Canal, Egypt).
For the spatiotemporal analysis of Vehicle Accident records, I will make use of the Space Time Cube, Emerging Hot Spot Analysis and Hot Spot Analysis (Getis-Ord Gi*) geoprocessing tools within ArcGIS Pro software.
You may watch the two guides below to obtain an understanding of the methodology involved-
Interesting, isn't it? Now to begin begin demonstrating its use...
2. Setting Up the Datasets & Initiating Location Analytics
At first, I will load the dataset containing the location and details of 100,000+ Vehicle Crash records from 2010 to 2015 (6 years) in Brevard County of Florida, USA onto ArcGIS Pro software. Aside from the positional information i.e. the exact coordinate of the recorded accident locations, several attributes pertaining to each Crash site are also present in the dataset (refer Figure 3 and Figure 4) - such as date and time of the crash, number of fatalities, number of injuries, the cause of the crash (whether the driver was under the influence of alcohol, was distracted, and so on), the prevalent weather conditions during the time of the crash, among other factors.
I believe such attributes are captured by Law Enforcement/Police around the world and so, replicating this workflow on another dataset is entirely possible and could be of high utility.


As the dataset contains Positional information i.e. the coordinates of each crash site, I can plot them on a standard 2-Dimensional (XY) Map-

Besides the historical crash information, I possess another important (commonly available globally) dataset - the digitized Road Network of Brevard County-

The next step involves restructuring the Crash Records dataset into a Space Time Cube format.
To explain using an analogy, just as we use Pivot Table tool in Microsoft Excel to convert raw data into a more meaningful and interpretable tabular summary, similarly, ArcGIS software is able to restructure and compartmentalize the Location Dataset into distinct Spatiotemporal buckets i.e. Bins using the Create Space Time Cube by Aggregating Points geoprocessing tool.


I am instructing the geoprocessing tool that each individual bucket i.e. Bin in the Space Time Cube should aggregate information corresponding to 2 miles of territory (spatial) across 16 weeks (temporal).
I have chosen not to render the output of Space Time Cube on the map as it is not relevant - you can review the tool's Output Summary in Figure 9 below though - it encapsulates how the raw location data has been restructured spatiotemporally into distinct Bins.
The Space Time Cube output file is directly saved in my system and I will use it as an input in the next geoprocessing step. Also, in the last phase of this demonstration, I will visually render and explain the processed Space Time Cube results in detail.

The next step involves performing an Emerging Hot Spot Analysis, also known as Space Time Pattern Mining.

Lest you think of it as so, the Emerging Hot Spot Analysis output is not a depiction of the density of crashes. Rather, it is a visualization of the trend of crashes within a Bin i.e in a 2 miles cross-section over 16 weeks of time.
That being said, the trend derivation also takes into account (i.e. is relative to) the crash density trend in the neighboring Bins (Spatial as well as Temporal neighbors).
Do refer the technical note for the Neighborhood Time Step concept in Figure 12.
The derived Trend for a Bin will fall under one of these eight Hot or Cold Spot categories - New, Consecutive, Intensifying, Persistent, Diminishing, Sporadic, Oscillating or Historical - description mentioned in the Infographic below-

Next up, I will deploy the Emerging Hot Spot Analysis tool on the Count of Total Road Crashes over a single Neighborhood Time Step. Refer the technical note on Neighborhood Time Step below-

The output generated upon running the Emerging Hot Spot Analysis tool on the Count of Total Road Crashes over a single Neighborhood Time Step is depicted below-

Alongside the Map-based output, the Summary of Results table is also generated (Figure 14) which totals the Hot Spot classifications of the 221 Bins in the Space Time Cube - there are 2 New Hot Spots, 17 Consecutive Hot Spots, 59 Sporadic Hot Spots, 13 Oscillating Hot Spots, 23 Persistent Cold Spots, 18 Diminishing Cold Spots and 3 Sporadic Cold Spots (Figure 11 contains the description of each of these Hot Spot types). These account for 135 Bins in total - the remaining 86 Bins are neither Hot Spots nor Cold Spots i.e. these are statistically not significant.

The category most commonly found in the Hot Spot output is the Sporadic Hot Spot which implies that the 59 affected Bins frequently switch between being a Hot Spot and not being one. So while we are certain that these Bins do not have any history of Cold Spots i.e. they are not completely devoid of crash incidents, however, they do not have a clearly distinguishable crash trend either - the statistical significance of the Hot Spots, whenever existent, is not always very high (refer the description of the Sporadic Hot Spot again if you'd like to).
As an Analyst who is seeking to identify risky road sections that are prone to crash incidents, it is the New (2), Persistent (0) and Intensifying (0) Hot Spot Bins that would be of primary interest to me.
Those who've been closely following this post thus far would have observed that I have not factored in Brevard County's Road Network dataset in the Space Time Cube and Emerging Hot Spot Analysis steps thus far. Yes, you are correct - the 2 miles Distance parameter that I set in the Emerging Hot Spot tool was Euclidean in nature i.e. based on straight-line distance between two points - it did not represent the real-world road distance connecting two Bins spatially.
Therefore, as I'd imagine you'll agree, in order to interpret the trend of Road Crashes for each Bin in a more accurate manner, I must factor in the digitized Road Network layer which contains information regarding actual Road Lengths.


Before I proceed to run the Hot Spot tool factoring in the county's digitized Road Network dataset, I would like to clean the raw Crash records dataset as there are some misleading datapoints within.
Observe in Figure 17 below that the location of some of the Crash sites (red dots) are not positioned directly on top of an existing road - rather, they are located beyond the road extent. This could be due to several reasons - it could be that the recorded site was where the vehicle ended-up in the aftermath of the accident and not where the crash had initially occurred. Or it could even be a case of uncalibrated/inaccurate GPS reading.
Anyways, these glitches certainly need to be corrected - the Crash sites must correspond to a location on top of the Road Network for the Hot Spot Tool to be able to factor it into its calculation.


To make the adjustment, I will use the Snap geoprocessing tool - where I shall instruct the GIS software to move any Road Crash sites that are within 0.25 miles of a nearby road to on top of that Road (I have assumed that the Crash spots that are beyond 0.25 are faulty records and shall ignore it from the analysis).
The Snap tool corrects the irregularities, as can be observed in the figure below...


… which therefore allows me to perform the next step which is to integrate the two layers - the Crash locations and the digitized Road Network seamlessly, by using the Spatial Join geoprocessing tool.

As evident in the popup in Figure 21, the Crash location dataset is now linked to the Road Network dataset. I am now ready to run the Hot Spot Analysis tool again...
Or am I?...

Actually, no. There is one more irregularity left to correct. Longer roads in the Road Network will naturally have more crashes assigned to them and the Hot Spot output will be biased towards it. This isn't the right approach for analysis and would hamper the quality of interpretation.
In order to rectify this implicit defect, I will compute a new field in the Road Network Dataset - Crash Rate per mile, per year.
The newly computed field (Crash_Rate on the extreme right in Figure 23 below) decouples the number of Crashes (Join_Count) from the Road Length (Shape_Length), thereby enabling me to perform a more accurate analysis and interpretation of Crash Hot Spots.

3. Performing Advanced Location Analytics using Getis-Ord Gi* Hot Spot method
Now that all the irregularities are sorted, I am definitely ready to perform Hot Spot Analysis again. Just that, instead of the Emerging Hot Spot Analysis tool, I'll be using the Hot Spot Analysis (Getis-Ord Gi*) tool. The Getis-Ord Gi* statistic would help me to spatially restrict the Hot Spot Analysis to just the Road Network (Figure 26) instead of the entire 2 miles Euclidean coverage of a Bin as was done previously (Figure 13).
Prior to running the tool, I will choose to assign weightage to not just the exact spot where the Crash occurred but also to the entire section of the road where the Crash sequence would have unfolded - right from the spot where the driver would have spotted the nearing obstacle till the spot where he/she would have crashed - this would be the accurate representation of the Accident-prone Road Section.


This length of Crash sequence - technically known as Impedance Distance Cutoff parameter - which I shall use to derive the spatial weightage (conceptualization of spatial relationships parameter in Figure 25) is 110 meters (length of an American football field) which is the minimum stopping sight distance for a vehicle traveling at 45 miles per hour.
You may read a technical note on how to generate Spatial Weights for a Network Dataset here.
Upon running the Hot Spot Analysis (Getis-Ord Gi*) tool, the Hot Spots are now located on top of the Road Network (a desirable change to Emerging Hot Spot Analysis tool's Euclidean output as can be seen in Figure 13) -

While Figure 26 depicts the hot spots derived from statistically analyzing all the Crash sites within Brevard County, let me refine the analysis by using the Hot Spot Analysis (Getis-Ord Gi*) tool on specific Crash types beginning first with Crashes which led to Fatalities(the statistical methodology remains the same, just the data field to be analyzed is changed in the Tool parameters).
I knew that there would naturally be a distinct change in the Hot Spot output. The GIS software allows me to compare both the Hot Spots outputs visually by stacking it side-by-side:

This visual representation appears insightful - Hot Spots have emerged at new locations in the Fatality Hot Spot output on the right in Figure 27 which the analyst must pay close attention to. You would appreciate how running the Hot Spot tool on a specific attribute (Fatality) helped to bring to the fore the high-risk road sections that were diluted in the analysis of All Crashes and hence weren't visible in that Hot Spot output.
Similarly, I will repeat the same Hot Spot Analysis on Crashes where the Driver was under the influence of Alcohol and compare the output to the All Crashes Hot Spot output -

The Alcohol-induced Crashes Hot Spot output shows several Hot Spots on the road running parallel to the river. This could be insightful for Law Enforcement - perhaps an indication of late night riverside partying?
This type of Spatiotemporal Analysis could be of high utility to multiple stakeholders. While the Municipal Corporation could decide to widen the roads at the Fatality-prone zones, the police may want to crack down on drunk-driving originating at riverside pubs.
By cleaning up the data initially, I have emphasized the need to use high-quality data inputs in order to make an accurate assessment of the data output generated by the powerful geospatial software.
4. Modeling the Workflow to automate the Analysis
Modern GIS platforms are very dynamic - let me demonstrate how one can analyze the Crash records much faster and in more detail up next.
The next question that I'd like to analyze is which hours of the day are the Crashes peaking in?
Fortunately, I can generate Charts and Tables from within the GIS platform itself - here's a Line Chart of three Attribute fields - Crash Count, Crash Hour of the Day, and Crash Day of the Week-

Any insights that you can glean?
Let me change the Symbology-

Try once again. What can you assess?
The Crash incidents peak between 3 pm - 5 pm, particularly on Weekdays (Monday - Friday).
So now, let me run the next Hot Spot analysis on this newly-discovered Peak Crash time frame - it is sure to throw up interesting output. However, instead of running the entire workflow one-step-at-a-time (as demonstrated previously), I'll utilize an Analysis Model this time - this would allow me to replicate the entire workflow multiple times with different parameters, at the click of a button!

This Model (Figure 31 above) may appear complicated at first glance, however, it is just a Graphical Codification of the methodology that I've demonstrated earlier in this post.

Recollect that I've demonstrated the four distinct steps that have been codified in the Model -
Step 1: Selecting the Crash Attribute field that one seeks to analyze
Step 2: Snapping the outlier Crash sites to on top of the Road Network
Step 3: Standardizing the Count of Crashes to the Road Network by computing Crash Rate per Mile per Year
Step 4: Creating a Hot-Spot Map using Hot Spot Analysis (Getis-Ord Gi*) tool
Such Models are not overly difficult to configure and may/may not require prior programming knowledge.
That being said, having such ready-to-run Models allows an analyst to replicate important workflows quickly, and in an error-free manner. It saves precious time as well as needless effort enabling one to perform high-end Location Analytics with a single click!
Upon running the model using the Peak Crash time frame, let me compare the generated output to the All Crashes Hot Spot output-

Once again, new Hot Spots have emerged and the analyst should deep-dive into these locations to understand why they are Crash-prone (maybe, they are busy commercial thoroughfares) and what can be done to make it safer (installing traffic controllers/speed breakers/signage/road signals etc.).
5. Visualizing the Results of Spatiotemporal Analysis in a 3D Scene
Like when I experienced it for the first time, you may be blown away by this final study segment - visualizing Spatiotemporal Analysis in 3D! And no, you do not need to wear 3D glasses to view the demonstration😎.
So far you've witnessed Hot Spots on a two-dimensional (XY) frame - so while you know that a particular site is a Hot Spot (spatial view), you don't really know how the Hot Spot has evolved during the six-year time frame of the Crash records dataset (temporal view). Let me demonstrate it so that you can appreciate the utility of analyzing the dataset through Space and Time simultaneously.
First, let me me create Yearly Hot Spot Maps for each of the six years 2010 to 2015 (the temporal analysis will be done on a y-o-y basis). I will utilize an Analysis Model (Figure 34) to generate these outputs - It will save me from the labor of manually replicating the workflow six times.

I'll utilize another Model (Figure 34) to perform the Hot Spot analysis on particular Hours and Days of my choosing - that would be the Peak Crashes time frame that I had discovered earlier (3 pm to 5 pm on weekdays). The Yearly Hot Spot Maps generated in the previous Model will serve as inputs in this Model (see the yellow Yearly Hot Spot Maps input box in the last row)-

Let me reveal to you the spatiotemporal rendition of the Year-on-Year Hot Spots of All Road Crashes during 3 pm-5 pm on Weekdays on a 3D Map Scene in ArcGIS Pro software-

The output in Figure 36 above may appear daunting at first. Let me explain to you what it is depicting and how to interpret it. Refer the highlighted portion in Figure 37 below-

At the highlighted intersection on Prospect Ave road above, you are seeing a Sporadic Hot Spot (refer description). Recollect that this Hot Spot type was most prevalent in Brevard county upon running the Emerging Hot Spot Analysis tool earlier.
The first year's (2010) Hot Spot is visualized right at the bottom - refer the Map's Legend on the left of the image - the dark red shade represents a highly Statistically Significant Hot Spot (99% confidence). In the second year (2011), the Hot Spot disappears completely i.e. is Statistically not Significant as indicated in the Legend. In the third year (2012), the Hot Spot reappears and is Statistically Significant albeit weaker (light red shade represents 95% confidence) than the Hot Spot of 2010. In the fourth year (2013), the Hot Spot disappears once again only to reappear with maximum intensity in the fifth year(2014). And finally, the Hot Spot disappears once again in the sixth year (2015).
I hope that you have grasped how to interpret this spatiotemporal Hot Spot rendition and appreciate its utility.
How would you interpret this Hot Spot stack-

And this one-

Thank you for considering to view this elaborate demonstration. I hope that you enjoyed exploring the subject matter. Feel free to share your feedback.

ABOUT US
Intelloc Mapping Services, Kolkata | Mapmyops.com offers Mapping services that can be integrated with Operations Planning, Design and Audit workflows. These include but are not limited to Drone Services, Subsurface Mapping Services, Location Analytics & App Development, Supply Chain Services, Remote Sensing Services and Wastewater Treatment. The services can be rendered pan-India and will aid your organization to meet its stated objectives pertaining to Operational Excellence, Sustainability and Growth.
Broadly, the firm's area of expertise can be split into two categories - Geographic Mapping and Operations Mapping. The Infographic below highlights our capabilities-

Our Mapping for Operations-themed workflow demonstrations can be accessed from the firm's Website / YouTube Channel and an overview can be obtained from this brochure. Happy to address queries and respond to documented requirements. Custom Demonstration, Training & Trials are facilitated only on a paid-basis. Looking forward to being of service.
Regards,