Leads: Dr. Meghanathan (JSU)
Co-Leads: Drs. Iyengar & Xie, Col. Miller (FIU), Dr. Chi (FAMU), Dr. Nagi Rao (ORNL)
Introduction
A forensic event network visualizes a crime scene through digital forensics information obtained by monitoring and analyzing traffic at the time of the event, logs from the event, legal evidence, or possible intrusions detected in the system. A forensic
event network comprises nodes (forensic events) and an edge connects two nodes if the associated forensic events have something in common, such as people who committed a crime, location of the crime, or date of the forensic events.
The edges are accordingly weighted with a vector of common attributes. In this research, we propose a multi-attribute based weighted temporal locality (WTL) graph model, a unique graph theoretic model for complex network analysis
that has been hitherto unexplored in literature, applying forensic model fusion to integrate the dynamics of the crime events (represented as nodes/vertices) captured with respect to five different attributes. These are Location,
Time, Law code, Offenders and Victims. The coordinates of a node in the WTL graph correspond to the latitude and longitude of the location attribute with an edge between any two nodes (i.e., the WTL graph is a complete graph [9]
with respect to the connectivity of the nodes). The distance between two nodes is the Euclidean distance between the coordinates of the nodes. However, the weight of an edge in the graph is quantified using the time attribute,
per one of the following three difference measures [1]: absolute real-time difference measure, daily-cycle measure (absolute difference between the two time instants mod 86400, which is the number of seconds in a day) and weekly-cycle
measure (absolute difference between the two time instants mod 14515200, which is the number of seconds in a week). In addition, the nodes (crime events) in the WTL graph are weighted: the weight of a node/crime event is represented
as a vector of the metric spaces considered for the offender(s) and victim(s) attributes.
Preliminary Work
We started our research by considering the recommendations of the classical work [1] for the attributes to represent a crime event and the viable metric spaces to represent these attributes. Accordingly, we represent a crime event as a vector of five
attributes [location, time, law code, offender(s), and victim(s)]. Each of these five attributes are represented in a metric space of one or more dimensions. For example, the location attribute is represented in a two-dimensional
metric space of latitude and longitude (corresponding to the GPS coordinates). Time (in one-dimension) is represented as the number of seconds past a cutoff date/time (say, January 1, 1970, 12 AM). Law code is represented as a
two- dimensional attribute (Title # and Section #) corresponding to the US Code [2] under which the crime is booked. Offender(s) and Victim(s) are each represented in a multi-dimensional metric space corresponding to their socio-economic,
demographic and geographical factors (like age, sex, annual income, ethnicity, home address and etc.) [1]. Based on the above attributes and metric spaces, we build a multi-attribute weighted temporal locality (WTL) graph of the
crime events booked under a particular Law code. Figure 1 in Appendix 4 presents a sample synthetic data set (demonstration only) comprising 8 crime events under a law code, and the WTL graph displaying the nodes (coordinates),
edges (sample for edge distance and the weights computed using the three-time difference measures: real-time difference; daily-cycle measure and weekly-cycle measure), and the node weights comprising the metric spaces age, sex
and ethnicity for the offenders and the victims. For clarity we display only node weights for three crime events. We have provided our preliminary data in Appendix 4 explaining how we will further develop the project objectives.
Research Objectives
We propose the following research tasks (involving various network science techniques) using the WTL graph to extract event signatures helpful for investigation.
- Task 1 - Similarity with respect to "Offenders" or "Victims" - We propose to examine the extent of similarity between any two crime events with respect to two or more of the metric spaces considered for the Offenders
(and likewise for Victims) and quantify the same. In this pursuit, we will subject the original WTL graph to a transformation wherein we will retain an edge between two crime events only if the normalized values for the metric
spaces considered are within a threshold. We will develop a binary search algorithm to determine the minimum value for this threshold (could be construed as a measure of similarity) that would keep the transformed WTL graph
connected. For example, with the approach to be developed for this task, we will be able to answer a question like: In a scale of 0-1 (0 being most similar), how would you rate the similarity of crime events committed by or
targeted on males (females) with respect to age?
- Task 2 - Spatial Cascadability of the Crime Events - We propose to develop a quantitative framework to assess the extent to which crime events (that occurred in a specific region) simply spread in space, starting
from the earliest occurring event (seed event). In this pursuit, we will develop an algorithm for spatial crime cascade based on the assumption that crime events have the potential to spread to locations that are within a threshold
distance (similar to cascadability of information in social networks [10]) and thereby further spread to the entire area (referred to as global cascade) that is being covered up for investigation. Given the WTL graph of the
crime events, we also propose to develop a binary search algorithm to determine the minimum threshold distance (referred to the cascadability distance index) that would be needed for the crime events to have simply spread through
spatial cascade starting from the seed crime event and eventually leading to a global cascade. Further, we propose to use the Spearman's rank-based correlation and the Kendall's concordance-based correlation measures [12-14,
18] to quantify (in a scale of -1 to 1) the extent to which the sequence of crime events predicted through the spatial cascading approach matches with their actual sequence of occurrence.
- Task 3 - Spectral Analysis of the Crime Events - Spectral analysis of complex real-world network graphs has been very useful to extract a large amount of hidden information in these graphs, which are otherwise
not easy to identify [15-16]. We propose to conduct spectral analysis of the WTL graph to extract useful quantitative information like the following which are not very obvious at the outset: (a) Relativity of Simultaneity score
for the crime events, which we formulate as a quantitative score in the range of 0,..., 1: the closer is the score to 0, the closer is the time of occurrence of a crime event to the times of occurrences of the other crime events
and vice-versa. (b) Bipartite clustering [6, 17] of the crime events wherein the absolute time difference between any two crime events within a cluster is expected to be much smaller than the absolute time difference between
any two crime events across the two clusters.
- Task 4 - Clusterability with respect to "Time" - In prior research [3], we observed the Hopkins Statistic [4] to effectively quantify the clusterability of complex real-world networks with respect to centrality
metrics [5]. Taking cue from this work, we propose to use the Hopkins Statistic to assess and quantify the clustering tendency of the WTL graph with respect to the daily-cycle and weekly-cycle measures considered for the edge
weight/time attribute. Unlike the conventional approach [7] to assess the clusterability of data points, our novelty is that we are assessing the clusterability of a pair of data points (a pair of crime events) on the basis
of their edge weight (the difference in their times of occurrences with respect to a particular difference measure). We can now answer questions such as: Are any two crime events more likely to happen around the same time of
day or same time of week, referred to as "burning times" [1] in criminology literature.
- Task 5 - Spatio-Temporal Assortative Matching of the Crime Events - We propose to build on top of our previous work on assortativity analysis [11, 21-23] for complex real-world networks and extract a spatio- temporal
assortative matching of the crime events from the WTL graph so that we could identify pairs of crime events that had happened close to each other with respect to both location (space) and time. For each edge, we compute the
product of the Euclidean distance between the end vertices of the edge and the absolute difference in the occurrence time of the two crimes corresponding to the edge’s end vertices. We propose to develop a greedy strategy for
assortative matching that prefers to include edges in the increasing order of the product of Euclidean distance and time difference.
- Task 6 - Homophily among Crime Events of more than one Law Code - We propose to apply the theory of homophily [20] from social network analysis and explore whether crimes that are booked under different law codes
could exist together as one community [24] in an amalgamated WTL graph. If the fraction of edges connecting the crimes of two different law codes in the amalgamated WTL graph is beyond a threshold (2pq; where p and q are the
fractions of the crime events booked under the two law codes), the crimes that were originally booked under a different law code could now be booked under either of the codes and further investigation could be conducted accordingly.