Given the lack of ticket numbers in 2022, I am going to limit the analysis to 2017-2021. Also there is a stunningly low number of license plate information to work with. Huge amounts of the dataset have no state drivers license or license plate information:
Thinking about an approach that might bring about a useable hypothesis. First, omit the empty values and recompute:
It seems like a low percentage of Virginia drivers, but it could be representative.
The big problem is when the dataset is limited to Automated Traffic Enforcement cameras, there is a real paucity of data:
There’s simply no way FL is near the top of this list.
I also looked at MAR_ID and I believe it is also correlated with ATE camera placements. I might be able to use it to better assign the top few ticket giving ATEs the text-matching algorithm may have missed.
Also worth noting that I found some metadata information on the opendata site: https://www.arcgis.com/sharing/rest/content/items/94455e9d5f42439788da06caeaaf35ac/info/metadata/metadata.xml?format=default&output=html
It defines the MAR_ID as ‘Master Address Repository (MAR) Unique Identifier’ which is probably a tool used by the geocoders to render the moving violations on the GIS system. Still it may match ATE cameras.