Some changes to previous analysis

Given the lack of ticket numbers in 2022, I am going to limit the analysis to 2017-2021. Also there is a stunningly low number of license plate information to work with. Huge amounts of the dataset have no state drivers license or license plate information:

	plate_and_lic_combined	%
	6,028,469	81.76
DC	1,130,953	15.34
MD	122,663	1.66
VA	54,572	0.74
FL	18,733	0.25
DE	4,906	0.07
MA	4,798	0.07
MI	3,174	0.04
PA	2,590	0.04
NC	2,137	0.03

Thinking about an approach that might bring about a useable hypothesis. First, omit the empty values and recompute:

	plate_and_lic_combined	%
DC	1,130,953	85.23
MD	122,663	9.24
VA	54,572	4.11
FL	18,733	1.41

It seems like a low percentage of Virginia drivers, but it could be representative.

The big problem is when the dataset is limited to Automated Traffic Enforcement cameras, there is a real paucity of data:

	plate_and_lic_combined	%
DC	1,002,118	96.13
FL	16,525	1.59
MD	8,638	0.83
DE	4,381	0.42
MA	4,271	0.41
VA	3,599	0.35
MI	2,888	0.28

There’s simply no way FL is near the top of this list.

I also looked at MAR_ID and I believe it is also correlated with ATE camera placements. I might be able to use it to better assign the top few ticket giving ATEs the text-matching algorithm may have missed.

Also worth noting that I found some metadata information on the opendata site: https://www.arcgis.com/sharing/rest/content/items/94455e9d5f42439788da06caeaaf35ac/info/metadata/metadata.xml?format=default&output=html

It defines the MAR_ID as ‘Master Address Repository (MAR) Unique Identifier’ which is probably a tool used by the geocoders to render the moving violations on the GIS system. Still it may match ATE cameras.