Numlock Sunday: Dave Gershgorn on the biometric dragnet

By Walt Hickey
Welcome to the Numlock Sunday edition. Each week, I'll sit down with an author or a writer, behind one of the stories covered in a previous weekday edition for a casual conversation about what they wrote.
This week, I spoke to Dave Gershgorn who wrote “This Is How the U.S. Military’s Massive Facial Recognition System Works” for OneZero. Here's what I wrote about it:
Over the past 10 years, the U.S. military invested $345 million in biometric databases, the current result of which is ABIS, the Automated Biometric Information System. It’s a database of 7.4 million identities linked to facial images, DNA data, fingerprints and other biometric data collected from allied soldiers, suspected terrorists and non-U.S. citizens. Individuals of interest may find themselves on the BEWL (Biometrically Enabled Watch List) which will allow them to be identified over surveillance tech on borders, bases and battlefields. In the first half of 2019, 4,467 people on the BEWL were identified,…

and 2,728 of them were in “theater” (areas where American troops are commanded). This military-controlled database on incontrovertible biological information — built by a contractor — has worried privacy advocates because it’s literally the thing they have been the most concerned about since the invention of photography.
This is an outstanding scoop, and it’s a troubling story about the potential for privacy abuses.
We spoke about the system, how it got here, and how one file format error is the only thing standing between the full implementation of the military and FBI’s system to the Department of Homeland Security.
Dave can be found at OneZero and on Twitter.
This interview has been condensed and edited.
Walt Hickey: What is ABIS and why has it got some privacy advocates a little worried?
Dave Gershgorn: ABIS is an Automated Biometric Information System. ABIS is a military term for military biometrics, and there are a few different systems, but the biggest one that I wrote about was the Department of Defense's ABIS. It is a catalog of biometric information — which means iris scans, fingerprints, face images. That could be a photo captured in the wild off of security cameras, or driver's license photos or a passport photo or a photo taken when enrolled or trained at a military facility. It's all of this kind of information. It's unclear whether DNA is included in the system. Department of Defense documents that I've read have included the DNA in this kind of a biometric file. But the DOD reached out to me and said that DNA is not in the ABIS, but their reports say to the contrary. So I'm still waiting for confirmation on that.

The reason why it's concerning is because the military is building out this capability of collecting and running this biometric information on everyone that it encounters outside of the United States. That is concerning for everyone outside of the United States, but also it should be concerning to people inside the United States because a cornerstone of them having all of this information is them also sharing it with anyone inside the United States — the FBI, the Department of Homeland Security — who will share their information back to the military.
When you talk about having millions and millions and millions of citizens information able to be scraped up in the same search as someone who might be a suspected terrorist, you get this paradigm where we're relying on AI and trusting AI to not mess up. If it does, some innocent person in Michigan can be labeled as a terrorist because their biometric information was similar in some indiscernible way to biometric data collected days, months or years ago.
It's a dataset of not only enemy combatant data but also allies in the field. It seems like a key historical use of military technology has been the eventual adaptation of that technology into a civilian sense. How is it currently used what is the potential there?
In the DOD directives, they say that they kind of want quality over quantity. They want to get as much biometric information as they can because they want to know if they've seen someone before and they want to know who it is right as they see them. That doesn't really matter if they are a threat or not.
The existence of this enormous database allows smaller subsections, like what's called a Biometrically Enabled Watchlist. There are seven or six grades to the biometrically enabled watch lists in the U S military. That ranges from "don't hire them for a job" to "detain them on site". As of 2017, there are 214,000 people on the main DOD biometrically-enabled watch list that has been basically compiled over the last 10 years now over independent individual missions for individuals.

For sections of the United States military, there are different versions of these biometrically enabled watch lists, which is wild. That's for two reasons: one, so the software can work faster, as it doesn't have to search against 200,000 people. And second, so they can be deployed to normal devices, so you can actually have your entire watch list on some of the tools that are used to collect it to scan against biometric information. That's how these things work in the field, but the idea is that if they have all this information, then the more information they have, the more people they know, which makes the military feel more secure.
This was really striking, between 2008 and 2017, the DOD added a 213,000 individuals to the the BEWL and then during that same period, they arrested or killed 1,700 people around the world on the basis of biometric and forensic matches. Like this is not in the abstract anymore.
No, definitely not. It is not abstract. It is not like something that's a theoretic harm. It is a very specific and real harm that could be perpetrated on people. The line that they say when I've done reporting on the facial recognition and police departments in the United States is they'll say "it's just one factor, it's just one part of the entire system of the investigative process." But it's a very big part. If you have an identity, you can have extenuating circumstances that make it seem like someone is guilty even if they're not. When you're taking the world of possibilities down from 7.4 million people down to a top five list or a top-10 list or a top-20 list, there are some real dangers of innocent people getting scraped up on there.

How is it interfacing with other databases? You wrote something that was really fascinating to me, but like mildly encouraging, in that they can't talk to DHS because their databases are inconsistent for now.
What's incredible is the reason that they can't interface with Department of Homeland Security is because they have the wrong file format.
The FBI and the DOD use EBTS, or electronic biometric transmission specification. As far as I know right now, it has not upgraded yet, but they're upgrading this to be able to talk to even more databases. It was in version 1.7, and now it's in version 4.1. They made a huge leap to be able to work more information in this EBTS file format.
So the FBI uses EBTS, while DHS uses IXM. And it's just a simple file format error that is basically preventing this enormous panopticon of surveillance, which I love. Still, it's supposed to happen at some point soon.

What we're going to see now is the procurement cycle for the U S military only typically lasts a few years. There is an option to extend or reevaluate these contracts, and we're coming up on a few of them. One of them is for the Department of Defense’s ABIS system. Then in 2021, they have to link up a lot of these to make sure they are congruent across all of the systems for biometric information across the federal government. It's a really important thing that we need to be on the lookout for, because something as simple as a file format can shape the amount of surveillance that American citizens face on the order of like millions and millions and millions of identities being caught in this surveillance Dragnet.
You've been on the AI beat a while but I haven't seen you cover the military side of it all that much until recently.
I started covering AI from a research perspective a few years ago, I started in 2015 when a lot of the field was kind of thinking about how AI could even be applied. It was still very much something in the realm of research.
This was when like Facebook had just started its research lab. Google had just poured a bunch of money into its research lab and made it a formal thing. So there was a lot of enthusiasm. There was a lot of emphasis on how do we get AI into a state where it can be used for either business purposes or enterprise purposes or whatever.
And that very, very quickly happened. I mean, it's amazing what you can do with a few billion dollars and the scale of Facebook and Google.
But very quickly we started to realize that there were unintentional harms of deploying these systems on such a wide-scale basis. That's where you start to see critics talking about the bias in artificial intelligence algorithms or the bias in facial recognition or the errors in the datasets that the American government on the local through federal level use to train these machine-learning algorithms.

U.S. Army photo by Sgt. Hackbarth/Released
Once these things went into practice, people started saying, "I think we have to start tapping the brakes a little bit and really start thinking about ways to more elegantly design these algorithms." And once I got that idea in my head, I started to look for where the rubber meets the road. What are the surfaces where AI and humans meet? Where are those surfaces, where do the surfaces have the highest stakes?
That led me down the route where now that I understand how these systems work, and they don't seem to be fundamentally changing, how can I take what I know about them and apply that to their full reporting that I do right now? It helps understand what facial recognition can be used for if you understand the difference between like face detection and face recognition, because there are technical differences between the two. I've tried to carry that research perspective into the more concrete reporting that I do.
I think the thing that I arrived on is that people interact with the police all the time, and people who interact with the police often are typically either overpoliced or they're in danger of having their rights eroded or breached. That's a really important area where AI can either help or hurt. Another is in the military, where anything that you uncover about the military and when people are being detained or killed is very high stakes. The fact that the Department of Defense is relying on fairly untested technology to make those decisions is kind of scary. I think that's the way that I'm trying to choose my stories now.
I think these are just things that people should know, and people don't understand AI to begin with. It's really difficult for the general public to realize this, it's too dense a subject matter that you need to understand in order to get the full picture. So I hope that in some way this reporting kind of helps bridge those gaps.
Anything you’d recommend people check out?
I definitely think people should go to DFBA.mil and look at their crazy video that they have. It is really something. People should understand whether their local police use biometrics. And if they do, they should learn how they use biometrics and whether it's embedded in the infrastructure and think about whether that's something that they want in there. I think that that's a vision that we all have to make on a very local-level now as software is infinitely replicable.
You are at OneZero, you're a fairly new publication, right?
We've been around for a few months now and we have been growing rapidly. OneZero is Medium's official technology and science publication. We cover everything from this kind of stuff to, just today, a scoop about Ambrosia, the young blood transfusion startup that the FDA warned consumers about, but they're back and our fantastic biotech reporter Emily Milan got the scoop on that. So they're going to be selling young blood transfusions again.There's another story on discord and teen dating, one tracking the life cycle of an Amazon Basics battery back to the entire supply chain and like investigating Amazon's claims about how green they are.
I loved that one.
We're doing some really, really cool stories and I'm excited to see what we do.
If you have anything you’d like to see in this Sunday special, shoot me an email. Comment below! Thanks for reading, and thanks so much for supporting Numlock.
Thank you so much for becoming a paid subscriber!
Send links to me on Twitter at @WaltHickey or email me with numbers, tips, or feedback at walt@numlock.news.