The (possible) Afghan data treasure trove doesn’t just threaten the Taliban’s enemies

Recent events in Afghanistan have resulted in discussions among information technology and security professionals.

Taliban fighters in Kabul, Afghanistan, 17 August 2021. By VOA – https://www.youtube.com/watch?v=nAg7egiXClU, Public Domain, https://commons.wikimedia.org/w/index.php?curid=109043891

One August 17 article from the Intercept hit close to home for me:

THE TALIBAN HAVE seized U.S. military biometrics devices that could aid in the identification of Afghans who assisted coalition forces, current and former military officials have told The Intercept.

This post talks about the data the Taliban could POTENTIALLY get from captured biometric devices and other sources, and how that data could conceivably pose a threat to the Taliban’s enemies AND the Taliban itself.

What data could the Taliban get from biometric devices?

The specific device referenced by the Intercept article was HIIDE…and let’s just say that while I don’t know as much about that device as I should, I do know a little bit about it. (It was manufactured by a company that was subsequently acquired by Safran.)

Another source implies that the Taliban may have acquired another device that the Intercept DIDN’T reference. The Taliban may not only have acquired live HIIDE devices, but also may have acquired devices from another company called SEEK.

(Yes, folks, these devices are called HIIDE and SEEK.)

At the time that this was revealed, I posted the following comment on LinkedIn:

Possession is not enough. Can the Taliban actually access the data? And how much data is on the devices themselves?

Someone interviewed by the Intercept speculated that even if the Taliban did not have the technological capability to hack the devices, it could turn to Pakistan’s Inter-Service Intelligence to do so. As we’ve learned over the years, Pakistan and the Taliban (and the Taliban’s allies such as al Qaeda) are NOT bitter enemies.

As I said, I don’t know enough about HIIDE and SEEK, so I’m not sure about some key things.

  • For example, I don’t know whether their on-board biometric data is limited to just biometric features (rather than images). While there’s the possibility that the devices stored biometric images, that has a drawback because of the large size of the images. Features derived from the images (which are necessary in matching anyway) take up much less storage space. And while biometric images are necessary in some cases (such as forensic latent fingerprint examination), there’s no need for images in devices that make a hit/no-hit decision without human intervention.
  • In addition, I don’t know what textual data is linked to the features (or images) on these devices. Obviously the more textual information that is available, such as a name, the more useful the data can be.
  • Also, the features stored on the devices may or may not be useful. There is no one standard for the specification of biometric features (each vendor has its own proprietary feature specification), and while it may be possible to convert fingerprint features from one vendor system to be used by another vendor’s system, I don’t know if this is possible for face and iris features.

Best-case scenario? Even if the Taliban or its friends can access the data on the devices, the data does not provide enough information for it to be used.

Worst-case scenario? The data DOES provide enough information so that EVERY PERSON whose data is stored on the device can be identified by a Taliban-equivalent device, which would presumably be called FIND (Find Infidels, Neutralize, Destroy).

I’ll return to that “every person” point later in this post.

But biometric data isn’t the only data that might have fallen into the Taliban’s hands.

What data could the Taliban get from non-biometric devices?

Now Politico has come out with its own article that asserts that the Taliban can potentially acquire a lot of other data. And Politico is not as pessimistic as the Intercept about the Taliban’s tech capabilities:

That gives today’s technologically adept Taliban tools to target Afghans who worked with the U.S. or the deposed Afghan government with unprecedented precision, increasing the danger for those who don’t get out on evacuation flights.

Before looking at the data the Taliban may have acquired, it’s useful to divide the data sources between data acquired from clients and data acquired from on-premise servers. HIIDE and SEEK, for example, are clients. (I’m only talking about on-premise servers because any data stored in a US government cloud can hopefully be secured so that the Taliban can’t get it. Hopefully.)

Unlike HIIDE and SEEK, which are mobile client devices, the Politico article focuses on data that is stored on on-premise Afghan government servers. It notes that American IT officials were more likely than Afghan IT officials to scrub their systems before the Taliban takeover, and one would hope that any data stored in US government cloud systems could also be secured before the Taliban could access it.

So what types of data would the Afghan government servers store?

Telecom companies store reams of records on who Afghan users have called and where they’ve been. Government databases include records of foreign-funded projects and associated personnel records.

More specifics are provided regarding telecom company data:

Take call logs. Telecommunications companies keep a record of nearly every phone call placed and to whom. U.S. State Department officials used the local cell networks to make calls to those who were working with the United States, including interpreters, drivers, cooks and more…

And mobile phone data is even more revealing:

Cell phones and mobile apps share data about users with third-party apps, such as location data, that the Taliban could easily get…

The geolocation issue has been known for years. Remember the brouhaha when military users of a particular fitness app effectively revealed the locations of secret U.S. military facilities?

Helmand province in Afghanistan. Photograph: Strava heatmap. Reproduced at https://www.theguardian.com/world/2018/jan/28/fitness-tracking-app-gives-away-location-of-secret-us-army-bases

In locations like Afghanistan, Djibouti and Syria, the users of Strava seem to be almost exclusively foreign military personnel, meaning that bases stand out brightly. In Helmand province, Afghanistan, for instance, the locations of forward operating bases can be clearly seen, glowing white against the black map.

Now perhaps enemy forces already knew about these locations, but it doesn’t help to broadcast them to everyone.

Back to Afghanistan and other data sources.

Afghan citizens’ ethnicity information can also be found in databases supporting the national ID system and voter registration.

This can be used by digital identity opponents to argue that digital identity, or any identity, is dangerous. I won’t dive into that issue right now.

Politico mentions other sources of data that the Taliban could conceivably access, including registration information (including identity documents) for non-governmental organization workers, tax records, and military commendation records.

So if you add up all of the data from all of the Afghan servers, and if the Taliban or its allies are able to achieve some level of technical expertise, then the data provides enough information so that EVERY PERSON whose data is stored on the servers can be identified by the Taliban.

Before we completely panic…

Of course it takes some effort to actually EMPLOY all of this data. In the ideal world, the Taliban would create a supercomputer system that aggregates the data and creates personal profiles that provide complete pictures of every person. But the world is not ideal, even in technologically advanced countries: remember that even after 9/11, it took years for the U.S. Departments of Justice, Homeland Security, and Defense to get their biometric systems to talk to each other.

Oh, and there’s one more thing.

Remember how I’ve mentioned a couple of times that the Taliban could conceivably get information on EVERY PERSON whose data is stored on these devices and servers?

One thing that’s been left unsaid by all of these commentaries is that this data trove not only reveals information about the enemies of the Taliban, but also reveals information about the Taliban itself.

  • The HIIDE and SEEK devices could include biometric templates of Taliban members (who would be considered “enemies” by these devices and may have been placed on “deny lists”).
  • The telecommunications records could reveal calls placed and received by Taliban members, including calls to Afghan government officials and NATO members that other Taliban members didn’t know about.
  • Mobile phone records could reveal the geolocations of Taliban members at any time, including locations that they didn’t want their fellow Taliban members to know about.
  • In general, the records could reveal Taliban members, including high-ranking Taliban members, who were secretly cooperating with the Taliban’s enemies.

With the knowledge that all of this data is now available, how many Taliban members will assist in decrypting this data? And how many will actively block this?

Oh, and even if all of the Taliban were completely loyal, any entity (such as the Pakistani Inter-Service Intelligence) that gets a hold of the data will NOT restrict its own data acquisition efforts to American, NATO, and former Afghan government intelligence. No, it will acquire information on the Taliban itself.

After all, this information could help the Pakistanis (or Chinese, or Russians, or whoever) put the, um, finger on Taliban members, should it prove useful to do so in the future.

Then again, Pakistan may want to ensure that its own digital data treasure trove is safe.