You Can Measure Quality, But is the Measure Meaningful? (OFIQ)

The purpose of measuring quality should not be for measurement’s own sake. The purpose should be to inform people to make useful decisions.

In Germany, the Bundesamt für Sicherheit in der Informationstechnik (Federal Office for Information Security) has developed the Open Source Face Image Quality (OFIQ) standard.

Experienced biometric professionals can’t help but notice that the acronym OFIQ is similar to the acronym NFIQ (used in NFIQ 2), but the latter refers to the NIST FINGERPRINT image quality standard. NFIQ is also open source, with contributions from NIST and the German BSI, among others.

But NFIQ and OFIQ, while analyzing different biometric modalities, serve a similar purpose: to distinguish between good and bad biometric images.

But do these open source algorithms meaningfully measure quality?

The study of OFIQ

Biometric Update alerted readers to the November 2025 study “On the Utility of the Open Source Facial Image Quality Tool for Facial Biometric Recognition in DHS Operations” (PDF).

Note the words “in DHS Operations,” which are crucial.

  • The DHS doesn’t care about how ALL facial recognition algorithms perform.
  • The DHS only cares about the facial recognition algorithms that may potentially use.
  • DHS doesn’t care about algorithms it would never use, such as Chinese or Russian algorithms.
  • In fact, from the DHS perspective, it probably hopes that the Chinese Cloudwalk algorithm performs very badly. (In NIST tests, it doesn’t.)

So which algorithms did DHS evaluate? We don’t know precisely.

“A total of 16 commercial face recognition systems were used in this evaluation. They are labeled in diagrams as COTS1 through COTS16….Each algorithm in this study was voluntarily submitted to the MdTF as
part of on-going biometric performance evaluations by its commercial entity.”

Usally MdTF rally participants aren’t disclosed, unless a participant discloses itself, like Paravision did after the 2022 Biometric Technology Rally.

“Paravision’s matching system alias in the test was ‘Miami.'”

Welcome to Miami, bienvenidos a Miami. Google Gemini.

So what did DHS find when it used OFIQ to evaluate images submitted to these 16 algorithms?

“We found that the OFIQ unified quality score provides extremely limited utility in the DHS use cases we investigated. At operationally relevant biometric thresholds, biometric matching performance was high and probe samples that were assessed as having very low quality by OFIQ still successfully matched to references using a variety of face recognition algorithms.”

Or in human words:

  • Images that yielded a high quality OFIQ score accurately matched faces using the tested algorithms.
  • Images that yielded a low quality OFIQ score…STILL accurately matched faces using the tested algorithms.
Google Gemini.

So, at least in DHS’ case, it makes no sense to use the OFIQ algorithm.

Your mileage may vary.

If you have questions, consult a biometric product marketing expert.

Or Will Smith. Just don’t make a joke about his wife.

Leave a Comment