“[A]dversarial Gait Recognition has arisen as a major challenge in video surveillance systems, as deep learning-based gait recognition algorithms become more sensitive to adversarial attacks.”
So Zeeshan Ali and others are working on IMPROVING gait-based adversarial attacks…the better to counter them.
“Our technique includes two major components: AdvHelper, a surrogate model that simulates the target, and PerturbGen, a latent-space perturbation generator implemented in an encoder-decoder framework. This design guarantees that adversarial samples are both effective and perceptually realistic by utilizing reconstruction and perceptual losses. Experimental results on the benchmark CASIA-gait dataset show that the proposed method achieves a high attack success rate of 94.33%.”
Now we need to better detect these adversarial attacks.
As you may know, I’ve often used Grok to convert static images to 6-second videos. But I’ve never tried to do this with an occluded face, because I feared I’d probably fail. Grok isn’t perfect, after all.
Facia’s 2024 definition of occlusion is “an extraneous object that hinders the view of a face, for example, a beard, a scarf, sunglasses, or a mustache covering lips.” Facia also mentions the COVID practice of wearing masks.
Occlusion limits the data available to facial recognition algorithms, which has an adverse effect on accuracy. At the time, “lower chin and mouth occlusions caused an inaccuracy rate increase of 8.2%.” Occlusion of the eyes naturally caused greater inaccuracies.
So how do we account for occlusions? Facia offers three tactics:
But those acronyms aren’t enough, so we’ll add one more.
At the 2025 Computer Vision and Pattern Recognition conference, a group of researchers led by Pratheba Selvaraju presented a paper entitled “OFER: Occluded Face Expression Reconstruction.” This gives us one more acronym to play around with.
Here’s the abstract of the paper:
“Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for singleimage 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to generate a shape and expression coefficients of face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. However, to maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores. We evaluate our method using standard benchmarks and introduce CO-545, a new protocol and dataset designed to assess the accuracy of expressive faces under occlusion. Our results show improved performance over occlusion-based methods, while also enabling the generation of diverse expressions for a given image.
Cool. I was just writing about multimodal for a biometric client project, but this is a different meaning altogether.
In my non-advanced brain, the process of creating multiple options and choosing the one with the “best” fit (however that is defined) seems promising.
Although Grok didn’t do too badly with this one. Not perfect, but pretty good.
This one’s in Schwyz, in Switzerland, which makes reading of the original story somewhat difficult. But we can safely say that “Eine unbekannte Täterschaft hat zur Täuschung künstliche Intelligenz eingesetzt und so mehrere Millionen Franken erbeutet” is NOT a good thing.
And that’s millions of Swiss francs, not millions of Al Frankens.
“Deploying audio manipulated to sound like a trusted business partner, fraudsters bamboozled an entrepreneur from the canton of Schwyz into transferring “several million Swiss francs” to a bank account in Asia.”
THIS VIDEO IS FAKE. U.S. Government pictures are public domain, so I altered and animated the original picture a bit. Inspired by a picture shared by Mitch Wagner of Kirk and Spock in ugly Christmas sweaters.
David Hentschel: ARP 2500 synthesizer (uncredited)
The video doesn’t match this list. According to the video, Elton played more than the guitar, and Bernie Taupin performed on the track.
So while we didn’t use the term “deepfake” in 1973, this promotional video meets at least some of the criteria of a deepfake.
And before you protest that everybody knew that Elton John didn’t play guitar…undoubtedly some people saw this video and believed that Elton was a guitarist. After all, they saw it with their own eyes.