Recent memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from (a) poor generalization or (b) an identity shortcut issue outputting the same as input.
We propose a (c) grid-based continous feature space, which can be a effective solution for resolving the issues.
When a normal patch is fed, even though there is no exact match in the training patches, a corresponding normal feature can be represented by interpolating near normal features (Well-generalization).
As the grid has never been exposed to abnormal features during training, it is unable to represent abnormal features by interpolating nearby normal features.
This is the core idea of how we can effectively resolve the identity shortcut issue frequently found in the existing methods that aggregate numerous features based on similarities.
There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing the input and the directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from poor generalization or an identity shortcut issue outputting the same as input, respectively. Furthermore, the majority of existing methods are designed to detect single-class anomalies, resulting in unsatisfactory performance when presented with multiple classes of objects. To tackle all of the above challenges, we propose CRAD, a novel anomaly detection method for representing normal features within a "continuous" memory, enabled by transforming spatial features into coordinates and mapping them to continuous grids. Furthermore, we carefully design the grids tailored for anomaly detection, representing both local and global normal features and fusing them effectively. Our extensive experiments demonstrate that CRAD successfully generalizes the normal features and mitigates the identity shortcut, furthermore, CRAD effectively handles diverse classes in a single model thanks to the high-granularity global representation. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection.
(a) The detailed architecture of CRAD and (b) visualization of coordinate jittering. The input x is firstly transformed into pixel-wise and feature-wise coordinates. After the normal features are sampled from local and global representations, they are fused by CNN blocks. The final reconstruction is acquired through the proposed feature refinement process.
The sampled features (from the local and global grids) with near coordinates share similar characteristics whether the input image is normal or not. Furthermore, the fused normal representations of both normal and abnormal inputs are reconstructed into normal images. This result demonstrates that the continuous feature space efficiently tackles the two major challenges in discrete feature space: weak generalization and IS.
CRAD achieves the best performance for unified anomaly detection.
@article{lee2024crad,
author = {Lee, Joo Chan and Kim, Taejune and Park, Eunbyung and Woo, Simon S. and Ko, Jong Hwan},
title = {Continuous Memory Representation for Anomaly Detection},
journal = {arXiv preprint arXiv:2402.18293},
year = {2024},
}