Publication: Improved Generative Evaluation: Utilizing the Manifold Hypothesis
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Reliably diagnosing and evaluating generative models remains an open problem. The most common metric employed to measure the performance of image generating models has been the Fréchet Inception Distance (FID) score. It has been shown to correlate well with human judgement and it is sensitive to mode collapse. However, recent papers have demonstrated issues with the metric, such as poor performance on certain tasks, long computation time, and failure to match human perception. We propose a novel formulation, Class-Aware Latent Distance (CALD), to alleviate these issues. The proposed score can be applied across domains, is quicker to compute, and matches human judgement where FID fails. To demonstrate the capability of this approach, we perform an empirical study on three image generation tasks and compare our metric to FID and Intrinsic Multiscale Distance (IMD). Across these experiments, we show that the proposed metric offers significant speedup over existing methods, is more reliable, and can be used to diagnose and perform early stopping during the training of Generative Adversarial Networks.