2019.7
论文地址:https://arxiv.org/abs/1906.01529v1
项目地址:https:// github.com/sheqi/ GAN Review
Abstrac
Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably the revolutionary techniques are in the area of computer vision such as plausible image generation, image to image translation, facial attribute manipulation and similar domains. Despite the significant success achieved in the computer vision field, applying GANs to real-world problems still poses significant challenges, three of which we focus on here: (1) High quality image generation; (2) Diverse image generation; and (3) Stable training. Through an in-depth review of GAN-related research in the literature, we provide an account of the architecture-variants and loss-variants, which have been proposed to handle these three challenges from two perspectives. We propose loss-variants and architecture-variants for classifying the most popular GANs, and discuss the potential improvements with focusing on these two aspects. While several reviews for GANs have been presented to date, none have focused on the review of GAN-variants based on their handling the challenges mentioned above. In this paper, we review and critically discuss 7 architecture-variant GANs and 9 loss-variant GANs for remedying those three challenges. The objective of this review is to provide an insight on the footprint that current GANs research focuses on the performance improvement. Code related to GAN-variants studied in this work is summarized on https:// github.com/ sheqi/ GAN Review.
在过去几年中,生成对抗网络(GAN)已被广泛研究。可以说,革命性技术属于计算机视觉领域,如合理的图像生成,图像到图像的转换,面部属性操作和类似的领域。尽管在计算机视觉领域取得了显着成功,但将GAN应用于现实问题仍然带来了重大挑战,其中三个我们关注的重点:(1)高质量的图像生成; (2)多样化的图像生成; (3)稳定的训练。通过对文献中与GAN相关的研究的深入回顾,我们提供了体系结构变体和损失函数变体的说明,已经提出从两个角度处理这三个挑战。我们提出了损失变量和架构变体,用于对最受欢迎的GAN进行分类,并通过关注这两个方面来讨论潜在的改进。虽然迄今为止已经提交了一些关于GAN的综述,但没有一个综述基于他们处理上述挑战的GAN变体的评论。在本文中,我们回顾并批判性地讨论了7种架构变体GAN和9种损失变体GAN,以弥补这三个挑战。本次审查的目的是提供对当前GAN研究侧重于绩效改进的足迹的见解。与本研究中研究的GAN变体相关的代码总结在https:// github.com/sheqi/GAN Review上。
1 INTRODUCTION
Ging growing interests in the deep learning commu- ENERATIVE adversarial networks (GANs) are attractnity [1]–[6]. GANs have been applied to various domains such as computer vision [7]–[14], natural language processing [15]–[18], time series synthesis [19]–[23], semantic segmentation [24]–[28] etc. GANs belong to the family of generative models. Comparing to other generative models e.g., variational autoencoders, GANs offer advantages such as an ability to handle sharp estimated density functions, efficiently generating desired samples, eliminating deterministic bias and good compatibility with the internal neural architecture. These properties have allowed GANs to enjoy success especially in the computer vision field e.g., plausible image generation [29]–[33], image to image translation [2], [34]–[40], image super-resolution [26], [41]–[44] and image completion [45]–[49].
生成 - 对抗网络(GAN)是深度学习领域越来越吸引人们兴趣的方向[1] - [6]。 GAN已应用于各种领域,如计算机视觉[7] - [14],自然语言处理[15] - [18],时间序列综合[19] - [23],语义分割[24] - [28] GAN属于生成模型的家族。 与其他生成模型(例如变分自动编码器)相比,GAN提供诸如处理尖锐估计密度函数的能力,有效生成所需样本,消除确定性偏差以及与内部神经结构的良好兼容性等优点。 这些特性使GAN成功,特别是在计算机视觉领域,例如,合理的图像生成[29] - [33],图像到图像的翻译[2],[34] - [40],图像超分辨率[26] ,[41] - [44]和图像完成[45] - [49]。
However, GANs suffer challenges from two aspects: (1) Hard to train — It is non-trivial for discriminator and generator to achieve Nash equilibrium during the training and the generator cannot learn the distribution of the full datasets well, which is known as mode collapse. Lots of work has been carried out in this area [50]–[53]; and (2) Hard to evaluate — the evaluation of GANs can be considered as an effort to measure the dissimilarity between the real distribution pr and the generated distribution pg. Unfortunately, the accurate estimation of pr is not possible. Thus, it is challenging to produce good estimations of the correspondence between pr and pg. Previous work has introduced evaluation metrics for GANs [54]–[62]. The first aspect concerns the performance for GANs directly e.g., image quality, image diversity and stable training. In this work. we are going to study existing GAN-variants that handle this aspect in the area of computer vision while those readers interested in the second aspect can consult [54], [62].
然而,GAN受到两个方面的挑战:(1)难以训练 - G和N在训练期间达到纳什均衡是非常重要的,并且G无法很好地学习完整数据集的分布,这被称为模式坍方。在这方面已经开展了大量工作[50] - [53]; (2)难以评估 - 对GAN的评估可以被认为是衡量实际分布pr与生成的分布pg之间的不相似性的努力。不幸的是,pr的准确估计是不可能的。因此,对pr和pg之间的对应关系进行良好估计是具有挑战性的。以前的工作引入了GAN的评估指标[54] - [62]。第一方面涉及GAN的性能,例如图像质量,图像分集和稳定训练。在这项工作中。我们将研究在计算机视觉领域处理这方面的现有GAN变体,而那些对第二方面感兴趣的读者可以参考[54],[62]。
Current GANs research focuses on two directions: (1) Improving the training for GANs; and (2) Deployment of GANs to real-world applications. The former seeks to improve GANs performance and is therefore a foundation for the latter aspect. Considering numerous research work in the literature, we give a brief review on the GANvariants that focus on improving training in this paper. The improvement of the training process provides benefits in terms of GANs performance as follows: (1) Improvements in generated image diversity (also known as mode diversity); (2) Increases in generated image quality; and (3) More stable training such as remedying the vanishing gradient for the generator. In order to improve the performance as mentioned above, modification for GANs can be done from either the architectural side or the loss perspective. We will study the GAN-variants coming from both sides that improve the performance for GANs. The rest of the paper is organized as follows: (1) We introduce the search strategy and part of the results (complete results are illustrated in Supplementary material) for the existing GANs papers in the area of computer vision; (2) We introduce related review work for GANs and illustrate the difference between those reviews and this work; (3) We give a brief introduction to GANs; (4) We review the architecture-variant GANs in the literature; (5) We review the loss-variant GANs in the literature; (6) We summarize the GAN-variants in this study and illustrate their difference and relationships; and (7) We conclude this review and preview likely future research work in the area of GANs.
目前的GAN研究主要集中在两个方向:(1)改进GAN的训练; (2)将GAN部署到实际应用程序中。前者旨在提高GAN的性能,因此是后一方面的基础。考虑到文献中的大量研究工作,我们简要回顾了本文关注改进训练的GAN变体。训练过程的改进在GAN性能方面提供如下益处:(1)产生的图像多样性的改进(也称为模式多样性); (2)生成图像质量的提高; (3)更稳定的训练,例如纠正生成器的消失梯度。为了改善如上所述的性能,可以从架构方面或损失方面对GAN进行修改。我们将研究来自双方的GAN变体,以改善GAN的性能。本文的其余部分安排如下:(1)我们在计算机视觉领域的现有GAN论文中介绍了搜索策略和部分结果(完整结果在补充材料中说明); (2)我们介绍了GAN的相关审查工作,并说明了这些审查与这项工作之间的区别; (3)我们简要介绍一下GAN; (4)我们回顾了文献中的体系结构变体GAN; (5)我们回顾了文献中的损失变量GAN; (6)我们总结了本研究中的GAN变体,并说明了它们的差异和关系; (7)我们总结了这篇评论,并预测了GAN领域未来可能的研究工作。
Many GAN-variants have been proposed in the literature to improve performance. These can be divided into two types: (1) Architecture-variants. The first proposed GAN used fully-connected neural networks [1] so specific types of architecture may be beneficial for specific applications e.g., convolutional neural networks (CNNs) for images and recurrent neural networks (RNNs) for time series data; and (2) Loss-variants. Here different variations of the loss function are explored (1) to enable more stable learning of G
在文献中已经提出了许多GAN变体以改善性能。 这些可以分为两种类型:(1)架构变体。 第一个提出的GAN使用完全连接的神经网络[1],因此特定类型的架构可能对特定应用有益,例如,用于图像的卷积神经网络(CNN)和用于时间序列数据的递归神经网络(RNN); (2)损失变体。 这里探讨了损失函数的不同变化(1)以使得能够更稳定地学习G
2 SEARCH STRATEGY AND RESULTS
pass
3 RELATED WORK
There has been previous GANs review papers for example in terms of reviewing GANs performance [63]. That work focuses on the experimental validation across different types of GANs benchmarking on LSUN-BEDROOM [64], CELEBA-HQ-128 [65] and the CIFAR10 [66] image datasets. The results suggest that the original GAN [1] with spectral normalization [67] is a good starting choice when applying GANs to a new dataset. A limitation of that review is that the benchmark datasets do not consider diversity in a significant way. Thus the benchmark results tend to focus more on evaluation of the image quality, which may ignore GANs efficacy in producing diverse images. Work [68] surveys different GANs architectures and their evaluation metrics. A further comparison on different architecture-variants’ performance, applications, complexity and so on needs to be explored. Papers [69]–[71] focus on the investigation of the newest development treads and the applications of GANs. They compare GAN-variants through different applications. Comparing this review to the current review literature, we emphasize an introduction to GAN-variants based on their performance including their ability to produce high quality and diverse images, stable training, ability for handling the vanishing gradient problem, etc. This is all done through the taking of a perspective based on architecture and loss function considerations. This work also provides the comparison and analysis in terms of pros and cons across GAN-variants presented in this paper.
例如,在审查GAN表现方面,以前的GAN综述文章已经有了[63]。这项工作的重点是在LSUN-BEDROOM [64],CELEBA-HQ-128 [65]和CIFAR10 [66]图像数据集上对不同类型的GAN进行基准测试的实验验证。结果表明,在将GAN应用于新数据集时,具有谱归一化[67]的原始GAN [1]是一个很好的起始选择。该审查的局限性在于基准数据集不会以显着的方式考虑多样性。因此,基准测试结果倾向于更多地关注图像质量的评估,这可能忽略GAN在产生不同图像方面的功效。工作[68]调查了不同的GAN架构及其评估指标。需要进一步比较不同架构变体的性能,应用,复杂性等。论文[69] - [71]着重研究最新的发展趋势和GAN的应用。他们通过不同的应用程序比较GAN变体。将此评论与当前的评论文献进行比较,我们强调基于其表现的GAN变体的介绍,包括它们产生高质量和多样化图像的能力,稳定的训练,处理消失梯度问题的能力等。这一切都是通过基于架构和损失函数考虑的观点。这项工作还提供了本文介绍的GAN变体的优缺点的比较和分析。
4 GENERATIVE ADVERSARIAL NETWORKS
Figure. 1 demonstrates the architecture of a typical GAN. The architecture comprises two components, one of which is a discriminator (D) distinguishing between real images and generated images while the other one is a generator (G) creating images to fool the discriminator. Given a distribution z ∼ pz, G defines a probability distribution pg as the distribution of the samples G(z). The objective of a GAN is to learn the generator’s distribution pg that approximates the real data distribution pr. Optimization of a GAN is performed with respect to a joint loss function for D and G
图1展示了典型GAN的架构。 该体系结构包括两个组件,其中一个是区分真实图像和生成图像的鉴别器(D),而另一个是生成图像以欺骗鉴别器的生成器(G)。 给定分布z~pz,G将概率分布pg定义为样本G(z)的分布。 GAN的目标是学习近似于实际数据分布的生成器分布pg。 针对D和G的关节损失函数执行GAN的优化
GANs, as a member of the deep generative model (DGM) family, has attracted exponentially growing interest in the deep learning community because of some advantages comparing to the tradition DGMs: (1) GANs are able to produce better output than other DGMs. Comparing to the most well-known DGMs—variational autoencoder (VAE), GANs are able to produce any type of probability density while VAE is not able to generate sharp images; (2) The GAN framework can train any type of generator network. Other DGMs may have pre-requirements for the generator e.g., the output layer of generator is Gaussian; (3) There is no restriction on the size of the latent variable. These advantages have led GANs to achieve the state of art performance on producing synthetic data especially for image data.
GAN作为深度生成模型(DGM)家族的一员,由于与传统DGM相比具有一些优势,因此引起了对深度学习社区的兴趣:(1)GAN能够产生比其他DGM更好的输出。 与最知名的DGM变分自动编码器(VAE)相比,GAN能够产生任何类型的概率密度,而VAE无法产生清晰的图像。 (2)GAN框架可以训练任何类型的G网络。 其他DGM可能对发生器有预先要求,例如,发生器的输出层是高斯分布; (3)潜变量的大小没有限制。 这些优点使GAN在生成合成数据(尤其是图像数据)方面达到了最佳性能。
5 ARCHITECTURE-VARIANT GANS
There are many types of architecture-variants proposed in the literature (see Fig.2) [33], [34], [72]–[74]. Architecturevariant GANs are mainly proposed for the purpose of different applications e.g., image to image transfer [34], image super resolution [41], image completion [75], and text-toimage generation [76]. In this section, we provide a review on architecture-variants that helps improve the performance for GANs from three aspects mentioned before, namely improving image diversity, improving image quality and more stable training. Review for those architecture-variants for different applications can be referred to work [68], [70].
文献中提出了许多类型的架构变体(见图2)[33],[34],[72] - [74]。 Architecturevariant GAN主要用于不同的应用,例如图像到图像传变换[34],图像超分辨率[41],图像完成[75]和文本图像生成[76]。 在本节中,我们提供了一个体系结构变体的评论,它有助于从前面提到的三个方面改善GAN的性能,即改善图像多样性,提高图像质量和更稳定的培训。 对于不同应用的架构变体的评论可以参考工作[68],[70]。
5.1 Fully-connected GAN (FCGAN)
The original GAN paper [1] uses fully-connected neural networks for both generator and discriminator. This architecture-variant is applied for some simple image datasets i.e., MNIST [77], CIFAR-10 [66] and Toronto Face Dataset. It does not demonstrate good generalization performance for more complex image types.
原始的GAN论文[1]使用完全连接的神经网络来生成发生器和鉴别器。 该架构变体适用于一些简单的图像数据集,即MNIST [77],CIFAR-10 [66]和Toronto Face Dataset。 对于更复杂的图像类型,它没有表现出良好的泛化性能。
5.2 Laplacian Pyramid of Adversarial Networks (LAPGAN)
LAPGAN is proposed for the production of higher resolution images from lower resolution input GAN [78]. Figure. 3 demonstrates the up-sampling process of generator in LAPGAN from right to left. LAPGAN utilizes a cascade of CNNs within a Laplacian pyramid framework [80] to generate high quality images.
LAPGAN被提议用于从较低分辨率输入GAN [78]产生更高分辨率的图像。 图3从右到左演示了LAPGAN中发生器的上采样过程。 LAPGAN利用拉普拉斯金字塔框架[80]内的级联CNN来生成高质量图像。
5.3 Deep Convolutional GAN (DCGAN)
DCGAN is the first work that applied a deconvolutional neural networks architecture for G [72]. Figure. 4 illustrates the proposed architecture for G. Deconvolution is proposed to visualize the features for a CNN and has shown good performance for CNNs visualization [81]. DCGAN deploys the spatial up-sampling ability of the deconvolution operation for G, which enables the generation of higher resolution images using GANs.
DCGAN是第一个将解卷积神经网络架构应用于G [72]的工作。 数字。 图4示出了用于G的所提出的体系结构。提出了解卷积以可视化CNN的特征并且已经示出了CNN可视化的良好性能[81]。 DCGAN为G的解卷积操作部署了空间上采样能力,这使得能够使用GAN生成更高分辨率的图像。
5.4 Boundary Equilibrium GAN (BEGAN)
BEGAN uses an autoencoder architecture for the discriminator which was first proposed in EBGAN [82] (see Fig. 5). Compared to traditional optimization, the BEGAN matches the autoencoder loss distributions using a loss derived from the Wasserstein distance instead of matching data distributions directly. This modification helps G to generate easyto-reconstruct data for the autoencoder at the beginning because the generated data is close to 0 and the real data distribution has not been learned accurately yet, which prevents D easily winning G at the early training stage.
BEGAN使用自动编码器架构作为鉴别器,这在EBGAN [82]中首次提出(见图5)。 与传统优化相比,BEGAN使用从Wasserstein距离导出的损失而不是直接匹配数据分布来匹配自动编码器损失分布。 这种修改有助于G在开始时为自动编码器生成易于重建的数据,因为生成的数据接近0并且尚未准确地学习实际数据分布,这阻止了D在早期训练阶段轻松赢得G.
5.5 Progressive GAN (PROGAN)
PROGAN involves progressive steps toward the expansion of the network architecture [74]. This architecture uses the idea of progressive neural networks first proposed in [83]. This technology does not suffer from forgetting and can leverage prior knowledge via lateral connections to previously learned features. Consequently it is widely applied for learning complex task sequences. Figure. 6 demonstrates the training process for PROGAN. Training starts with low resolution 4 × 4 pixels image. Both G and D start to grow with the training progressing. Importantly, all variables remain trainable throughout this growing process. This progressive training strategy enables substantially more stable learning for both networks. By increasing the resolution little by little, the networks are continuously asked a much simpler question comparing to the end goal of discovering a mapping from latent vectors. All current state-of-the-art GANs employ this type of training strategy and it has resulted in impressive, plausible images [29], [74], [84].
PROGAN涉及扩展网络架构的渐进步骤[74]。该架构使用了[83]中首次提出的渐进神经网络的思想。该技术不会遗忘,并且可以通过横向连接利用先前知识来学习先前学习的特征。因此,它被广泛应用于学习复杂的任务序列。图6展示了PROGAN的培训过程。训练从低分辨率4×4像素图像开始。随着训练的进行,G和D都开始增长。重要的是,在整个增长过程中,所有变量都可以训练。这种渐进式培训策略可以为两个网络提供更稳定的学习。通过逐渐增加分辨率,与从潜在向量发现映射的最终目标相比,网络不断被问到一个更简单的问题。所有当前最先进的GAN都采用这种类型的培训策略,并产生了令人印象深刻的合理图像[29],[74],[84]。
5.6 Self-attention GAN (SAGAN)
Traditional CNNs can only capture local spatial information and the receptive field may not cover enough structure, which causes CNN-based GANs to have difficulty in learning multi-class image datasets (e.g., ImageNet) and the key components in generated images may shift e.g., the nose in a face-generated image may not appear in right position. Self-attention mechanism have been proposed to ensure large receptive field and without sacrificing computational efficiency for CNNs [85]. SAGAN deploys a self-attention mechanism in the design of the discriminator and generator architectures for GANs [86] (see Fig. 7). Benefiting from the self-attention mechanism, SAGAN is able to learn global, long-range dependencies for generating images. It has achieved great performance on multi-class image generation based on the ImageNet datasets.
传统的CNN只能捕获局部空间信息,并且感知域可能无法覆盖足够的结构,这导致基于CNN的GAN难以学习多类图像数据集(例如,ImageNet),并且生成的图像中的关键组件可能会移位,例如, 脸部生成的图像中的鼻子可能不会出现在正确的位置。 已经提出自我关注机制以确保大的感受野并且不牺牲CNN的计算效率[85]。 SAGAN在GAN的鉴别器和发生器架构的设计中采用了自我关注机制[86](见图7)。 受益于自我关注机制,SAGAN能够学习生成图像的全局远程依赖性。 它在基于ImageNet数据集的多类图像生成方面取得了很好的性能。
5.7 BigGAN
BigGAN [84] has also achieved state-of-the-art performance on the ImageNet datasets. Its design is based on SAGAN and it has been demonstrated that the increase in batch size and the model complexity can dramatically improve GANs performance with respect to complex image datasets.
BigGAN [84]也在ImageNet数据集上实现了最先进的性能。 它的设计基于SAGAN,并且已经证明,批量大小和模型复杂性的增加可以显着提高GAN在复杂图像数据集方面的性能。
5.8 Summary
We have provided an overview of architecture-variant GANs which aim to improve performance based on the three key challenges: (1) Image quality; (2) Mode diversity; and (3) Vanishing gradient. An illustration of relative performance can be found in Fig. 8. All proposed architecture- variants are able to improve image quality. SAGAN is proposed for improving the capacity of multi-class learning in GANs, the goal of which is to produce more diverse images. Benefiting from the SAGAN architecture, BigGAN is designed for improving both image quality and image diversity. It should be noted that both PROGAN and BigGAN are able to produce high resolution images. BigGAN realizes this higher resolution by increasing the batch size and the authors mention that a progressive growing [74] operation is unnecessary when the batch size is large enough (2048 used in the original paper [84]). However, a progressive growing operation is still needed when GPU memory is limited (a large batch size is hungry for GPU memory). Benefiting from spectrum normalization (SN), which will be discussed in loss-variant GANs part, both SAGAN and BigGAN is effective for the vanishing gradient challenge. These milestone architecture-variants indicate a strong advantage of GANs — compatibility, where a GAN is open to any type of neural architecture. This property enables GANs to be applied to many different applications.
我们提供了架构变体GAN的概述,旨在基于以下三个主要挑战来提高性能:(1)图像质量; (2)模式多样性; (3)消失梯度。可以在图8中找到相对性能的图示。所有提出的架构变体都能够改善图像质量。 SAGAN被提议用于提高GAN中多类学习的能力,其目标是产生更多样化的图像。受益于SAGAN架构,BigGAN旨在提高图像质量和图像多样性。应该注意,PROGAN和BigGAN都能够产生高分辨率图像。 BigGAN通过增加批量大小来实现这种更高的分辨率,作者提到当批量大小足够大时,不需要逐步增长[74]操作(原始论文中使用了2048 [84])。然而,当GPU内存有限时(GPU内存需要大批量大小),仍然需要逐步增长的操作。受益于频谱归一化(SN),其将在损耗变量GAN部分中讨论,SAGAN和BigGAN都对消失的梯度挑战有效。这些里程碑式架构变体表明了GAN的强大优势 - 兼容性,其中GAN对任何类型的神经架构都是开放的。此属性使GAN可以应用于许多不同的应用程序。
Regarding the improvements achieved by different architecture-variant GANs, we next present an analysis on the interconnections and comparisons between the architecture-variants presented here. Starting with the FCGAN described in the original GAN literature, this architecture-variant can only generate simple image datasets. Such a limitation is caused by the network architecture, where the capacity of FC networks is very limited. Research on improving the performance of GANs starts from designing more complex architectures for GANs. A more complex image datasets (e.g., ImageNet) has higher resolution and diversity comparing to simple image datasets (e.g., MNIST) and needs accordingly more sophisticated approaches.
关于不同架构变体GAN所实现的改进,我们接下来将对这里介绍的架构变体之间的互连和比较进行分析。 从原始GAN文献中描述的FCGAN开始,该体系结构变体只能生成简单的图像数据集。 这种限制是由网络架构引起的,其中FC网络的容量非常有限。 提高GAN性能的研究始于为GAN设计更复杂的架构。 与简单图像数据集(例如,MNIST)相比,更复杂的图像数据集(例如,ImageNet)具有更高的分辨率和多样性,因此需要更复杂的方法。
In the context of producing higher resolution images, one obvious approach is to increase the size of generator. LAPGAN and DCGAN up-sample the generator based on such a perspective. Benefiting from the concise deconvolutional up-sampling process and easy generalization of DCGAN, the architecture in DCGAN is more widely used in the GANs literature. It should be noticed that most GANs in the computer vision area use the deconvolutional neural network as the generator, which is first used in DCGAN. Therefore, DCGAN is one of the classical GAN-variants in the literature.
在产生更高分辨率图像的背景下,一种显而易见的方法是增加生成器的尺寸。 LAPGAN和DCGAN基于这样的观点对G进行上采样。 受益于简明的去卷积上采样过程和DCGAN的简单泛化,DCGAN中的架构在GAN文献中得到了更广泛的应用。 应该注意的是,计算机视觉领域中的大多数GAN使用反卷积神经网络作为生成器,其首先在DCGAN中使用。 因此,DCGAN是文献中经典的GAN变体之一。
The ability to produce high quality images is an important aspect of GANs clearly. This can be improved through judicious choice of architecture. BEGAN and PROGAN demonstrate approaches from this perspective. With the same architecture used for the generator in DCGAN, BEGAN redesigns the discriminator by including encoder and decoder, where the discriminator tries to distinguish the difference between the generated and autoencoded images in pixel space. Image quality has been improved in this case. Based on DCGAN, PROGAN demonstrates a progressive approach that incrementally trains an architecture similar to DCGAN. This novel approach cannot only improve image quality but also produce higher resolution images.
生产高质量图像的能力显然是GAN的一个重要方面。 通过明智地选择架构可以改善这一点。 BEGAN和PROGAN从这个角度展示了方法。 利用与DCGAN中的发生器相同的架构,BEGAN通过包括编码器和解码器来重新设计鉴别器,其中鉴别器试图区分像素空间中生成的和自动编码的图像之间的差异。 在这种情况下,图像质量得到了改善。 基于DCGAN,PROGAN展示了一种渐进式方法,可逐步训练类似于DCGAN的架构。 这种新颖的方法不仅可以提高图像质量,还可以产生更高分辨率的图像。
Producing diverse images is the most challenging task for GANs and it is very difficult for GANs to successfully produce images such as those represented in the ImageNet sets. It is difficult for traditional CNNs to learn global and long-range dependencies from images. Thanks to selfattention mechanism though, approaches such as those in SAGAN integrate self-mechanisms to both discriminator and generator, which helps GANs a lot in terms of learning multi-class images. Moreover, BigGAN, which can be considered an extension of SAGAN, introduces a deeper GAN architecture with a very large batch size, which produces high quality and diverse images as in ImageNet and is the current state-of-the-art.
生成各种图像对于GAN来说是最具挑战性的任务,GAN很难成功生成ImageNet集中表示的图像。 传统的CNN很难从图像中学习全局和远程依赖。 由于自我保护机制,SAGAN中的方法将自我机制集成到鉴别器和生成器两者,这有助于GAN在学习多类图像方面有很多。 此外,BigGAN可以被认为是SAGAN的扩展,它引入了更深的GAN架构,具有非常大的批量大小,可以像ImageNet一样产生高质量和多样化的图像,并且是当前最先进的技术。
6 LOSS-VARIANT GANS
pass
7 DISCUSSION
We have introduced the most significant problems present in the original GAN design, which are mode collapse and vanishing gradient for updating G. We have surveyed significant GAN-variants that remedy these problems through two design considerations: (1) Architecture-variants. This aspect focuses on architectural options for GANs. This approach enables GANs to be successfully applied to different applications, however, it is not able to fully solve the problems mentioned above; (2) Loss-variant. We have provided a detail explanation why these problems arise in the original GAN. These problems are essentially caused by the loss function in the original GAN. Thus, modifying this loss function can solve this problem. It should be noted that the loss function may change for some architecturevariants. However, this loss function is changed according to the architecture thus it is architecture-specific loss. It is not able to generalize to other architectures.
我们已经介绍了原始GAN设计中存在的最重要的问题,即用于更新G的模式崩溃和消失梯度。我们已经调查了通过两个设计考虑来解决这些问题的重要GAN变体:(1)架构变体。 这方面侧重于GAN的架构选项。 这种方法使GAN能够成功应用于不同的应用,但是,它无法完全解决上述问题; (2)损失变量。 我们已经详细解释了原始GAN中出现这些问题的原因。 这些问题主要是由原始GAN中的丢失功能引起的。 因此,修改此损失函数可以解决此问题。 应该注意,对于一些架构变量,损失函数可能会改变。 但是,这种损失函数根据体系结构而改变,因此它是体系结构特定的丢失。 它无法概括为其他架构。
Through a comparison of the different architectural approaches surveyed in this work, it is clear that the modification of the GAN architecture has significant impact on the generated images quality and their diversity. Recent research shows that the capacity and performance of GANs are related to the network size and batch size [84], which indicates that a well designed architecture is critical for good GANs performance. However, modifications to the architecture only is not able to eliminate all the inherent training problems for GANs. Redesign of the loss function including regularization and normalization can help yield 12 more stable training for GANs. This work introduced various approaches to the design of the loss function for GANs. Based on the comparison for each loss-variant, we find that spectral normalization as first demonstrated in the SN-GAN brings lots of benefits including ease of implementation, relatively light computational requirements and the ability to work well for almost all GANs. We suggest that researchers, who seek to apply GANs to real-world problems, include spectral normalization to the discriminator.
通过对本工作中调查的不同架构方法的比较,很明显GAN架构的修改对生成的图像质量及其多样性具有显着影响。最近的研究表明,GAN的容量和性能与网络规模和批量大小有关[84],这表明精心设计的架构对于良好的GAN性能至关重要。但是,对架构的修改不能消除GAN的所有固有训练问题。重新设计损失函数包括正则化和归一化可以帮助为GAN产生个更稳定的训练。这项工作介绍了设计GAN损失函数的各种方法。基于每个损耗变量的比较,我们发现SN-GAN中首次展示的频谱归一化带来了许多好处,包括易于实现,相对较轻的计算要求以及几乎所有GAN都能很好地工作的能力。我们建议寻求将GAN应用于现实世界问题的研究人员将鉴别器的光谱归一化包括在内。
There is no answer to the question of which GAN is the best. The selection of a specific GAN type depends on the application. For instance, if an application requires the production of natural scenes images (this requires generation of images which are very diverse). DCGAN with spectrum normalization applied, SAGAN and BigGAN can be good choices here. BigGAN is able to produce the most realistic images compared to the other two. However, BigGAN is much more computationally intensive. Thus it depends on the actual computational requirements set by the real-world application.
对哪个GAN最好的问题没有答案。 选择特定的GAN类型取决于应用程序。 例如,如果应用程序需要生成自然场景图像(这需要生成非常多样化的图像)。 应用频谱归一化的DCGAN,SAGAN和BigGAN在这里可以是很好的选择。 与其他两个相比,BigGAN能够产生最逼真的图像。 但是,BigGAN的计算密集程度要高得多。 因此,它取决于实际应用程序设置的实际计算要求。
7.1 Interconnections Between Architecture and Loss
In this paper, we highlight the problems inherent in the original GAN design. In highlighting how subsequent researchers have remedied those problems, we explored architecture-variants and loss-variants in GAN designs separately. However, it should be noted that there are interconnections between these two types of GAN-variants. As mentioned before, loss functions are easily integrated to different architectures. Benefit from improved convergence and stabilization through a redesigned loss function, architecture-variants are able to achieve better performance and accomplish solutions to more difficult problems. For examples, BEGAN and PROGAN use Wasserstein distance instead of JS divergence. SAGAN and BigGAN deploy spectral normalization, where they achieve good performance based on multi-class image generation. These two types of variants equally contribute to the progress of GANs.
在本文中,我们强调了原始GAN设计中固有的问题。 在强调后续研究人员如何解决这些问题时,我们分别探讨了GAN设计中的体系结构变体和损失变体。 但是,应该注意,这两种类型的GAN变体之间存在互连。 如前所述,损失函数可以轻松集成到不同的体系结构中。 通过重新设计的损耗功能从改进的收敛和稳定中受益,架构变体能够实现更好的性能并完成解决更难的问题。 例如,BEGAN和PROGAN使用Wasserstein距离而不是JS散度。 SAGAN和BigGAN部署频谱归一化,在此基础上实现了基于多类图像生成的良好性能。 这两种变体同样有助于GAN的发展。
7.2 Future Directions
GANs were originally proposed to produce plausible synthetic images and have achieved exciting performance in the computer vision area. GANs have been applied to some other fields, (e.g., time series generation [20], [21], [103] and natural language processing [15], [104]–[106]) with some success. Compared to computer vision, GANs research in other areas is still somewhat limited. The limitation is caused by the different properties inherent in image versus non-image data. For instance, GANs work to produce continuous value data but natural language are based on discrete values like words, characters, bytes, etc., so it is hard to apply GANs for natural language applications. Future research of course is being carried out for applying GANs to other areas.
GAN最初被提出用于产生合理的合成图像,并且已经在计算机视觉领域中实现了令人兴奋的性能。 GAN已经应用于其他一些领域(例如,时间序列生成[20],[21],[103]和自然语言处理[15],[104] - [106])并取得了一些成功。 与计算机视觉相比,GAN在其他领域的研究仍然有限。 该限制是由图像与非图像数据中固有的不同属性引起的。 例如,GAN用于生成连续值数据,但自然语言基于离散值,如单词,字符,字节等,因此很难将GAN应用于自然语言应用程序。 正在进行的未来研究将GAN应用于其他领域。
8 CONCLUSION
In this paper, we review GAN-variants based on performance improvement offered in terms of higher image quality, more diverse images and more stable training. We review the current state of GAN-related research from an architecture and loss basis. Current state-of-art GANs models such as BigGAN and PROGAN are able to produce high quality images and diverse images in the computer vision field. However, research that applies GANs to video is limited. Moreover, GAN-related research in other areas such as time series generation and natural language processing lags that for computer vision in terms of performance and capability. We conclude that there are clearly opportunities for future research and application in these fields in particular.
在本文中,我们回顾了基于性能改进的GAN变体,提供了更高的图像质量,更多样化的图像和更稳定的培训。 我们从架构和损失的基础上回顾了与GAN相关的研究的现状。 当前最先进的GAN模型(例如BigGAN和PROGAN)能够在计算机视觉领域中产生高质量图像和各种图像。 但是,将GAN应用于视频的研究是有限的。 此外,在时间序列生成和自然语言处理等其他领域的GAN相关研究在性能和能力方面落后于计算机视觉。 我们得出结论,特别是在这些领域有明确的未来研究和应用的机会。