Researchers from Carnegie Mellon and MIT have developed a new methodology that allows a user to create custom Generative Adversarial Network (GAN) image-creation systems simply by sketching indicative doodles.
A system of this type could allow an end user to create image-generating systems capable of generating very specific images, such as particular animals, types of building – and even individual people. Currently, most GAN generation systems produce broad and fairly random output, with limited facility to specify particular characteristics, such as animal breed, hair types in people, styles of architecture or actual facial identities.
The approach, outlined in the paper Sketch Your Own GAN, utilizes a novel sketching interface as an effective ‘search’ function to find features and classes in otherwise over-stuffed image databases which may contain thousands of types of object, including many sub-types which are not relevant to the user’s intent. The GAN is then trained on this filtered sub-set of imagery.
By sketching the specific object type with which the user wishes to calibrate the GAN, the framework’s generative capabilities become specialized to that class. For instance, if a user wishes to create a framework that generates a specific type of cat (rather than just any old cat, as can be obtained with This Cat Does Not Exist), their input sketches serve as a filter to rule out non-relevant classes of cat.
The research is led by Sheng Yu-Wang of Carnegie Mellon University, together with Colleague Jun-Yan Zhu, and David Bau of MIT’s Computer Science & Artificial Intelligence Laboratory.
The method itself is dubbed ‘GAN sketching’, and uses the input sketches to directly change the weights of a ‘template’ GAN model to specifically target the identified domain or sub-domain through cross-domain adversarial loss.
Different regularization methods were explored to ensure that the model’s output is diverse, while maintaining a high image quality. The researchers created sample applications that are able to interpolate latent space and conduct image editing procedures.
This [$class] Does Not Exist
GAN-based image generation systems have become a fad, if not a meme, over the last few years, with a proliferation of projects capable of generating pictures of non-existent things, including people, rental apartments, snacks, feet, horses, politicians and insects, among many others.
GAN-based image synthesis systems are created by compiling or curating extensive datasets containing images from the target domain, such as faces or horses; training models that generalize a range of features across the images in the database; and implementing generator modules that can output random examples based on the learned features.
High-dimensional features are among the first to be concretized during the training process, and are equivalent to a painter’s first broad swatches of color on a canvas. These high-dimensional characteristics will eventually correlate to much more detailed features (i.e. the eye-glint and sharp whiskers of a cat, instead of just a generic beige blob representing the head).
I Know What You Mean…
By mapping the relationship between these earlier seminal shapes and the ultimately detailed interpretations which are obtained much later in the training process, it’s possible to infer relationships between ‘vague’ and ‘specific’ images, allowing users to create complex and photorealistic imagery from crude daubs.
Recently NVIDIA released a desktop version of its long-term GauGAN research into GAN-based landscape generation, which easily demonstrates this principle:
Likewise, multiple systems such as DeepFacePencil have used the same principle to create sketch-induced photoreal image generators for various domains.
The new paper’s GAN Sketching approach seeks to remove the formidable burden of data-gathering and curation that is typically involved in the development of GAN image frameworks, by using user input to define which sub-set of imagery should constitute the training data.
The system has been designed to require only a small number of input sketches in order to calibrate the framework. The system effectively reverses the functionality of PhotoSketch, a joint research initiative from 2019 by researchers from Carnegie Mellon, Adobe, Uber ATG and Argo AI, which is incorporated into the new work. PhotoSketch was designed to create artistic sketches from images, and already contains the effective mapping of vague>specific image creation relationships.
For the generation part of the process, the new method only modifies the weights of StyleGAN2. Since the image data being used is only a subset of the total available data, merely modifying the mapping network obtains desirable results.
The method was evaluated on a number of popular sub-domains, including equestrian, churches, and cats.
Princeton University’s 2016 LSUN dataset was used as the core material from which to derive target sub-domains. To establish a sketch mapping system that’s robust to the eccentricities of real-world user input sketches, the system is trained on images from the QuickDraw dataset developed by Microsoft between 2021-2016.
Though the sketch mapping between PhotoSketch and QuickDraw are quite different, the researchers found that their framework succeeds well in straddling them quite easily on relatively simple poses, though more complicated poses (such as cats lying down) prove more of a challenge, while very abstract user input (i.e. overly crude drawings) also hinder the quality of the results.
Latent Space and Natural Image Editing
The researchers developed two applications based on the core work: latent space editing, and image editing. Latent space editing offers interpretable user controls that are facilitated at training time, and allow a wide degree of variation while remaining faithful to the target domain, and pleasingly consistent across variations.
The latent space editing component was powered by the 2020 GANSpace project, a joint initiative from Aalto University, Adobe and NVIDIA.
A single image can also be fed to the customized model, facilitating natural image editing. In this application, a sole image is projected to the custom GAN, not only enabling direct editing, but also preserving higher-level latent space editing, if this has also been used.
Though configurable, the system is not designed to work in real-time, at least in terms of training and calibration. Currently GAN Sketching requires 30,000 training iterations. The system also requires access to the original training data for the original model.
In cases where the dataset is open source, and has a license that permits local copying, this could be accommodated by including the source data in a locally installed package, though this would take up considerable disk space; or by accessing or processing data remotely, via a cloud-based approach, which introduces network overheads and (in the case of processing actually occurring on the cloud) possibly compute cost considerations.