Questions — Variational Autoencoders (VAE)

Question 1 Multiple Choice

A standard autoencoder trained on face images fails to generate new faces when you sample random points from latent space — the decoder produces noise. Why, and how does a VAE fix this?

AStandard autoencoders use tanh activations that prevent meaningful generation; VAEs use ReLU, which produces smoother latent spaces

BStandard autoencoders impose no structure on latent space, so random points fall in uncharted regions; the VAE's KL term regularizes latent codes into a dense, structured distribution

CStandard autoencoders encode to a single low-dimensional vector, which is too compressed for generation; VAEs use higher-dimensional latent spaces

DStandard autoencoders overfit to training data; VAEs add dropout regularization to prevent this

Question 2 Multiple Choice

The reparameterization trick in VAE training rewrites the sampling step as z = μ + σ·ε where ε ~ N(0,1). Why is this substitution necessary?

AIt prevents the KL divergence from becoming infinite when σ approaches zero

BIt allows gradients to flow through the sampling step back to the encoder parameters μ and σ

CIt ensures sampled z values stay within a bounded range to prevent numerical instability

DIt converts the Gaussian distribution to a uniform distribution, which is easier to implement

Question 3 True / False

Removing the KL divergence term from the VAE loss (training with reconstruction loss only) would cause the latent space to become unstructured, degrading generative capability.

TTrue

FFalse

Question 4 True / False

VAEs typically produce sharper, more detailed image outputs than GANs because the KL regularization enforces a well-organized latent space.

TTrue

FFalse

Question 5 Short Answer

Why does the KL divergence term in the ELBO loss force the latent space to become structured and usable for generation, rather than just functioning as an arbitrary regularizer?

Think about your answer, then reveal below.

Questions: Variational Autoencoders (VAE)