An autoencoder is trained on 28×28 images with a bottleneck of 2 neurons. During training, the reconstruction loss improves steadily. What has the network necessarily learned in the bottleneck layer?
ANothing useful — 2 neurons is too small to encode any meaningful information about images
BA compressed 2D representation that captures the most important structure in the data
CThe exact pixel values of each image, stored in a lookup table
DThe labels of each image, since reconstruction requires knowing what object is in the image
The bottleneck constraint forces the network to discard less important information and retain the structure that allows reconstruction. With only 2 neurons, the encoder must learn which two dimensions of variation account for the most information — effectively a nonlinear form of PCA. Option A is wrong: if reconstruction loss is low, the 2 neurons are clearly encoding something useful. Options C and D misunderstand how autoencoders work — there is no lookup table and no label supervision.
Question 2 Multiple Choice
A denoising autoencoder is trained by randomly zeroing 50% of input pixels and training the network to reconstruct the original clean image. Compared to a standard autoencoder with the same architecture, the denoising version will...
AHave higher reconstruction loss because the task is harder
BLearn a less useful representation because it receives degraded inputs
CLearn more robust features because it must understand structure rather than memorize pixel patterns
DPerform identically — corruption during training has no effect on the learned representation
The denoising training objective forces the network to model the data manifold rather than pixel-level statistics. When 50% of pixels are missing, the network cannot solve the task by memorizing inputs; it must learn what makes an image look like its category so it can hallucinate the missing pixels correctly. This produces representations that capture genuine structure. Option A is misleading: harder task, yes, but the network is evaluated on the clean-vs-reconstruction comparison, not the corrupted-vs-reconstruction one.
Question 3 True / False
An autoencoder with a bottleneck dimension equal to the input dimension — with no compression at all — could theoretically achieve zero reconstruction loss without learning any meaningful representation.
TTrue
FFalse
Answer: True
If the bottleneck has the same dimensionality as the input, the encoder can learn the identity mapping, passing every input through unchanged, and the decoder can do the same. Reconstruction is perfect but nothing has been learned about the data's structure. This is why the bottleneck constraint is essential — it is the compression requirement that forces the network to find compact, meaningful representations rather than copying its input.
Question 4 True / False
In a denoising autoencoder, the training target (what the network is trained to output) is the corrupted, noisy version of the input.
TTrue
FFalse
Answer: False
The reconstruction target in a denoising autoencoder is always the original, clean input. The corrupted version is only the input to the encoder — the network receives degraded data and must produce the clean original as output. This asymmetry is what makes denoising autoencoders powerful: the network is rewarded for recovering structure, not for reproducing noise.
Question 5 Short Answer
How can a trained autoencoder be used for anomaly detection, and what property of the learned representation makes this possible?
Think about your answer, then reveal below.
Model answer: Encode the new data point through the encoder, then reconstruct it with the decoder, and measure the reconstruction error. Normal data points — similar to training data — will reconstruct with low error because the network learned their structure. Anomalous points will reconstruct poorly because their patterns were not present during training, and the bottleneck representation will fail to capture them accurately. This works because the autoencoder learns a compressed model of the data manifold; anything off the manifold reconstructs badly.
The key insight is that the autoencoder's bottleneck represents a model of the training distribution. Data that fits this distribution passes through the bottleneck efficiently and reconstructs well. Data that deviates from the distribution — anomalies — cannot be efficiently encoded into the same low-dimensional space, resulting in high reconstruction error. This doesn't require any labeled anomaly examples, making it genuinely unsupervised.