Visual ethnography uses photographs, video, and visual analysis as primary data collection and representation methods. Multimodal ethnography integrates visual, aural, and textual data to capture embodied, sensory social practices. Rather than ancillary documentation, visual data becomes central: analyzing spatial arrangements, bodily practices, material culture, and affective dimensions of social life. Video ethnography reveals temporal sequences and social interaction invisible in fieldnotes alone.
Advanced ethnography already trained you in thick description, reflexivity, and the craft of sustained field presence. Visual and multimodal ethnography extend this toolkit by treating images, video, and sound not as illustrations of fieldnotes but as irreducible data with their own analytic possibilities. A written fieldnote describing a kitchen might capture the sequence of events and the researcher's interpretations, but a video of the same kitchen records spatial arrangements, gesture, gaze direction, body orientation, and temporal pacing that prose cannot fully reconstruct. The move is epistemological: visual data is not a better snapshot of what text describes — it captures different dimensions of social life altogether.
The central concept is multimodality — the idea that social life is enacted across multiple sensory registers simultaneously. People communicate through words, but also through posture, proximity, touch, objects, and the spatial organization of environments. When you observe a classroom, the seating arrangement (who sits where relative to whom) carries social information that fieldnotes may describe but photographs make structurally visible. When you study a workplace, the rhythm of sound — when silence is expected, when conversation is permitted, what sounds signal authority — is analytically significant but notoriously difficult to represent in text. Multimodal methods capture this layered structure of interaction.
Photography and video become primary data through specific analytic techniques. Photo-elicitation interviews show photographs to participants and use their responses to access meanings, memories, and interpretations that direct questioning might not reach — the image becomes a prompt that surfaces tacit knowledge. Participant-generated imagery asks community members to photograph their own environments, centering their visual categories and priorities rather than the researcher's. Video analysis uses tools from conversation analysis and interaction studies: close examination of turn-taking, bodily alignment, and sequential organization of action in time. These approaches are not simply "documenting" what happens — they are theoretically informed methods for making aspects of social life visible that would otherwise remain implicit.
The ethical and political dimensions of visual methods are distinctive. Unlike fieldnotes, photographs are indexical — they record particular faces, spaces, and bodies that can be identified. Informed consent for visual data must address not just participation in research but representation: how images will be stored, who will see them, and how subjects are depicted. Positionality matters differently too: the researcher's gaze is literally materialized in what the camera frames and from what angle. Reflexive visual ethnographers document their own photographic choices as data about their perspective, not as objective recording of external reality. The decision about what to photograph, when, and from where is an analytic act, not a neutral one.
Multimodal representation — how visual ethnography presents its findings — is itself a methodological choice with consequences. Traditional ethnographic monographs insert photographs as illustrations. But visual ethnographers often argue that findings should be presented through the modalities in which data was collected: films, photo essays, interactive digital archives, or multimedia presentations. The choice to produce a written text about visual data involves a translation that risks losing what made the visual data analytically significant in the first place. This tension between visual data and textual academic convention pushes visual ethnographers toward new forms of representation that blur the boundary between data, analysis, and scholarly argument.
Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.
No topics depend on this one yet.