A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Memory-Mapped Files and I/O

College Depth 98 in the knowledge graph ☐ I know this ☆ Set as goal

399prerequisites beneath it

Virtual Memory and Demand Paging Asynchronous I/O (AIO) Operations +4 more→

Core Idea

Memory-mapped files allow a file to be accessed as a region of memory, enabling efficient large-file operations and inter-process data sharing. Reads and writes to the mapped region are transparently managed by the kernel, with the page cache handling I/O. This provides an alternative to explicit read()/write() calls and enables zero-copy data transfer between processes.

Explainer

From your understanding of virtual memory, you know that a process's address space is a collection of virtual pages mapped to physical frames through page tables. Normally, these pages back anonymous memory — stack, heap, and data segments that exist only in RAM (and swap). Memory-mapped files extend this mechanism: instead of mapping a virtual page to anonymous memory, the OS maps it to a specific region of a file on disk. When the process reads from or writes to that address range, it is transparently reading from or writing to the file.

The mechanics work through the same page fault machinery you already know. When a process calls `mmap()` to map a file, the kernel sets up page table entries for the requested address range but does not immediately load any data. When the process first accesses an address in the mapped region, a page fault occurs. The kernel's fault handler recognizes that this page is backed by a file, reads the corresponding file data into a physical frame (via the page cache), and updates the page table. Subsequent accesses to that page hit memory directly with no system call overhead. The kernel flushes modified pages back to disk lazily or when explicitly requested via `msync()`.

This approach has two major advantages over traditional `read()`/`write()` system calls. First, it eliminates a copy: with `read()`, the kernel reads file data into a kernel buffer and then copies it into the user's buffer — two copies total. With memory mapping, the process accesses the page cache directly, achieving zero-copy I/O. For large files or random-access patterns (like databases scanning an index), this difference is substantial. Second, memory-mapped files enable shared memory between processes. If two processes map the same file with `MAP_SHARED`, they share the same physical pages. A write by one process is visible to the other without any explicit IPC mechanism — the page cache serves as the shared medium. This is how many databases and high-performance servers share data across worker processes.

The tradeoffs are worth understanding. Memory-mapped I/O is not always faster than `read()`/`write()` — for sequential reads of small files, the system call overhead is negligible and the simpler interface may be preferable. Mapped regions consume virtual address space, which matters on 32-bit systems. Error handling is also less intuitive: a disk error during a memory access triggers a `SIGBUS` signal rather than returning an error code, which is harder to handle gracefully. And because the kernel controls when dirty pages are flushed, data can be lost if the system crashes before a writeback. Despite these caveats, memory-mapped files are a foundational technique — they underpin dynamic library loading (shared libraries are memory-mapped into process address spaces), executable loading, and the internals of many database engines.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Binary Counters: Design and Analysis → Binary Arithmetic → Fixed-Point Number Representation → Two's Complement Representation → Overflow and Underflow Detection → Binary Adders: Half-Adders and Full-Adders → Full Adder and Carry Propagation → Carry Lookahead Adder Design → Half Adder Circuit Design → Multiplication Circuit Design → Sequential Circuit Design → Registers and Register Files → Instruction Set Architecture (ISA) → Assembly Language Basics → Memory Organization and Addressing → Memory Address Decoding → Memory Bus Architecture and Interconnect → I/O Systems and Buses → Asynchronous I/O (AIO) Operations → Device Drivers and I/O Controllers → Memory-Mapped Files and I/O

Longest path: 99 steps · 399 total prerequisite topics

Prerequisites (6)

Virtual Memory and Demand Paginghard File System Implementationsoft Asynchronous I/O (AIO) Operationssoft Device Drivers and I/O Controllerssoft File Descriptor Tables and I/O Redirectionsoft I/O Buffering and Kernel Buffer Cachessoft

Leads To (0)

No topics depend on this one yet.