A file is the OS abstraction for persistent named data — a sequence of bytes stored on a device that survives process termination. Every file has metadata: name, type, size, permissions, timestamps, and a unique identifier. The file system is the OS subsystem responsible for organizing files into a hierarchical namespace, tracking their locations on storage devices, managing free space, and enforcing access control. The OS provides a uniform file interface (open, read, write, seek, close) that abstracts over the physical storage device, whether it is a hard disk, SSD, or network share.
Use stat() in Python or C to inspect a file's full metadata. Then read the ext4 or NTFS Wikipedia article to see how these abstractions are implemented.
At its heart, a file system answers a deceptively simple question: how do you store named data on a device that only understands numbered blocks? A hard disk or SSD is just a flat array of fixed-size blocks (typically 512 bytes or 4 KB). The file system builds the abstractions of files, directories, names, and permissions on top of this raw storage — much like how an operating system builds the abstraction of processes on top of raw CPU time. If you have worked with basic I/O, you have used the result of this abstraction every time you called `open()`, `read()`, or `write()`.
A file is the fundamental unit: a named, persistent sequence of bytes. But a file is more than its data. The OS stores metadata alongside each file — the owner, permissions, timestamps (created, modified, accessed), size, and the physical locations of its data blocks on disk. This metadata is typically stored in a structure called an inode (on Unix-like systems) or a Master File Table entry (on NTFS). The key insight is that the file's name is *not* part of the inode. The name lives in a directory entry, which is simply a mapping from a human-readable name to an inode number. This separation is why hard links work: two different names in two different directories can point to the same inode, and therefore the same data. The file exists as long as at least one name points to it.
The file system interface that the OS exposes to programs is deliberately uniform. Whether the underlying storage is a magnetic disk, an SSD, a USB drive, or a network share, you use the same operations: open (get a file descriptor), read (copy bytes from the file into memory), write (copy bytes from memory to the file), seek (move the read/write position), and close (release the file descriptor). This abstraction is powerful because application code does not need to know or care about the physical storage technology. The OS translates these logical operations into the appropriate device-specific commands — sequential reads on a hard disk, page-level writes on an SSD, or network packets for a remote file system.
Directories provide the organizational structure. A directory is itself a special file whose contents are a list of (name, inode number) pairs. Directories can contain other directories, creating the familiar hierarchical tree structure — `/home/user/documents/report.txt` is a path through four directories to reach a file. This hierarchy is a namespace: it allows millions of files to coexist with human-readable, organized names. The file system must also manage free space — tracking which blocks on the device are available for new data — and enforce access control — ensuring that only authorized users can read, write, or execute a file. These concerns are what separate a file system from a simple key-value store, and they become central when you study file system implementation next.