Processor Affinity and CPU Binding

College Depth 67 in the knowledge graph I know this Set as goal
Unlocks 3 downstream topics
scheduling multiprocessor optimization

Core Idea

Processor affinity controls which CPUs a process or thread can execute on, enabling cache optimization and NUMA-aware scheduling. Hard affinity strictly restricts execution to specific CPUs; soft affinity expresses a preference while allowing migration if necessary. Binding processes to CPUs can improve cache hit rates and memory locality on multiprocessor and NUMA systems.

Explainer

From your study of context switching and CPU dispatch, you know that when the OS switches a process off a CPU, it saves the process's register state and loads another process's state onto that core. What you may not have considered is what happens to the data that process left behind in the CPU's cache. Each core maintains its own L1 and L2 caches filled with the recently accessed memory of whatever was running on it. When a process is dispatched back to the *same* core, those cache lines may still be warm — the data the process needs is already sitting in fast local memory. If the scheduler moves the process to a *different* core, the new core's cache is cold for that process, and it must re-fetch everything from slower shared cache or main memory. This is the performance problem that processor affinity addresses.

Soft affinity is the default behavior in most modern schedulers: the OS *prefers* to schedule a process back onto the core it last ran on, but will migrate it to another core if that core is idle and the home core is busy. This is a best-effort optimization — it improves cache hit rates on average without creating load imbalance. Hard affinity, by contrast, is an explicit constraint set by the programmer or administrator. It restricts a process or thread to a specific set of CPUs and the scheduler will never move it outside that set, even if those cores are overloaded and others sit idle.

Hard affinity becomes critical on NUMA (Non-Uniform Memory Access) systems, where each CPU socket has its own local memory bank. Accessing local memory is fast; accessing a remote socket's memory can take two to three times as long. If a process's data lives in socket 0's memory but the scheduler moves the process to socket 1, every memory access crosses the interconnect. By binding the process to the cores on socket 0, you guarantee that memory accesses stay local. Database servers, real-time audio processing, and high-frequency trading systems routinely use CPU binding for this reason.

The tradeoff is straightforward: affinity improves cache and memory locality at the cost of scheduling flexibility. If you pin four threads to four cores and a fifth thread needs CPU time, it cannot use those pinned cores even if the pinned threads are sleeping. On Linux, the `taskset` command and `sched_setaffinity()` system call control hard affinity; on Windows, `SetProcessAffinityMask()` and `SetThreadAffinityMask()` serve the same purpose. The key insight is that processor affinity is not about making the CPU faster — it is about preventing the scheduler from undoing the locality that the hardware cache hierarchy has already built up.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Kernel Architecture and OS StructureSystem Calls and User/Kernel ModeProcesses and the Process Control BlockProcess Creation: fork() and exec()Process Termination and Resource CleanupProcess States and State TransitionsContext Switching and CPU DispatchProcessor Affinity and CPU Binding

Longest path: 68 steps · 244 total prerequisite topics

Prerequisites (2)

Leads To (2)