Every time a system powers on, DDR4 memory runs through a choreographed startup sequence before it’s ready to accept a single read or write command. A recent deep dive on systemverilog.io walks through the full process in detail, covering initialization, write leveling, DQ training, and voltage reference calibration. It’s a lot of careful work that most software developers never see.
The Initialization Sequence
DDR4 initialization is defined precisely by the JEDEC standard, and the memory controller must follow it exactly. The sequence starts before any commands are issued. CKE (Clock Enable) must be held low while power supplies stabilize, typically for at least 500 microseconds. The clock must be stable before CKE is asserted. After that, the controller programs Mode Registers MR0 through MR7, which configure the DRAM’s behavior: CAS latency, write recovery time, on-die termination values, and more.
ZQ calibration is part of initialization too. The DRAM has an external 240-ohm resistor on the ZQ pin, and it uses that as a reference to calibrate its internal drive strength and termination impedance. Without this step, signal integrity degrades, especially at high frequencies.
Write Leveling
Modern DDR4 DIMMs have multiple memory chips, each with their own DQS (Data Strobe) signal. Because PCB traces from the memory controller to each chip have different physical lengths, the DQS signal arrives at each chip at a slightly different time relative to the clock. Write leveling compensates for this skew.
During write leveling, the controller puts the DRAM into a special mode where the DRAM feeds back whether DQS arrived before or after the rising edge of CK. The controller sweeps the DQS output delay, observing the feedback, until it finds the alignment point. It then adjusts its delay setting for each byte lane independently.
DQ and DQS Training
After write leveling, the controller trains the read and write data paths. The goal is to center the DQ signals within the DQS strobe window, for both reads and writes.
The process works by sweeping a programmable delay element and testing whether data transfers correctly. The valid timing window has a left edge and a right edge; the controller finds both and sets its delay to the midpoint. This happens per byte lane and per DQ bit, so on a 64-bit-wide interface the controller is tuning a significant number of independent paths.
For DDR4, Vref training adds another dimension. The input receiver threshold voltage for DQ signals is programmable via MR6, and the controller sweeps it alongside timing to find the two-dimensional center of the data eye. A larger eye means more margin against noise and temperature variation.
Why the POST Delay Exists
Memory initialization and training have direct effects on stability and performance. Systems that complete training with poor centering will show marginal behavior at higher speeds or elevated temperatures; this is why XMP profiles sometimes need manual tuning to hold stable on specific boards.
That long pause during POST before the OS starts loading is the controller working through these sequences, measuring results, and writing calibrated values to registers. Some systems cache training results in non-volatile storage to skip the process on subsequent boots, which is why a first boot after clearing CMOS takes longer.
The systemverilog.io article covers the state machines and signal waveforms governing each phase in more depth than most hardware documentation. If you’re working on memory controller verification, writing firmware that touches DRAM configuration, or trying to understand why your overclock is unstable, it’s a useful reference to have.