from Hacker News

Beating the L1 cache with value speculation (2021)

by shoo on 10/10/25, 9:54 PM with 14 comments

  • by hshdhdhehd on 10/15/25, 2:23 AM

    I am new to this low level, but am I right in understanding this works because he uses a linked list but often it is contiguous in memory so you guess the next element is contiguous and if it is the branch predictor predicts you are right and saves going to cache and breaking the pipeline.

    However I imagine you'd also get the same great performance using an array?

  • by stinkbeetle on 10/15/25, 3:32 AM

    Data speculation is a CPU technique too, which Apple CPUs are known to implement. Apparently they can do stride detection when predicting address values.

    Someone with a M >= 2 might try the code and find no speedup with the "improved" version, and that it's already iterating faster than L1 load-to-use latency.

  • by rini17 on 10/15/25, 2:03 PM

    Won't it introduce risk of invalid memory access when the list isn't contiguous? And if it always is contiguous then why not use array instead. Smells like contrived example.
  • by signa11 on 10/11/25, 4:00 PM