To learn more:
M. Milenkovic, A. Milenkovic, J. Kulick, “Microbenchmarks for Determining Branch Predictor Organization,”
Software Practice & Experience, Vol 34, Issue 5, April 2004, pp. 465 - 487.
[pdf] [ps.gz]
Abstract. In order to achieve an optimum performance of a given application on a given computer platform, a program developer or compiler must be aware of computer architecture parameters, including those related to branch predictors. Although dynamic branch predictors are designed with the aim to automatically adapt to changes in branch behavior during program execution, code optimizations based on the information about predictor structure can greatly increase overall program performance. Yet, exact predictor implementations are seldom made public, even though processor manuals provide valuable optimization hints.
This paper presents an experiment flow with a series of microbenchmarks that determine the organization and size of a branch predictor using on-chip performance monitoring registers. Such knowledge can be used either for manual code optimization or for design of new, more architecture-aware compilers. Three examples illustrate how insight into exact branch predictor organization can be directly applied to code optimization. The proposed experiment flow is illustrated with microbenchmarks tuned for Intel Pentium III and Pentium 4 processors, although they can easily be adapted for other architectures. The described approach can also be used during processor design for performance evaluation of various branch predictor organizations and for testing and validation during implementation.
1. Determining BTB organization
2. Outcome predictor experiment flow