Stream-Based Trace Compression

 


What is Stream-based Compression?

Trace-driven simulations have been widely used in quantitative evaluations of new techniques in computer architecture. To offer a faithful representation of a specific workload, traces are very large, encompassing billions of memory references and/or instructions. To efficiently store and use even a small collection of traces, its size must be reduced beyond traditional compression techniques. 
Stream-based compression (SBC) is a new method for single-pass compression of address traces and various extended trace formats. The SBC algorithm relies on extracting instruction streams. An instruction stream is defined as a sequential run of instructions, from the target of a taken branch to the first taken branch in sequence. A stream table is created during compression, encompassing all relevant information about streams, such as starting address, stream length, instruction words and their types. All instructions from a stream are replaced by its index in the stream table, creating a trace of instruction streams. 
Unlike instruction addresses, data address for a memory referencing instruction stays rarely constant during program execution, but it a can have a regular stride. The SBC trace compression features a very simple and yet efficient on-line algorithm for compression of data address references. The compressed data address trace encompasses the data address stride and the number or repetitions for each memory-referencing instruction in a stream. A change in a data address stride results in another compressed record. Compressed records are ordered by the corresponding stream appearances in the trace.  


How does it work?

The SBC algorithm exploits several inherent characteristics of program execution traces. Instruction traces consist of a fairly limited number of different instruction streams, and the most of memory references exhibit strong spatial and/or temporal locality, for example, a load having a constant address stride across loop iterations. 
Figure 1 illustrates the compression flow assuming Dinero+ input. A Dinero+ trace record has fixed length fields: the header field (0 – data read, 1 – data write, and 2 – instruction read), the address field, and the instruction word field for the instruction read type. The stream-based compression results in three files: Stream Table File (STF), Stream-Based Instruction Trace (SBIT), and Stream-Based Data Trace (SBDT). Figure 2 demonstrates the compression process using a short excerpt of a trace. Figure 3 shows the SBC format for data address references.

Figure 1. SBC Compression Flow: from Dinero+ to SBC.

Figure 2. SBC Compression: An Example

 

Figure 3. SBC Data Trace Format.

 


Programs to work with SBC traces

The utility programs working with the SBC traces have been made available to the research community under the terms of the GNU General Public License.   

 


Learn more ...

  • A. Milenkovic, M. Milenkovic, "Exploiting Streams in Instruction and Data Address Trace Compression," 
    Proceedings of the IEEE 6th Annual Workshop on Workload Characterization, Austin, TX, USA, October 27, 2003, pp. 99-107.
    [PDF] [ps.gz] [PPT slides]


Download SBC Traces for SPEC CPU2000 Benchmarks

Traces are collected for the Alpha architecture running SPEC CPU2000 benchmarks with the reference input set.
Each trace encompasses first 2 billion instructions (F2B) and 2 billion instructions after skipping 50 billion instructions (M2B).
A trace includes instruction (SBIT - T1.*) and data address (SBDT - C3.*) components, and stream table (STF - tblID.*).
SBIT and SBDT components consist of 101 partial traces (each but last encompassing about 20 million instructions).