/************************************************************************************* * Multi2Sim Simulator @ LaCASA Laboratory (lacasa.uah.edu) * This file shows the Multi2Sim x86 configuration parameters for detailed simulation, * including their default values. * * Authors: Aleksandar Milenkovic, Amrish K. Tewar * * Email: milenkovic@computer.org; akt0001@uah.edu * * Date: October 2014 *************************************************************************************/ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< [milenka@eb136i-nsf02 m2s]$ m2s --x86-help ; Multi2Sim 4.2 - A Simulation Framework for CPU-GPU Heterogeneous Computing ; Please use command 'm2s --help' for a list of command-line options. ; Simulation alpha-numeric ID: yY1xK The x86 CPU configuration file is a plain text INI file, defining the parameters of the CPU model used for a detailed (architectural) simulation. This configuration file is passed to Multi2Sim with option '--x86-config , which must be accompanied by option '--x86-sim detailed'. The following is a list of the sections allowed in the CPU configuration file, along with the list of variables for each section. Section '[ General ]': Frequency = (Default = 1000 MHz) Frequency in MHz for the x86 CPU. Value between 1 and 10K. Cores = (Default = 1) Number of cores. Threads = (Default = 1) Number of hardware threads per core. The total number of computing nodes in the CPU model is equals to Cores * Threads. FastForward = (Default = 0) Number of x86 instructions to run with a fast functional simulation before the architectural simulation starts. ContextQuantum = (Default = 100k) If ContextSwitch is true, maximum number of cycles that a context can occupy a CPU hardware thread before it is replaced by other pending context. ThreadQuantum = (Default = 1k) For multithreaded processors (Threads > 1) configured as coarse-grain multi- threading (FetchKind = SwitchOnEvent), number of cycles in which instructions are fetched from the same thread before switching. ThreadSwitchPenalty = (Default = 0) For coarse-grain multithreaded processors (FetchKind = SwitchOnEvent), number of cycles that the fetch stage stalls after a thread switch. RecoverKind = {Writeback|Commit} (Default = Writeback) On branch misprediction, stage in the execution of the mispredicted branch when processor recovery is triggered. RecoverPenalty = (Default = 0) Number of cycles that the fetch stage gets stalled after a branch misprediction. PageSize = (Default = 4kB) Memory page size in bytes. DataCachePerfect = {t|f} (Default = False) ProcessPrefetchHints = {t|f} (Default = True) If specified as false, the cpu will ignore any prefetch hints/instructions. PrefetchHistorySize = (Default = 10) Number of past prefetches to keep track of, so as to avoid redundant prefetches from being issued from the cpu to the cache module. InstructionCachePerfect = {t|f} (Default = False) Set these options to true to simulate a perfect data/instruction caches, respectively, where every access results in a hit. If set to false, the parameters of the caches are given in the memory configuration file Section '[ Pipeline ]': FetchKind = {Shared|TimeSlice|SwitchOnEvent} (Default = TimeSlice) Policy for fetching instruction from different threads. A shared fetch stage fetches instructions from different threads in the same cycle; a time-slice fetch switches between threads in a round-robin fashion; option SwitchOnEvent switches thread fetch on long-latency operations or thread quantum expiration. DecodeWidth = (Default = 4) Number of x86 instructions decoded per cycle. DispatchKind = {Shared|TimeSlice} (Default = TimeSlice) Policy for dispatching instructions from different threads. If shared, instructions from different threads are dispatched in the same cycle. Otherwise, instruction dispatching is done in a round-robin fashion at a cycle granularity. DispatchWidth = (Default = 4) Number of microinstructions dispatched per cycle. IssueKind = {Shared|TimeSlice} (Default = TimeSlice) Policy for issuing instructions from different threads. If shared, instructions from different threads are issued in the same cycle; otherwise, instruction issue is done round-robin at a cycle granularity. IssueWidth = (Default = 4) Number of microinstructions issued per cycle. CommitKind = {Shared|TimeSlice} (Default = Shared) Policy for committing instructions from different threads. If shared, instructions from different threads are committed in the same cycle; otherwise, they commit in a round-robin fashion. CommitWidth = (Default = 4) Number of microinstructions committed per cycle. OccupancyStats = {t|f} (Default = False) Calculate structures occupancy statistics. Since this computation requires additional overhead, the option needs to be enabled explicitly. These statistics will be attached to the CPU report. Section '[ Queues ]': FetchQueueSize = (Default = 64) Size of the fetch queue given in bytes. UopQueueSize = (Default = 32) Size of the uop queue size, given in number of uops. RobKind = {Private|Shared} (Default = Private) Reorder buffer sharing among hardware threads. RobSize = (Default = 64) Reorder buffer size in number of microinstructions (if private, per-thread size). IqKind = {Private|Shared} (Default = Private) Instruction queue sharing among threads. IqSize = (Default = 40) Instruction queue size in number of uops (if private, per-thread IQ size). LsqKind = {Private|Shared} (Default = 20) Load-store queue sharing among threads. LsqSize = (Default = 20) Load-store queue size in number of uops (if private, per-thread LSQ size). RfKind = {Private|Shared} (Default = Private) Register file sharing among threads. RfIntSize = (Default = 80) Number of integer physical register (if private, per-thread). RfFpSize = (Default = 40) Number of floating-point physical registers (if private, per-thread). RfXmmSize = (Default = 40) Number of XMM physical registers (if private, per-thread). Section '[ TraceCache ]': Present = {t|f} (Default = False) If true, a trace cache is included in the model. If false, the rest of the options in this section are ignored. Sets = (Default = 64) Number of sets in the trace cache. Assoc = (Default = 4) Associativity of the trace cache. The product Sets * Assoc is the total number of traces that can be stored in the trace cache. TraceSize = (Default = 16) Maximum size of a trace of uops. BranchMax = (Default = 3) Maximum number of branches contained in a trace. QueueSize = (Default = 32) Size of the trace queue size in uops. Section '[ FunctionalUnits ]': The possible variables in this section follow the format . = where refers to a functional unit type, and refers to a property of it. Possible values for are: IntAdd Integer adder IntMult Integer multiplier IntDiv Integer divider EffAddr Operator for effective address computations Logic Operator for logic operations FloatSimple Simple floating-point operations FloatAdd Floating-point adder FloatComp Floating-point comparator FloatMult Floating-point multiplier FloatDiv Floating-point divider FloatComplex Operator for complex floating-point computations XMMIntAdd XMM integer adder XMMIntMult XMM integer multiplier XMMIntDiv XMM integer Divider XMMLogic XMM logic operations XMMFloatAdd XMM floating-point adder XMMFloatComp XMM floating-point comparator XMMFloatMult XMM floating-point multiplier XMMFloatDiv XMM floating-point divider XMMFloatConv XMM floating-point converter XMMFloatComplex Complex XMM floating-point operations Possible values for are: Count Number of functional units of a given kind. OpLat Latency of the operator. IssueLat Latency since an instruction was issued until the functional unit is available for the next use. For pipelined operators, IssueLat is smaller than OpLat. Section '[ BranchPredictor ]': Kind = {Perfect|Taken|NotTaken|Bimodal|TwoLevel|Combined} (Default = TwoLevel) Branch predictor type. BTB.Sets = (Default = 256) Number of sets in the BTB. BTB.Assoc = (Default = 1024) Number of entries of the bimodal branch predictor. Choice.Size = (Default = 1024) Number of entries for the choice predictor. RAS.Size = (Default = 32) Number of entries of the return address stack (RAS). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> END_OF_FILE