source: trunk/lib_ir/AgnerTestP/PMCTest/PMCTest.txt @ 4221

Last change on this file since 4221 was 4221, checked in by linmengl, 5 years ago

initial checkin of Agner Fog's performance script

File size: 14.2 KB
1                    PMCTest.txt                    2014-10-01 Agner Fog
3                    Multi-threaded PMC Test program
5This program is intended for optimizing a little piece of code written in
6C++ or assembly.
8The code to test will be executed a fixed number of times and the
9test results will be output for each repetition. This program measures
10how many clock cycles the code to test takes in each repetition.
11Furthermore, it is possible to set a number of Performance Monitor Counters
12(PMC) to count the number of micro-operations (uops), cache misses,
13branch mispredictions, etc.
14It is possible to run the same code in multiple threads simultaneously in
15order to test resource contention between threads.
17The setup of the Performance Monitor Counters is microprocessor-specific.
18The specifications for possible performance monitor counters for each
19microprocessor family is defined in the table CounterDefinitions in the
20bottom of the file PMCTestA.cpp. It is possible to make additions to this
23(c) Copyright 2000 - 2014 by Agner Fog. GNU General Public License
26System requirements:
28Windows 2000 or later, or Linux, 32 or 64 bit.
29Microsoft, Gnu or Intel C++ compiler.
30MASM, YASM, NASM or JWasm assembler.
35PMCTest uses a kernel mode driver for getting access to the performance
36monitor counters (PMCs). There are three versions of the driver:
37* MSRDriver32.sys is used in 32-bit Windows systems.
38* MSRDriver64.sys is used in 64-bit Windows systems, even if the test
39  program runs in 32 bit mode.
40* MSRdrv is used in Linux systems, both 32 and 64 bit.
42The test program consists of two modules, A and B. The A module, written
43in C++ (PMCTestA.cpp), contains interface to the driver, thread creation,
44locking each thread to a specific CPU core, a table of PMC defitions for
45different microprocessors, and output of the results. All OS-dependent
46functions are in the files PMCTestWin.h and PMCTestLinux.h, which are
47included conditionally from PMCTestA.cpp.
49The B module is where you put in the piece of code to test. The test code
50is repeated in a loop that reads the counters before and after each
51execution of the test code. A dummy loop without the test code measures
52the overhead counts for reading the counters and looping. This overhead
53is subtracted from the test counts. The B module must be compiled or
54assembled after each modification of the test code. There are several
55versions of the B module:
56PMCTestB.cpp     C++ language, 32 and 64 bit mode, Windows and Linux
57PMCTestB32.asm   Assembly language, MASM syntax, 32 bit Windows
58PMCTestB64.asm   Assembly language, MASM syntax, 64 bit Windows
59PMCTestB32.nasm  Assembly language, NASM/YASM syntax, 32 bit Linux or Windows
60PMCTestB64.nasm  Assembly language, NASM/YASM syntax, 64 bit Linux or Windows
62You need only one of the B files. The B file must be compiled or assembled
63and linked together with the compiled A file into an executable file.
65Insert the piece of code you want to test in the B file at the place indicated
66"Test code start". Alternatively, you may put the test code into a separate
67third module and call it from the B module.
69Find the list of possible performance monitor counters at the bottom of PMCTestA.cpp.
70Find the counters you want to use and insert the counter id numbers in the
71B file in the table named "CounterTypesDesired". The maximum number of counters
72you can have depends on the microprocessor.
74Compile for console mode, 32 or 64 bits. Make sure unicode is not enabled.
76If you are using Microsoft Visual Studio 2010 you can open one of the following
77project files:
78A32.vcxproj:  Assembly language, 32-bit mode
79A64.vcxproj:  Assembly language, 64-bit mode
80C32.vcxproj:  C++ language, 32-bit mode
81C64.vcxproj:  C++ language, 64-bit mode
82Similar project files are available for Visual Studio 2008.
85If you are using any other compiler IDE then make a project containing
86PMCTestA.cpp and one of the B files. Turn off unicode support. You may
87need to define a custom build step for the assembly file, e.g.
88Command line: ml64 /c /Zi /Fl $(InputFileName)
89Output: $(InputName).obj
91You may build the test program without an IDE. The following batch or script
92files are provided for convenience:  Linux, 32 bit mode, YASM assembler  Linux, 64 bit mode, YASM assembler
95m32.bat  Windows, 32 bit mode, MASM assembler
96m64.bat  Windows, 64 bit mode, MASM assembler
97n32.bat  Windows, 32 bit mode, NASM assembler
98n64.bat  Windows, 64 bit mode, NASM assembler
99y32.bat  Windows, 32 bit mode, YASM assembler
100y64.bat  Windows, 64 bit mode, YASM assembler
101In Windows, you need to modify the bat files to insert the correct paths for the
102compiler, linker, assembler, library files and header files. These paths depend on
103the compiler version installed.
104In Linux, you need to make the script files executable by:
105chmod 744 *.sh
106Then execute with e.g.  ./
109Installing the driver in Windows:
111The driver files MSRDriver32.sys and MSRDriver64.sys must be available in the same
112directory as the test program. The driver is installed the first time the test
113program runs. If you want to uninstall the driver use uninstall.exe
115Note: If running under 64 bits Windows Vista or Windows 7 or later, then you have
116to press F8 during system boot and select "Disable Driver Signature Enforcement".
118The compiled program must be run with administrator rights.
119If running from Visual Studio or any other IDE then run the IDE as administrator.
120If running from a .bat file then run the .bat file or the command prompt as administrator.
122The driver is not needed if you set USE_PERFORMANCE_COUNTERS to 0 in the B file.
125Installing the driver in Linux:
127Unpack into an empty folder. Then:
129chmod 744
130sudo ./
132See DriverSrcLinux.txt for details. You need to reinstall after reboot.
137Some microprocessors have multiple cores and some processors can run two threads
138in each core. The total number of threads that the processor can run simultaneously is
139the number of cores times the number of threads per core. The maximum number of threads
140to run during the test is limited to 4 or 8 by MAXTHREADS in the file PMCTest.h.
141You can run multiple threads in order to test the influence of multithreading on performance.
142If you run 3 threads on a processor with multiple cores and two threads per core then
143you will have two threads (proc. 0 and 1) in the first core and one thread (proc. 2) in
144the second core. The first two theads are likely to run slower because they are sharing
145the same resources. Make sure the threads do not write to the same cache lines.
147You should use the multithreading feature only when you want to test resource contention
148between threads. The results may be misleading or difficult to interpret.
149The most consistent and reliable results are obtained by running only a single thread.
152Microprocessors supported:
154* Intel microprocessors from Pentium 1 through Pentium 4, Pentium M,
155  Core Solo/Duo, Core 2, Nehalem, Sandy Bridge, Atom.
156* AMD Duron, Athlon, Athlon 64, Opteron, K8, K10, Bobcat, Bulldozer.
157* VIA Nano (The PMCs in VIA Nano are undocumented).
159These microprocessors all have time stamp counters and performance monitor
160counters. The performance monitor counters are microprocessor specific. The
161PMC programs may need modification to work with future microprocessor families.
164User options:
167The following options can be set by modifying the B file:
169REPETITIONS:     The number of times the test code is repeated. The output will show
170                 the results for each repetition.
172NUM_THREADS:     Number of simultaneous threads. Set to 1 unless you are testing
173                 multithreading performance.
175USE_PERFORMANCE_COUNTERS: Set to 1 if you are using performance monitor counters.
177SUBTRACT_OVERHEAD: Set to 1 to subtract program overhead from clock counts and
178                 performance counts. You may set this to 0 if the overhead counts
179                 are unstable because of multithreading.
181OVERHEAD_REPETITIONS: Number of repetitions to measure the program overhead. This
182                 should be more than 1 in order to eliminate cache effects.
184CACHELINESIZE:   The size, in bytes, of cache lines in the microprocessor. This value
185                 is needed in case of multithreading in order to prevent threads from
186                 using the same cache lines.
188CounterTypesDesired: Counter id numbers for the counter types you want to use, e.g.
189                 id 100 for counting micro-operations. The id numbers are CPU-specific.
190                 These are listed in the table CounterDefinitions in PMCTestA.cpp.
192User data:       Any static data that your test code may need.
194User Initializations: Any initializations that your test code may need before the test loop.
196Test code start: Insert the code to test here.
199Defining new event counters:
201You can add new counter types in the table CounterDefinitions in PMCTestA.cpp according
202to the manual for the CPU in question. The fields in the table are:
203id:       An arbitrary id number.
204scheme:   The counter scheme defines whitch model-specific registers to use. This is
205          specific for a particular brand and family of CPUs. Values may be OR'ed if
206          multiple sub-schemes have the same counters.
207cpu:      The CPU type for whitch the counter works. Values may be OR'ed.
208CounterFirst, CounterLast: The range of counter registers that can be used.
209eventreg: Event register, if applicable.
210event:    Identification of the event to count.
211mask:     Bit-mask possibly identifying sub-events.
212name:     A name to show on the output listing. Max. 9 characters.
215Defining new CPUs:
217The function CCounters::GetProcessorVendor() identifies the CPU vendor.
218The function CCounters::GetProcessorFamily() identifies the CPU family and model.
219The function CCounters::GetPMCScheme() finds the appropriate PMC scheme for the
220specific CPU vendor, family and model.
221The function CCounters::DefineCounter(SCounterDefinition & CDef) specifies how to
222set up the necessary model-specific registers for a particular PMC scheme.
223You may get a blue screen error if you attempt to use a model-specific register that
224doesn't exist in the current CPU.
229Cannot load driver:
230    Make sure the *.sys driver files are available in the path.
231    The 64 bit driver MSRDriver64.sys is needed under x64 Windows, even if
232    running in 32 bit mode.
233    If running under Windows Vista or Windows 7: Press F8 during system boot
234    and select "Disable Driver Signature Enforcement".
235    Run as administrator.
236    Ignore the popup message "Windows requires a digitally signed driver".
238Error compiling *.cpp file:
239    Make sure the character set is ASCII, not unicode. In Visual C++ set
240    Project -> Properties -> General -> Character set -> Not Set.
241    Make sure you compile for console mode, unmanaged code.
242    Do not enable precompiled headers. Comment out "#include <intrin.h>" in
243    the .cpp files if the compiler doesn't support intrinsics.
245Unresolved externals:
246    Make sure you define a custom build rule for any .asm file if you are
247    using MS Visual Studio:
248    For PMCTestB32.asm: Right click on PMCTestB32.asm -> Properties ->
249    All Configurations -> Custom Build Step -> Command Line:
250    ml /c /Cx /Zi /Fl $(InputFileName)
251    Outputs: $(InputName).obj
252    For PMCTestB64.asm: Same as above. Use ml64 instead of ml.
254Cannot make counter:
255    The numbers in CounterTypesDesired in the B file should fit the CPU type
256    you are running on. Available numbers are listed in PMCTestA.cpp under
257    CounterDefinitions.
260File list:
262PMCTestA.cpp:       Source code for A module, 32 and 64 bit mode.
263PMCTestB.cpp:       Source code for B module, 32 and 64 bit mode, C++ code.
264PMCTestB32.asm:     Source code for B module, 32 bit, Assembly, MASM syntax
265PMCTestB64.asm:     Source code for B module, 64 bit, Assembly, MASM syntax
266PMCTestB32.nasm:    Source code for B module, 32 bit, Assembly, NASM/YASM syntax
267PMCTestB64.nasm:    Source code for B module, 64 bit, Assembly, NASM/YASM syntax
268PMCTest.h:          C++ include file with class definitions, constants, etc.
269MSRDriver.h:        C++ include file with driver definitions
270MSRdrvL.h:          C++ include file with driver definitions, Linux only
271PMCTestWin.h        C++ file with Windows-specific functions
272PMCTestLinux.h      C++ file with Linux-specific functions
273intrin1.h           C++ include file with intrinsic functions for MS compiler
274                    (Shortened version of intrin.h) Project workspace and project files for MS Visual studio 2008 Project workspace and project files for MS Visual studio 2010
277m32.bat, m64.bat    Windows batch program for building test program, MASM assembler, 32 or 64 bit.
278n32.bat, n64.bat    Windows batch program for building test program, NASM assembler, 32 or 64 bit.
279y32.bat, y64.bat    Windows batch program for building test program, YASM assembler, 32 or 64 bit.,      Linux shell script for building test program, YASM assembler, 32 or 64 bit.,      Linux shell script for building test program, C++, 32 or 64 bit.
282PMCTestA32.obj      PMCTestA.cpp compiled, 32 bit Windows
283PMCTestA64.obj      PMCTestA.cpp compiled, 64 bit Windows
284MSRDriver32.sys:    Kernel mode driver for 32 bit Windows.
285MSRDriver64.sys:    Kernel mode driver for 64 bit Windows.
286uninstall.exe:      Used for unintstalling driver under Windows
287uninstall.cpp:      Source code for uninstall.exe
288PMCTest.txt:        This file. Test scripts that I have used for measuring instruction latency
290                    and throughput. For Ubuntu 64 bit, YASM assembler.   Various other test scripts that I have used. For Ubuntu 64 bit, YASM assembler.
292TemplateB32.nasm:   Used by test scripts in
293TemplateB64.nasm:   Used by test scripts in and
Note: See TracBrowser for help on using the repository browser.