Optimized GPE Emulation Function Analysis (Windows Embedded CE 6.0)
1/6/2010
After you implement the changes for Optimizing a GPE Emulation Function as part of How to Profile and Optimize a Display Driver, and obtain a set of performance profiling data, you must examine the results of the data to determine whether the optimization improved performance.
Monte Carlo Profiling Data
With the optimized code in place, the profiling results should resemble the following output.
Note
For clarity, the tick count, process IDs, and thread IDs have been removed from the following output. Also, the specific timer values vary from run to run because of the manual process of controlling the profiling process.
Kernel Profiler: Gathering MonteCarlo data in buffered mode
ProfileStart() : Allocated 13946 kB for Profiler Buffer (0x48000000)
Starting profile timer at 200 uS rate
ProfApp: Took 3666 ms to perform blts.
Kernel Profiler: Looking up symbols for 42478 hits.
.
.
(Additional lines omitted for clarity.)
.
.
.
Total samples recorded = 42478
Module Hits Percent
------------ ---------- -------
nk.exe 23679 55.7
ddi_flat.dll 18037 42.4
gwes.dll 645 1.5
coredll.dll 87 0.2
ProfApp.exe 16 0.0
fsdmgr.dll 4 0.0
relfsd.dll 2 0.0
kbdmouse.dll 1 0.0
UNKNOWN 7 0.0
Hits Percent Address Module Routine
---------- ------- -------- ------------:---------------------
22683 53.3 802351c2 nk.exe :_IDLE_STATE
14065 33.1 03dbac20 ddi_flat.dll:?EmulatedBltSrcCopy1616Convert
3340 7.8 03db2068 ddi_flat.dll:?CursorOn
300 0.7 03db2268 ddi_flat.dll:?CursorOff
182 0.4 0003d906 gwes.dll :?dwRealizeColor
176 0.4 8025a53b nk.exe :_NE2000_READ_PORT_UCHAR
171 0.4 8025a52f nk.exe :_NE2000_WRITE_PORT_UCHAR
164 0.3 80229a7f nk.exe :_PerfCountSinceTick
89 0.2 03dbb680 ddi_flat.dll:?EmulatedBltFill16
56 0.1 8024855c nk.exe :_ObjectCall
55 0.1 80243470 nk.exe :_ZeroPage
(Additional lines omitted for clarity.)
1 0.0 03db8e90 ddi_flat.dll:?ScanLine
1 0.0 03c4265f kbdmouse.dll:_KeybdDriverVKeyToUnicode
23 0.0 :<UNACCOUNTED FOR>
For more information, see Monte Carlo Profiling.
DispPerf.exe Data
The following tables show the single set of data from the second run of ProfApp.exe broken into several smaller tables.
The following table shows the overall summary of the number of times each raster operation (ROP) was called.
For information about converting the ROP codes reported by DispPerf to the ROP codes listed in Ternary Raster Operations, see Display Driver Performance Profiling.
RopCode | cTotal |
---|---|
0x0000CCCC |
1063 |
0x0000F0F0 |
6337 |
0x00008888 |
59 |
0x00006666 |
52 |
0x0000AAF0 |
4628 |
0x0000EEEE |
7 |
0xFEFEFFF1 |
49 |
0x0000E2E2 |
1 |
0x00005555 |
438 |
The following table shows the profiling results from all the ROPs performed by GPE functions.
RopCode | cGPE | dwGPETime | Avg.GPETime |
---|---|---|---|
0x0000CCCC |
47 |
67082 |
1427 |
0x0000F0F0 |
754 |
23428 |
31 |
0x00008888 |
59 |
51740 |
876 |
0x00006666 |
52 |
28191 |
542 |
0x0000AAF0 |
4056 |
40499 |
9 |
0x0000EEEE |
0 |
0 |
0 |
0xFEFEFFF1 |
49 |
145383 |
2967 |
0x0000E2E2 |
1 |
353 |
353 |
0x00005555 |
0 |
0 |
0 |
The following table shows the profiling results from all the ROPs performed emulation functions. For more information, see BitBlT Emulation Library Functions.
RopCode | cEmul | dwEmulTime | Avg.EmulTime |
---|---|---|---|
0x0000CCCC |
1016 |
4412339 |
4342 |
0x0000F0F0 |
5583 |
435198 |
77 |
0x00008888 |
0 |
0 |
0 |
0x00006666 |
0 |
0 |
0 |
0x0000AAF0 |
572 |
36957 |
64 |
0x0000EEEE |
7 |
23902 |
3414 |
0xFEFEFFF1 |
0 |
0 |
0 |
0x0000E2E2 |
0 |
0 |
0 |
0x00005555 |
438 |
48659 |
111 |
The DispPerf results do not show profiling results for hardware calls because the settings in How to Profile and Optimize a Display Driver are based on the FLAT driver, which is a general purpose driver that does not make use of advanced hardware capabilities.
Analysis of the Profiling Results After Optimization
The Monte Carlo results for the optimized driver show that the function MaskedSrcToMaskedDst is no longer a factor in the driver's performance because it is never called. The tasks that were handled by this function are now handled by the new emulated GPE function EmulatedBltSrcCopy1616Convert instead.
When reviewing Monte Carlo profiling results, remember that the results only show the proportion of time, not the absolute amount of time, spent in that function.
The second set of profiling results show that just over 93 percent of the non-idle profiling time was spent in the driver, whereas the original run spent about 98 percent of its non-idle time in the driver.
Do not interpret these results as a 5 percent increase in performance because there is no way to know the absolute times behind these percentages when analyzing Monte Carlo results.
The results from DispPerf.exe do show absolute times spent in the various ROPs. Whereas the original run spent a combined total of nearly 42 seconds for all of the ROPs, the optimized run spent a combined total of just over 5 seconds.
The dramatic difference can be seen in ROP 0x0000CCCC (SRCCOPY), which was the focus of the optimization efforts.
Although the amount of time spent for this ROP in the emulation libraries rose significantly, as expected, from an original total of 0.09 seconds to 4.4 seconds, this was more than offset by the reduction in time spent in GPE calls, which decreased from 40.9 seconds to 0.06 seconds.
ProfApp.exe is written to report its execution time in the debug output. This allows for an independent check against the results from the profiling tools.
The first run of ProfApp.exe spent 34,297 milliseconds (ms) blitting to the screen.
After optimizing the color format conversion in the driver, the second run of ProfApp.exe spent 3,666 ms blitting to the screen.
This represents a 90 percent gain in total performance after optimization.
See Also
Concepts
How to Profile and Optimize a Display Driver
Display Driver Performance