Optimizing a GPE Emulation Function (Windows Embedded CE 6.0)
1/6/2010
This topic describes changes to the implementation of the function EmulatedBlt_Internal to improve the performance for the FLAT display driver in How to Profile and Optimize a Display Driver.
These changes are made to specifically address the specific scenario presented in that example, namely RGB555 to RGB565 color conversion.
You could apply this optimization to all the drivers and all the OS designs that you usually work with. The tradeoff for doing this, however, is an increase in the footprint and complexity of your display driver in exchange for an optimization for a scenario that is actually somewhat rare in most general applications.
As indicated in Display Driver Performance, the default emulation functions are good candidates for optimization because they can be replaced with more efficient implementations that use specific hardware functionality, specialized knowledge of the usage scenario, or both.
The following steps describe how to add code to your OS design to improve the performance of ProfApp.exe.
To optimize EmulatedBlt_Internal for RGB555 to RGB565 color conversion
In your Emulrotate project, replace line 186 of ebbltsel.cpp with the following code so that this
if
statement is nested within the existingif
statement:if (!(pParms->pConvert && pParms->rop4 == 0xCCCC && pParms->pSrc->Format() == gpe16Bpp)) { return S_OK; // This was originally line 186. }
This code creates a special case for 16-bit-per-pixel (bpp) format conversions within the function EmulatedBltSelect16.
Insert the following code into ebbltsel.cpp at what was originally line 238 to add another
else if
clause to the existingif
statement.Depending on how you formatted your code changes in the previous step, the correct spot for this code should be approximately line 244.
else if ((pParms->pSrc->Format() == gpe16Bpp) && (NULL == pParms->pLookup) && (pParms->pConvert)) { pParms->pBlt = FUNCNAME(BltSrcCopy1616Convert); }
This creates a special code path to a new function to handle 16-bpp format conversions.
Insert the following code into public\common\oak\inc\emul.h at line 138:
SCODE EmulatedBltSrcCopy1616Convert( GPEBltParms * );
This code prototypes a new function to perform 16-bpp color format conversions.
In your Emulrotate project, add the following code to ebcopy16.cpp:
SCODE Emulator::EmulatedBltSrcCopy1616Convert(GPEBltParms* pBltParms) { // Source-related information. PRECTL prcSrc = pBltParms->prclSrc; UINT32 iScanStrideSrc = pBltParms->pSrc->Stride()/sizeof(WORD); WORD *pwScanLineSrc = (WORD *)pBltParms->pSrc->Buffer() + prcSrc->top * iScanStrideSrc + prcSrc->left; // Destination-related information. PRECTL prcDst = pBltParms->prclDst; UINT32 iScanStrideDst = pBltParms->pDst->Stride()/sizeof(WORD); WORD *pwScanLineDst = (WORD *)pBltParms->pDst->Buffer() + prcDst->top * iScanStrideDst + prcDst->left; int cRows = prcDst->bottom - prcDst->top; int cCols = prcDst->right - prcDst->left; // Copy source before overwriting. if (!pBltParms->yPositive) { // Scan from end of memory, and negate stride. pwScanLineSrc += iScanStrideSrc * (cRows - 1); pwScanLineDst += iScanStrideDst * (cRows - 1); iScanStrideSrc = (UINT32)-(INT32)iScanStrideSrc; iScanStrideDst = (UINT32)-(INT32)iScanStrideDst; } if (!pBltParms->xPositive) { // Copy from right to left. for (int row = 0; row < cRows; row++) { WORD *pwPixelDst = pwScanLineDst + cCols - 1; WORD *pwPixelSrc = pwScanLineSrc + cCols - 1; while (pwPixelDst >= pwScanLineDst) { *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1); pwPixelDst--; pwPixelSrc--; } pwScanLineSrc += iScanStrideSrc; pwScanLineDst += iScanStrideDst; } } else { // Copy from left to right. for (int row = 0; row < cRows; row++) { WORD *pwPixelDst = pwScanLineDst; WORD *pwPixelSrc = pwScanLineSrc; WORD *pwLim = pwPixelDst + cCols; BOOL bPreWord; BOOL bPostWord; DWORD * pdwPixelDst; DWORD * pdwLim; bPreWord = ((WORD)pwPixelDst & 2) ? TRUE : FALSE; if (bPreWord) { *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1); pwPixelDst++; pwPixelSrc++; } pdwPixelDst = (DWORD *)pwPixelDst; pdwLim = (DWORD *)((DWORD)pwLim & (~3)); bPostWord = ((WORD)pwLim & 2) ? TRUE : FALSE; while (pdwPixelDst < pdwLim) { DWORD dwSrc = (*pwPixelSrc << 16); pwPixelSrc++; dwSrc |= *pwPixelSrc; pwPixelSrc++; dwSrc = (dwSrc & 0x1F001F) | ((dwSrc & 0x7FE07FE0) << 1); *pdwPixelDst = dwSrc; pdwPixelDst++; } pwPixelDst = (WORD *)pdwPixelDst; if (bPostWord) { *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1); pwPixelDst++; pwPixelSrc++; } pwScanLineSrc += iScanStrideSrc; pwScanLineDst += iScanStrideDst; } } return S_OK; }