RN: milgpu

-------------------------------------------------------------------------------
                       Matrox Imaging Library (MIL) 10.0
                           Release Notes (milgpu)
                               December, 2013
            (c) Copyright Matrox Electronic Systems Ltd., 1992-2013.
-------------------------------------------------------------------------------

Main table of contents

Section 1 : Differences between MIL 10.0 and MIL 9.0 Update 35
Section 2 : Differences between MIL 9.0 Update 35 and MIL 9.0 Update 30
Section 3 : Differences between MIL 9.0 Update 30 and MIL 9.0 Update 14
Section 4 : Differences between MIL 9.0 Update 14 and MIL 9.0 Update 3
Section 5 : Differences between MIL 9.0 Update 3 and MIL 9.0
Section 6 : MIL 9.0 GPU (Graphics Processing Unit) accelerations 
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 1: Differences between MIL10.0 and MIL 9.0 Update 35

1. Overview

-------------------------------------------------------------------------------

1. Overview

   MIL 10.0 GPU processing includes all the features of MIL 9.0 Update 35.

   MIL GPU processing is generally compatible with GPUs that are themselves 
   compatible with DirectX 9 and up. The specific GPU you are using, as well 
   as its driver implementation, can affect the processing speed, accuracy, 
   and precision of your results. MIL GPU processing has been validated with 
   recent (but not necessarily the latest) GPUs and drivers from AMD and 
   NVIDIA.

   Note that Matrox GPU processing is not supported for Intel graphics. This 
   includes the Matrox 4Sight GP.

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 2: Differences between MIL 9.0 Update 35 and MIL 9.0 Update 30

Table of Contents for Section 1

1. Overview
2. GPU acceleration restrictions
3. New GPU functionalities and improvements
   3.1.  GPU accelerated image processing operations
      3.1.1.  MimHistogram
4. GPU specific examples
   4.1.  MilInteropWithCUDA (Updated)
   4.2.  MilInteropWithDX (Updated)
   4.3.  MilInteropWithOpenCL (New)
5. Is my application running on the GPU?
   5.1.  Deactivate MIL Host processing compensation
   5.2.  Windows SysInternals Process Explorer v15.0 (c) tool
6. Do MIL GPU results and precision change between updates?
   6.1.  MIL GPU algortihms
   6.2.  Graphics card firmware
7. Fixed bugs
   7.1.  All modules (DX9, DX10, DX11)
8. GPU boards
   8.1.  GPU board limitations
-------------------------------------------------------------------------------

1. Overview

   - New DirectX 11 and DirectCompute support (shader models 5.0)
      - Minimal requirement: Microsoft Windows Vista with SP2 (or later), 
                             Windows 7


        Note: Make sure that Windows Update 971512 (Windows Graphics, Imaging,
              and XPS Library) is installed to use DirectX 11 and Direct
              Compute in Windows Vista with SP2.
              (See http://support.microsoft.com/kb/971512)

   - General performance improvements and bug fixes (DirectX 9 and 10)

   - Interoperability support with OpenCL (through DX10/OpenCL 
     interoperability)
      - See MilInteropWithOpenCL specific example

2. GPU acceleration restrictions

   - Before MIL 9.0 Udpate 35, a monitor had to be connected to a graphics card
     to benefit from GPU acceleration. Installing MIL 9.0 Update 35 allows GPU
     acceleration on a graphics card with or without its outputs connected to a
     monitor.

     Requirements: - DX10 or DX11
                   - Windows Vista with SP2 or later, Windows 7
                   - WDDM 1.1 compatible graphics card driver

     Note: Make sure that Windows Update 971512 (Windows Graphics, Imaging, and
           XPS Library) is installed to have WDDM 1.1 driver model support in
           Windows Vista with SP2.
           (See http://support.microsoft.com/kb/971512)

   - DirectX versions supported by MIL GPU in your system are those supported
     by all detected graphics adapters. If, for example, your system is
     equipped with:

      - 1 Intel HD 2000          (Intel Core-i7 2600 integrated GPU)
      - 1 AMD Radeon HD 6970     (discrete graphics adapter)
      - 1 NVIDIA GeForce GTX 480 (discrete graphics adapter)

     DirectX 9 and 10 will be supported by MIL GPU. To get DirectX 11 support,
     you would have to disable the Intel integrated GPU (through a BIOS option
     in this example).

   - M_MAPPABLE buffer attribute is no longer supported. This flag was allowed
     for Host memory GPU buffers (M_HOST_MEMORY) but it will now generate a MIL
     error.

3. New GPU functionalities and improvements

   3.1.  GPU accelerated image processing operations

      3.1.1.  MimHistogram (DX11)

         - Supports M_MONO8, M_MONO16, floating-point and packed-binary
           source buffers.
         - Special very fast optimization for 8-bit histograms.

4. GPU specific examples

   Visual Studio solutions including all projects for GPU specific examples
   were removed for compatibility purposes. Go to each specific example
   folder to open individual solutions and projects.

   4.1.  MilInteropWithCUDA

      - Updated from original version in MIL 9.0 update 14 to support 
        Windows 7 through NVIDIA CUDA toolkit 4.0
      - Support for Visual Studio 2003 has been removed (NVIDIA CUDA toolkit
        4.0 restriction)
      - Support for Visual Studio 2008 has been added

      This example demonstrates how it is possible to apply custom CUDA
      kernels on MIL buffers when needed, and how MIL handles everything else
      from buffer allocations to onscreen display.

      The second part of the example shows the same processing entirely done
      with MIL.

      Requirements: - An NVIDIA CUDA-compatible GPU
                    - Install latest GPU driver
                    - Install latest Microsoft DirectX SDK
                    - Install NVIDIA CUDA Toolkit 4.0
                            
   4.2.  MilInteropWithDX

      - Updated from orignal example MilInteropWithDX9 included in MIL 9.0
        update 14.

      This example demonstrates how it is possible to apply custom DirectX 9
      and 10, or DirectCompute shaders on MIL buffers when needed, and how
      MIL handles everything else from buffer allocations to onscreen display.

      The second part of the example shows the same processing entirely done
      with MIL.

      Requirements: - Install latest GPU driver
                    - Install latest Microsoft DirectX SDK

      Note: Microsoft Visual Studio 2005 (or later) and a DirectX 10 compatible
            GPU are needed to run the DirectX 10 interoperability
            functionality.

      Note: Microsoft Visual Studio 2008 (or later) and a DirectX 11 compatible
            GPU are needed to run the DirectCompute interoperability
            functionality.

   4.3.  MilInteropWithOpenCL

      This new example demonstrates how it is possible to apply custom OpenCL
      kernels on MIL buffers when needed, and how MIL handles everything else
      from buffer allocations to onscreen display.

      The second part of the example shows the same processing entirely done
      with MIL.

      Requirements: - Windows Vista and later
                    - An OpenCL-compatible GPU (OpenCL 1.1)
                    - Install latest GPU driver
                    - Install latest Microsoft DirectX SDK
                    - Install NVIDIA CUDA Toolkit 4.0 (for NVIDIA GPUs)
                        or
                    - Install AMD APP SDK 2.5 (for AMD GPUs)

5. Is my application running on the GPU?

   MIL currently does not provide an explicit way to know if a function was
   executed on the GPU or on the Host CPU. This information must be obtained
   through two implicit means: the first one is achieved with MIL, while the
   second one requires a third-party tool.

   5.1.  Deactivate MIL Host processing compensation

      Add this call in your application to disable Host compensation:

      MappControl(M_PROCESSING, M_COMPENSATION_DISABLE);

      This will cause all following processing calls that cannot be performed
      by the GPU to generate a MIL error. Refer to MappControl documentation
      for more information on Host processing compensation.

   5.2.  Windows SysInternals Process Explorer v15.0 (c) tool

      (http://technet.microsoft.com/en-us/sysinternals/bb896653)

      Beginning with version 15.0, the Process Explorer (c) tool includes a
      GPU usage meter similar to the Windows Task Manager performance meters.
      This tool can be used to determine if the GPU usage increases when a
      specific MIL application or function is running. According to the tool
      documentation, the GPU usage meter is supported on Windows Vista and
      later only. Go to the Windows SysInternals website for download and
      documentation.

6. Do MIL GPU results and precision change between updates?

   6.1.  MIL GPU algortihms

      When GPU support for a function is added in a MIL GPU update, it is
      possible that its results and precision will be different from its MIL
      Host counterpart. It is hardly possible to guarantee identical results
      with different hardware (CPU, GPU, FPGA, ...). However, once GPU support
      for a function is added in a MIL GPU update, the results and precision
      of this function should not change in following updates (unless stated
      otherwise through specific optimizations or bug fixes).

   6.2.  Graphics card firmware

      Some GPU functionalities are based on a firmware installed with your
      graphics card driver. While this firmware does not necessarily change
      with each driver version, some MIL GPU results could change between two
      graphics card driver versions.

7. Fixed bugs

   7.1.  All modules (DX9, DX10)

      - Fix: Now returning an explicit error when trying to allocate a remote
             GPU system and DMIL server is configured as a Windows Service.
      - Fix: MbufTransfer(M_CLEAR) could fail if destination buffer was
             already locked on a system different from GPU.
      - Fix: Internal DLLs are now unloaded when freeing the last allocated
             GPU system (MsysFree).
      - Fix: MimResize was compensated on Host if one of M_FAST or M_REGULAR
             flags was added to InterpolationMode parameter.
      - Fix: MimLutMap results were incorrect with a 16-bit monochrome source,
             and color LUT and destination.

8. GPU boards
   
   8.1. GPU board limitations

      - NVIDIA driver family 197.xx has shown instability issues in DirectX 10
        (which is the default acceleration mode in Windows Vista and 
        Windows 7). Revert to driver 196.21 if your MIL GPU application does 
        not behave as expected. This issue does not affect DirectX 9 GPU 
        acceleration.
   
      - Floating-point exceptions can occur in NVIDIA drivers for older cards
        (GeForce 8000 series), in DirectX 10.0 version of some functions on
        64-bit Vista. Note that these occurrences are only possible if
        floating-point exceptions, which are usually disabled by default, are
        enabled in your application.

      - Your graphics card driver might not start if the BIOS option mapping
        PCI resources over 4 GB (64-bit IO mapping) is enabled. Workarounds
        are updating your graphics card driver or disabling this BIOS option.
        Note that this limitation is not restricted to 64-bit operating
        systems.


-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 3: Differences between MIL 9.0 Update 30 and MIL 9.0 Update 14

Table of Contents for Section 2

1. Overview
2. GPU acceleration restrictions
3. New GPU functionalities and improvements
   3.1.  Managing buffers using a GPU system
      3.1.1.  MbufBayer
      3.1.2.  MbufClear
      3.1.3.  MbufCopy
      3.1.4   MbufCopyMask
      3.1.5.  MbufTransfer
   3.2.  GPU accelerated image processing operations
      3.2.1.  MgenLutFunction
      3.2.2.  MgenLutRamp
      3.2.3.  MimArith
      3.2.4.  MimArithMultiple
      3.2.5.  MimBinarize
      3.2.6.  MimConvert
      3.2.7.  MimDilate
      3.2.8.  MimErode
      3.2.9   MimMorphic
      3.2.10. MimThick
      3.2.11. MimThin
4. Fixed bugs
   4.1.  Both modules (DX9, DX10)
   4.2.  DirectX 10 specific
5. GPU boards
   5.1.  GPU board limitations
-------------------------------------------------------------------------------

1. Overview

   - General performance improvements and bug fixes (DirectX 9 and 10)

   - New packed binary buffers support in DirectX 10
   - New packed YUV16 buffers support (DirectX 9 and 10)
   - Refer to section 3 for specific function improvements and restrictions
   - Refer to section 4 for specific bug fixes

2. GPU acceleration restrictions

   - Kernels and structuring elements cannot be allocated in video memory.
   - No support for packed binary buffers and M_BGR32 packed buffers copy
     and conversions. First convert M_BGR32 to 8-bit monochrome, then to
     packed binary.
   - Buffer width (X size) must be a multiple of 32 for packed binary buffers.
   - The X offset of a packed binary child buffer relative to its ancestor
     must be a multiple of 32 and the width of the child must be a multiple
     of 32.
   - Buffer width (X size) must be a multiple of 2 for packed YUV16 buffers.
   - The X offset of a packed YUV16 child buffer relative to its ancestor
     must be a multiple of 2 and the width of the child must be a multiple
     of 2.

3. New GPU functionalities and improvements

   3.1.  Managing buffers using a GPU system

      3.1.1.  MbufBayer

         - Now supports YUV16 packed destination buffers (DX9, DX10).

      3.1.2.  MbufClear

         - Now supports packed binary buffers (DX10).

      3.1.3.  MbufCopy

         - Now supports packed binary buffers (DX10).
         - Now supports YUV16 packed buffers, including conversion to and
           from BGR32 packed buffers (DX9, DX10).

      3.1.4.  MbufCopyMask

         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed buffers (DX10).

      3.1.5.  MbufTransfer

         - M_COPY and M_CLEAR modes support packed binary buffers (DX10).
         - M_COPY and M_CLEAR modes support band selection for multiband
           buffers (DX9, DX10).
         - M_COPY supports YUV16 packed buffers, including conversion to and
           from BGR32 packed buffers buffers (DX9, DX10).

   3.2.  GPU accelerated image processing operations

      3.2.1.  MgenLutFunction

         - Now supports all sizes and offsets (DX9, DX10).

      3.2.2.  MgenLutRamp

         - Now supports all sizes and offsets (DX9, DX10).

      3.2.3.  MimArith

         - Now supports packed binary buffers for logical operations (DX10).

      3.2.4.  MimArithMultiple

         - Constants can have any floating-point value (DX9, DX10).
         - M_OFFSET_GAIN mode now supports mixed-sign buffers (DX9, DX10).

      3.2.5.  MimBinarize

         - Now supports packed binary buffers (DX10).

      3.2.6.  MimConvert

         - Now supports conversion between BGR32 packed and YUV16 packed
           buffers (DX9, DX10).

      3.2.7.  MimDilate

         - M_BINARY mode now supports packed binary buffers. Both source and
           destination buffers must be packed binary (DX10).

      3.2.8.  MimErode

         - M_BINARY mode now supports packed binary buffers. Both source and
           destination buffers must be packed binary (DX10).

      3.2.9.  MimMorphic

         - M_BINARY mode now supports packed binary buffers. Both source and
           destination buffers must be packed binary (DX10).

      3.2.10. MimThick

         - Now supports M_BINARY mode (DX9, DX10).
         - M_BINARY mode now supports packed binary buffers. Both source and
           destination buffers must be packed binary (DX10).

      3.2.11. MimThin

         - Now supports M_BINARY mode (DX9, DX10).
         - M_BINARY mode now supports packed binary buffers. Both source and
           destination buffers must be packed binary (DX10).
         - M_BINARY2 mode support for 8-bit buffers (DX10). Both source and
           destination buffers must be 8-bit.

4. Fixed bugs

   4.1.  Both modules (DX9, DX10)

      - Fix: MimConvolve could produce incorrect results with a separable
             kernel when source and destination buffers had different
             pixel size.
      - Fix: Application could crash when calling MimArith with the same
             buffer (i.e. MimArith(Buf, Buf, Buf, ...));
      - Fix: Now returing an error when an internal resource allocation fails.

   4.2.  DirectX 10 specific

      - Fix: MbufCopyCond was always compensated on Host.


5. GPU boards

   5.1. GPU board limitations

      - NVIDIA driver family 197.xx has shown instability issues in DirectX 10
        (which is the default acceleration mode in Windows Vista and 
        Windows 7). Revert to driver 196.21 if your MIL GPU application does 
        not behave as expected. This issue does not affect DirectX 9 GPU 
        acceleration.
   
      - Floating-point exceptions can occur in NVIDIA drivers for older cards
        (GeForce 8000 series), in DirectX 10.0 version of some functions on
        64-bit Vista. Note that these occurrences are only possible if
        floating-point exceptions, which are usually disabled by default, are
        enabled in your application.

      - Your graphics card driver might not start if the BIOS option mapping
        PCI resources over 4 GB (64-bit IO mapping) is enabled. Workarounds
        are updating your graphics card driver or disabling this BIOS option.
        Note that this limitation is not restricted to 64-bit operating
        systems.


-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 4: Differences between MIL 9.0 Update 14 and MIL 9.0 Update 3

Table of Contents for Section 2

1. Overview
2. New GPU functionalities and improvements
   2.1.  Managing buffers using a GPU system
      2.1.1.  MbufBayer
      2.1.2.  MbufCopy
   2.2.  GPU accelerated image processing operations
      2.2.1.  MimArith
      2.2.2.  MimCountDifference
      2.2.3.  MimShift
   2.3.  GPU system allocation and control
      2.3.1.  MsysAlloc
3. Fixed bugs
   3.1.  DirectX 9 module
4. GPU specific examples
   4.1.  MilInteropWithCUDA
   4.2.  MilInteropWithDX9
5. GPU boards
   5.1.  GPU board limitations
-------------------------------------------------------------------------------

1. Overview

   - New DirectX 10.0 support (pixel and vertex shader models 4.0)
      - Minimal requirement: Microsoft Windows Vista (or later)

      - Significant improvement of transfer rates between Host and Video memory

        versus DirectX 9 (can reach more than 5 GB/s in PCIe 2.0 systems)


   - General performance improvements and bug fixes (DirectX 9)

   - Refer to section 2 for specific function improvements and restrictions

2. New GPU functionalities and improvements

   2.1.  Managing buffers using a GPU system

      2.1.1.  MbufBayer (DX9)

         - Performance improvement.

      2.1.2.  MbufCopy (DX10)

         - Supported copies :

              Source          ->   Destination
              any type             any type

   2.2.  GPU accelerated image processing operations

      2.2.1.  MimArith (DX10)

         - Supports all operations.

      2.2.2.  MimCountDifference (DX9)

         - Performance improvement.

      2.2.3.  MimShift (DX10)

         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-point
           buffers.

   2.3.  GPU system allocation and control

      2.3.1.  MsysAlloc (DX9, DX10)

         - M_COMPLETE mode now displays a progress bar showing effects

           compilation status. Use M_COMPLETE+M_SILENT to disable the

           display of the progress bar dialog.


3. Fixed bugs

   3.1.  DirectX 9 module

      - Fix: copy between BGR32 and float was accepted but not handled. It is
        now refused.
      - Fix: convolution with some offcentered kernel, mirror overscan and 16-
        bit buffers were processed by same code as 8-bit buffers.
      - Fix: some extreme MimPolarTransform could be imprecise on some ATI 
        GPUs.
      - Fix: white-balance coefficients were not applied in MbufBayer as soon
        as one of them was equal to 1.0.
      - Fix: MimClip now casts condition values as source type and write values
        as destination type.
      - Fix: MimCountDifference could return wrong result with tiny buffers on
        old generation ATI GPUs.
      - Fix: MimPolarTransform could return wrong results with BGR32 buffers.
      - Fix: MappFree while in a device lost state could result in a deadlocked
        application.
      - Fix: a device lost happening in some specific context during a function
        call could result in a crash.

4. GPU specific examples

   4.1.  MilInteropWithCUDA

      This small example demonstrates how it is possible to apply custom CUDA
      kernels on MIL buffers when needed, and how MIL handles everything else
      from buffer allocations to onscreen display.

      The second part of the example shows the same processing entirely done
      with MIL.

      Requirements: - An NVIDIA CUDA-compatible GPU
                    - Install GPU latest driver
                    - Install latest Microsoft DirectX SDK
                    - Install NVIDIA CUDA 2.0 Toolkit
                    - Install NVIDIA CUDA 2.0 SDK
                            

   4.2.  MilInteropWithDX9

      This small example demonstrates how it is possible to apply custom
      DirectX 9 shaders on MIL buffers when needed, and how MIL handles
      everything else from buffer allocations to onscreen display.

      Note that exclusive access to the Direct3D device (also shared by the MIL
      GPU system) must be acquired before executing a custom DirectX effect.

      The second part of the example shows the same processing entirely done
      with MIL.

      Requirements: - Install GPU latest driver
                    - Install latest Microsoft DirectX SDK
                            
5. GPU boards

   5.1. GPU board limitations
   
      - Floating-point exceptions can occur in NVIDIA drivers for older cards
        (GeForce 8000 series), in DirectX 10.0 version of some functions on
        64-bit Vista. Note that these occurrences are only possible if
        floating-point exceptions, which are usually disabled by default, are
        enabled in your application.

      - Your graphics card driver might not start if the BIOS option mapping
        PCI resources over 4 GB (64-bit IO mapping) is enabled. Workarounds
        are updating your graphics card driver or disabling this BIOS option.
        Note that this limitation is not restricted to 64-bit operating
        systems.


-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 5: Differences between MIL 9.0 Update 3 and MIL 9.0

Table of Contents for Section 3

1. GPU acceleration restrictions
2. GPU guidelines for optimal performances
3. New GPU functionalities and improvements
   3.1.  Managing buffers using a GPU system
      3.1.1.  MbufCopy
      3.1.2.  MbufCopyClip
      3.1.3.  MbufCopyCond
      3.1.4.  MbufTransfer
   3.2.  New GPU accelerated image processing operations  
      3.2.1.  MgenLutFunction
      3.2.2.  MgenLutRamp
      3.2.3.  MgenWarpParameter
      3.2.4.  MimArith
      3.2.5.  MimConvolve
      3.2.6.  MimDistance
      3.2.7.  MimMorphic
      3.2.8.  MimPolarTransform
      3.2.9.  MimProject
      3.2.10. MimTransform
   3.3.  GPU system allocation
      3.3.1.  MsysAlloc
4. MilConfig GPU configuration tab
5. Effects compilation time
6. GPU lost device state recovery
7. GPU specific examples
   7.1.  MColorWarp
-------------------------------------------------------------------------------

1. GPU acceleration restrictions
   
   Several GPU acceleration restrictions have been removed as follows:
   
   - GPU system now support the following M_IMAGE buffer types: 
      - 1-band unsigned and signed M_MONO8
      - 1-band unsigned and signed M_MONO16
      - Packed unsigned and signed M_BGR32
      - 1-band floating-point
      
   - Source(s) and destination(s) buffers can now be of different
     type and depth but cannot mix number of bands (any mono -> any mono,
     any color -> any color), unless stated otherwise.
   
   - All the overscan modes are now supported for these neighborhood 
     operations:
     MimConvolve, MimMorphic, MimPolarTransform, MimResize, MimRotate,
     MimTranslate, MimWarp. However, M_DISABLE and M_ENABLE overscan modes can 
     be much slower than other modes.
                                
   - Refer to the function list in section 3 for specific type restrictions.

2. GPU guidelines for optimal performances

   The additionnal guidelines are recommanded for optimal performances:
   
   - Note that operations with signed buffers can be slightly less performant.

3. New GPU functionalities and improvements

   3.1.  Managing buffers using a GPU system
                    
      3.1.1. MbufCopy

         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-point
           buffers.

         - Supported copies :

              Source          ->   Destination
                any monochrome       any monochrome
                M_MONO8              M_BGR32 packed
                M_BGR32 packed       M_BGR32 packed
                M_BGR32 packed       M_MONO8

      3.1.2.  MbufCopyClip
      
         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-point
           buffers.
         - Supports all sizes and offsets. However, when copying monochrome 
           buffers, offsets and sizes multiple of 4 are copied faster.
           M_BGR32 packed buffers are fully optimized for all sizes and 
           offsets.

         - Supported copies :

              Source          ->   Destination
                any monochrome       any monochrome
                M_MONO8              M_BGR32 packed
                M_BGR32 packed       M_BGR32 packed
                M_BGR32 packed       M_MONO8

      3.1.3.  MbufCopyCond
      
         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-point
           buffers.

         - Supported copies :

              Source          ->   Destination             CondBuf
                any monochrome       any monochrome          any monochrome
                M_MONO8              M_BGR32 packed          M_MONO8
                M_BGR32 packed       M_BGR32 packed          M_BGR32 packed

      3.1.4. MbufTransfer

         - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-point
           buffers.

         - Supported transfers for M_COPY:

              Source          ->   Destination
                any monochrome       any monochrome
                M_MONO8              M_BGR32 packed
                M_BGR32 packed       M_BGR32 packed
                M_BGR32 packed       M_MONO8
                    
         - Supported transfers for M_COPY+M_SCALE: 

              Source          ->   Destination
                any monochrome       any monochrome
                M_BGR32 packed       M_BGR32 packed

         - Supported transfers for M_COMPOSITION:

              Source          ->   Destination
                M_MONO8              M_MONO8
                M_MONO8              M_MONO16
                M_MONO8              M_BGR32 packed
                M_BGR32 packed       M_BGR32 packed
                    

   3.2.  GPU accelerated image processing operations

      3.2.1.  MgenLutFunction

            - LUT will be generated faster when StartIndex and EndIndex are
              multiples of 4.
      
      3.2.2.  MgenLutRamp
      
            - LUT will be generated faster when StartIndex and EndIndex are
              multiples of 4.
      
      3.2.3.  MgenWarpParameter

            - In M_WARP_LUT mode, monochrome LUTs must have a width that is a
              multiple of 4.

      3.2.4.  MimArith
      
            - M_NEG and M_ABS operations are now supported.
      
      3.2.5.  MimConvolve
      
            - All kernel sizes are now supported.
      
      3.2.6.  MimDistance
      
            - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-
              point buffers.
            - Destination is always treated as an unsigned buffer.

      3.2.7.  MimMorphic
      
            - All structuring element sizes are now supported.
      
      3.2.8.  MimPolarTransform

            - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-
              point buffers.
            - Supports M_NEAREST_NEIGHBOR and M_BILINEAR interpolation modes.
            - Supports all overscan modes.

      3.2.9.  MimProject

            - GPU support for M_MONO8, M_MONO16, M_BGR32 packed and floating-
              point source buffers.
            - Result buffer must have been allocated with M_FLOAT option.
            - For a row projection (M_0_DEGREE), souce buffer must have a 
              number of lines that is a multiple of 4 if this number is smaller 
              than the number of locations in the result buffer.

      3.2.10. MimTransform

            - GPU support for M_MONO8, M_mono16 and floating-point buffers,
              following function restrictions on sizes and type combinations.
            - Only supports M_FFT transformation type.
            - Transformation always done in floating-point on GPU.
            - Does not support M_LOG_SCALE mode.                  

  
   3.3.  GPU system allocation and control

      3.3.1. MsysAlloc

            - M_PARTIAL is default value for InitFlag on GPU.
            - See section (4. MilConfig GPU configuration tab) for details
              on GPU system allocation options.

4. MilConfig GPU configuration tab

   Effects (GPU binary code for MIL functions) that are selected in this page
   means that they are available for MIL GPU compilation and execution.
   When deselecting one or more effects, MIL GPU will not compile them and
   functions corresponding to these effects would be compensated on Host.

   Compilation of an effect depends on the InitFlag parameter when the GPU
   system was allocated:

      - When InitFlag is M_PARTIAL, GPU binary code for a function is compiled
        the first time a function is called. System allocation is faster but
        there is a runtime penalty for each function's first call.
      - When InitFlag is M_COMPLETE, GPU binary code for all functions is
        compiled at system allocation. System allocation can be substantially
        longer but there will be no runtime penalty.
        WARNING : Using this option with all effects activated can cause
        graphics driver to become unstable.

   To know which effects are compiled and used by a MIL GPU application when
   using M_PARTIAL (or M_DEFAULT) option, add this MsysControl() call just
   before freeing your GPU system:

      MIL_ID GPUSystem;
      ...
      MsysAlloc(M_SYSTEM_GPU, M_DEFAULT, M_DEFAULT, &GPUSystem);
      ...
      MsysControl(GPUSystem, M_GPU_UPDATE_EFFECTS, M_WRITE);       <=====
      MsysFree(GPUSystem);

   It will automatically disable effects that were not used by this
   application. You can now allocate your system with the M_COMPLETE option
   that will compile only necessary effects at system allocation, and
   eliminate all runtime penalties when a function is called for the first
   time.
   NOTE : Disabled effects are not automatically re-enabled by another
          application that would need them!

5. Effects compilation time

   To run on any DirectX 9.0c compliant GPU, an effect (GPU binary code) must
   be compiled by the graphics card driver. For most effects, compilation
   time (which is done at system allocation or in the first MIL function call
   depending on MsysAlloc InitFlag parameter) will range from a few
   milliseconds to a second. However, larger and more complex effects can
   take several seconds to compile.

   This behaviour is perfectly normal and depends on many parameters
   including GPU brand and model, operating system, graphics driver version
   and Host general performance. Effects compilation time while debugging
   your application in Visual Studio can also increase.

6. GPU lost device state recovery

   MIL GPU processing module requires underlying Direct3D devices to be in
   operational state to work properly. Some events may cause these devices
   to transition to a lost device state. During the extent of a lost device
   state, MIL GPU module can no longer use graphics card acceleration.

   To recover gracefully, MIL GPU module waits for Direct3D devices to
   transition back to operational state before continuing. This will cause
   a thread doing GPU processing to block on the next call that follows a
   lost device state transition. It will resume operation when devices are
   restored.

   As mentionned by Microsoft (see MSDN Lost Devices reference in DirectX
   SDK documentation), the full set of scenarios that can cause a Direct3D
   device to become lost is not specified, but some of them are known:

      - locking a computer
      - when another application assumes full-screen operation (for example,
            a screen saver)

7. GPU specific examples

   7.1. MColorWarp

        This small example demonstrates how to efficiently use a GPU system
        to process images. Concepts to remember from this example are:

           - Supported buffer types are automatically allocated in video
             memory
           - Buffer widths are multiples of 4
           - Use synchronous timers (GPU system is asynchronous)
           - Avoid in-place processing
           - Avoid Mgra functions (would trigger transfers between video
             and host memory)
   

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Section 6: MIL 9.0 GPU (Graphics Processing Unit) accelerations 

Table of Contents for Section 4

1. GPU acceleration restrictions
2. GPU guidelines for optimal performances
3. New GPU functionalities and improvements
   3.1.  Managing buffers using a GPU system
      3.1.1.  MbufBayer
      3.1.2.  MbufClear
      3.1.3.  MbufCopy
      3.1.4.  MbufTransfer
   3.2.  GPU accelerated image processing operations  
      3.2.1.  MimArith
      3.2.2.  MimArithMultiple
      3.2.3.  MimBinarize
      3.2.4.  MimClip
      3.2.5.  MimConnectMap
      3.2.6.  MimConvert
      3.2.7.  MimConvolve
      3.2.8.  MimCountDifference
      3.2.9.  MimDilate
      3.2.10. MimEdgeDetect
      3.2.11. MimErode
      3.2.12. MimFlip
      3.2.13. MimLutMap
      3.2.14. MimMorphic
      3.2.15. MimRank
      3.2.16. MimResize
      3.2.17. MimRotate
      3.2.18. MimThick
      3.2.19. MimThin      
      3.2.20. MimTranslate
      3.2.21. MimWarp
      3.2.22. MregTransformImage
4. GPU boards
   4.1. GPU board limitations
-------------------------------------------------------------------------------

1. GPU acceleration restrictions

   - Requirement: DirectX 9.0c support (pixel and vertex shader models 3.0).

   - M_IMAGE buffer types supported on a GPU system are: 
      - 1-band unsigned M_MONO8
      - 1-band unsigned M_MONO16
      - Packed unsigned M_BGR32
                                                            
   - A buffer must be allocated in video memory to be processed on the GPU, 
     unless stated otherwise.
   - Source(s) and destination(s) buffers must be of same type 
     (M_MONO8->M_MONO8, ...), unless stated otherwise.
   - Maximum buffer sizes are determined by the capabilities of your graphics 
     controller(s), which vary between models. A simple rule of thumb: newer 
     graphics controllers tend to support larger buffers.
   - Buffer width (X size) must be a multiple of 4 for monochrome buffers.
   - Buffer cannot be allocated in video memory if any of these attributes or 
     control flags are requested: M_COMPRESS, M_GDI, M_MAPPABLE, M_FAST_MEMORY,
     M_SHARED, M_NON_PAGED, M_HOST_ADDRESS, M_MIL_ID, M_HOST_ADDRESS_REMOTE,
     M_PHYSICAL_ADDRESS, M_64BIT_PHYSICAL_ADDRESS.
   - M_GRAB attribute is not supported.
   - Unless stated otherwise, the X offset of a monochrome child buffer 
     relative to its ancestor must be a multiple of 4 and the width of the 
     child must be a multiple of 4.
   - Refer to the function list in section 3 for specific type restrictions.

2. GPU guidelines for optimal performances

   - Install driver updates for your graphics controller(s).
   - Specify M_VIDEO_MEMORY or M_ON_BOARD to force buffer allocation in video 
     memory. If onboard allocation is not specifically requested 
     (M_VIDEO_MEMORY, M_ON_BOARD or M_HOST_MEMORY are not specified), 
     allocation will be done onboard if possible.
   - Try to avoid mixing functions that will be executed on the GPU, and 
     functions that will be compensated on Host. This will minimize data 
     transfers between video memory and host memory.
   - The cost of doing in-place processing, including two child buffers sharing
     the same ancestor buffer, is very high. The entire ancestor buffer has to 
     be cloned. Therefore, try to avoid using the same ancestor buffer as a 
     source and destination in the same function call.

3. New GPU functionalities and improvements

   3.1.  Managing buffers using a GPU system

      3.1.1. MbufBayer

         - GPU support for unsigned M_MONO8 and M_MONO16 source buffers.
         - The destination buffer can be of same type as the source, or 
           M_BGR32 packed.
         - Minimum buffer size X and Y must be 2.
         - Buffer size X and Y must be a multiple of 2.
         - Supports default bilinear and M_AVERAGE_2X2 modes. 
         - Does not support M_ADAPTIVE mode.
         - Does not support white-balance coefficients calculation.

      3.1.2. MbufClear

         - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
           buffers.
         - Supports all sizes and offsets. However, when clearing monochrome 
           buffers (M_MONOXX), offsets and sizes multiple of 4 are cleared 
           faster. M_BGR32 packed buffers are fully optimized for all sizes 
           and offsets.


      3.1.3. MbufCopy

         - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
           buffers.
         - Supports all sizes and offsets. However, when copying monochrome 
           buffers (M_MONOXX), offsets and sizes multiple of 4 are copied 
           faster. M_BGR32 packed buffers are fully optimized for all sizes 
           and offsets.

         - Supported copies :

              Source          ->   Destination
              M_MONO8                M_MONO8
              M_MONO8                M_MONO16
              M_MONO8                M_BGR32 packed
              M_MONO16               M_MONO8
              M_MONO16               M_MONO16
              M_BGR32 packed         M_BGR32 packed

      3.1.4. MbufTransfer

         - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
           buffers.
         - Buffers with M_HOST_MEMORY attribute are supported.
         - Supports all sizes and offsets. However, when transferring 
           monochrome buffers (M_MONOXX), offsets and sizes multiple of 4 are 
           transferred faster. M_BGR32 packed buffer transfers are fully 
           optimized for all sizes and offsets.
         - Supported transfer functions are: M_CLEAR, M_COPY and M_COMPOSITION.
         - Supported transfer types are: M_DEFAULT and M_DIRECTX_MODE.
         - Source and destination buffers must be of same type for a 
           M_COMPOSITION transfer.

         - Supported transfers for M_COPY:

              Source          ->   Destination
              M_MONO8                M_MONO8
              M_MONO8                M_MONO16
              M_MONO8                M_BGR32 packed
              M_MONO16               M_MONO8
              M_MONO16               M_MONO16
              M_BGR32 packed         M_BGR32 packed
                    

   3.2.  GPU accelerated image processing operations  

      3.2.1. MimArith

            - M_ADD, M_ADD_CONST, M_SUB, M_SUB_CONST, M_CONST_SUB, M_SUB_ABS,
              M_MULT, M_MULT_CONST:
               . GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 
                 packed buffers.
               . Supports M_SATURATION.

            - M_DIV, M_DIV_CONST, M_CONST_DIV, M_MIN_CONST, M_MAX_CONST, M_MIN,
              M_MAX:
               . GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
                 buffers.
               . Does not support M_SATURATION.

            - M_AND, M_NAND, M_OR, M_XOR, M_NOR, M_XNOR, M_NOT, M_AND_CONST,
              M_NAND_CONST, M_OR_CONST, M_XOR_CONST, M_NOR_CONST, M_XNOR_CONST,
              M_NEG, M_ABS, M_xxx+M_FIXED_POINT:
               . Not supported.

      3.2.2. MimArithMultiple

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.

      3.2.3. MimBinarize

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.
            - Does not support histogram modes (CondLow and CondHigh must not 
              be M_DEFAULT)
            - Does not support a DestImageBufId set to M_NULL.

      3.2.4. MimClip

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.

      3.2.5.  MimConnectMap:

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.

      3.2.6. MimConvert:

            - M_RGB_TO_H, M_RGB_TO_L:
               . GPU support for M_BGR32 packed to unsigned M_MONO8 or M_MONO16 
                 buffers.

            -M_RGB_TO_Y:
               . GPU support for M_BGR32 packed to unsigned M_MONO8 or M_MONO16 
                 buffers.
               . GPU support for M_BGR32 packed to first band of M_BGR32 packed 
                 buffers.

            - M_L_TO_RGB:
               . GPU support for unsigned M_MONO8 or M_MONO16 to M_BGR32 packed 
                 buffers.

            - M_RGB_TO_HLS, M_HLS_TO_RGB:
               . GPU support for M_BGR32 packed to M_BGR32 packed buffers.

            - User-supplied matrices:
               . GPU support for M_BGR32 packed to unsigned M_MONO8 or M_MONO16 
                 for 3x1 matrices.
               . GPU support for M_BGR32 packed to M_BGR32 packed for 
                 3x3 matrices.

            - Other conversions:
               . Not supported.

      3.2.7. MimConvolve:

            - Custom Kernel:
               . Supports kernel sizes: 2x2, 3x3, 5x5, 7x7, 9x9, 11x11, 16x16.
                 To process other sizes (smaller than 16x16), allocate a 
                 supported kernel and pad with zeroes.
               . Separable 9x9 and larger kernels will be processed faster than
                 non-separable kernels.
               . Symmetric kernels (horizontally, vertically, or both) will be
                 processed faster than non-symmetric kernels.
               . GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
                 buffers.
               . M_DEFAULT, M_DISABLE and M_FAST overscans are fully supported.
               . M_REPLACE and M_MIRROR overscans are supported for source 
                 buffers with power-of-two dimensions.
               . Supported filter modes are M_DEFAULT and M_KERNEL.
               . Supported filter types are M_DEFAULT and M_USER_DEFINED.

            - M_EDGE_DETECT, M_EDGE_DETECT2, M_VERT_EDGE, M_HORIZ_EDGE, 
              M_SMOOTH, M_SHARPEN, M_SHARPEN2, M_LAPLACIAN_EDGE, 
              M_LAPLACIAN_EDGE2:
               . GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
                 buffers.
               . M_OVERSCAN_DISABLE and M_OVERSCAN_FAST combination flags are
                 supported.

            - M_DERICHE_FILTER, M_SHEN_FILTER:
               . Not supported.

      3.2.8. MimCountDifference:

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.

      3.2.9. MimDilate

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.

      3.2.10. MimEdgeDetect

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.
            - M_DISABLE and M_FAST overscans are fully supported.

      3.2.11. MimErode

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.

      3.2.12. MimFlip

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.

      3.2.13. MimLutMap

            - GPU support for unsigned M_MONO8, M_MONO16 buffers and LUTs.

      3.2.14. MimMorphic

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.
            - Supports these kernel sizes: 3x3, 5x5, 7x7, 9x9, 11x11.
            - M_DEFAULT and M_FAST overscans are fully supported.

            - M_DILATE, M_ERODE:
               . Supports both M_BINARY and M_GRAYSCALE modes.

            - M_THICK, M_THIN, M_MATCH, M_HIT_OR_MISS:
               . Supports only M_GRAYSCALE mode.

            - M_AREA_CLOSE, M_AREA_OPEN:
               . Not supported.

      3.2.15. MimRank

            - Unsigned M_MONO8 and M_MONO16 buffers.
            - Supports predefined structural elements only: M_3X3_CROSS, 
              M_3X3_RECT, M_5X5_RECT.
            - Supports M_OVERSCAN_FAST overscan only.

      3.2.16. MimResize

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.
            - Supports M_NEAREST_NEIGHBOR and M_BILINEAR interpolation modes.
            - M_OVERSCAN_FAST is fully supported.
            - M_OVERSCAN_DISABLE is supported for M_BGR32 packed buffers.

      3.2.17. MimRotate

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.
            - Supports M_NEAREST_NEIGHBOR and M_BILINEAR interpolation modes.
            - M_OVERSCAN_FAST is fully supported.
            - M_OVERSCAN_DISABLE is supported for M_BGR32 packed buffers.

      3.2.18. MimThick

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.
            - Supports M_GRAYSCALE mode.

      3.2.19. MimThin

            - GPU support for unsigned M_MONO8 and M_MONO16 buffers.
            - Supports M_GRAYSCALE mode.

      3.2.20. MimTranslate

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.
            - Supports M_BILINEAR interpolation mode.
            - M_OVERSCAN_FAST is fully supported.
            - M_OVERSCAN_DISABLE is supported for M_BGR32 packed buffers.
            
      3.2.21. MimWarp

            - GPU support for unsigned M_MONO8, M_MONO16 and M_BGR32 packed 
              buffers.
            - Supports M_NEAREST_NEIGHBOR and M_BILINEAR interpolation modes.
            - M_OVERSCAN_FAST is fully supported.
            - M_OVERSCAN_DISABLE is supported for M_BGR32 packed buffers.
            - LUT buffers support signed 16-bit integers only.

      3.2.22. MregTransformImage

            - Supports M_NEAREST_NEIGHBOR and M_BILINEAR interpolation modes.
            - M_OVERSCAN_FAST is fully supported.
            - M_OVERSCAN_DISABLE is supported for M_BGR32 packed buffers.
            - Supports M_FIRST_IMAGE and M_LAST_IMAGE as 
              M_MOSAIC_COMPOSITION.

4. GPU boards

   4.1. GPU board limitations

      - Note that the accuracy may change from one board/driver to another.
        For example, computation precision with 16-bit buffers may be less 
        accurate with GeForce 7000 series and older GPUs from NVIDIA.

      - Note that the performance may change from one board/driver to another.
        For example, due to hardware/driver limitations with ATI/AMD GPUs, the
        MimCountDifference function is not fully accelerated.