Lambda function vpat pixel access in Release mode

Hi there,
I tried to make a lambda function for my pixel access. But there is something strange happening…
In debug mode everything works fine, while in release mode there is an access violation:

void ForEachPixel(IMG img, std::function<void(intptr_t pBase, int x, int y, int z)> func)
  for (int z = 0; z < ImageDimension(img); ++z)
    intptr_t pBase = 0;
    PVPAT vpat = nullptr;
    assert(GetImageVPA(img, z, reinterpret_cast<void**>(&pBase), &vpat));
    for (int x = 0; x < ImageWidth(img); ++x)
      for (int y = 0; y < ImageHeight(img); ++y)
        func(pBase + vpat[x].XEntry + vpat[y].YEntry, x, y, z);
  IMG img1 = NULL;
  Cvb::DataType type1 = Cvb::DataType::Int12BppUnsigned();
  cvbbool_t success1 = CreateGenericImageDT(150, 512, 512, type1.NativeDescriptor(), img1);
  // And this is where everything just goes down
  ForEachPixel(img1, [](intptr_t pBase, int x, int y, int z) 
     *(reinterpret_cast<cvbuint16_t*>(pBase)) = 21; 

Btw. I’m using VS14 Win10 64bit with CVB 13.1.000

Hi @richard_loewenherz,

I recommend a close look at your code, specifically the line assert(GetImageVPA(...)) says:

This macro is disabled if, at the moment of including <assert.h>, a macro with the name NDEBUG has already been defined.

In a Visual Studio Release build the name NDEBUG is defined by default. Therefore the entire line starting with assert is dropped from the code in Release (but not in Debug) build and your nested loops are working on nullptrs…

On a different note: I have a few recommendations that are likely to improve the performance of your code:

  1. I’d strongly recommend not to query an image’s width, height and dimension on every cycle of the loops but instead query once before the loops and then use the cached values as upper limits for your loops.
  2. I’d recommend to have the outer loop run over the y coordinates and the inner loop over the x coordinates: Unless you have an unusual memory layout of your data, neighboring pixels are more likely to be already available in the CPU’s data cache than those from the next line which is why iterating line by line is often more cache-efficient.
  3. Rather than calculate every pixel’s address inside the loop, you can spare your CPU are few calculations:
    for (int z = 0; z < imgDim; ++z)
      intptr_t pBase = 0;
      PVPAT vpat = nullptr;
      GetImageVPA(img, z, reinterpret_cast<void**>(&pBase), &vpat));
      for (int y = 0; y < imgHeight; ++y)
        intptr_t lineBase = pBase + vpat[y].YEntry;
        for (int x = 0; x < imgWidth; ++x)
          func(lineBase + vpat[x].XEntry, x, y, z);
  1. If linear access is possible (see GetLinearAccess from the CVCUtilities.dll) then writing a code path for linear arrangement usually gives about a factor of 1.5 to 2 in speed over a well optimized VPAT loop like the one above when iterating the entire image (as you are doing).