just do IT: use memcmp to compare objects?

Friday, December 11, 2009

use memcmp to compare objects?

I came across a question regarding c++, is it more efficient to use memcmp to determine equality of two objects of the same type.
This is not a question regarding efficiency at all, it's about correctness. Using memcmp to compare two objects MAY be correct sometimes, but it really depends on several factors:

Class alignment
Compiler implementation
Compiler configuration

You may get different results when work with different types, with different compilers. Even worse, you may get different results between two invocations in the same environment. Though it seems to be efficient, it's not reliable.

We know many compiler will align a class's members to word size for better performance, because it's harder to read or write memories at arbitrary location. So possibly, there are gaps (unused memories) between fields.

Those gaps are occupied by objects of the class, but are note directly managed through objects. The contents of these gaps are undefined. They may be what's left over since their last usage. Or they might be cleared/filled by a diligent compiler.
When you use memcmp to compare two objects, these gaps which has random bits are also taken into considertion. But this is undesired behavior and leads to uncertainty.

So, never do this unless you're 100% sure about the memory layout, compiler behavior, and you really don't care portability, and you really want to gain the efficiency.

The demo below shall show using memcmp doesn't work correctly with microsoft's c++ compiler v15.00.30729.01 and gcc v4.4.1.


#include "string.h"
// ==================================
//        Class:  Foo
//  Description:
// ==================================
class Foo
{
  public:
      Foo (): a(0), b(0), c(0){          
      };   // constructor
      int a;
      char b;
      int c;

}; // -----  end of class Foo  -----

void shuffle_stack()
{
  Foo f1;
  Foo f2;

  *((int*)(&f1.b)) = 0x87654321;
  *((int*)(&f2.b)) = 0x12345678;
}

int compare()
{
  Foo f1;
  Foo f2;

  return memcmp((void*)(&f1), (void*)(&f2), sizeof(Foo));
}

int main ( int argc, char *argv[] )
{
  int rc = 0;
  shuffle_stack();
  rc = compare();
  return 0;
} // ----------  end of function main  ----------

2 comments:

Anonymous said...: Thanks, good to know this pitfall.
Does this happen only when the object is on stack?; May 14, 2010 at 8:56 AM
Unknown said...: No, it doesn't only happen on stack.
I used stack because it's easier to domonstrate than using heap.

The thing is tricker for objects on heap. For example, in microsoft debug crt, the heap manager will initialize the allocated memory to 0xCD, but release crt won't. So, it's possible our application may behave differently under debug build and release build. It will be an extremely difficult bug to debug.

The essential idea is we'd better not use such tricks. As Herb Sutter proposed in C++ Coding Standards, Don't optimize prematurely.; May 14, 2010 at 6:44 PM

just do IT

Friday, December 11, 2009

use memcmp to compare objects?

2 comments:

Blog Archive

Labels

Search This Blog

Links