Discussion:
How are C++ objects laid out in memory ?
(too old to reply)
WahJava
2005-05-25 17:23:58 UTC
Permalink
Hi hackers,

I'm investigating on how C++ objects can be accessed and invoked by
the external code (e.g. a C code, or a assembly language routine, or
some other language routines). I'm using "Microsoft 32-bit C/C++
Optimizing Compiler v. 13.10.3052". How C++ class is actually
laid out in memory ?

My half correct guess is representation as a structure is
represented. e.g.

class Msg
{
char* msg;
public:
Msg(const char*);
void print();
~Msg();
};

might be represented in C as:

struct MsgStruct
{
char* msg;

void (*construct)(struct MsgStruct*, const char*);
void (*print)(struct MsgStruct*);
void (*destruct)(struct MsgStruct*);
};

But the function pointers declared in above MsgStruct structure have to
be invoked using "thiscall" calling convention (documented in MSDN,
where "this" pointer is passed in ECX register), and "thiscall"
convention can't be explicitly. So a tweak will be needed as below:

/* Invoke method on Msg object not MsgStruct stucture */

void invoke_print_method(void* p)
{
Msg* m = (Msg*)p; /* Cast a Msg object from parameter */
void (Msg::*fn)() = &Msg::print;
unsigned** px = (unsigned**)(&fn);

__asm {
lea eax, [fn] ; Get the value of pointer, i.e. address of print()
mov ecx, [m] ; Now, set this pointer
call [eax] ; Invoke the function, since EAX contains address
; of print() method
}
}

But some of my thoughts contradicts what I've actually derived
above. That's why I've not used If we've to represent C++ member
methods as the function
pointers in C structure, then this means we've to duplicate function
pointers for each object which also leads to memory wastage. And this
means, size of C++ object is increased. But size of C++ object remains
4 bytes, whereas size of structure instance is 16 bytes (4 bytes data,
12 bytes for 3 function pointers).

Suppose I want to expose a C++ object to some C code, although that C
code can cast my C++ object to a pointer and can change its data, but
what about member methods. And is there any standard that controls
this behavior ? Or every compiler does in its own way ? Then how
member methods can be invoked ? Is there any table of function
pointers which I can locate and then invoke the function pointers ?

And by the way, how COM does it ?

Thanx in advance,

Ashish Shukla alias Wah Java !!
Wah Java !!

-----------------------------------

tsorF treboR - peels I erofeb og ot seliM
Matt
2005-05-26 03:10:00 UTC
Permalink
Post by WahJava
Hi hackers,
I'm investigating on how C++ objects can be accessed and invoked by
the external code (e.g. a C code, or a assembly language routine, or
some other language routines). I'm using "Microsoft 32-bit C/C++
Optimizing Compiler v. 13.10.3052". How C++ class is actually
laid out in memory ?
This is a very complicated question since the structure is affected by all
of the following idioms (and probably others which I have forgotten to
enumerate):
* Virtual functions
* Inheritance of virtual functions
* Multiple inheritance
* RTTI
Post by WahJava
My half correct guess is representation as a structure is
represented. e.g.
class Msg
{
char* msg;
Msg(const char*);
void print();
~Msg();
};
struct MsgStruct
{
char* msg;
void (*construct)(struct MsgStruct*, const char*);
void (*print)(struct MsgStruct*);
void (*destruct)(struct MsgStruct*);
};
No. Functions that are not declared "virtual" are statically bound. Look at
this example:

class A
{
public:
void foo()
{
printf("I'm in A\n");
}
};

class B : public A
{
public:
void foo()
{
printf("I'm in B\n");
}
};

void strange(A *a)
{
a->foo();
}

int main(void)
{
B b;
strange(&b);
}

Surprised? It's one of the oddities of C++. Add virtual to both declarations
of "foo" and see how the results change.

Also, assuming you have this class:
class A
{
int a;
virtual void foo();
int b;
};

You *should* under MSVC get this structure:
struct A_VTABLE
{
void (__thiscall *foo)();
};

struct A_STRUCT
{
A_VTABLE *vtable;
int a;
int b;
};

This is what I have observed, at any rate. I don't know if the structure
layout is optimized for certain cases. In the above example, it would make
more sense to include foo directly in A_STRUCT. With RTTI, the vtable
pointer is adjusted so that it looks like this:

struct A_VTABLE
{
void (__thiscall *foo)();
};

struct A_VTABLE_RTTI
{
void *rtti_data;
A_VTABLE vtable;
};

The pointer stored in A_STRUCT remains a pointer to A_VTABLE, not
A_VTABLE_RTTI.
Post by WahJava
But the function pointers declared in above MsgStruct structure have
to be invoked using "thiscall" calling convention (documented in MSDN,
where "this" pointer is passed in ECX register), and "thiscall"
[...]

This is a bit pedantic. It's almost the same as using __fastcall except that
it is custom-tailored to C++.
Post by WahJava
But some of my thoughts contradicts what I've actually derived
above. That's why I've not used If we've to represent C++ member
methods as the function
pointers in C structure, then this means we've to duplicate function
pointers for each object which also leads to memory wastage. And this
means, size of C++ object is increased. But size of C++ object remains
4 bytes, whereas size of structure instance is 16 bytes (4 bytes data,
12 bytes for 3 function pointers).
No. See my comments above. There is one copy of the vtable per type, and it
is statically initialized. This is more efficient with frequent object
creation, but it is less efficient with frequent dynamically-bound (virtual)
method calls.
Post by WahJava
Suppose I want to expose a C++ object to some C code, although that C
code can cast my C++ object to a pointer and can change its data, but
what about member methods. And is there any standard that controls
this behavior ? Or every compiler does in its own way ? Then how
member methods can be invoked ? Is there any table of function
pointers which I can locate and then invoke the function pointers ?
[...]

This is a question for the C++ newsgroup. I don't know what the standard
does or does not require, but I would *presume* that the limitations on
class layout are lax enough that you can't rely on a specific
representation.

There are 2 portable ways I know of to expose C++ objects to C code:
1) Write your code in C (that is, without classes, not *necessarily* in C
proper) and create C++ wrapper classes for it
2) Write your code in C++ and create C wrapper functions

As to C programs being able to cast your object pointer and access data
directly, that is true, but C++ programs can do it just as easily. Also,
this sort of thinking is a bit futile since nothing would stop me from doing
this:
void russian_roulette(void)
{
char *p = (char *)(size_t) rand();
(*p)++;
}

At some point you have to accept that, if the user wants to do something
stupid, you have no way to stop them. It is good to insulate the user as
best you can, but going to the extreme is pointless.

-Matt
Tim Roberts
2005-05-27 04:15:43 UTC
Permalink
Post by WahJava
I'm investigating on how C++ objects can be accessed and invoked by
the external code (e.g. a C code, or a assembly language routine, or
some other language routines). I'm using "Microsoft 32-bit C/C++
Optimizing Compiler v. 13.10.3052". How C++ class is actually
laid out in memory ?
Matt already gave you very good answers to most of these. I'm only going
to add a few additional comments.

If a C++ class or struct contains no virtual methods, its layout is exactly
identical to the same struct written in C.
Post by WahJava
My half correct guess is representation as a structure is
represented. e.g.
class Msg
{
char* msg;
Msg(const char*);
void print();
~Msg();
};
struct MsgStruct
{
char* msg;
void (*construct)(struct MsgStruct*, const char*);
void (*print)(struct MsgStruct*);
void (*destruct)(struct MsgStruct*);
};
Nope. The data structure will be:

struct MsgStruct
{
char msg;
}

And there will be public functions created called:

void Msg::Msg( const char * );
void Msg::print( );
void Msg::~Msg( );

There are no function pointers. Calls to xxx->print() can all be linked
statically.
Post by WahJava
Suppose I want to expose a C++ object to some C code, although that C
code can cast my C++ object to a pointer and can change its data, but
what about member methods. And is there any standard that controls
this behavior ?
Yes and no. The ISO C++ Standard requires certain behavior that make a
particular layout most natural, and most compilers do it the same way, but
a compiler can do it whatever way it wants, as long as it works the same.
Post by WahJava
Is there any table of function
pointers which I can locate and then invoke the function pointers ?
If the class has virtual methods, then the first dword of the struct is a
pointer to a table of function pointers. For non-virtual methods, the C
code can just use the (decorated) name.
Post by WahJava
And by the way, how COM does it ?
Remember that a COM interface has no members, and all of its methods are
virtual. That means a COM interface object consists of exactly 4 bytes,
which contains a pointer to a function table.

Do you have access to Visual C++? Go look at any include file generated by
the IDL compiler. OBJIDL.H is one example. It contains both C++ and C
code to access the methods of its COM objects. That will show you how it
is done.
--
- Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Jonathan Bartlett
2005-05-27 15:27:43 UTC
Permalink
This may interest you:

http://www.codesourcery.com/cxx-abi/

Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017
pacman128@gmail.com
2005-05-28 18:57:00 UTC
Permalink
You might be interested in my online assembly tutorial. It has a
chapter on C++.

http://www.drpaulcarter.com/pcasm

--
Paul Carter

Loading...