Re: C++ RTTI/dynamic_cast across shared module boundaries?
Re: C++ RTTI/dynamic_cast across shared module boundaries?
- Subject: Re: C++ RTTI/dynamic_cast across shared module boundaries?
- From: Zachary Pincus <email@hidden>
- Date: Sat, 4 Mar 2006 23:23:50 -0800
Hi folks,
I'm still trying to resolve my problem with typeinfo object
resolution across DSO boundaries. I thought that I had fixed it, but
now it's broken again. Worse, I still can't reproduce the problem
with simpler test cases.
To recap, I have some shared objects (python extensions) which all
use a particular template class. For dynamic_casting to work, the
typeinfo symbols need to be exported from the modules and loaded
globally by dyld for this to work. Somehow, I can't get this working.
My newest theory is that different libraries are seeing different
definitions of the template class due to some macro issues (as
suggested by Howard below). Is there any way to tell by looking at
the symbols that nm -m produces whether two symbols will resolve to
the same thing at load time?
e.g.:
zpincus% nm -m _ITKBasicFiltersAPython.so | c++filt | grep_for_stuff
00ba2280 (__DATA,__const_coal) weak external typeinfo for
itk::Image<float, 2u>
zpincus% nm -m _ITKCommonAPython.so | c++filt | grep grep_for_stuff
005b3b60 (__DATA,__const_coal) weak external typeinfo for
itk::Image<float, 2u>
Is there any way I can tell whether the loader will resolve these two
symbols (or others) as the same (if RTLD_GLOBAL is passed to dlopen),
or if they will be resolved differently because one library saw a
slightly different definition of the class than the other?
Thanks everyone for the help,
Zach Pincus
On Feb 18, 2006, at 9:41 AM, Howard Hinnant wrote:
Templates (like the type_info in your example) should be implicitly
declared weak, meaning they will be unique'd across DSO
boundaries. So as long as the template class has the same
definition, you should be ok with respect to the ODR. If different
DSO's saw different definitions of your templates (say via
different macro flags or whatever), only then would you run afoul
of the ODR.
Know that we (Apple) are concerned about your problem and no need
to apologize for bringing this up on the Xcode list, no matter
where the problem ultimately lies.
-Howard
On Feb 18, 2006, at 12:14 PM, Zachary Pincus wrote:
Steve,
Thanks for your discussion of this problem.
One question: How does the One Definition Rule interact with
templated classes? The issue that I'm running into is that the
master "image" class that the image filters in my different
modules all need to interact with is a template. So any module
that needs to create a new image will implicitly implement that
image class. I just can't see any way to *not* violate the One
Definition Rule when you need to share templated classes across
DSO boundaries.
Is this correct? I ask because if there is no good way to not
violate the One Definition Rule with templated classes, that seems
like a good argument for why the current GCC RTTI implementation
is wrong.
Technically, I guess that I could ensure that no module ever
actually constructs any instances of templated classes. Instead I
could have an object factory, defined only in one place, that
handles the construction. Though the One Definition Rule would
still be violated in this case, it wouldn't matter, because
everything would get the same typeinfo object. This is
exceptionally nasty though, and would definitely void out any
performance increases due to not having to do string comparisons
in dynamic_cast operations (etc).
In reference to my specific problem, I'll try to verify whether
python on OS X is (a) using dlopen() to load the modules (I rather
think that it is, but never hurts to check) and (b) that the
dlopen flags are getting set right. Maybe the best approach will
be to write a little module that does run dlopen() with the right
flags. If loading that in python before loading the rest of the
modules fixes things, then it's a python problem and I'll have to
apologize for bugging everyone on the XCode list!
Zach
On Feb 18, 2006, at 10:44 AM, Steve Baxter wrote:
Hi Zach,
The big problem here is the way that GCC implements RTTI - it is
different to pretty much every other implementation. The GCC
runtime compares type_info by pointer rather than by name().
This means that a class implemented in two different DSOs
(dylibs) will not be considered the same by the RTTI in GCC, but
will be considered the same by the RTTI on almost every other
platform.
Personally I feel this is a mistake on the part of the GCC
designers. Having the same class implemented in two different
libs is technically a bug in your application, it violates the
One Definition Rule:
http://en.wikipedia.org/wiki/One_Definition_Rule
However, in practice the one definition rule is very difficult
(and sometimes impossible if you are a third-party plugin being
loaded by an app over which you have no control) to get right.
GCC requires it to be true or RTTI will fail. VC++ and
Codewarrior do not require this to be true.
The first thing I would do is file a bug on radar against the GCC
RTTI implementation. If Apple compiled libsubc++ with
__GXX_MERGED_TYPEINFO_NAMES=0, type_info would be compared by name
() not address and all your problems will just go away without
any more work. I did file a bug and got it returned as "behaves
correctly". If lots of people file a bug against this we can
maybe change Apple's mind (or at least get them to provide an
alternative version of libsubc++ that does have this option
switched on). See type_info for information about this compile
switch. My bug was 4424486 - please feel free to reference it.
Failing this, you can work around this problem. Here are the
requirements:
(1) You must export all the symbols in your dylibs. This will
prevent dead code stripping from working and increase the size of
your plugins, but disk space is cheap right (right, but internet
bandwidth is not).
(2) You must load the dylibs by calling dlopen() with
RTLD_GLOBAL. You may also need to pass RTLD_NOW (the
documentation says otherwise, but I have a feeling I couldn't
make it work without this).
I found that CFBundle does not pass RTLD_GLOBAL to dlopen() - if
you are using C++ and RTTI, you cannot use CFBundle to open your
plugins (or rather you can, but you need to use dlopen() as well
before you call CFBundleLoadExecutable()).
I have to say though that you *seem* to be jumping through all
the hoops correctly. Are you sure that Python is definitely
passing your flags on to dlopen()?
I wrote a much longer post about all of this a couple of weeks ago:
http://lists.apple.com/archives/xcode-users/2006/Feb/msg00234.html
Cheers,
Steve.
On 18 Feb 2006, at 12:52, Zachary Pincus wrote:
I've not tried this on OS X (my problems were on Linux, IRIX,
and Solaris). However, I assume (maybe incorrectly) the problem
exists in all GCC implementations. I also have not tried this
in GCC 4, we were using GCC 3. Maybe the problem identified in
the FAQ was fixed in GCC 4? If so, that would be fantastic.
The problem we had is that we were using dynamic_cast as a
method for implementing a plug-in architecture. The
dynamic_cast was used to access data types provided by each
plug-in. The problem we ran into was that some of our plug-ins
were also libraries that others linked to. We had tons of
duplicate symbol errors when we exported all symbols.
This is definitely the same species of difficulty that I'm
having: dynamic_cast used for data types across plugins. I'm
sure there's some little OS X-specific twist with how gcc works
that I'm just not understanding. Arrg.
Zach
On Feb 17, 2006, at 9:33 PM, Zachary Pincus wrote:
Michael,
Thanks for this information! That's exactly what I was looking
for.
I assume that you're saying linking with "-Wl,-E" (as
specified on the web page you referred) isn't a good solution
because it exports all global symbols. Our of curiosity, what
about exporting all the global symbols is bad? Just that it
increases the potential for symbol-name collisions?
Zach
On Feb 17, 2006, at 6:55 PM, Michael Rice wrote:
It sounds like you are running into the C++ ABI described in
the GCC FAQ (http://gcc.gnu.org/faq.html#dso). I ran into
this problem long ago and have still to find a good, generic
solution for this problem (i.e., not having to export every
symbol in the library). My best solution so far has been to
implement my own, less efficient, RTTI system.
On Feb 17, 2006, at 8:43 AM, Zachary Pincus wrote:
Thanks Howard.
In the Code Generation build settings of all targets,
uncheck "Symbols Hidden by Default".
Right now, I'm not actually using XCode (part of my
debugging was to remove XCode from the mix and do all of the
building and linking directly on the command line, so I
could easily fix problem flags). There are absolutely no '-
fvisibility=hidden' flags on the link or compile command
lines I have been using, so I don't think symbols are being
hidden. (Given that the man page for g++ says that the
default is for public visibility.)
Is there any way I verify this with, say, otool?
Also, a correction: telling Python to load with *either*
dyld flags of RTLD_LAZY|RTLD_GLOBAL *or* RTLD_NOW|
RTLD_GLOBAL doesn't help.
Zach
On Feb 17, 2006, at 8:24 AM, Howard Hinnant wrote:
On Feb 17, 2006, at 9:01 AM, Zachary Pincus wrote:
Hi folks,
I've been trying for a while to get c++ RTTI and dynamic
casting to work across the boundaries of several "bundle"
shared modules. I've spent a day looking at man pages and
online, to no avail.
In my case, instances of particular classes can be created
in various modules, but need to work (and dynamically cast
properly) when passed to other modules. (Before you ask:
it's an image processing library, where different image
filter types are defined in different modules, but they
all need to be able to send and receive the same image
types.)
I've linked the modules as follows:
/usr/bin/c++ -bundle -o [output].so [object files] -L[link
paths] -l[link libs]
Now, how do I need to set up my environment to get RTTI
and dynamic_cast working across several such modules?
Right now, the module loader is Python, which I think uses
dlopen to load the modules. I've set the dlopen flags (in
python, sys.setdlopenflags()) to 0x9, which is RTLD_LAZY|
RTLD_GLOBAL (as they are defined in /usr/include/dlfcn.h),
but that really doesn't help. (Other permutations on the
dlopen flags don't help.)
Is there anything else I need to do? Is there anything
else I can try? Is this a hopeless project?
In the Code Generation build settings of all targets,
uncheck "Symbols Hidden by Default".
-Howard
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40stanford.edu
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
email@hidden
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40improvision.com
This email sent to email@hidden
Steve Baxter
Software Development Manager
Improvision
+44-2476-692229
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40apple.com
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40stanford.edu
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden