Template and Inheritance
As Meyers noted in Item 24 of Effective C++,the inability to inline a virtual function is its biggest performace penalty.
Virtual functions seems to inflict a performace cost in several ways:
[1] The vptr must be initialized in the constructor.
[2] A virtual function is invoked via pointer indirection. We must fetch the pointer to the function table and then access the correct function offset.
[3] Inlining is a compile-time decision. The compiler cannot inline virtual functions whose resolution takes place at run-time.
The true cost of virtual functions then boils down to the third item only.
-------------------------------------------------------------------------------------------
Virtual function calls that can be resolved only at rum-time will inhibit inling. At times, that may pose a performace problem that we must solve. Dynamic binding of?a function call is a consequence of inheritance. One way to eliminate dyanamic binding is to replace inheritance with a template-based design. Templates are more performance-friendly in the sense that they push the resolution step from run-time to compile-time. Compile-time, as far as we are concerned, is free.
The desing space for inheritance and templates has some overlap. We will discuss one such example.
Suppose you wanted to develop a thread-safe string class that may be manipulated safely by concurrent threads in a Win32 environment. In that environment you have a choice of multiple synchronization schemes such ascriticalsection, mutex, and semanphores, just to name a few. You would like your thread-safe string to offer the flexibility to use any of those schemes, and at different times you may have a reason to prefer one scheme over another. Inheritance would be a reasonable choice to capture the commonality among synchronization mechanisms.
The Locker abstract base class will declare the common interface:

?2

?3

?4

?5

?6

?7

?8

?9

10

11

12


13

14

15

16


17

[1] Hard coding. You could derive three distinct classes from string::CriticalSectionString, MutexString, and SemaphoreString, each class implementing its implied synchronization mechanism.
[2] Inheritance. You could derive a single ThreadSafeString class that contains a pointer to a Locker object. Use polynorphism to select the particular synchronization mechanism at run-time.
[3] Templates. Create a template-based string class parameterized by the Locker type.
////////////////////////////////////////////////////////////////////////////////////////////
Here we only talk about the Template implementation.
The templates-based design combines the best of both worlds-reuse and efficiency. The ThreadSafeString is implemented as a?template parameterized by the Locker template argument:

?2

?3

?4

?5

?6

?7


?8

?9

10

11

12


?2

?3

?4



?5

?6

?7

?8

?9

10

{
?? ThreadSafeString<CriticalSectionLock> csString = "Hello";
?? ...
}
or you may go with a mutex:
{
?? ThreadSafeString<MutexLock> mtxString = "Hello";
?? ...
}
This design also provides a relief from the virtual function calls to lock() and unlock(). The declaration of a ThreadSafeString selects a particular type of synchronization upon template instantiation time. Just like hard coding, this enables the compiler to resolve the virtual calls and inline them.
As you?can see, templates can make a positive performace contribution by pushing computations out of the excution-time and into compile-time, enabling inling in the process.
posted on 2006-11-13 13:37 Zero Lee 閱讀(284) 評論(0) 編輯 收藏 引用 所屬分類: C++ Performance