Is the “this” pointer just a compile time thing?
up vote
28
down vote
favorite
I asked myself whether the this
pointer could be overused since I usually use it every single time I refer to a member variable or function. I wondered if it could have performance impact since there must be a pointer which needs to be dereferenced every time. So I wrote some test code
struct A {
int x;
A(int X) {
x = X; /* And a second time with this->x = X; */
}
};
int main() {
A a(8);
return 0;
}
and surprisingly even with -O0
they output the exact same assembler code.
Also if I use a member function and call it in another member function it shows the same behavior. So is the this
pointer just a compile time thing and not an actual pointer? Or are there cases where this
is actually translated and dereferenced? I use GCC 4.4.3 btw.
c++ gcc this this-pointer
add a comment |
up vote
28
down vote
favorite
I asked myself whether the this
pointer could be overused since I usually use it every single time I refer to a member variable or function. I wondered if it could have performance impact since there must be a pointer which needs to be dereferenced every time. So I wrote some test code
struct A {
int x;
A(int X) {
x = X; /* And a second time with this->x = X; */
}
};
int main() {
A a(8);
return 0;
}
and surprisingly even with -O0
they output the exact same assembler code.
Also if I use a member function and call it in another member function it shows the same behavior. So is the this
pointer just a compile time thing and not an actual pointer? Or are there cases where this
is actually translated and dereferenced? I use GCC 4.4.3 btw.
c++ gcc this this-pointer
6
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
up vote
28
down vote
favorite
up vote
28
down vote
favorite
I asked myself whether the this
pointer could be overused since I usually use it every single time I refer to a member variable or function. I wondered if it could have performance impact since there must be a pointer which needs to be dereferenced every time. So I wrote some test code
struct A {
int x;
A(int X) {
x = X; /* And a second time with this->x = X; */
}
};
int main() {
A a(8);
return 0;
}
and surprisingly even with -O0
they output the exact same assembler code.
Also if I use a member function and call it in another member function it shows the same behavior. So is the this
pointer just a compile time thing and not an actual pointer? Or are there cases where this
is actually translated and dereferenced? I use GCC 4.4.3 btw.
c++ gcc this this-pointer
I asked myself whether the this
pointer could be overused since I usually use it every single time I refer to a member variable or function. I wondered if it could have performance impact since there must be a pointer which needs to be dereferenced every time. So I wrote some test code
struct A {
int x;
A(int X) {
x = X; /* And a second time with this->x = X; */
}
};
int main() {
A a(8);
return 0;
}
and surprisingly even with -O0
they output the exact same assembler code.
Also if I use a member function and call it in another member function it shows the same behavior. So is the this
pointer just a compile time thing and not an actual pointer? Or are there cases where this
is actually translated and dereferenced? I use GCC 4.4.3 btw.
c++ gcc this this-pointer
c++ gcc this this-pointer
edited Nov 12 at 16:42
sds
38k1492164
38k1492164
asked Nov 12 at 15:01
Yastanub
386212
386212
6
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
6
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
6
6
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
12 Answers
12
active
oldest
votes
up vote
65
down vote
accepted
So is the this pointer just a compile time thing and not an actual pointer?
It very much is a run time thing. It refers to the object on which the member function is invoked, naturally that object can exist at run time.
What is a compile time thing is how name lookup works. When a compiler encounters x = X
it must figure out what is this x
that is being assigned. So it looks it up, and finds the member variable. Since this->x
and x
refer to the same thing, naturally you get the same assembly output.
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
up vote
23
down vote
It is an actual pointer, as the standard specifies it (§12.2.2.1):
In the body of a non-static (12.2.1) member function, the keyword
this
is a prvalue expression whose value is the address of the object for which the function is called. The type ofthis
in a member function of a classX
isX*
.
this
is actually implicit every time you reference a non-static member variable or member function within a class own code. It is also needed (either when implicit or explicit) because the compiler needs to tie back the function or the variable to an actual object at runtime.
Using it explicitly is rarely useful, unless you need, for example, to disambiguate between a parameter and a member variable within a member function. Otherwise, without it the compiler will shadow the member variable with the parameter (See it live on Coliru).
6
You also need to explicitly writethis->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.
– Toby Speight
Nov 12 at 18:58
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
add a comment |
up vote
14
down vote
this
always has to exist when you are in a non-static method. Whether you explicitly use it or not, you have to have a reference to the current instance, and this is what this
gives you.
In both cases, you are going to access memory through the this
pointer. It's just that you can omit it in some cases.
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
add a comment |
up vote
13
down vote
This is almost a duplicate of How do objects work in x86 at the assembly level?, where I comment the asm output of some examples, including showing which register the this
pointer was passed in.
In asm, this
works exactly like a hidden first arg, so both the member-function foo::add(int)
and the non-member add
which takes an explicit foo*
first arg compile to exactly the same asm.
struct foo {
int m;
void add(int a); // not inline so we get a stand-alone definition emitted
};
void foo::add(int a) {
this->m += a;
}
void add(foo *obj, int a) {
obj->m += a;
}
On the Godbolt compiler explorer, compiling for x86-64 with the System V ABI (first arg in RDI, second in RSI), we get:
# gcc8.2 -O3
foo::add(int):
add DWORD PTR [rdi], esi # memory-destination add
ret
add(foo*, int):
add DWORD PTR [rdi], esi
ret
I use GCC 4.4.3
That was released in January 2010, so it's missing nearly a decade of improvements to the optimizer, and to error messages. The gcc7 series has been out and stable for a while. Expect missed optimizations with such an old compiler, especially for modern instruction sets like AVX.
add a comment |
up vote
9
down vote
After compilation, every symbol is just an address, so it can't be a run-time issue.
Any member symbol is compiled to an offset in the current class anyway, even if you didn't use this
.
When name
is used in C++ it can be one of the following.
- In the global namespace (like
::name
), or in the current namespace, or in the used namespace (whenusing namespace ...
been used) - In the current class
- Local definition, in upper block
- Local definition, in current block
Therefore, when you write code, the compiler should scan each, in a manner to look for the symbol name, from the current block and up to the global namespace.
Using this->name
helps the compiler to narrow the search for name
to only look for it in the current class scope, meaning it skips local definitions, and if not found in class scope, do not look for it in the global scope.
add a comment |
up vote
5
down vote
Here is a simple example how "this" could be useful during runtime:
#include <vector>
#include <string>
#include <iostream>
class A;
typedef std::vector<A*> News;
class A
{
public:
A(const char* n): name(n){}
std::string name;
void subscribe(News& n)
{
n.push_back(this);
}
};
int main()
{
A a1("Alex"), a2("Bob"), a3("Chris");
News news;
a1.subscribe(news);
a3.subscribe(news);
std::cout << "Subscriber:";
for(auto& a: news)
{
std::cout << " " << a->name;
}
return 0;
}
add a comment |
up vote
4
down vote
Your machine does not know anything about class methods, they are normal functions under the hood.
Hence methods have to be implemented by always passing a pointer to the current object, it's just implicit in C++, i.e. T Class::method(...)
is just syntactic sugar for T Class_Method(Class* this, ...)
.
Other languages like Python or Lua choose to make it explicit and modern object-oriented C APIs like Vulkan (unlike OpenGL) use a similar pattern.
add a comment |
up vote
4
down vote
since I usually use it every single time I refer to a member variable or function.
You always use this
when you refer to a member variable or function. There is simply no other way to reach members. The only choice is implicit vs explicit notation.
Let's go back to see how it was done before this
to understand what this
is.
Without OOP:
struct A {
int x;
};
void foo(A* that) {
bar(that->x)
}
With OOP but writing this
explicitly
struct A {
int x;
void foo(void) {
bar(this->x)
}
};
using shorter notation:
struct A {
int x;
void foo(void) {
bar(x)
}
};
But the difference is only in source code. All are compiled to same thing. If you create a member method, the compiler will create a pointer argument for you and name it "this". If you omit this->
when referring to a member, the compiler is clever just enough to insert it for you most of the time. That's it. The only difference is 6 less letters in the source.
Writing this
explicitly makes sense when there is an ambiguity, namely another variable named just like your member variable:
struct A {
int x;
A(int x) {
this->x = x
}
};
There are some instances, like __thiscall, where OO and non-OO code may end bit different in asm, but whenever the pointer is passed on stack and then optimized to a register or in ECX from the very beginning doesn't make it "not a pointer".
add a comment |
up vote
2
down vote
"this" can also safeguard against shadowing by a function parameter, for example:
class Vector {
public:
double x,y,z;
void SetLocation(double x, double y, double z);
};
void Vector::SetLocation(double x, double y, double z) {
this->x = x; //Passed parameter assigned to member variable
this->y = y;
this->z = z;
}
(Obviously, writing such code is discouraged.)
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
add a comment |
up vote
2
down vote
if the compiler inlines a member function that is called with static rather than dynamic binding, it might be able to optimize away the this
pointer. Take this simple example:
#include <iostream>
using std::cout;
using std::endl;
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int main(void)
{
example e;
e.foo(10);
cout << e.foo() << endl;
}
GCC 7.3.0 with the -march=x86-64 -O -S
flag is able to compile cout << e.foo()
to three instructions:
movl $10, %esi
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEi@PLT
This is a call to std::ostream::operator<<
. Remember that cout << e.foo();
is syntactic sugar for std::ostream::operator<< (cout, e.foo());
. And operator<<(int)
could be written two ways: static operator<< (ostream&, int)
, as a non-member function, where the operand on the left is an explicit parameter, or operator<<(int)
, as a member function, where it’s implicitly this
.
The compiler was able to deduce that e.foo()
will always be the constant 10
. Since the 64-bit x86 calling convention is to pass function arguments in registers, that compiles down to the single movl
instruction, which sets the second function parameter to 10
. The leaq
instruction sets the first argument (which might be an explicit ostream&
or the implicit this
) to &cout
. Then the program makes a call
to the function.
In more complex cases, though—such as if you have a function taking an example&
as a parameter—the compiler needs to look up this
, as this
is what tells the program which instance it’s working with, and therefore, which instance’s x
data member to look up.
Consider this example:
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int bar( const example& e )
{
return e.foo();
}
The function bar()
gets compiled to a bit of boilerplate and the instruction:
movl (%rdi), %eax
ret
You remember from the previous example that %rdi
on x86-64 is the first function argument, the implicit this
pointer for the call to e.foo()
. Putting it in parentheses, (%rdi)
, means look up the variable at that location. (Since the only data in an example
instance is x
, &e.x
happens to be the same as &e
in this case.) Moving the contents to %eax
sets the return value.
In this case, the compiler needed the implicit this
argument to foo(/* example* this */)
to be able to find &e
and therefore &e.x
. In fact, inside a member function (that isn’t static
), x
, this->x
and (*this).x
all mean the same thing.
add a comment |
up vote
1
down vote
this
is a pointer. It's like an implicit parameter that's part of every method. You could imagine using plain C functions and writing code like:
Socket makeSocket(int port) { ... }
void send(Socket *this, Value v) { ... }
Value receive(Socket *this) { ... }
Socket *mySocket = makeSocket(1234);
send(mySocket, someValue); // The subject, `mySocket`, is passed in as a param called "this", explicitly
Value newData = receive(socket);
In C++, similar code might look like:
mySocket.send(someValue); // The subject, `mySocket`, is passed in as a param called "this"
Value newData = mySocket.receive();
add a comment |
up vote
1
down vote
this
is indeed a runtime pointer (albeit one implicitly supplied by the compiler), as has been iterated in most answers. It is used to indicate which instance of a class a given member function is to operate on when called; for any given instance c
of class C
, when any member function cf()
is called, c.cf()
will be supplied a this
pointer equal to &c
(this naturally also applies to any struct s
of type S
, when calling member function s.sf()
, as shall be used for cleaner demonstrations). It can even be cv-qualified just as any other pointer, with the same effects (but, unfortunately, not the same syntax due to being special); this is commonly used for const
correctness, and much less frequently for volatile
correctness.
template<typename T>
uintptr_t addr_out(T* ptr) { return reinterpret_cast<uintptr_t>(ptr); }
struct S {
int i;
uintptr_t address() const { return addr_out(this); }
};
// Format a given numerical value into a hex value for easy display.
// Implementation omitted for brevity.
template<typename T>
std::string hex_out_s(T val, bool disp0X = true);
// ...
S s[2];
std::cout << "Control example: Two distinct instances of simple class.n";
std::cout << "s[0] address:tttt" << hex_out_s(addr_out(&s[0]))
<< "n* s[0] this pointer:ttt" << hex_out_s(s[0].address())
<< "nn";
std::cout << "s[1] address:tttt" << hex_out_s(addr_out(&s[1]))
<< "n* s[1] this pointer:ttt" << hex_out_s(s[1].address())
<< "nn";
Sample output:
Control example: Two distinct instances of simple class.
s[0] address: 0x0000003836e8fb40
* s[0] this pointer: 0x0000003836e8fb40
s[1] address: 0x0000003836e8fb44
* s[1] this pointer: 0x0000003836e8fb44
These values aren't guaranteed, and can easily change from one execution to the next; this can most easily be observed while creating and testing a program, through the use of build tools.
Mechanically, it's similar to a hidden parameter added to the start of each member function's argument list; x.f() cv
can be seen as a special variant of f(cv X* this)
, albeit with a different format for linguistic reasons. In fact, there were recent proposals by both Stroustrup and Sutter to unify the call syntax of x.f(y)
and f(x, y)
, which would've made this implicit behaviour an explicit linguistic rule. It unfortunately was met with concerns that it may cause a few unwanted surprises for library developers, and thus not yet implemented; to my knowledge, the most recent proposal is a joint proposal, for f(x,y)
to be able to fall back on x.f(y)
if no f(x,y)
is found, similar to the interaction between, e.g., std::begin(x)
and member function x.begin()
.
In this case, this
would be more akin to a normal pointer, and the programmer would be able to specify it manually. If a solution is found to allow the more robust form without violating the principle of least astonishment (or bringing any other concerns to pass), then an equivalent to this
would also be able to be implicitly generated as a normal pointer for non-member functions, as well.
Relatedly, one important thing to note is that this
is the instance's address, as seen by that instance; while the pointer itself is a runtime thing, it doesn't always have the value you'd think it has. This becomes relevant when looking at classes with more complex inheritance hierarchies. Specifically, when looking at cases where one or more member classes that contain member functions don't have the same address as the derived class itself. Three cases in particular come to mind:
Note that these are demonstrated using MSVC, with class layouts output via the undocumented -d1reportSingleClassLayout compiler parameter, due to me finding it more easily readable than GCC or Clang equivalents.
Non-standard layout: When a class is standard layout, the address of an instance's first data member is exactly identical to the address of the instance itself; thus,
this
can be said to be equivalent to the first data member's address. This will hold true even if said data member is a member of a base class, as long as the derived class continues to follow standard layout rules. ...Conversely, this also means that if the derived class isn't standard layout, then this is no longer guaranteed.
struct StandardBase {
int i;
uintptr_t address() const { return addr_out(this); }
};
struct NonStandardDerived : StandardBase {
virtual void f() {}
uintptr_t address() const { return addr_out(this); }
};
static_assert(std::is_standard_layout<StandardBase>::value, "Nyeh.");
static_assert(!std::is_standard_layout<NonStandardDerived>::value, ".heyN");
// ...
NonStandardDerived n;
std::cout << "Derived class with non-standard layout:"
<< "n* n address:ttttt" << hex_out_s(addr_out(&n))
<< "n* n this pointer:tttt" << hex_out_s(n.address())
<< "n* n this pointer (as StandardBase):tt" << hex_out_s(n.StandardBase::address())
<< "n* n this pointer (as NonStandardDerived):t" << hex_out_s(n.NonStandardDerived::address())
<< "nn";
Sample output:
Derived class with non-standard layout:
* n address: 0x00000061e86cf3c0
* n this pointer: 0x00000061e86cf3c0
* n this pointer (as StandardBase): 0x00000061e86cf3c8
* n this pointer (as NonStandardDerived): 0x00000061e86cf3c0
Note that
StandardBase::address()
is supplied with a differentthis
pointer thanNonStandardDerived::address()
, even when called on the same instance. This is because the latter's use of a vtable caused the compiler to insert a hidden member.
class StandardBase size(4):
+---
0 | i
+---
class NonStandardDerived size(16):
+---
0 | {vfptr}
| +--- (base class StandardBase)
8 | | i
| +---
| <alignment member> (size=4)
+---
NonStandardDerived::$vftable@:
| &NonStandardDerived_meta
| 0
0 | &NonStandardDerived::f
NonStandardDerived::f this adjustor: 0
Virtual base classes: Due to virtual bases trailing after the most-derived class, the
this
pointer supplied to a member function inherited from a virtual base will be different than the one provided to members of the derived class itself.
struct VBase {
uintptr_t address() const { return addr_out(this); }
};
struct VDerived : virtual VBase {
uintptr_t address() const { return addr_out(this); }
};
// ...
VDerived v;
std::cout << "Derived class with virtual base:"
<< "n* v address:ttttt" << hex_out_s(addr_out(&v))
<< "n* v this pointer:tttt" << hex_out_s(v.address())
<< "n* this pointer (as VBase):ttt" << hex_out_s(v.VBase::address())
<< "n* this pointer (as VDerived):ttt" << hex_out_s(v.VDerived::address())
<< "nn";
Sample output:
Derived class with virtual base:
* v address: 0x0000008f8314f8b0
* v this pointer: 0x0000008f8314f8b0
* this pointer (as VBase): 0x0000008f8314f8b8
* this pointer (as VDerived): 0x0000008f8314f8b0
Once again, the base class' member function is supplied with a different
this
pointer, due toVDerived
's inheritedVBase
having a different starting address thanVDerived
itself.
class VDerived size(8):
+---
0 | {vbptr}
+---
+--- (virtual base VBase)
+---
VDerived::$vbtable@:
0 | 0
1 | 8 (VDerivedd(VDerived+0)VBase)
vbi: class offset o.vbptr o.vbte fVtorDisp
VBase 8 0 4 0
Multiple inheritance: As can be expected, multiple inheritance can easily lead to cases where the
this
pointer passed to one member function is different than thethis
pointer passed to a different member function, even if both functions are called with the same instance. This can come up for member functions of any base class other than the first, similarly to when working with non-standard layout classes (where all base classes after the first start at a different address than the derived class itself)... but it can be especially surprising in the case ofvirtual
functions, when multiple members supply virtual functions with the same signature.
struct Base1 {
int i;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Base2 {
short s;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Derived : Base1, Base2 {
bool b;
uintptr_t address() const override { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
// ...
Derived d;
std::cout << "Derived class with multiple inheritance:"
<< "n (Calling address() through a static_cast reference, then the appropriate raw_address().)"
<< "n* d address:ttttt" << hex_out_s(addr_out(&d))
<< "n* d this pointer:tttt" << hex_out_s(d.address()) << " (" << hex_out_s(d.raw_address()) << ")"
<< "n* d this pointer (as Base1):ttt" << hex_out_s(static_cast<Base1&>((d)).address()) << " (" << hex_out_s(d.Base1::raw_address()) << ")"
<< "n* d this pointer (as Base2):ttt" << hex_out_s(static_cast<Base2&>((d)).address()) << " (" << hex_out_s(d.Base2::raw_address()) << ")"
<< "n* d this pointer (as Derived):ttt" << hex_out_s(static_cast<Derived&>((d)).address()) << " (" << hex_out_s(d.Derived::raw_address()) << ")"
<< "nn";
Sample output:
Derived class with multiple inheritance:
(Calling address() through a static_cast reference, then the appropriate raw_address().)
* d address: 0x00000056911ef530
* d this pointer: 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base1): 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base2): 0x00000056911ef530 (0x00000056911ef540)
* d this pointer (as Derived): 0x00000056911ef530 (0x00000056911ef530)
We would expect each
raw_address()
to same rules due to each explicitly being a separate function, and thus thatBase2::raw_address()
will return a different value thanDerived::raw_address()
. But since we know derived functions will always call the most-derived form, how isaddress()
correct when called from a reference toBase2
? This is due to a little compiler trickery called an "adjustor thunk", which is a helper that takes a base class instance'sthis
pointer and adjusts it to point to the most-derived class instead, when necessary.
class Derived size(40):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | i
| | <alignment member> (size=4)
| +---
| +--- (base class Base2)
16 | | {vfptr}
24 | | s
| | <alignment member> (size=6)
| +---
32 | b
| <alignment member> (size=7)
+---
Derived::$vftable@Base1@:
| &Derived_meta
| 0
0 | &Derived::address
Derived::$vftable@Base2@:
| -16
0 | &thunk: this-=16; goto Derived::address
Derived::address this adjustor: 0
If you're curious, feel free to tinker around with this little program, to take a look at how the addresses change if you run it multiple times, or at cases where it might have a different value than you may expect.
add a comment |
12 Answers
12
active
oldest
votes
12 Answers
12
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
65
down vote
accepted
So is the this pointer just a compile time thing and not an actual pointer?
It very much is a run time thing. It refers to the object on which the member function is invoked, naturally that object can exist at run time.
What is a compile time thing is how name lookup works. When a compiler encounters x = X
it must figure out what is this x
that is being assigned. So it looks it up, and finds the member variable. Since this->x
and x
refer to the same thing, naturally you get the same assembly output.
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
up vote
65
down vote
accepted
So is the this pointer just a compile time thing and not an actual pointer?
It very much is a run time thing. It refers to the object on which the member function is invoked, naturally that object can exist at run time.
What is a compile time thing is how name lookup works. When a compiler encounters x = X
it must figure out what is this x
that is being assigned. So it looks it up, and finds the member variable. Since this->x
and x
refer to the same thing, naturally you get the same assembly output.
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
up vote
65
down vote
accepted
up vote
65
down vote
accepted
So is the this pointer just a compile time thing and not an actual pointer?
It very much is a run time thing. It refers to the object on which the member function is invoked, naturally that object can exist at run time.
What is a compile time thing is how name lookup works. When a compiler encounters x = X
it must figure out what is this x
that is being assigned. So it looks it up, and finds the member variable. Since this->x
and x
refer to the same thing, naturally you get the same assembly output.
So is the this pointer just a compile time thing and not an actual pointer?
It very much is a run time thing. It refers to the object on which the member function is invoked, naturally that object can exist at run time.
What is a compile time thing is how name lookup works. When a compiler encounters x = X
it must figure out what is this x
that is being assigned. So it looks it up, and finds the member variable. Since this->x
and x
refer to the same thing, naturally you get the same assembly output.
answered Nov 12 at 15:05
StoryTeller
88.8k12179245
88.8k12179245
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago
add a comment |
up vote
23
down vote
It is an actual pointer, as the standard specifies it (§12.2.2.1):
In the body of a non-static (12.2.1) member function, the keyword
this
is a prvalue expression whose value is the address of the object for which the function is called. The type ofthis
in a member function of a classX
isX*
.
this
is actually implicit every time you reference a non-static member variable or member function within a class own code. It is also needed (either when implicit or explicit) because the compiler needs to tie back the function or the variable to an actual object at runtime.
Using it explicitly is rarely useful, unless you need, for example, to disambiguate between a parameter and a member variable within a member function. Otherwise, without it the compiler will shadow the member variable with the parameter (See it live on Coliru).
6
You also need to explicitly writethis->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.
– Toby Speight
Nov 12 at 18:58
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
add a comment |
up vote
23
down vote
It is an actual pointer, as the standard specifies it (§12.2.2.1):
In the body of a non-static (12.2.1) member function, the keyword
this
is a prvalue expression whose value is the address of the object for which the function is called. The type ofthis
in a member function of a classX
isX*
.
this
is actually implicit every time you reference a non-static member variable or member function within a class own code. It is also needed (either when implicit or explicit) because the compiler needs to tie back the function or the variable to an actual object at runtime.
Using it explicitly is rarely useful, unless you need, for example, to disambiguate between a parameter and a member variable within a member function. Otherwise, without it the compiler will shadow the member variable with the parameter (See it live on Coliru).
6
You also need to explicitly writethis->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.
– Toby Speight
Nov 12 at 18:58
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
add a comment |
up vote
23
down vote
up vote
23
down vote
It is an actual pointer, as the standard specifies it (§12.2.2.1):
In the body of a non-static (12.2.1) member function, the keyword
this
is a prvalue expression whose value is the address of the object for which the function is called. The type ofthis
in a member function of a classX
isX*
.
this
is actually implicit every time you reference a non-static member variable or member function within a class own code. It is also needed (either when implicit or explicit) because the compiler needs to tie back the function or the variable to an actual object at runtime.
Using it explicitly is rarely useful, unless you need, for example, to disambiguate between a parameter and a member variable within a member function. Otherwise, without it the compiler will shadow the member variable with the parameter (See it live on Coliru).
It is an actual pointer, as the standard specifies it (§12.2.2.1):
In the body of a non-static (12.2.1) member function, the keyword
this
is a prvalue expression whose value is the address of the object for which the function is called. The type ofthis
in a member function of a classX
isX*
.
this
is actually implicit every time you reference a non-static member variable or member function within a class own code. It is also needed (either when implicit or explicit) because the compiler needs to tie back the function or the variable to an actual object at runtime.
Using it explicitly is rarely useful, unless you need, for example, to disambiguate between a parameter and a member variable within a member function. Otherwise, without it the compiler will shadow the member variable with the parameter (See it live on Coliru).
edited Nov 12 at 18:58
Toby Speight
15.9k133965
15.9k133965
answered Nov 12 at 15:20
JBL
9,52433567
9,52433567
6
You also need to explicitly writethis->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.
– Toby Speight
Nov 12 at 18:58
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
add a comment |
6
You also need to explicitly writethis->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.
– Toby Speight
Nov 12 at 18:58
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
6
6
You also need to explicitly write
this->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.– Toby Speight
Nov 12 at 18:58
You also need to explicitly write
this->
when accessing a member of a non-dependent base type from a template member. Not often needed, and a good compiler will diagnose exactly when you forget it, but worth mentioning.– Toby Speight
Nov 12 at 18:58
1
1
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
It can also be very useful to write "this->" when developing with an IDE, because the IDE can then provide a list of members to select from. (Personally, I tend not to use an IDE, but if one chooses to, taking advantage of it seems sensible.)
– Martin Bonner
2 days ago
3
3
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
"Using it explicitly is rarely useful", from the compiler perspective, true; From a human perspective, some teams will enforce this as a style rule to prevent human-error introduced bugs.
– Tezra
2 days ago
add a comment |
up vote
14
down vote
this
always has to exist when you are in a non-static method. Whether you explicitly use it or not, you have to have a reference to the current instance, and this is what this
gives you.
In both cases, you are going to access memory through the this
pointer. It's just that you can omit it in some cases.
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
add a comment |
up vote
14
down vote
this
always has to exist when you are in a non-static method. Whether you explicitly use it or not, you have to have a reference to the current instance, and this is what this
gives you.
In both cases, you are going to access memory through the this
pointer. It's just that you can omit it in some cases.
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
add a comment |
up vote
14
down vote
up vote
14
down vote
this
always has to exist when you are in a non-static method. Whether you explicitly use it or not, you have to have a reference to the current instance, and this is what this
gives you.
In both cases, you are going to access memory through the this
pointer. It's just that you can omit it in some cases.
this
always has to exist when you are in a non-static method. Whether you explicitly use it or not, you have to have a reference to the current instance, and this is what this
gives you.
In both cases, you are going to access memory through the this
pointer. It's just that you can omit it in some cases.
edited Nov 12 at 17:01
answered Nov 12 at 15:05
Matthieu Brucher
5,6241228
5,6241228
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
add a comment |
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
Essentially, syntactical sugar (whether by inclusion or omission, its a shortcut).
– Draco18s
Nov 12 at 16:55
add a comment |
up vote
13
down vote
This is almost a duplicate of How do objects work in x86 at the assembly level?, where I comment the asm output of some examples, including showing which register the this
pointer was passed in.
In asm, this
works exactly like a hidden first arg, so both the member-function foo::add(int)
and the non-member add
which takes an explicit foo*
first arg compile to exactly the same asm.
struct foo {
int m;
void add(int a); // not inline so we get a stand-alone definition emitted
};
void foo::add(int a) {
this->m += a;
}
void add(foo *obj, int a) {
obj->m += a;
}
On the Godbolt compiler explorer, compiling for x86-64 with the System V ABI (first arg in RDI, second in RSI), we get:
# gcc8.2 -O3
foo::add(int):
add DWORD PTR [rdi], esi # memory-destination add
ret
add(foo*, int):
add DWORD PTR [rdi], esi
ret
I use GCC 4.4.3
That was released in January 2010, so it's missing nearly a decade of improvements to the optimizer, and to error messages. The gcc7 series has been out and stable for a while. Expect missed optimizations with such an old compiler, especially for modern instruction sets like AVX.
add a comment |
up vote
13
down vote
This is almost a duplicate of How do objects work in x86 at the assembly level?, where I comment the asm output of some examples, including showing which register the this
pointer was passed in.
In asm, this
works exactly like a hidden first arg, so both the member-function foo::add(int)
and the non-member add
which takes an explicit foo*
first arg compile to exactly the same asm.
struct foo {
int m;
void add(int a); // not inline so we get a stand-alone definition emitted
};
void foo::add(int a) {
this->m += a;
}
void add(foo *obj, int a) {
obj->m += a;
}
On the Godbolt compiler explorer, compiling for x86-64 with the System V ABI (first arg in RDI, second in RSI), we get:
# gcc8.2 -O3
foo::add(int):
add DWORD PTR [rdi], esi # memory-destination add
ret
add(foo*, int):
add DWORD PTR [rdi], esi
ret
I use GCC 4.4.3
That was released in January 2010, so it's missing nearly a decade of improvements to the optimizer, and to error messages. The gcc7 series has been out and stable for a while. Expect missed optimizations with such an old compiler, especially for modern instruction sets like AVX.
add a comment |
up vote
13
down vote
up vote
13
down vote
This is almost a duplicate of How do objects work in x86 at the assembly level?, where I comment the asm output of some examples, including showing which register the this
pointer was passed in.
In asm, this
works exactly like a hidden first arg, so both the member-function foo::add(int)
and the non-member add
which takes an explicit foo*
first arg compile to exactly the same asm.
struct foo {
int m;
void add(int a); // not inline so we get a stand-alone definition emitted
};
void foo::add(int a) {
this->m += a;
}
void add(foo *obj, int a) {
obj->m += a;
}
On the Godbolt compiler explorer, compiling for x86-64 with the System V ABI (first arg in RDI, second in RSI), we get:
# gcc8.2 -O3
foo::add(int):
add DWORD PTR [rdi], esi # memory-destination add
ret
add(foo*, int):
add DWORD PTR [rdi], esi
ret
I use GCC 4.4.3
That was released in January 2010, so it's missing nearly a decade of improvements to the optimizer, and to error messages. The gcc7 series has been out and stable for a while. Expect missed optimizations with such an old compiler, especially for modern instruction sets like AVX.
This is almost a duplicate of How do objects work in x86 at the assembly level?, where I comment the asm output of some examples, including showing which register the this
pointer was passed in.
In asm, this
works exactly like a hidden first arg, so both the member-function foo::add(int)
and the non-member add
which takes an explicit foo*
first arg compile to exactly the same asm.
struct foo {
int m;
void add(int a); // not inline so we get a stand-alone definition emitted
};
void foo::add(int a) {
this->m += a;
}
void add(foo *obj, int a) {
obj->m += a;
}
On the Godbolt compiler explorer, compiling for x86-64 with the System V ABI (first arg in RDI, second in RSI), we get:
# gcc8.2 -O3
foo::add(int):
add DWORD PTR [rdi], esi # memory-destination add
ret
add(foo*, int):
add DWORD PTR [rdi], esi
ret
I use GCC 4.4.3
That was released in January 2010, so it's missing nearly a decade of improvements to the optimizer, and to error messages. The gcc7 series has been out and stable for a while. Expect missed optimizations with such an old compiler, especially for modern instruction sets like AVX.
answered Nov 12 at 16:03
Peter Cordes
114k16173297
114k16173297
add a comment |
add a comment |
up vote
9
down vote
After compilation, every symbol is just an address, so it can't be a run-time issue.
Any member symbol is compiled to an offset in the current class anyway, even if you didn't use this
.
When name
is used in C++ it can be one of the following.
- In the global namespace (like
::name
), or in the current namespace, or in the used namespace (whenusing namespace ...
been used) - In the current class
- Local definition, in upper block
- Local definition, in current block
Therefore, when you write code, the compiler should scan each, in a manner to look for the symbol name, from the current block and up to the global namespace.
Using this->name
helps the compiler to narrow the search for name
to only look for it in the current class scope, meaning it skips local definitions, and if not found in class scope, do not look for it in the global scope.
add a comment |
up vote
9
down vote
After compilation, every symbol is just an address, so it can't be a run-time issue.
Any member symbol is compiled to an offset in the current class anyway, even if you didn't use this
.
When name
is used in C++ it can be one of the following.
- In the global namespace (like
::name
), or in the current namespace, or in the used namespace (whenusing namespace ...
been used) - In the current class
- Local definition, in upper block
- Local definition, in current block
Therefore, when you write code, the compiler should scan each, in a manner to look for the symbol name, from the current block and up to the global namespace.
Using this->name
helps the compiler to narrow the search for name
to only look for it in the current class scope, meaning it skips local definitions, and if not found in class scope, do not look for it in the global scope.
add a comment |
up vote
9
down vote
up vote
9
down vote
After compilation, every symbol is just an address, so it can't be a run-time issue.
Any member symbol is compiled to an offset in the current class anyway, even if you didn't use this
.
When name
is used in C++ it can be one of the following.
- In the global namespace (like
::name
), or in the current namespace, or in the used namespace (whenusing namespace ...
been used) - In the current class
- Local definition, in upper block
- Local definition, in current block
Therefore, when you write code, the compiler should scan each, in a manner to look for the symbol name, from the current block and up to the global namespace.
Using this->name
helps the compiler to narrow the search for name
to only look for it in the current class scope, meaning it skips local definitions, and if not found in class scope, do not look for it in the global scope.
After compilation, every symbol is just an address, so it can't be a run-time issue.
Any member symbol is compiled to an offset in the current class anyway, even if you didn't use this
.
When name
is used in C++ it can be one of the following.
- In the global namespace (like
::name
), or in the current namespace, or in the used namespace (whenusing namespace ...
been used) - In the current class
- Local definition, in upper block
- Local definition, in current block
Therefore, when you write code, the compiler should scan each, in a manner to look for the symbol name, from the current block and up to the global namespace.
Using this->name
helps the compiler to narrow the search for name
to only look for it in the current class scope, meaning it skips local definitions, and if not found in class scope, do not look for it in the global scope.
edited Nov 13 at 5:14
Peter Mortensen
13.3k1983111
13.3k1983111
answered Nov 12 at 15:17
SHR
5,48242240
5,48242240
add a comment |
add a comment |
up vote
5
down vote
Here is a simple example how "this" could be useful during runtime:
#include <vector>
#include <string>
#include <iostream>
class A;
typedef std::vector<A*> News;
class A
{
public:
A(const char* n): name(n){}
std::string name;
void subscribe(News& n)
{
n.push_back(this);
}
};
int main()
{
A a1("Alex"), a2("Bob"), a3("Chris");
News news;
a1.subscribe(news);
a3.subscribe(news);
std::cout << "Subscriber:";
for(auto& a: news)
{
std::cout << " " << a->name;
}
return 0;
}
add a comment |
up vote
5
down vote
Here is a simple example how "this" could be useful during runtime:
#include <vector>
#include <string>
#include <iostream>
class A;
typedef std::vector<A*> News;
class A
{
public:
A(const char* n): name(n){}
std::string name;
void subscribe(News& n)
{
n.push_back(this);
}
};
int main()
{
A a1("Alex"), a2("Bob"), a3("Chris");
News news;
a1.subscribe(news);
a3.subscribe(news);
std::cout << "Subscriber:";
for(auto& a: news)
{
std::cout << " " << a->name;
}
return 0;
}
add a comment |
up vote
5
down vote
up vote
5
down vote
Here is a simple example how "this" could be useful during runtime:
#include <vector>
#include <string>
#include <iostream>
class A;
typedef std::vector<A*> News;
class A
{
public:
A(const char* n): name(n){}
std::string name;
void subscribe(News& n)
{
n.push_back(this);
}
};
int main()
{
A a1("Alex"), a2("Bob"), a3("Chris");
News news;
a1.subscribe(news);
a3.subscribe(news);
std::cout << "Subscriber:";
for(auto& a: news)
{
std::cout << " " << a->name;
}
return 0;
}
Here is a simple example how "this" could be useful during runtime:
#include <vector>
#include <string>
#include <iostream>
class A;
typedef std::vector<A*> News;
class A
{
public:
A(const char* n): name(n){}
std::string name;
void subscribe(News& n)
{
n.push_back(this);
}
};
int main()
{
A a1("Alex"), a2("Bob"), a3("Chris");
News news;
a1.subscribe(news);
a3.subscribe(news);
std::cout << "Subscriber:";
for(auto& a: news)
{
std::cout << " " << a->name;
}
return 0;
}
answered Nov 12 at 16:00
Helmut Zeisel
663
663
add a comment |
add a comment |
up vote
4
down vote
Your machine does not know anything about class methods, they are normal functions under the hood.
Hence methods have to be implemented by always passing a pointer to the current object, it's just implicit in C++, i.e. T Class::method(...)
is just syntactic sugar for T Class_Method(Class* this, ...)
.
Other languages like Python or Lua choose to make it explicit and modern object-oriented C APIs like Vulkan (unlike OpenGL) use a similar pattern.
add a comment |
up vote
4
down vote
Your machine does not know anything about class methods, they are normal functions under the hood.
Hence methods have to be implemented by always passing a pointer to the current object, it's just implicit in C++, i.e. T Class::method(...)
is just syntactic sugar for T Class_Method(Class* this, ...)
.
Other languages like Python or Lua choose to make it explicit and modern object-oriented C APIs like Vulkan (unlike OpenGL) use a similar pattern.
add a comment |
up vote
4
down vote
up vote
4
down vote
Your machine does not know anything about class methods, they are normal functions under the hood.
Hence methods have to be implemented by always passing a pointer to the current object, it's just implicit in C++, i.e. T Class::method(...)
is just syntactic sugar for T Class_Method(Class* this, ...)
.
Other languages like Python or Lua choose to make it explicit and modern object-oriented C APIs like Vulkan (unlike OpenGL) use a similar pattern.
Your machine does not know anything about class methods, they are normal functions under the hood.
Hence methods have to be implemented by always passing a pointer to the current object, it's just implicit in C++, i.e. T Class::method(...)
is just syntactic sugar for T Class_Method(Class* this, ...)
.
Other languages like Python or Lua choose to make it explicit and modern object-oriented C APIs like Vulkan (unlike OpenGL) use a similar pattern.
answered Nov 13 at 5:49
Trass3r
2,7701429
2,7701429
add a comment |
add a comment |
up vote
4
down vote
since I usually use it every single time I refer to a member variable or function.
You always use this
when you refer to a member variable or function. There is simply no other way to reach members. The only choice is implicit vs explicit notation.
Let's go back to see how it was done before this
to understand what this
is.
Without OOP:
struct A {
int x;
};
void foo(A* that) {
bar(that->x)
}
With OOP but writing this
explicitly
struct A {
int x;
void foo(void) {
bar(this->x)
}
};
using shorter notation:
struct A {
int x;
void foo(void) {
bar(x)
}
};
But the difference is only in source code. All are compiled to same thing. If you create a member method, the compiler will create a pointer argument for you and name it "this". If you omit this->
when referring to a member, the compiler is clever just enough to insert it for you most of the time. That's it. The only difference is 6 less letters in the source.
Writing this
explicitly makes sense when there is an ambiguity, namely another variable named just like your member variable:
struct A {
int x;
A(int x) {
this->x = x
}
};
There are some instances, like __thiscall, where OO and non-OO code may end bit different in asm, but whenever the pointer is passed on stack and then optimized to a register or in ECX from the very beginning doesn't make it "not a pointer".
add a comment |
up vote
4
down vote
since I usually use it every single time I refer to a member variable or function.
You always use this
when you refer to a member variable or function. There is simply no other way to reach members. The only choice is implicit vs explicit notation.
Let's go back to see how it was done before this
to understand what this
is.
Without OOP:
struct A {
int x;
};
void foo(A* that) {
bar(that->x)
}
With OOP but writing this
explicitly
struct A {
int x;
void foo(void) {
bar(this->x)
}
};
using shorter notation:
struct A {
int x;
void foo(void) {
bar(x)
}
};
But the difference is only in source code. All are compiled to same thing. If you create a member method, the compiler will create a pointer argument for you and name it "this". If you omit this->
when referring to a member, the compiler is clever just enough to insert it for you most of the time. That's it. The only difference is 6 less letters in the source.
Writing this
explicitly makes sense when there is an ambiguity, namely another variable named just like your member variable:
struct A {
int x;
A(int x) {
this->x = x
}
};
There are some instances, like __thiscall, where OO and non-OO code may end bit different in asm, but whenever the pointer is passed on stack and then optimized to a register or in ECX from the very beginning doesn't make it "not a pointer".
add a comment |
up vote
4
down vote
up vote
4
down vote
since I usually use it every single time I refer to a member variable or function.
You always use this
when you refer to a member variable or function. There is simply no other way to reach members. The only choice is implicit vs explicit notation.
Let's go back to see how it was done before this
to understand what this
is.
Without OOP:
struct A {
int x;
};
void foo(A* that) {
bar(that->x)
}
With OOP but writing this
explicitly
struct A {
int x;
void foo(void) {
bar(this->x)
}
};
using shorter notation:
struct A {
int x;
void foo(void) {
bar(x)
}
};
But the difference is only in source code. All are compiled to same thing. If you create a member method, the compiler will create a pointer argument for you and name it "this". If you omit this->
when referring to a member, the compiler is clever just enough to insert it for you most of the time. That's it. The only difference is 6 less letters in the source.
Writing this
explicitly makes sense when there is an ambiguity, namely another variable named just like your member variable:
struct A {
int x;
A(int x) {
this->x = x
}
};
There are some instances, like __thiscall, where OO and non-OO code may end bit different in asm, but whenever the pointer is passed on stack and then optimized to a register or in ECX from the very beginning doesn't make it "not a pointer".
since I usually use it every single time I refer to a member variable or function.
You always use this
when you refer to a member variable or function. There is simply no other way to reach members. The only choice is implicit vs explicit notation.
Let's go back to see how it was done before this
to understand what this
is.
Without OOP:
struct A {
int x;
};
void foo(A* that) {
bar(that->x)
}
With OOP but writing this
explicitly
struct A {
int x;
void foo(void) {
bar(this->x)
}
};
using shorter notation:
struct A {
int x;
void foo(void) {
bar(x)
}
};
But the difference is only in source code. All are compiled to same thing. If you create a member method, the compiler will create a pointer argument for you and name it "this". If you omit this->
when referring to a member, the compiler is clever just enough to insert it for you most of the time. That's it. The only difference is 6 less letters in the source.
Writing this
explicitly makes sense when there is an ambiguity, namely another variable named just like your member variable:
struct A {
int x;
A(int x) {
this->x = x
}
};
There are some instances, like __thiscall, where OO and non-OO code may end bit different in asm, but whenever the pointer is passed on stack and then optimized to a register or in ECX from the very beginning doesn't make it "not a pointer".
answered Nov 13 at 12:26
Agent_L
3,1811620
3,1811620
add a comment |
add a comment |
up vote
2
down vote
"this" can also safeguard against shadowing by a function parameter, for example:
class Vector {
public:
double x,y,z;
void SetLocation(double x, double y, double z);
};
void Vector::SetLocation(double x, double y, double z) {
this->x = x; //Passed parameter assigned to member variable
this->y = y;
this->z = z;
}
(Obviously, writing such code is discouraged.)
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
add a comment |
up vote
2
down vote
"this" can also safeguard against shadowing by a function parameter, for example:
class Vector {
public:
double x,y,z;
void SetLocation(double x, double y, double z);
};
void Vector::SetLocation(double x, double y, double z) {
this->x = x; //Passed parameter assigned to member variable
this->y = y;
this->z = z;
}
(Obviously, writing such code is discouraged.)
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
add a comment |
up vote
2
down vote
up vote
2
down vote
"this" can also safeguard against shadowing by a function parameter, for example:
class Vector {
public:
double x,y,z;
void SetLocation(double x, double y, double z);
};
void Vector::SetLocation(double x, double y, double z) {
this->x = x; //Passed parameter assigned to member variable
this->y = y;
this->z = z;
}
(Obviously, writing such code is discouraged.)
"this" can also safeguard against shadowing by a function parameter, for example:
class Vector {
public:
double x,y,z;
void SetLocation(double x, double y, double z);
};
void Vector::SetLocation(double x, double y, double z) {
this->x = x; //Passed parameter assigned to member variable
this->y = y;
this->z = z;
}
(Obviously, writing such code is discouraged.)
answered 2 days ago
Szak1
37539
37539
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
add a comment |
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
1
1
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Usually shadowing comes up as an issue when the member variable is being shadowed by an introduced local variable (where you normally aren't thinking of what is in the global scope), so use of this->x is encouraged to prevent such modification bugs.
– Tezra
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
Yeah unfortunately -Wshadow is not enabled with -Wall. gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
– Trass3r
2 days ago
add a comment |
up vote
2
down vote
if the compiler inlines a member function that is called with static rather than dynamic binding, it might be able to optimize away the this
pointer. Take this simple example:
#include <iostream>
using std::cout;
using std::endl;
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int main(void)
{
example e;
e.foo(10);
cout << e.foo() << endl;
}
GCC 7.3.0 with the -march=x86-64 -O -S
flag is able to compile cout << e.foo()
to three instructions:
movl $10, %esi
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEi@PLT
This is a call to std::ostream::operator<<
. Remember that cout << e.foo();
is syntactic sugar for std::ostream::operator<< (cout, e.foo());
. And operator<<(int)
could be written two ways: static operator<< (ostream&, int)
, as a non-member function, where the operand on the left is an explicit parameter, or operator<<(int)
, as a member function, where it’s implicitly this
.
The compiler was able to deduce that e.foo()
will always be the constant 10
. Since the 64-bit x86 calling convention is to pass function arguments in registers, that compiles down to the single movl
instruction, which sets the second function parameter to 10
. The leaq
instruction sets the first argument (which might be an explicit ostream&
or the implicit this
) to &cout
. Then the program makes a call
to the function.
In more complex cases, though—such as if you have a function taking an example&
as a parameter—the compiler needs to look up this
, as this
is what tells the program which instance it’s working with, and therefore, which instance’s x
data member to look up.
Consider this example:
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int bar( const example& e )
{
return e.foo();
}
The function bar()
gets compiled to a bit of boilerplate and the instruction:
movl (%rdi), %eax
ret
You remember from the previous example that %rdi
on x86-64 is the first function argument, the implicit this
pointer for the call to e.foo()
. Putting it in parentheses, (%rdi)
, means look up the variable at that location. (Since the only data in an example
instance is x
, &e.x
happens to be the same as &e
in this case.) Moving the contents to %eax
sets the return value.
In this case, the compiler needed the implicit this
argument to foo(/* example* this */)
to be able to find &e
and therefore &e.x
. In fact, inside a member function (that isn’t static
), x
, this->x
and (*this).x
all mean the same thing.
add a comment |
up vote
2
down vote
if the compiler inlines a member function that is called with static rather than dynamic binding, it might be able to optimize away the this
pointer. Take this simple example:
#include <iostream>
using std::cout;
using std::endl;
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int main(void)
{
example e;
e.foo(10);
cout << e.foo() << endl;
}
GCC 7.3.0 with the -march=x86-64 -O -S
flag is able to compile cout << e.foo()
to three instructions:
movl $10, %esi
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEi@PLT
This is a call to std::ostream::operator<<
. Remember that cout << e.foo();
is syntactic sugar for std::ostream::operator<< (cout, e.foo());
. And operator<<(int)
could be written two ways: static operator<< (ostream&, int)
, as a non-member function, where the operand on the left is an explicit parameter, or operator<<(int)
, as a member function, where it’s implicitly this
.
The compiler was able to deduce that e.foo()
will always be the constant 10
. Since the 64-bit x86 calling convention is to pass function arguments in registers, that compiles down to the single movl
instruction, which sets the second function parameter to 10
. The leaq
instruction sets the first argument (which might be an explicit ostream&
or the implicit this
) to &cout
. Then the program makes a call
to the function.
In more complex cases, though—such as if you have a function taking an example&
as a parameter—the compiler needs to look up this
, as this
is what tells the program which instance it’s working with, and therefore, which instance’s x
data member to look up.
Consider this example:
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int bar( const example& e )
{
return e.foo();
}
The function bar()
gets compiled to a bit of boilerplate and the instruction:
movl (%rdi), %eax
ret
You remember from the previous example that %rdi
on x86-64 is the first function argument, the implicit this
pointer for the call to e.foo()
. Putting it in parentheses, (%rdi)
, means look up the variable at that location. (Since the only data in an example
instance is x
, &e.x
happens to be the same as &e
in this case.) Moving the contents to %eax
sets the return value.
In this case, the compiler needed the implicit this
argument to foo(/* example* this */)
to be able to find &e
and therefore &e.x
. In fact, inside a member function (that isn’t static
), x
, this->x
and (*this).x
all mean the same thing.
add a comment |
up vote
2
down vote
up vote
2
down vote
if the compiler inlines a member function that is called with static rather than dynamic binding, it might be able to optimize away the this
pointer. Take this simple example:
#include <iostream>
using std::cout;
using std::endl;
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int main(void)
{
example e;
e.foo(10);
cout << e.foo() << endl;
}
GCC 7.3.0 with the -march=x86-64 -O -S
flag is able to compile cout << e.foo()
to three instructions:
movl $10, %esi
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEi@PLT
This is a call to std::ostream::operator<<
. Remember that cout << e.foo();
is syntactic sugar for std::ostream::operator<< (cout, e.foo());
. And operator<<(int)
could be written two ways: static operator<< (ostream&, int)
, as a non-member function, where the operand on the left is an explicit parameter, or operator<<(int)
, as a member function, where it’s implicitly this
.
The compiler was able to deduce that e.foo()
will always be the constant 10
. Since the 64-bit x86 calling convention is to pass function arguments in registers, that compiles down to the single movl
instruction, which sets the second function parameter to 10
. The leaq
instruction sets the first argument (which might be an explicit ostream&
or the implicit this
) to &cout
. Then the program makes a call
to the function.
In more complex cases, though—such as if you have a function taking an example&
as a parameter—the compiler needs to look up this
, as this
is what tells the program which instance it’s working with, and therefore, which instance’s x
data member to look up.
Consider this example:
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int bar( const example& e )
{
return e.foo();
}
The function bar()
gets compiled to a bit of boilerplate and the instruction:
movl (%rdi), %eax
ret
You remember from the previous example that %rdi
on x86-64 is the first function argument, the implicit this
pointer for the call to e.foo()
. Putting it in parentheses, (%rdi)
, means look up the variable at that location. (Since the only data in an example
instance is x
, &e.x
happens to be the same as &e
in this case.) Moving the contents to %eax
sets the return value.
In this case, the compiler needed the implicit this
argument to foo(/* example* this */)
to be able to find &e
and therefore &e.x
. In fact, inside a member function (that isn’t static
), x
, this->x
and (*this).x
all mean the same thing.
if the compiler inlines a member function that is called with static rather than dynamic binding, it might be able to optimize away the this
pointer. Take this simple example:
#include <iostream>
using std::cout;
using std::endl;
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int main(void)
{
example e;
e.foo(10);
cout << e.foo() << endl;
}
GCC 7.3.0 with the -march=x86-64 -O -S
flag is able to compile cout << e.foo()
to three instructions:
movl $10, %esi
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEi@PLT
This is a call to std::ostream::operator<<
. Remember that cout << e.foo();
is syntactic sugar for std::ostream::operator<< (cout, e.foo());
. And operator<<(int)
could be written two ways: static operator<< (ostream&, int)
, as a non-member function, where the operand on the left is an explicit parameter, or operator<<(int)
, as a member function, where it’s implicitly this
.
The compiler was able to deduce that e.foo()
will always be the constant 10
. Since the 64-bit x86 calling convention is to pass function arguments in registers, that compiles down to the single movl
instruction, which sets the second function parameter to 10
. The leaq
instruction sets the first argument (which might be an explicit ostream&
or the implicit this
) to &cout
. Then the program makes a call
to the function.
In more complex cases, though—such as if you have a function taking an example&
as a parameter—the compiler needs to look up this
, as this
is what tells the program which instance it’s working with, and therefore, which instance’s x
data member to look up.
Consider this example:
class example {
public:
int foo() const { return x; }
int foo(const int i) { return (x = i); }
private:
int x;
};
int bar( const example& e )
{
return e.foo();
}
The function bar()
gets compiled to a bit of boilerplate and the instruction:
movl (%rdi), %eax
ret
You remember from the previous example that %rdi
on x86-64 is the first function argument, the implicit this
pointer for the call to e.foo()
. Putting it in parentheses, (%rdi)
, means look up the variable at that location. (Since the only data in an example
instance is x
, &e.x
happens to be the same as &e
in this case.) Moving the contents to %eax
sets the return value.
In this case, the compiler needed the implicit this
argument to foo(/* example* this */)
to be able to find &e
and therefore &e.x
. In fact, inside a member function (that isn’t static
), x
, this->x
and (*this).x
all mean the same thing.
edited 2 days ago
answered Nov 13 at 5:25
Davislor
8,22111126
8,22111126
add a comment |
add a comment |
up vote
1
down vote
this
is a pointer. It's like an implicit parameter that's part of every method. You could imagine using plain C functions and writing code like:
Socket makeSocket(int port) { ... }
void send(Socket *this, Value v) { ... }
Value receive(Socket *this) { ... }
Socket *mySocket = makeSocket(1234);
send(mySocket, someValue); // The subject, `mySocket`, is passed in as a param called "this", explicitly
Value newData = receive(socket);
In C++, similar code might look like:
mySocket.send(someValue); // The subject, `mySocket`, is passed in as a param called "this"
Value newData = mySocket.receive();
add a comment |
up vote
1
down vote
this
is a pointer. It's like an implicit parameter that's part of every method. You could imagine using plain C functions and writing code like:
Socket makeSocket(int port) { ... }
void send(Socket *this, Value v) { ... }
Value receive(Socket *this) { ... }
Socket *mySocket = makeSocket(1234);
send(mySocket, someValue); // The subject, `mySocket`, is passed in as a param called "this", explicitly
Value newData = receive(socket);
In C++, similar code might look like:
mySocket.send(someValue); // The subject, `mySocket`, is passed in as a param called "this"
Value newData = mySocket.receive();
add a comment |
up vote
1
down vote
up vote
1
down vote
this
is a pointer. It's like an implicit parameter that's part of every method. You could imagine using plain C functions and writing code like:
Socket makeSocket(int port) { ... }
void send(Socket *this, Value v) { ... }
Value receive(Socket *this) { ... }
Socket *mySocket = makeSocket(1234);
send(mySocket, someValue); // The subject, `mySocket`, is passed in as a param called "this", explicitly
Value newData = receive(socket);
In C++, similar code might look like:
mySocket.send(someValue); // The subject, `mySocket`, is passed in as a param called "this"
Value newData = mySocket.receive();
this
is a pointer. It's like an implicit parameter that's part of every method. You could imagine using plain C functions and writing code like:
Socket makeSocket(int port) { ... }
void send(Socket *this, Value v) { ... }
Value receive(Socket *this) { ... }
Socket *mySocket = makeSocket(1234);
send(mySocket, someValue); // The subject, `mySocket`, is passed in as a param called "this", explicitly
Value newData = receive(socket);
In C++, similar code might look like:
mySocket.send(someValue); // The subject, `mySocket`, is passed in as a param called "this"
Value newData = mySocket.receive();
answered 2 days ago
Alexander
30.1k44474
30.1k44474
add a comment |
add a comment |
up vote
1
down vote
this
is indeed a runtime pointer (albeit one implicitly supplied by the compiler), as has been iterated in most answers. It is used to indicate which instance of a class a given member function is to operate on when called; for any given instance c
of class C
, when any member function cf()
is called, c.cf()
will be supplied a this
pointer equal to &c
(this naturally also applies to any struct s
of type S
, when calling member function s.sf()
, as shall be used for cleaner demonstrations). It can even be cv-qualified just as any other pointer, with the same effects (but, unfortunately, not the same syntax due to being special); this is commonly used for const
correctness, and much less frequently for volatile
correctness.
template<typename T>
uintptr_t addr_out(T* ptr) { return reinterpret_cast<uintptr_t>(ptr); }
struct S {
int i;
uintptr_t address() const { return addr_out(this); }
};
// Format a given numerical value into a hex value for easy display.
// Implementation omitted for brevity.
template<typename T>
std::string hex_out_s(T val, bool disp0X = true);
// ...
S s[2];
std::cout << "Control example: Two distinct instances of simple class.n";
std::cout << "s[0] address:tttt" << hex_out_s(addr_out(&s[0]))
<< "n* s[0] this pointer:ttt" << hex_out_s(s[0].address())
<< "nn";
std::cout << "s[1] address:tttt" << hex_out_s(addr_out(&s[1]))
<< "n* s[1] this pointer:ttt" << hex_out_s(s[1].address())
<< "nn";
Sample output:
Control example: Two distinct instances of simple class.
s[0] address: 0x0000003836e8fb40
* s[0] this pointer: 0x0000003836e8fb40
s[1] address: 0x0000003836e8fb44
* s[1] this pointer: 0x0000003836e8fb44
These values aren't guaranteed, and can easily change from one execution to the next; this can most easily be observed while creating and testing a program, through the use of build tools.
Mechanically, it's similar to a hidden parameter added to the start of each member function's argument list; x.f() cv
can be seen as a special variant of f(cv X* this)
, albeit with a different format for linguistic reasons. In fact, there were recent proposals by both Stroustrup and Sutter to unify the call syntax of x.f(y)
and f(x, y)
, which would've made this implicit behaviour an explicit linguistic rule. It unfortunately was met with concerns that it may cause a few unwanted surprises for library developers, and thus not yet implemented; to my knowledge, the most recent proposal is a joint proposal, for f(x,y)
to be able to fall back on x.f(y)
if no f(x,y)
is found, similar to the interaction between, e.g., std::begin(x)
and member function x.begin()
.
In this case, this
would be more akin to a normal pointer, and the programmer would be able to specify it manually. If a solution is found to allow the more robust form without violating the principle of least astonishment (or bringing any other concerns to pass), then an equivalent to this
would also be able to be implicitly generated as a normal pointer for non-member functions, as well.
Relatedly, one important thing to note is that this
is the instance's address, as seen by that instance; while the pointer itself is a runtime thing, it doesn't always have the value you'd think it has. This becomes relevant when looking at classes with more complex inheritance hierarchies. Specifically, when looking at cases where one or more member classes that contain member functions don't have the same address as the derived class itself. Three cases in particular come to mind:
Note that these are demonstrated using MSVC, with class layouts output via the undocumented -d1reportSingleClassLayout compiler parameter, due to me finding it more easily readable than GCC or Clang equivalents.
Non-standard layout: When a class is standard layout, the address of an instance's first data member is exactly identical to the address of the instance itself; thus,
this
can be said to be equivalent to the first data member's address. This will hold true even if said data member is a member of a base class, as long as the derived class continues to follow standard layout rules. ...Conversely, this also means that if the derived class isn't standard layout, then this is no longer guaranteed.
struct StandardBase {
int i;
uintptr_t address() const { return addr_out(this); }
};
struct NonStandardDerived : StandardBase {
virtual void f() {}
uintptr_t address() const { return addr_out(this); }
};
static_assert(std::is_standard_layout<StandardBase>::value, "Nyeh.");
static_assert(!std::is_standard_layout<NonStandardDerived>::value, ".heyN");
// ...
NonStandardDerived n;
std::cout << "Derived class with non-standard layout:"
<< "n* n address:ttttt" << hex_out_s(addr_out(&n))
<< "n* n this pointer:tttt" << hex_out_s(n.address())
<< "n* n this pointer (as StandardBase):tt" << hex_out_s(n.StandardBase::address())
<< "n* n this pointer (as NonStandardDerived):t" << hex_out_s(n.NonStandardDerived::address())
<< "nn";
Sample output:
Derived class with non-standard layout:
* n address: 0x00000061e86cf3c0
* n this pointer: 0x00000061e86cf3c0
* n this pointer (as StandardBase): 0x00000061e86cf3c8
* n this pointer (as NonStandardDerived): 0x00000061e86cf3c0
Note that
StandardBase::address()
is supplied with a differentthis
pointer thanNonStandardDerived::address()
, even when called on the same instance. This is because the latter's use of a vtable caused the compiler to insert a hidden member.
class StandardBase size(4):
+---
0 | i
+---
class NonStandardDerived size(16):
+---
0 | {vfptr}
| +--- (base class StandardBase)
8 | | i
| +---
| <alignment member> (size=4)
+---
NonStandardDerived::$vftable@:
| &NonStandardDerived_meta
| 0
0 | &NonStandardDerived::f
NonStandardDerived::f this adjustor: 0
Virtual base classes: Due to virtual bases trailing after the most-derived class, the
this
pointer supplied to a member function inherited from a virtual base will be different than the one provided to members of the derived class itself.
struct VBase {
uintptr_t address() const { return addr_out(this); }
};
struct VDerived : virtual VBase {
uintptr_t address() const { return addr_out(this); }
};
// ...
VDerived v;
std::cout << "Derived class with virtual base:"
<< "n* v address:ttttt" << hex_out_s(addr_out(&v))
<< "n* v this pointer:tttt" << hex_out_s(v.address())
<< "n* this pointer (as VBase):ttt" << hex_out_s(v.VBase::address())
<< "n* this pointer (as VDerived):ttt" << hex_out_s(v.VDerived::address())
<< "nn";
Sample output:
Derived class with virtual base:
* v address: 0x0000008f8314f8b0
* v this pointer: 0x0000008f8314f8b0
* this pointer (as VBase): 0x0000008f8314f8b8
* this pointer (as VDerived): 0x0000008f8314f8b0
Once again, the base class' member function is supplied with a different
this
pointer, due toVDerived
's inheritedVBase
having a different starting address thanVDerived
itself.
class VDerived size(8):
+---
0 | {vbptr}
+---
+--- (virtual base VBase)
+---
VDerived::$vbtable@:
0 | 0
1 | 8 (VDerivedd(VDerived+0)VBase)
vbi: class offset o.vbptr o.vbte fVtorDisp
VBase 8 0 4 0
Multiple inheritance: As can be expected, multiple inheritance can easily lead to cases where the
this
pointer passed to one member function is different than thethis
pointer passed to a different member function, even if both functions are called with the same instance. This can come up for member functions of any base class other than the first, similarly to when working with non-standard layout classes (where all base classes after the first start at a different address than the derived class itself)... but it can be especially surprising in the case ofvirtual
functions, when multiple members supply virtual functions with the same signature.
struct Base1 {
int i;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Base2 {
short s;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Derived : Base1, Base2 {
bool b;
uintptr_t address() const override { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
// ...
Derived d;
std::cout << "Derived class with multiple inheritance:"
<< "n (Calling address() through a static_cast reference, then the appropriate raw_address().)"
<< "n* d address:ttttt" << hex_out_s(addr_out(&d))
<< "n* d this pointer:tttt" << hex_out_s(d.address()) << " (" << hex_out_s(d.raw_address()) << ")"
<< "n* d this pointer (as Base1):ttt" << hex_out_s(static_cast<Base1&>((d)).address()) << " (" << hex_out_s(d.Base1::raw_address()) << ")"
<< "n* d this pointer (as Base2):ttt" << hex_out_s(static_cast<Base2&>((d)).address()) << " (" << hex_out_s(d.Base2::raw_address()) << ")"
<< "n* d this pointer (as Derived):ttt" << hex_out_s(static_cast<Derived&>((d)).address()) << " (" << hex_out_s(d.Derived::raw_address()) << ")"
<< "nn";
Sample output:
Derived class with multiple inheritance:
(Calling address() through a static_cast reference, then the appropriate raw_address().)
* d address: 0x00000056911ef530
* d this pointer: 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base1): 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base2): 0x00000056911ef530 (0x00000056911ef540)
* d this pointer (as Derived): 0x00000056911ef530 (0x00000056911ef530)
We would expect each
raw_address()
to same rules due to each explicitly being a separate function, and thus thatBase2::raw_address()
will return a different value thanDerived::raw_address()
. But since we know derived functions will always call the most-derived form, how isaddress()
correct when called from a reference toBase2
? This is due to a little compiler trickery called an "adjustor thunk", which is a helper that takes a base class instance'sthis
pointer and adjusts it to point to the most-derived class instead, when necessary.
class Derived size(40):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | i
| | <alignment member> (size=4)
| +---
| +--- (base class Base2)
16 | | {vfptr}
24 | | s
| | <alignment member> (size=6)
| +---
32 | b
| <alignment member> (size=7)
+---
Derived::$vftable@Base1@:
| &Derived_meta
| 0
0 | &Derived::address
Derived::$vftable@Base2@:
| -16
0 | &thunk: this-=16; goto Derived::address
Derived::address this adjustor: 0
If you're curious, feel free to tinker around with this little program, to take a look at how the addresses change if you run it multiple times, or at cases where it might have a different value than you may expect.
add a comment |
up vote
1
down vote
this
is indeed a runtime pointer (albeit one implicitly supplied by the compiler), as has been iterated in most answers. It is used to indicate which instance of a class a given member function is to operate on when called; for any given instance c
of class C
, when any member function cf()
is called, c.cf()
will be supplied a this
pointer equal to &c
(this naturally also applies to any struct s
of type S
, when calling member function s.sf()
, as shall be used for cleaner demonstrations). It can even be cv-qualified just as any other pointer, with the same effects (but, unfortunately, not the same syntax due to being special); this is commonly used for const
correctness, and much less frequently for volatile
correctness.
template<typename T>
uintptr_t addr_out(T* ptr) { return reinterpret_cast<uintptr_t>(ptr); }
struct S {
int i;
uintptr_t address() const { return addr_out(this); }
};
// Format a given numerical value into a hex value for easy display.
// Implementation omitted for brevity.
template<typename T>
std::string hex_out_s(T val, bool disp0X = true);
// ...
S s[2];
std::cout << "Control example: Two distinct instances of simple class.n";
std::cout << "s[0] address:tttt" << hex_out_s(addr_out(&s[0]))
<< "n* s[0] this pointer:ttt" << hex_out_s(s[0].address())
<< "nn";
std::cout << "s[1] address:tttt" << hex_out_s(addr_out(&s[1]))
<< "n* s[1] this pointer:ttt" << hex_out_s(s[1].address())
<< "nn";
Sample output:
Control example: Two distinct instances of simple class.
s[0] address: 0x0000003836e8fb40
* s[0] this pointer: 0x0000003836e8fb40
s[1] address: 0x0000003836e8fb44
* s[1] this pointer: 0x0000003836e8fb44
These values aren't guaranteed, and can easily change from one execution to the next; this can most easily be observed while creating and testing a program, through the use of build tools.
Mechanically, it's similar to a hidden parameter added to the start of each member function's argument list; x.f() cv
can be seen as a special variant of f(cv X* this)
, albeit with a different format for linguistic reasons. In fact, there were recent proposals by both Stroustrup and Sutter to unify the call syntax of x.f(y)
and f(x, y)
, which would've made this implicit behaviour an explicit linguistic rule. It unfortunately was met with concerns that it may cause a few unwanted surprises for library developers, and thus not yet implemented; to my knowledge, the most recent proposal is a joint proposal, for f(x,y)
to be able to fall back on x.f(y)
if no f(x,y)
is found, similar to the interaction between, e.g., std::begin(x)
and member function x.begin()
.
In this case, this
would be more akin to a normal pointer, and the programmer would be able to specify it manually. If a solution is found to allow the more robust form without violating the principle of least astonishment (or bringing any other concerns to pass), then an equivalent to this
would also be able to be implicitly generated as a normal pointer for non-member functions, as well.
Relatedly, one important thing to note is that this
is the instance's address, as seen by that instance; while the pointer itself is a runtime thing, it doesn't always have the value you'd think it has. This becomes relevant when looking at classes with more complex inheritance hierarchies. Specifically, when looking at cases where one or more member classes that contain member functions don't have the same address as the derived class itself. Three cases in particular come to mind:
Note that these are demonstrated using MSVC, with class layouts output via the undocumented -d1reportSingleClassLayout compiler parameter, due to me finding it more easily readable than GCC or Clang equivalents.
Non-standard layout: When a class is standard layout, the address of an instance's first data member is exactly identical to the address of the instance itself; thus,
this
can be said to be equivalent to the first data member's address. This will hold true even if said data member is a member of a base class, as long as the derived class continues to follow standard layout rules. ...Conversely, this also means that if the derived class isn't standard layout, then this is no longer guaranteed.
struct StandardBase {
int i;
uintptr_t address() const { return addr_out(this); }
};
struct NonStandardDerived : StandardBase {
virtual void f() {}
uintptr_t address() const { return addr_out(this); }
};
static_assert(std::is_standard_layout<StandardBase>::value, "Nyeh.");
static_assert(!std::is_standard_layout<NonStandardDerived>::value, ".heyN");
// ...
NonStandardDerived n;
std::cout << "Derived class with non-standard layout:"
<< "n* n address:ttttt" << hex_out_s(addr_out(&n))
<< "n* n this pointer:tttt" << hex_out_s(n.address())
<< "n* n this pointer (as StandardBase):tt" << hex_out_s(n.StandardBase::address())
<< "n* n this pointer (as NonStandardDerived):t" << hex_out_s(n.NonStandardDerived::address())
<< "nn";
Sample output:
Derived class with non-standard layout:
* n address: 0x00000061e86cf3c0
* n this pointer: 0x00000061e86cf3c0
* n this pointer (as StandardBase): 0x00000061e86cf3c8
* n this pointer (as NonStandardDerived): 0x00000061e86cf3c0
Note that
StandardBase::address()
is supplied with a differentthis
pointer thanNonStandardDerived::address()
, even when called on the same instance. This is because the latter's use of a vtable caused the compiler to insert a hidden member.
class StandardBase size(4):
+---
0 | i
+---
class NonStandardDerived size(16):
+---
0 | {vfptr}
| +--- (base class StandardBase)
8 | | i
| +---
| <alignment member> (size=4)
+---
NonStandardDerived::$vftable@:
| &NonStandardDerived_meta
| 0
0 | &NonStandardDerived::f
NonStandardDerived::f this adjustor: 0
Virtual base classes: Due to virtual bases trailing after the most-derived class, the
this
pointer supplied to a member function inherited from a virtual base will be different than the one provided to members of the derived class itself.
struct VBase {
uintptr_t address() const { return addr_out(this); }
};
struct VDerived : virtual VBase {
uintptr_t address() const { return addr_out(this); }
};
// ...
VDerived v;
std::cout << "Derived class with virtual base:"
<< "n* v address:ttttt" << hex_out_s(addr_out(&v))
<< "n* v this pointer:tttt" << hex_out_s(v.address())
<< "n* this pointer (as VBase):ttt" << hex_out_s(v.VBase::address())
<< "n* this pointer (as VDerived):ttt" << hex_out_s(v.VDerived::address())
<< "nn";
Sample output:
Derived class with virtual base:
* v address: 0x0000008f8314f8b0
* v this pointer: 0x0000008f8314f8b0
* this pointer (as VBase): 0x0000008f8314f8b8
* this pointer (as VDerived): 0x0000008f8314f8b0
Once again, the base class' member function is supplied with a different
this
pointer, due toVDerived
's inheritedVBase
having a different starting address thanVDerived
itself.
class VDerived size(8):
+---
0 | {vbptr}
+---
+--- (virtual base VBase)
+---
VDerived::$vbtable@:
0 | 0
1 | 8 (VDerivedd(VDerived+0)VBase)
vbi: class offset o.vbptr o.vbte fVtorDisp
VBase 8 0 4 0
Multiple inheritance: As can be expected, multiple inheritance can easily lead to cases where the
this
pointer passed to one member function is different than thethis
pointer passed to a different member function, even if both functions are called with the same instance. This can come up for member functions of any base class other than the first, similarly to when working with non-standard layout classes (where all base classes after the first start at a different address than the derived class itself)... but it can be especially surprising in the case ofvirtual
functions, when multiple members supply virtual functions with the same signature.
struct Base1 {
int i;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Base2 {
short s;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Derived : Base1, Base2 {
bool b;
uintptr_t address() const override { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
// ...
Derived d;
std::cout << "Derived class with multiple inheritance:"
<< "n (Calling address() through a static_cast reference, then the appropriate raw_address().)"
<< "n* d address:ttttt" << hex_out_s(addr_out(&d))
<< "n* d this pointer:tttt" << hex_out_s(d.address()) << " (" << hex_out_s(d.raw_address()) << ")"
<< "n* d this pointer (as Base1):ttt" << hex_out_s(static_cast<Base1&>((d)).address()) << " (" << hex_out_s(d.Base1::raw_address()) << ")"
<< "n* d this pointer (as Base2):ttt" << hex_out_s(static_cast<Base2&>((d)).address()) << " (" << hex_out_s(d.Base2::raw_address()) << ")"
<< "n* d this pointer (as Derived):ttt" << hex_out_s(static_cast<Derived&>((d)).address()) << " (" << hex_out_s(d.Derived::raw_address()) << ")"
<< "nn";
Sample output:
Derived class with multiple inheritance:
(Calling address() through a static_cast reference, then the appropriate raw_address().)
* d address: 0x00000056911ef530
* d this pointer: 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base1): 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base2): 0x00000056911ef530 (0x00000056911ef540)
* d this pointer (as Derived): 0x00000056911ef530 (0x00000056911ef530)
We would expect each
raw_address()
to same rules due to each explicitly being a separate function, and thus thatBase2::raw_address()
will return a different value thanDerived::raw_address()
. But since we know derived functions will always call the most-derived form, how isaddress()
correct when called from a reference toBase2
? This is due to a little compiler trickery called an "adjustor thunk", which is a helper that takes a base class instance'sthis
pointer and adjusts it to point to the most-derived class instead, when necessary.
class Derived size(40):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | i
| | <alignment member> (size=4)
| +---
| +--- (base class Base2)
16 | | {vfptr}
24 | | s
| | <alignment member> (size=6)
| +---
32 | b
| <alignment member> (size=7)
+---
Derived::$vftable@Base1@:
| &Derived_meta
| 0
0 | &Derived::address
Derived::$vftable@Base2@:
| -16
0 | &thunk: this-=16; goto Derived::address
Derived::address this adjustor: 0
If you're curious, feel free to tinker around with this little program, to take a look at how the addresses change if you run it multiple times, or at cases where it might have a different value than you may expect.
add a comment |
up vote
1
down vote
up vote
1
down vote
this
is indeed a runtime pointer (albeit one implicitly supplied by the compiler), as has been iterated in most answers. It is used to indicate which instance of a class a given member function is to operate on when called; for any given instance c
of class C
, when any member function cf()
is called, c.cf()
will be supplied a this
pointer equal to &c
(this naturally also applies to any struct s
of type S
, when calling member function s.sf()
, as shall be used for cleaner demonstrations). It can even be cv-qualified just as any other pointer, with the same effects (but, unfortunately, not the same syntax due to being special); this is commonly used for const
correctness, and much less frequently for volatile
correctness.
template<typename T>
uintptr_t addr_out(T* ptr) { return reinterpret_cast<uintptr_t>(ptr); }
struct S {
int i;
uintptr_t address() const { return addr_out(this); }
};
// Format a given numerical value into a hex value for easy display.
// Implementation omitted for brevity.
template<typename T>
std::string hex_out_s(T val, bool disp0X = true);
// ...
S s[2];
std::cout << "Control example: Two distinct instances of simple class.n";
std::cout << "s[0] address:tttt" << hex_out_s(addr_out(&s[0]))
<< "n* s[0] this pointer:ttt" << hex_out_s(s[0].address())
<< "nn";
std::cout << "s[1] address:tttt" << hex_out_s(addr_out(&s[1]))
<< "n* s[1] this pointer:ttt" << hex_out_s(s[1].address())
<< "nn";
Sample output:
Control example: Two distinct instances of simple class.
s[0] address: 0x0000003836e8fb40
* s[0] this pointer: 0x0000003836e8fb40
s[1] address: 0x0000003836e8fb44
* s[1] this pointer: 0x0000003836e8fb44
These values aren't guaranteed, and can easily change from one execution to the next; this can most easily be observed while creating and testing a program, through the use of build tools.
Mechanically, it's similar to a hidden parameter added to the start of each member function's argument list; x.f() cv
can be seen as a special variant of f(cv X* this)
, albeit with a different format for linguistic reasons. In fact, there were recent proposals by both Stroustrup and Sutter to unify the call syntax of x.f(y)
and f(x, y)
, which would've made this implicit behaviour an explicit linguistic rule. It unfortunately was met with concerns that it may cause a few unwanted surprises for library developers, and thus not yet implemented; to my knowledge, the most recent proposal is a joint proposal, for f(x,y)
to be able to fall back on x.f(y)
if no f(x,y)
is found, similar to the interaction between, e.g., std::begin(x)
and member function x.begin()
.
In this case, this
would be more akin to a normal pointer, and the programmer would be able to specify it manually. If a solution is found to allow the more robust form without violating the principle of least astonishment (or bringing any other concerns to pass), then an equivalent to this
would also be able to be implicitly generated as a normal pointer for non-member functions, as well.
Relatedly, one important thing to note is that this
is the instance's address, as seen by that instance; while the pointer itself is a runtime thing, it doesn't always have the value you'd think it has. This becomes relevant when looking at classes with more complex inheritance hierarchies. Specifically, when looking at cases where one or more member classes that contain member functions don't have the same address as the derived class itself. Three cases in particular come to mind:
Note that these are demonstrated using MSVC, with class layouts output via the undocumented -d1reportSingleClassLayout compiler parameter, due to me finding it more easily readable than GCC or Clang equivalents.
Non-standard layout: When a class is standard layout, the address of an instance's first data member is exactly identical to the address of the instance itself; thus,
this
can be said to be equivalent to the first data member's address. This will hold true even if said data member is a member of a base class, as long as the derived class continues to follow standard layout rules. ...Conversely, this also means that if the derived class isn't standard layout, then this is no longer guaranteed.
struct StandardBase {
int i;
uintptr_t address() const { return addr_out(this); }
};
struct NonStandardDerived : StandardBase {
virtual void f() {}
uintptr_t address() const { return addr_out(this); }
};
static_assert(std::is_standard_layout<StandardBase>::value, "Nyeh.");
static_assert(!std::is_standard_layout<NonStandardDerived>::value, ".heyN");
// ...
NonStandardDerived n;
std::cout << "Derived class with non-standard layout:"
<< "n* n address:ttttt" << hex_out_s(addr_out(&n))
<< "n* n this pointer:tttt" << hex_out_s(n.address())
<< "n* n this pointer (as StandardBase):tt" << hex_out_s(n.StandardBase::address())
<< "n* n this pointer (as NonStandardDerived):t" << hex_out_s(n.NonStandardDerived::address())
<< "nn";
Sample output:
Derived class with non-standard layout:
* n address: 0x00000061e86cf3c0
* n this pointer: 0x00000061e86cf3c0
* n this pointer (as StandardBase): 0x00000061e86cf3c8
* n this pointer (as NonStandardDerived): 0x00000061e86cf3c0
Note that
StandardBase::address()
is supplied with a differentthis
pointer thanNonStandardDerived::address()
, even when called on the same instance. This is because the latter's use of a vtable caused the compiler to insert a hidden member.
class StandardBase size(4):
+---
0 | i
+---
class NonStandardDerived size(16):
+---
0 | {vfptr}
| +--- (base class StandardBase)
8 | | i
| +---
| <alignment member> (size=4)
+---
NonStandardDerived::$vftable@:
| &NonStandardDerived_meta
| 0
0 | &NonStandardDerived::f
NonStandardDerived::f this adjustor: 0
Virtual base classes: Due to virtual bases trailing after the most-derived class, the
this
pointer supplied to a member function inherited from a virtual base will be different than the one provided to members of the derived class itself.
struct VBase {
uintptr_t address() const { return addr_out(this); }
};
struct VDerived : virtual VBase {
uintptr_t address() const { return addr_out(this); }
};
// ...
VDerived v;
std::cout << "Derived class with virtual base:"
<< "n* v address:ttttt" << hex_out_s(addr_out(&v))
<< "n* v this pointer:tttt" << hex_out_s(v.address())
<< "n* this pointer (as VBase):ttt" << hex_out_s(v.VBase::address())
<< "n* this pointer (as VDerived):ttt" << hex_out_s(v.VDerived::address())
<< "nn";
Sample output:
Derived class with virtual base:
* v address: 0x0000008f8314f8b0
* v this pointer: 0x0000008f8314f8b0
* this pointer (as VBase): 0x0000008f8314f8b8
* this pointer (as VDerived): 0x0000008f8314f8b0
Once again, the base class' member function is supplied with a different
this
pointer, due toVDerived
's inheritedVBase
having a different starting address thanVDerived
itself.
class VDerived size(8):
+---
0 | {vbptr}
+---
+--- (virtual base VBase)
+---
VDerived::$vbtable@:
0 | 0
1 | 8 (VDerivedd(VDerived+0)VBase)
vbi: class offset o.vbptr o.vbte fVtorDisp
VBase 8 0 4 0
Multiple inheritance: As can be expected, multiple inheritance can easily lead to cases where the
this
pointer passed to one member function is different than thethis
pointer passed to a different member function, even if both functions are called with the same instance. This can come up for member functions of any base class other than the first, similarly to when working with non-standard layout classes (where all base classes after the first start at a different address than the derived class itself)... but it can be especially surprising in the case ofvirtual
functions, when multiple members supply virtual functions with the same signature.
struct Base1 {
int i;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Base2 {
short s;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Derived : Base1, Base2 {
bool b;
uintptr_t address() const override { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
// ...
Derived d;
std::cout << "Derived class with multiple inheritance:"
<< "n (Calling address() through a static_cast reference, then the appropriate raw_address().)"
<< "n* d address:ttttt" << hex_out_s(addr_out(&d))
<< "n* d this pointer:tttt" << hex_out_s(d.address()) << " (" << hex_out_s(d.raw_address()) << ")"
<< "n* d this pointer (as Base1):ttt" << hex_out_s(static_cast<Base1&>((d)).address()) << " (" << hex_out_s(d.Base1::raw_address()) << ")"
<< "n* d this pointer (as Base2):ttt" << hex_out_s(static_cast<Base2&>((d)).address()) << " (" << hex_out_s(d.Base2::raw_address()) << ")"
<< "n* d this pointer (as Derived):ttt" << hex_out_s(static_cast<Derived&>((d)).address()) << " (" << hex_out_s(d.Derived::raw_address()) << ")"
<< "nn";
Sample output:
Derived class with multiple inheritance:
(Calling address() through a static_cast reference, then the appropriate raw_address().)
* d address: 0x00000056911ef530
* d this pointer: 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base1): 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base2): 0x00000056911ef530 (0x00000056911ef540)
* d this pointer (as Derived): 0x00000056911ef530 (0x00000056911ef530)
We would expect each
raw_address()
to same rules due to each explicitly being a separate function, and thus thatBase2::raw_address()
will return a different value thanDerived::raw_address()
. But since we know derived functions will always call the most-derived form, how isaddress()
correct when called from a reference toBase2
? This is due to a little compiler trickery called an "adjustor thunk", which is a helper that takes a base class instance'sthis
pointer and adjusts it to point to the most-derived class instead, when necessary.
class Derived size(40):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | i
| | <alignment member> (size=4)
| +---
| +--- (base class Base2)
16 | | {vfptr}
24 | | s
| | <alignment member> (size=6)
| +---
32 | b
| <alignment member> (size=7)
+---
Derived::$vftable@Base1@:
| &Derived_meta
| 0
0 | &Derived::address
Derived::$vftable@Base2@:
| -16
0 | &thunk: this-=16; goto Derived::address
Derived::address this adjustor: 0
If you're curious, feel free to tinker around with this little program, to take a look at how the addresses change if you run it multiple times, or at cases where it might have a different value than you may expect.
this
is indeed a runtime pointer (albeit one implicitly supplied by the compiler), as has been iterated in most answers. It is used to indicate which instance of a class a given member function is to operate on when called; for any given instance c
of class C
, when any member function cf()
is called, c.cf()
will be supplied a this
pointer equal to &c
(this naturally also applies to any struct s
of type S
, when calling member function s.sf()
, as shall be used for cleaner demonstrations). It can even be cv-qualified just as any other pointer, with the same effects (but, unfortunately, not the same syntax due to being special); this is commonly used for const
correctness, and much less frequently for volatile
correctness.
template<typename T>
uintptr_t addr_out(T* ptr) { return reinterpret_cast<uintptr_t>(ptr); }
struct S {
int i;
uintptr_t address() const { return addr_out(this); }
};
// Format a given numerical value into a hex value for easy display.
// Implementation omitted for brevity.
template<typename T>
std::string hex_out_s(T val, bool disp0X = true);
// ...
S s[2];
std::cout << "Control example: Two distinct instances of simple class.n";
std::cout << "s[0] address:tttt" << hex_out_s(addr_out(&s[0]))
<< "n* s[0] this pointer:ttt" << hex_out_s(s[0].address())
<< "nn";
std::cout << "s[1] address:tttt" << hex_out_s(addr_out(&s[1]))
<< "n* s[1] this pointer:ttt" << hex_out_s(s[1].address())
<< "nn";
Sample output:
Control example: Two distinct instances of simple class.
s[0] address: 0x0000003836e8fb40
* s[0] this pointer: 0x0000003836e8fb40
s[1] address: 0x0000003836e8fb44
* s[1] this pointer: 0x0000003836e8fb44
These values aren't guaranteed, and can easily change from one execution to the next; this can most easily be observed while creating and testing a program, through the use of build tools.
Mechanically, it's similar to a hidden parameter added to the start of each member function's argument list; x.f() cv
can be seen as a special variant of f(cv X* this)
, albeit with a different format for linguistic reasons. In fact, there were recent proposals by both Stroustrup and Sutter to unify the call syntax of x.f(y)
and f(x, y)
, which would've made this implicit behaviour an explicit linguistic rule. It unfortunately was met with concerns that it may cause a few unwanted surprises for library developers, and thus not yet implemented; to my knowledge, the most recent proposal is a joint proposal, for f(x,y)
to be able to fall back on x.f(y)
if no f(x,y)
is found, similar to the interaction between, e.g., std::begin(x)
and member function x.begin()
.
In this case, this
would be more akin to a normal pointer, and the programmer would be able to specify it manually. If a solution is found to allow the more robust form without violating the principle of least astonishment (or bringing any other concerns to pass), then an equivalent to this
would also be able to be implicitly generated as a normal pointer for non-member functions, as well.
Relatedly, one important thing to note is that this
is the instance's address, as seen by that instance; while the pointer itself is a runtime thing, it doesn't always have the value you'd think it has. This becomes relevant when looking at classes with more complex inheritance hierarchies. Specifically, when looking at cases where one or more member classes that contain member functions don't have the same address as the derived class itself. Three cases in particular come to mind:
Note that these are demonstrated using MSVC, with class layouts output via the undocumented -d1reportSingleClassLayout compiler parameter, due to me finding it more easily readable than GCC or Clang equivalents.
Non-standard layout: When a class is standard layout, the address of an instance's first data member is exactly identical to the address of the instance itself; thus,
this
can be said to be equivalent to the first data member's address. This will hold true even if said data member is a member of a base class, as long as the derived class continues to follow standard layout rules. ...Conversely, this also means that if the derived class isn't standard layout, then this is no longer guaranteed.
struct StandardBase {
int i;
uintptr_t address() const { return addr_out(this); }
};
struct NonStandardDerived : StandardBase {
virtual void f() {}
uintptr_t address() const { return addr_out(this); }
};
static_assert(std::is_standard_layout<StandardBase>::value, "Nyeh.");
static_assert(!std::is_standard_layout<NonStandardDerived>::value, ".heyN");
// ...
NonStandardDerived n;
std::cout << "Derived class with non-standard layout:"
<< "n* n address:ttttt" << hex_out_s(addr_out(&n))
<< "n* n this pointer:tttt" << hex_out_s(n.address())
<< "n* n this pointer (as StandardBase):tt" << hex_out_s(n.StandardBase::address())
<< "n* n this pointer (as NonStandardDerived):t" << hex_out_s(n.NonStandardDerived::address())
<< "nn";
Sample output:
Derived class with non-standard layout:
* n address: 0x00000061e86cf3c0
* n this pointer: 0x00000061e86cf3c0
* n this pointer (as StandardBase): 0x00000061e86cf3c8
* n this pointer (as NonStandardDerived): 0x00000061e86cf3c0
Note that
StandardBase::address()
is supplied with a differentthis
pointer thanNonStandardDerived::address()
, even when called on the same instance. This is because the latter's use of a vtable caused the compiler to insert a hidden member.
class StandardBase size(4):
+---
0 | i
+---
class NonStandardDerived size(16):
+---
0 | {vfptr}
| +--- (base class StandardBase)
8 | | i
| +---
| <alignment member> (size=4)
+---
NonStandardDerived::$vftable@:
| &NonStandardDerived_meta
| 0
0 | &NonStandardDerived::f
NonStandardDerived::f this adjustor: 0
Virtual base classes: Due to virtual bases trailing after the most-derived class, the
this
pointer supplied to a member function inherited from a virtual base will be different than the one provided to members of the derived class itself.
struct VBase {
uintptr_t address() const { return addr_out(this); }
};
struct VDerived : virtual VBase {
uintptr_t address() const { return addr_out(this); }
};
// ...
VDerived v;
std::cout << "Derived class with virtual base:"
<< "n* v address:ttttt" << hex_out_s(addr_out(&v))
<< "n* v this pointer:tttt" << hex_out_s(v.address())
<< "n* this pointer (as VBase):ttt" << hex_out_s(v.VBase::address())
<< "n* this pointer (as VDerived):ttt" << hex_out_s(v.VDerived::address())
<< "nn";
Sample output:
Derived class with virtual base:
* v address: 0x0000008f8314f8b0
* v this pointer: 0x0000008f8314f8b0
* this pointer (as VBase): 0x0000008f8314f8b8
* this pointer (as VDerived): 0x0000008f8314f8b0
Once again, the base class' member function is supplied with a different
this
pointer, due toVDerived
's inheritedVBase
having a different starting address thanVDerived
itself.
class VDerived size(8):
+---
0 | {vbptr}
+---
+--- (virtual base VBase)
+---
VDerived::$vbtable@:
0 | 0
1 | 8 (VDerivedd(VDerived+0)VBase)
vbi: class offset o.vbptr o.vbte fVtorDisp
VBase 8 0 4 0
Multiple inheritance: As can be expected, multiple inheritance can easily lead to cases where the
this
pointer passed to one member function is different than thethis
pointer passed to a different member function, even if both functions are called with the same instance. This can come up for member functions of any base class other than the first, similarly to when working with non-standard layout classes (where all base classes after the first start at a different address than the derived class itself)... but it can be especially surprising in the case ofvirtual
functions, when multiple members supply virtual functions with the same signature.
struct Base1 {
int i;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Base2 {
short s;
virtual uintptr_t address() const { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
struct Derived : Base1, Base2 {
bool b;
uintptr_t address() const override { return addr_out(this); }
uintptr_t raw_address() { return addr_out(this); }
};
// ...
Derived d;
std::cout << "Derived class with multiple inheritance:"
<< "n (Calling address() through a static_cast reference, then the appropriate raw_address().)"
<< "n* d address:ttttt" << hex_out_s(addr_out(&d))
<< "n* d this pointer:tttt" << hex_out_s(d.address()) << " (" << hex_out_s(d.raw_address()) << ")"
<< "n* d this pointer (as Base1):ttt" << hex_out_s(static_cast<Base1&>((d)).address()) << " (" << hex_out_s(d.Base1::raw_address()) << ")"
<< "n* d this pointer (as Base2):ttt" << hex_out_s(static_cast<Base2&>((d)).address()) << " (" << hex_out_s(d.Base2::raw_address()) << ")"
<< "n* d this pointer (as Derived):ttt" << hex_out_s(static_cast<Derived&>((d)).address()) << " (" << hex_out_s(d.Derived::raw_address()) << ")"
<< "nn";
Sample output:
Derived class with multiple inheritance:
(Calling address() through a static_cast reference, then the appropriate raw_address().)
* d address: 0x00000056911ef530
* d this pointer: 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base1): 0x00000056911ef530 (0x00000056911ef530)
* d this pointer (as Base2): 0x00000056911ef530 (0x00000056911ef540)
* d this pointer (as Derived): 0x00000056911ef530 (0x00000056911ef530)
We would expect each
raw_address()
to same rules due to each explicitly being a separate function, and thus thatBase2::raw_address()
will return a different value thanDerived::raw_address()
. But since we know derived functions will always call the most-derived form, how isaddress()
correct when called from a reference toBase2
? This is due to a little compiler trickery called an "adjustor thunk", which is a helper that takes a base class instance'sthis
pointer and adjusts it to point to the most-derived class instead, when necessary.
class Derived size(40):
+---
| +--- (base class Base1)
0 | | {vfptr}
8 | | i
| | <alignment member> (size=4)
| +---
| +--- (base class Base2)
16 | | {vfptr}
24 | | s
| | <alignment member> (size=6)
| +---
32 | b
| <alignment member> (size=7)
+---
Derived::$vftable@Base1@:
| &Derived_meta
| 0
0 | &Derived::address
Derived::$vftable@Base2@:
| -16
0 | &thunk: this-=16; goto Derived::address
Derived::address this adjustor: 0
If you're curious, feel free to tinker around with this little program, to take a look at how the addresses change if you run it multiple times, or at cases where it might have a different value than you may expect.
answered yesterday
Justin Time
2,9621328
2,9621328
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53264848%2fis-the-this-pointer-just-a-compile-time-thing%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
Possible duplicate of Is there overhead using this-> in c++?
– underscore_d
Nov 12 at 19:04
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
2 days ago