Posted on Jan 5, 2021

Assingment vs. Initialization in C++

This is going to be my first post so I chose a rather simple concept: Assignment vs Initialization in C++. I will try to keep the post as practical as possible and share keywords in case the reader wants to do in-depth research. So buckle up and enjoy the ride folks!

int x; // Define x x = 3; // Assign 3 to x int y{3}; // Define and initialize y with 3

Above statements cause both x and y variables to have a value of 3, which leads to the common pitfall that they are identical.

Let's wear ISO C++ Standards Committee hat
According to C++20 standards, which is recently published, initialization is explained in "9.4 Initializers" section; whereas assignment is explained in "11.4.5 Assignment operator" section. How dare you call them identical, you peasant!

That was a little bit harsh. Perhaps we should wear C++ compiler implementer hat
Gcc 10.2 produces identical output for below codes with or without optimizations.

int getX(){ int x {3}; return x; } int getY(){ int y; y = 3; return y; }

get(): push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 3 mov eax, DWORD PTR [rbp-4] pop rbp ret

That was a bit anticlimactic, I guess. My guess is if data type is scalar, compiler can directly assign value as cppreference.com suggests but I couldn't find the relevant section (direct assignment) on c++ standard. Perhaps we should try a non-scalar data type. For example std::string.

#include <string> std::string getX(){ std::string x {"bugra"}; return x; } std::string getY(){ std::string x; x = "bugra"; return x; }

Let's see what GCC 10.2 produce for getX and getY with optimizations enabled.

getX[abi:cxx11](): lea rdx, [rdi+16] mov BYTE PTR [rdi+20], 97 mov rax, rdi mov QWORD PTR [rdi], rdx mov DWORD PTR [rdi+16], 1919382882 mov QWORD PTR [rdi+8], 5 mov BYTE PTR [rdi+21], 0 ret

.LC0: .string "bugra" getY[abi:cxx11](): push r12 mov r8d, 5 mov ecx, OFFSET FLAT:.LC0 xor edx, edx push rbp xor esi, esi mov r12, rdi push rbx lea rbx, [rdi+16] mov QWORD PTR [rdi], rbx mov QWORD PTR [rdi+8], 0 mov BYTE PTR [rdi+16], 0 call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) mov rax, r12 pop rbx pop rbp pop r12 ret mov rbp, rax jmp .L2 getY[abi:cxx11]() [clone .cold]:

I am no expert but I think getX (initializing method) is a lot better than getY (assignment method). Since we established that assignment and initializing can cause different outputs to be produced, let's try to understand the difference between them. We need to wear the formal hat again.

C++20 standard "6.7.7 Temporary objects" section states that the expression a = f() requires a temporary object for the result of f(), which is materialized so that the reference parameter of X::operator=(const X&) can bind to it.

Cpp Core Guidelines also advices to prefer initalization to assignment.

Let's finish with a regular developer hat

TLDR: Prefer initialization to assignment!

class A { // Good string s1; public: A(string p) : s1{p} { } // GOOD: directly construct  }; class B { // BAD string s1; public: B(const char* p) { s1 = p; } // BAD: default constructor followed by assignment };

Keywords: copy assignment, Builtin direct assignment for scalar types, copy initialization, direct initialization, list-initialization, temporary objects

ISO C++20 Standard

Top comments (3)

Sandor Dargo • Jan 6 '21 • Edited

Just to complement on this part:

C++20 standard "6.7.7 Temporary objects" section states that the expression a = f() requires a temporary object for the result of f(), which is materialized so that the reference parameter of X::operator=(const X&) can bind to it.

That's not the only problem we face in getY for the string example.

 std::string x; x = "bugra";

In fact, we have to understand how local variables are initialized. When you create a local integer (int i;), memory is reserved for the new variable, but primitive data types are not default initialized. They will hold whatever value that they find in the allocated memory (on the stack).

On the other hand, std::string is not a primitive data type and objects are default initialized given that they have a default constructor. If they don't have, well, such code wouldn't compile.

class MyClass { public: explicit MyClass(int num) : m_num(num) {} private: int m_num; }; int main() { MyClass mc; // ERROR: no matching function for call to 'MyClass::MyClass()' }

So getting back to getY for the string example.
The line std::string x; creates a variable x which is initialized to an empty string. Then on the next line x = "bugra"; you assign a new value to x. x is assigned twice! (The integer i was assigned only once!)

It's yet another problem that "bugra" is not a string. It's a const char* that first have to be - implicitly - converted to a std::string and you pay for it. Hence the immense difference in the ASM code. If we want to avoid that cost, and we have access to C++14, we should use a string literal.

Then the generated ASM code becomes similar:

But even with whatever optimization turned on, there is no reason in similar circumstances to split declaration from initialization. For example, it doesn't let you declare your variables const.

Here is a great talk on this

Buğra Hasbek • Jan 6 '21

Thanks for the great feedback Sandor. I really appreciate it :)

Maresia • Jan 6 '21

Cool, before I got involved with move semantics I had no idea of the difference between assignment and initialization. I believe that many people don't even try to understand :3
constructor -> initialization
operator= -> assignment

[google translator]