Optimization and multithreading in B.Stroustrup's new book

Please refer to section 41.2.2 Instruction Reordering of "TCPL" 4th edition by B.Stroustrup, which I transcribe below:

To gain performance, compilers, optimizers, and hardware reorder instructions. Consider:

// thread 1:
int x;
bool x_init;
void init()
{
    x = initialize(); // no use of x_init in initialize()
    x_init = true;
    // ...
}

For this piece of code there is no stated reason to assign to x before assigning to x_init. The optimizer (or the hardware instruction scheduler) may decide to speed up the program by executing x_init = true first. We probably meant for x_init to indicate whether x had been initialized by initializer() or not. However, we did not say that, so the hardware, the compiler, and the optimizer do not know that.

Add another thread to the program:

// thread 2:
extern int x;
extern bool x_init;
void f2()
{
    int y;
    while (!x_init) // if necessary, wait for initialization to complete
    this_thread::sleep_for(milliseconds{10});
    y = x;
    // ...
}

Now we have a problem: thread 2 may never wait and thus will assign an uninitialized x to y. Even if thread 1 did not set x_init and x in ''the wrong order,'' we still may have a problem. In thread 2, there are no assignments to x_init, so an optimizer may decide to lift the evaluation of !x_init out of the loop, so that thread 2 either never sleeps or sleeps forever.

  • Does the Standard allow the reordering in thread 1? (some quote from the Standard would be forthcoming) Why would that speed up the program?
  • Both answers in this discussion on SO seem to indicate that no such optimization occurs when there are global variables in the code, as x_init above.
  • What does the author mean by "to lift the evaluation of !x_init out of the loop"? Is this something like this?

    if( !x_init ) while(true) this_thread::sleep_for(milliseconds{10});
    
    y = x;
    

  • This is not so much a issue of the C++ compiler/standard, but that of modern CPUs. Have a look here. The compiler isn't going to emit memory barrier instructions between the assignments of x and x_init unless you tell it to.

    For what it is worth, prior to C++11, the standard had no notion of multi threading in it's abstract machine model. Things are a bit nicer these days.


  • The C++11 standard does not "allow" or "prevent" reordering. It specifies some way to force a specific "barrier" that, it turns, prevent the compiler to move instructions before/after them. A compiler, in this example might reorder the assignment because it might be more efficient on a CPU with multiple computing unit (ALU/Hyperthreading/etc...) even with a single core. Typically, if your CPU has 2 ALU that can work in parallel, there is no reason the compiler would not try to feed them with as much work as it can. I'm not speaking of out-of-order reordering of the CPU instructions that's done internally in Intel CPU (for example), but compile time ordering to ensure all the computing resources are busy doing some work.

  • I think it depends on the compiler compilation flags. Typically unless you tell it to, the compiler must assume that another compilation unit (say B.cpp, which is not visible at compile time) can have a "extern bool x_init", and can change it at any time. Then, the re-ordering optimization would break with the expected behavior (B can define the initialize() function). This example is trivial and unlikely to break. The linked SO answer are not related to this "optimization", but simply that, in their case, the compiler can not make the assumption that the global array is not modified externally, and as such can not make the optimization. This is not like your example.

  • Yes. It's a very common optimization trick, instead of:

  • // test is a bool

    for (int i = 0; i < 345; i++) {
       if (test) do_something();
    }
    

    The compiler might do:

    if (test) for(int i = 0; i < 345; i++) { do_something(); }
    

    And save 344 useless tests.

    链接地址: http://www.djcxy.com/p/77498.html

    上一篇: 使用dimple.js,rCharts情节不会出现在闪亮的应用中

    下一篇: B.Stroustrup新书中的优化和多线程