byte aligned and equal to cache line size

2018-06-19 04:49:44

Possible Duplicate:
Is there any guarantee of alignment of address return by C++'s new operation?

In this program, i am printing each address returned by new for unsigned chars. Then deleting them backwards in the end.

#include "stdafx.h"
#include<stdlib.h>
void func();

int main()
{
    int i=10;
    while(i-->0)printf("loaded %i n", (new unsigned char));
    getchar();
    unsigned char *p=new unsigned char;printf("last pointer loaded %i n", p);
    i=10;
    while(i-->0)delete (p-=64);
    getchar();
    p+=640;
    delete p;//nearly forgot to delete this ^^
    return 0;
}

output:

As you can see, each new returns 64-byte aligned data.

Question: Is this 64-Byte being equal to cache-line size or just a compiler thing?

Question: Should i make my structures at mostly 64-bytes long?

Question: will this be different when i change my cpu, ram, OS or compiler?

Pentium-m, VC++ 2010 express, windows-xp

Thanks.

The implementation choices for a heap manager make a lot more sense when you consider what happens after a large number of allocations and deallocations.

A call to malloc() needs to locate a block of unused block of sufficient size to allocate. It could be bigger (in which case, it could either create a free block with the difference - or waste it). A naive strategy of finding the closest size of block is called best fit. If it goes onto to create new free blocks, you could alternatively call it worst leave.

After use, the best-fit approach results in a large amounts of fragmentation, caused by small blocks that are unlikely to be ever allocated again, and the cost of searching the free blocks becomes high.

Consequently, high performance heap managers don't work like this. Instead they operate as pool allocators for various fixed block-sizes. Schemes in which the blocks are powers of 2 (eg 64,128,256,512... ) the norm, although throwing in some intermediates is probably worthwhile too (eg 48,96,192...) . In this scheme, malloc() and free() are both O(1) operations, and the critical sections in allocation are minimal - potentially per pool - which gets important in a multi-threaded environment.

The wasting of memory in small allocations is a much lesser evil than fragmentation, O(n) allocdealloc complexity and poor MT performance.

The minimum block size wrt to the cache line size is one of those classic engineering trade-offs, and it's a safe bet that Microsoft did quite a bit of experimentation to arrive at 64 as their minimum. FWIW, I'm pretty sure you'll find the cache-line size of modern CPUs are bigger than that.

链接地址: http://www.djcxy.com/p/54080.html

上一篇: JNI将原生的无符号字符转换为jbyte数组

下一篇: 字节对齐并等于缓存行大小