go 的栈为什么在堆上
内存模型和垃圾回收和 go
的高并发特性是息息相关
go
的协程栈的作用:
协程的执行路径
局部变量:
方法内部声明的局部变量,如果只是在内部使用的话,会记录在协程栈上
方法中的参数传递
函数返回值的传递
go
的协程栈位于 go
的堆内存上,栈内存的释放也是通过 gc
来释放的
go
的堆内存位于操作系统虚拟内存上(操作系统会为每个进程分配一个虚拟内存)
协程栈不够用了咋办
协程栈初始空间大概 2~4kb
,局部变量太大,栈帧太多这两种情况会造成协程栈空间不够用
逃逸分析就是把一些放在协程栈上的变量放到堆上
不是所有的变量都会放在协程栈上:
栈帧回收之后,需要继续使用的变量
本地变量太大,栈帧放不下
有三种逃逸:
指针逃逸
函数返回了对象的指针
空接口逃逸
如果函数的参数类型是
interface{}
(因为interface{}
类型的函数往往会使用反射)空接口作为参数不会导致实参逃逸,但方法中使用反射的手段就会导致实参逃逸
大变量逃逸
在
64
位的机器中,一般超过64k
的变量会逃逸
栈帧太多导致栈空间不够用
协程在函数调用前会先判断栈空间(morestack
)是否足够,如果栈空间不够,就需要进行扩容
栈空间扩容有两种方式:
分段栈(
go1.13
之前)不够了开辟一块新的栈,但是如果下一个函数做完之后,又要回到老的栈帧上(之前开辟的会被释放),又调用一个新的函数,就需要开辟一个新的栈帧,这个函数处理完之后,又要回到老的栈帧上(这块新开辟的栈帧会被释放)
优点:没有空间浪费
缺点:栈指针会在不连续的空间来回跳转
连续栈(
go1.14
之后)当栈空间不足时,扩容为原来的
2
倍当栈空间使用率不足原来的
1/4
时,缩容为原来的1/2
当栈空间不够时,开辟一块新的栈空间,把老的栈里面的内容都拷贝过来
优点:空间一直连续
缺点:伸缩时开销比较大
go 的堆内存结构
go
模仿了 TCmalloc
建立了堆内存架构
操作系统会给应用提供虚拟内存空间(不是 windows
的虚拟内存),因为操作系统是不允许进程去碰物理内存,所以操作系统会给进程提供虚拟内存空间
linxu
可以通过 mmap
和 madvice
来分配
go
是通过 heapArena
来获取虚拟内存的,go
每次获取虚拟内存为 64mb
,最多能够申请 4194304
个虚拟内存单元(2^22
)
内存单元也叫 heapArena
,所有的 heapArena
组成了mheap
(go
堆内存)
heapArena
描述了一个 64mb
的虚拟内存空间
// A heapArena stores metadata for a heap arena. heapArenas are stored // outside of the Go heap and accessed via the mheap_.arenas index. type heapArena struct { // bitmap stores the pointer/scalar bitmap for the words in // this arena. See mbitmap.go for a description. // This array uses 1 bit per word of heap, or 1.6% of the heap size (for 64-bit). bitmap [heapArenaBitmapWords]uintptr }
所有的 heapArena
组成了mheap
// Main malloc heap. // The heap itself is the "free" and "scav" treaps, // but all the other global data is here too. // // mheap must not be heap-allocated because it contains mSpanLists, // which must not be heap-allocated. type mheap struct { // arenas is the heap arena map. It points to the metadata for // the heap for every arena frame of the entire usable virtual // address space. // // Use arenaIndex to compute indexes into this array. // // For regions of the address space that are not backed by the // Go heap, the arena map contains nil. // // Modifications are protected by mheap_.lock. Reads can be // performed without locking; however, a given entry can // transition from nil to non-nil at any time when the lock // isn't held. (Entries never transitions back to nil.) // // In general, this is a two-level mapping consisting of an L1 // map and possibly many L2 maps. This saves space when there // are a huge number of arena frames. However, on many // platforms (even 64-bit), arenaL1Bits is 0, making this // effectively a single-level map. In this case, arenas[0] // will never be nil. arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena // central free lists for small size classes. // the padding makes sure that the mcentrals are // spaced CacheLinePadSize bytes apart, so that each mcentral.lock // gets its own cache line. // central is indexed by spanClass. central [numSpanClasses]struct { mcentral mcentral pad [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte } }
heapArena 内存分配——线性分配
线性分配就是按照对象或者值的大小依此往后排列,如下所示
如果垃圾回收,删除了一些对象或者值,如下所示
如果再次分配内存,会从上次分配的位置开始分配,这是线性分配,如下所示
把这些空挡统计起来,做一个类似链表的数据结构,如果有一个小的对象就可以放进来,那如果有一个比较大的对象,还是要放到后面去
所以在内存中,由于对象分配和回收就会产生很多碎片,大对象是放不下去的,只能放一些小对象
go
为了解决这个问题,使用了一种分级分配的方案
go
对 heapArena
切成了很多小块,给对象分配能放进去的最小的块
这个分成一个个小的块,相同的小块组成的一组成为 mspan
,go
称为 mspan
每个 mspan
为 n
个相同大小的格子
go
中一共有 67
种 mspan
第一级别最小的格子是 8 bit
,一共有 1024
个,大小有 8k
,最大浪费 87.50%
的空间
具体的对应关系如下,一共有 68
(包括 0
级)
class | bytes/obj | bytes/span | objects | tail waste | max waste | min align |
---|---|---|---|---|---|---|
1 | 8 | 8192 | 1024 | 0 | 87.50% | 8 |
2 | 16 | 8192 | 512 | 0 | 43.75% | 16 |
3 | 24 | 8192 | 341 | 0 | 29.24% | 8 |
4 | 32 | 8192 | 256 | 0 | 21.88% | 32 |
5 | 48 | 8192 | 170 | 32 | 31.52% | 16 |
6 | 64 | 8192 | 128 | 0 | 23.44% | 64 |
7 | 80 | 8192 | 102 | 32 | 19.07% | 16 |
8 | 96 | 8192 | 85 | 32 | 15.95% | 32 |
9 | 112 | 8192 | 73 | 16 | 13.56% | 16 |
10 | 128 | 8192 | 64 | 0 | 11.72% | 128 |
11 | 144 | 8192 | 56 | 128 | 11.82% | 16 |
12 | 160 | 8192 | 51 | 32 | 9.73% | 32 |
13 | 176 | 8192 | 46 | 96 | 9.59% | 16 |
14 | 192 | 8192 | 42 | 128 | 9.25% | 64 |
15 | 208 | 8192 | 39 | 80 | 8.12% | 16 |
16 | 224 | 8192 | 36 | 128 | 8.15% | 32 |
17 | 240 | 8192 | 34 | 32 | 6.62% | 16 |
18 | 256 | 8192 | 32 | 0 | 5.86% | 256 |
19 | 288 | 8192 | 28 | 128 | 12.16% | 32 |
20 | 320 | 8192 | 25 | 192 | 11.80% | 64 |
21 | 352 | 8192 | 23 | 96 | 9.88% | 32 |
22 | 384 | 8192 | 21 | 128 | 9.51% | 128 |
23 | 416 | 8192 | 19 | 288 | 10.71% | 32 |
24 | 448 | 8192 | 18 | 128 | 8.37% | 64 |
25 | 480 | 8192 | 17 | 32 | 6.82% | 32 |
26 | 512 | 8192 | 16 | 0 | 6.05% | 512 |
27 | 576 | 8192 | 14 | 128 | 12.33% | 64 |
28 | 640 | 8192 | 12 | 512 | 15.48% | 128 |
29 | 704 | 8192 | 11 | 448 | 13.93% | 64 |
30 | 768 | 8192 | 10 | 512 | 13.94% | 256 |
31 | 896 | 8192 | 9 | 128 | 15.52% | 128 |
32 | 1024 | 8192 | 8 | 0 | 12.40% | 1024 |
33 | 1152 | 8192 | 7 | 128 | 12.41% | 128 |
34 | 1280 | 8192 | 6 | 512 | 15.55% | 256 |
35 | 1408 | 16384 | 11 | 896 | 14.00% | 128 |
36 | 1536 | 8192 | 5 | 512 | 14.00% | 512 |
37 | 1792 | 16384 | 9 | 256 | 15.57% | 256 |
38 | 2048 | 8192 | 4 | 0 | 12.45% | 2048 |
39 | 2304 | 16384 | 7 | 256 | 12.46% | 256 |
40 | 2688 | 8192 | 3 | 128 | 15.59% | 128 |
41 | 3072 | 24576 | 8 | 0 | 12.47% | 1024 |
42 | 3200 | 16384 | 5 | 384 | 6.22% | 128 |
43 | 3456 | 24576 | 7 | 384 | 8.83% | 128 |
44 | 4096 | 8192 | 2 | 0 | 15.60% | 4096 |
45 | 4864 | 24576 | 5 | 256 | 16.65% | 256 |
46 | 5376 | 16384 | 3 | 256 | 10.92% | 256 |
47 | 6144 | 24576 | 4 | 0 | 12.48% | 2048 |
48 | 6528 | 32768 | 5 | 128 | 6.23% | 128 |
49 | 6784 | 40960 | 6 | 256 | 4.36% | 128 |
50 | 6912 | 49152 | 7 | 768 | 3.37% | 256 |
51 | 8192 | 8192 | 1 | 0 | 15.61% | 8192 |
52 | 9472 | 57344 | 6 | 512 | 14.28% | 256 |
53 | 9728 | 49152 | 5 | 512 | 3.64% | 512 |
54 | 10240 | 40960 | 4 | 0 | 4.99% | 2048 |
55 | 10880 | 32768 | 3 | 128 | 6.24% | 128 |
56 | 12288 | 24576 | 2 | 0 | 11.45% | 4096 |
57 | 13568 | 40960 | 3 | 256 | 9.99% | 256 |
58 | 14336 | 57344 | 4 | 0 | 5.35% | 2048 |
59 | 16384 | 16384 | 1 | 0 | 12.49% | 8192 |
60 | 18432 | 73728 | 4 | 0 | 11.11% | 2048 |
61 | 19072 | 57344 | 3 | 128 | 3.57% | 128 |
62 | 20480 | 40960 | 2 | 0 | 6.87% | 4096 |
63 | 21760 | 65536 | 3 | 256 | 6.25% | 256 |
64 | 24576 | 24576 | 1 | 0 | 11.45% | 8192 |
65 | 27264 | 81920 | 3 | 128 | 10.00% | 128 |
66 | 28672 | 57344 | 2 | 0 | 4.91% | 4096 |
67 | 32768 | 32768 | 1 | 0 | 12.50% | 8192 |
mspan
是一个链表,next
指向下一个 mspan
的地址,prev
指向上一个 mspan
的地址
heapArena
内部保存着 spans
数组
type mspan struct { next *mspan // next span in list, or nil if none prev *mspan // previous span in list, or nil if none } type heapArena struct { // spans maps from virtual address page ID within this arena to *mspan. // For allocated spans, their pages map to the span itself. // For free spans, only the lowest and highest pages map to the span itself. // Internal pages map to an arbitrary span. // For pages that have never been allocated, spans entries are nil. // // Modifications are protected by mheap.lock. Reads can be // performed without locking, but ONLY from indexes that are // known to contain in-use or stack spans. This means there // must not be a safe-point between establishing that an // address is live and looking it up in the spans array. spans [pagesPerArena]*mspan }
每一个 heapArena
不是有所有的 mspan
,它是根据对象需要去开辟的,所以就需要一个中心索引 mcentral
有 136
个 mcentral
结构体,其中 68
个组需要 GC
扫描的 mspan
,68
个组不需要 GC
扫描的 mspan
mcentral
就是把 heapArena
中某一级的 span
用链表串起来,需要 gc
的放在需要的 gc
的列表中,不需要 gc
的放在不需要的 gc
的列表中
// Central list of free objects of a given size. type mcentral struct { _ sys.NotInHeap spanclass spanClass // partial and full contain two mspan sets: one of swept in-use // spans, and one of unswept in-use spans. These two trade // roles on each GC cycle. The unswept set is drained either by // allocation or by the background sweeper in every GC cycle, // so only two roles are necessary. // // sweepgen is increased by 2 on each GC cycle, so the swept // spans are in partial[sweepgen/2%2] and the unswept spans are in // partial[1-sweepgen/2%2]. Sweeping pops spans from the // unswept set and pushes spans that are still in-use on the // swept set. Likewise, allocating an in-use span pushes it // on the swept set. // // Some parts of the sweeper can sweep arbitrary spans, and hence // can't remove them from the unswept set, but will add the span // to the appropriate swept list. As a result, the parts of the // sweeper and mcentral that do consume from the unswept list may // encounter swept spans, and these should be ignored. partial [2]spanSet // list of spans with a free object full [2]spanSet // list of spans with no free objects }
mcentral
好是好,但是有性能问题,问题在于如果要修改某一个 span
,需要上锁,上锁就涉及到了锁竞争问题
在讲协程的时候提到,每个协程有一个本地列队,我们在这个本地队列中给它开辟一个 mcache
缓存,这样就减少了锁竞争
mcache
结构如下
// Per-thread (in Go, per-P) cache for small objects. // This includes a small object cache and local allocation stats. // No locking needed because it is per-thread (per-P). // // mcaches are allocated from non-GC'd memory, so any heap pointers // must be specially handled. type mcache struct { _ sys.NotInHeap // The following members are accessed on every malloc, // so they are grouped here for better caching. nextSample uintptr // trigger heap sample after allocating this many bytes scanAlloc uintptr // bytes of scannable heap allocated // Allocator cache for tiny objects w/o pointers. // See "Tiny allocator" comment in malloc.go. // tiny points to the beginning of the current tiny block, or // nil if there is no current tiny block. // // tiny is a heap pointer. Since mcache is in non-GC'd memory, // we handle it by clearing it in releaseAll during mark // termination. // // tinyAllocs is the number of tiny allocations performed // by the P that owns this mcache. tiny uintptr tinyoffset uintptr tinyAllocs uintptr // The rest is not accessed on every malloc. alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass stackcache [_NumStackOrders]stackfreelist // flushGen indicates the sweepgen during which this mcache // was last flushed. If flushGen != mheap_.sweepgen, the spans // in this mcache are stale and need to the flushed so they // can be swept. This is done in acquirep. flushGen atomic.Uint32 }
p
结构体中有一个 mcache
的属性
type p struct { mcache *mcache }
GO 的是如何分配堆内存的?
对象分级:
Tiny
微对象(0
,16B
)无指针Small
小对象[16B
,32KB
]Large
大对象(32K
,+∞
)
微小对象分配至普通的 mspan
(1~67
),大对象量身定做 mspan
(0
)
微对象分配
从
mcache
中拿到2
级mspan
(一个2
级的mspan
大小是16
字节)将多个微对象合并成一个
16Byte
存入
// Allocate an object of size bytes. // Small objects are allocated from the per-P cache's free lists. // Large objects (> 32 kB) are allocated straight from the heap. func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { delayedZeroing := false if size <= maxSmallSize { if noscan && size < maxTinySize { // Tiny allocator. // // Tiny allocator combines several tiny allocation requests // into a single memory block. The resulting memory block // is freed when all subobjects are unreachable. The subobjects // must be noscan (don't have pointers), this ensures that // the amount of potentially wasted memory is bounded. // // Size of the memory block used for combining (maxTinySize) is tunable. // Current setting is 16 bytes, which relates to 2x worst case memory // wastage (when all but one subobjects are unreachable). // 8 bytes would result in no wastage at all, but provides less // opportunities for combining. // 32 bytes provides more opportunities for combining, // but can lead to 4x worst case wastage. // The best case winning is 8x regardless of block size. // // Objects obtained from tiny allocator must not be freed explicitly. // So when an object will be freed explicitly, we ensure that // its size >= maxTinySize. // // SetFinalizer has a special case for objects potentially coming // from tiny allocator, it such case it allows to set finalizers // for an inner byte of a memory block. // // The main targets of tiny allocator are small strings and // standalone escaping variables. On a json benchmark // the allocator reduces number of allocations by ~12% and // reduces heap size by ~20%. off := c.tinyoffset // Align tiny pointer for required (conservative) alignment. if size&7 == 0 { off = alignUp(off, 8) } else if goarch.PtrSize == 4 && size == 12 { // Conservatively align 12-byte objects to 8 bytes on 32-bit // systems so that objects whose first field is a 64-bit // value is aligned to 8 bytes and does not cause a fault on // atomic access. See issue 37262. // TODO(mknyszek): Remove this workaround if/when issue 36606 // is resolved. off = alignUp(off, 8) } else if size&3 == 0 { off = alignUp(off, 4) } else if size&1 == 0 { off = alignUp(off, 2) } if off+size <= maxTinySize && c.tiny != 0 { // The object fits into existing tiny block. x = unsafe.Pointer(c.tiny + off) c.tinyoffset = off + size c.tinyAllocs++ mp.mallocing = 0 releasem(mp) return x } // Allocate a new maxTinySize block. span = c.alloc[tinySpanClass] v := nextFreeFast(span) if v == 0 { v, span, shouldhelpgc = c.nextFree(tinySpanClass) } x = unsafe.Pointer(v) (*[2]uint64)(x)[0] = 0 (*[2]uint64)(x)[1] = 0 // See if we need to replace the existing tiny block with the new one // based on amount of remaining free space. if !raceenabled && (size < c.tinyoffset || c.tiny == 0) { // Note: disabled when race detector is on, see comment near end of this function. c.tiny = uintptr(x) c.tinyoffset = size } size = maxTinySize } } return x }
小对象分配
// Allocate an object of size bytes. // Small objects are allocated from the per-P cache's free lists. // Large objects (> 32 kB) are allocated straight from the heap. func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { delayedZeroing := false if size <= maxSmallSize { if !(noscan && size < maxTinySize) { var sizeclass uint8 if size <= smallSizeMax-8 { sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)] } else { sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)] } size = uintptr(class_to_size[sizeclass]) spc := makeSpanClass(sizeclass, noscan) span = c.alloc[spc] v := nextFreeFast(span) if v == 0 { v, span, shouldhelpgc = c.nextFree(spc) } x = unsafe.Pointer(v) if needzero && span.needzero != 0 { memclrNoHeapPointers(x, size) } } } return x }
mcache
的替换
mcache
中,每个级别的mspan
中只有一个当
mspan
满了之后,会从mcentral
中换一个新的
mcentral
的扩容
mcentral
中,只有有限数量的mspan
当
mspan
缺少时,会从heapArena
开辟新的mspan
refill
从中央的 mspan
找新的 mspan
,如果中央的 mspan
满了,就扩容
// refill acquires a new span of span class spc for c. This span will // have at least one free object. The current span in c must be full. // // Must run in a non-preemptible context since otherwise the owner of // c could change. func (c *mcache) refill(spc spanClass) { // Return the current cached span to the central lists. s := c.alloc[spc] if uintptr(s.allocCount) != s.nelems { throw("refill of span with free space remaining") } if s != &emptymspan { // Mark this span as no longer cached. if s.sweepgen != mheap_.sweepgen+3 { throw("bad sweepgen in refill") } mheap_.central[spc].mcentral.uncacheSpan(s) // Count up how many slots were used and record it. stats := memstats.heapStats.acquire() slotsUsed := int64(s.allocCount) - int64(s.allocCountBeforeCache) atomic.Xadd64(&stats.smallAllocCount[spc.sizeclass()], slotsUsed) // Flush tinyAllocs. if spc == tinySpanClass { atomic.Xadd64(&stats.tinyAllocCount, int64(c.tinyAllocs)) c.tinyAllocs = 0 } memstats.heapStats.release() // Count the allocs in inconsistent, internal stats. bytesAllocated := slotsUsed * int64(s.elemsize) gcController.totalAlloc.Add(bytesAllocated) // Clear the second allocCount just to be safe. s.allocCountBeforeCache = 0 } // Get a new cached span from the central lists. s = mheap_.central[spc].mcentral.cacheSpan() if s == nil { throw("out of memory") } if uintptr(s.allocCount) == s.nelems { throw("span has no free space") } // Indicate that this span is cached and prevent asynchronous // sweeping in the next sweep phase. s.sweepgen = mheap_.sweepgen + 3 // Store the current alloc count for accounting later. s.allocCountBeforeCache = s.allocCount // Update heapLive and flush scanAlloc. // // We have not yet allocated anything new into the span, but we // assume that all of its slots will get used, so this makes // heapLive an overestimate. // // When the span gets uncached, we'll fix up this overestimate // if necessary (see releaseAll). // // We pick an overestimate here because an underestimate leads // the pacer to believe that it's in better shape than it is, // which appears to lead to more memory used. See #53738 for // more details. usedBytes := uintptr(s.allocCount) * s.elemsize gcController.update(int64(s.npages*pageSize)-int64(usedBytes), int64(c.scanAlloc)) c.scanAlloc = 0 c.alloc[spc] = s }
大对象的分配
直接从
heapArena
开辟0
级的mspan
0
级的mspan
为大对象定制
func mallocgc(){ if size > maxSmallSize { shouldhelpgc = true // For large allocations, keep track of zeroed state so that // bulk zeroing can be happen later in a preemptible context. span = c.allocLarge(size, noscan) span.freeindex = 1 span.allocCount = 1 size = span.elemsize x = unsafe.Pointer(span.base()) if needzero && span.needzero != 0 { if noscan { delayedZeroing = true } else { memclrNoHeapPointers(x, size) // We've in theory cleared almost the whole span here, // and could take the extra step of actually clearing // the whole thing. However, don't. Any GC bits for the // uncleared parts will be zero, and it's just going to // be needzero = 1 once freed anyway. } } } }
垃圾回收
垃圾回收有三种思路:
标记-清除
从根节点出发,标记所有可以到达的对象,然后清除没有标记的对象
缺点:会有内存碎片
标记-整理
比标记清除多了一步整理,把所有的对象往前移动
缺点:整理的开销比较大
复制
在标记清除的基础上,把存活的对象整理后复制到另外一块内存中
缺点:浪费一半的内存
go
因为堆内存结构的独特优势,选择了最简单的标记-清除,找到有引用的对象,剩下的就是没有引用的对象
从哪开始找
被栈上的指针引用
被全局变量指针引用
被寄存器中的指针引用(现在正在操作的变量)
上述变量被称为
Root Set
(GC Root
)
通过广度优先(BFS
)的方式搜索,这种分析也被称为可达性分析标记法
三色标记法:(每次扫描时,所有的对象恢复为白色)
黑色:有用,已经分析扫描
它自己是有用的,并且它内部的属性也都已经分析过了
灰色:有用,还为分析扫描
它自己已经已经分析过了,但是它内部的属性还没有分析过
白色:暂时无用(最后是要被清除的)
要么是还没有扫描到
要么是已经扫描了,但是没有用
串行 GC
步骤
Stop The World
,暂停所有其他协程通过可达性分析,找到无用的堆内存
释放堆内存
并发垃圾回收的难点:
标记阶段
插入屏障:插入的节点强制置为灰色
删除屏障:释放的节点强制置为灰色
如果已经分析过的节点,它的内部属性释放了,同时那个释放的那个属性又被分析过的节点引用了,就会造成这个节点无法被扫描到,导致被删除
如果已经分析过的节点,由于业务操作,在的内部新增了一个属性,因为这个节点已经被扫描过了,所以新增的属性无法被扫描到,导致被删除
GC
触发时机:
系统定时触发
sysmon
定时检查,是runtime
背后的一个循环,如果2
分钟之内没有GC
,会强制触发用户显示触发
用户调用
runtime.GC
申请内存时触发
GC
优化:
尽量少在堆上产生垃圾
空结构体指向一个固定的地址,不占用空间,比如用
channel
传递空结构体逃逸会使原本在栈上的对象进入堆中:
fmt
包返回了指针而不是拷贝
缓存性质的对象:
channel
缓存区频繁创建和删除
内存池化
减少逃逸
使用空结构体
GC
分析工具
go tool pprof
go tool trace
go build -gcflags="-m"
GODEBUG="gctrace=1"