2018年10月

我使用archlinux的机器最近发生了两次内存占用过多导致卡死的问题。第一次我以为是chrome开太多窗口造成的,没有在意。但是第二次出现的时候,有来得及kill掉进程, 所以顺带看了一下free,发现和预想不一样的地方。

free
              total        used        free      shared  buff/cache   available
Mem:           7666        5616        1305         228         743        1561
Swap:           958         448         509

我在已经kill掉chrome的情况下,是不可能占用5G以上内存的啊?,只能说:一定是什么地方出了问题。。。
htop检查得知所有进程占用不到几百M。而清理/proc/sys/vm/drop_caches依然没有改善。 于是怀疑还是slap的问题,检查/proc/meminfo信息:

cat /proc/meminfo |grep ^S
SwapCached:          500 kB
SwapTotal:        981276 kB
SwapFree:         513248 kB
Shmem:            227892 kB
Slab:            5145568 kB
SReclaimable:      44640 kB
SUnreclaim:      5100928 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB

如上,SUnreclaim居然占了5个G,不科学。 再联想到只有最近才出现,且连续出现了两次内存问题, 怀疑是升级的新内核版本有内存泄露的bug。
我的版本是:

Linux NAS-Arch 4.18.12-arch1-1-ARCH #1 SMP PREEMPT Thu Oct 4 01:01:27 UTC 2018 x86_64 GNU/Linux

稍微搜了一下,好像的确是内存泄露的原因。
Huge memory leak on linux kernel 4.18
可以用Kernel Memory Leak Detector来检测,
好像也有一些猜测的原因:

I have updated the gist with the output of kmemleak after clearing and scanning (done after about 45 mins of runtime)
https://gist.github.com/coolsidd/d8a1d5addafd6a2367b68e6a6b243dc4/revisions

As for the amdgpu I don't believe that it is the cause (of atleast the major part) of the leak. It was the first module I removed while checking so I can confirm that the leak persists after removing amdgpu.

As for the rtl8723be (my network driver) , the lts version is very different doesn't properly work (it does not have the antenna select option). However I have been using this module since a year and it is also present of 4.17.x. Were there any changes in this version (their github page does not show any major changes since last 6 months). I will build for 4.17.x tomorrow to confirm whether the leak is due to the rtl drivers.
--
There were a few commits for rtlwifi between 4.17.14 and 4.18.6
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/wireless/realtek/rtlwifi?h=v4.17.1&qt=range&q=v4.17.14..v4.18.6

没仔细看, 我先升级到最新的内核,如果再有问题,就回退到4.17.11