我使用archlinux的机器最近发生了两次内存占用过多导致卡死的问题。第一次我以为是chrome开太多窗口造成的,没有在意。但是第二次出现的时候,有来得及kill掉进程, 所以顺带看了一下free,发现和预想不一样的地方。

free
              total        used        free      shared  buff/cache   available
Mem:           7666        5616        1305         228         743        1561
Swap:           958         448         509

我在已经kill掉chrome的情况下,是不可能占用5G以上内存的啊?,只能说:一定是什么地方出了问题。。。
htop检查得知所有进程占用不到几百M。而清理/proc/sys/vm/drop_caches依然没有改善。 于是怀疑还是slap的问题,检查/proc/meminfo信息:

cat /proc/meminfo |grep ^S
SwapCached:          500 kB
SwapTotal:        981276 kB
SwapFree:         513248 kB
Shmem:            227892 kB
Slab:            5145568 kB
SReclaimable:      44640 kB
SUnreclaim:      5100928 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB

如上,SUnreclaim居然占了5个G,不科学。 再联想到只有最近才出现,且连续出现了两次内存问题, 怀疑是升级的新内核版本有内存泄露的bug。
我的版本是:

Linux NAS-Arch 4.18.12-arch1-1-ARCH #1 SMP PREEMPT Thu Oct 4 01:01:27 UTC 2018 x86_64 GNU/Linux

稍微搜了一下,好像的确是内存泄露的原因。
Huge memory leak on linux kernel 4.18
可以用Kernel Memory Leak Detector来检测,
好像也有一些猜测的原因:

I have updated the gist with the output of kmemleak after clearing and scanning (done after about 45 mins of runtime)
https://gist.github.com/coolsidd/d8a1d5addafd6a2367b68e6a6b243dc4/revisions

As for the amdgpu I don't believe that it is the cause (of atleast the major part) of the leak. It was the first module I removed while checking so I can confirm that the leak persists after removing amdgpu.

As for the rtl8723be (my network driver) , the lts version is very different doesn't properly work (it does not have the antenna select option). However I have been using this module since a year and it is also present of 4.17.x. Were there any changes in this version (their github page does not show any major changes since last 6 months). I will build for 4.17.x tomorrow to confirm whether the leak is due to the rtl drivers.
--
There were a few commits for rtlwifi between 4.17.14 and 4.18.6
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/wireless/realtek/rtlwifi?h=v4.17.1&qt=range&q=v4.17.14..v4.18.6

没仔细看, 我先升级到最新的内核,如果再有问题,就回退到4.17.11

前段时间发现sql经常不按照预期的索引走,而且最严重的有次发生在重启的mysql机器上。
后来通过万能的google发现是tokudb的bug导致的,虽然percona的文档说新版本已经fix,但是mariadb最新版上看起来仍然没有完全修复。
具体现象就是执行show index from [table_name]返回的Cardinality为0,或者非常低的数值。
准确的修复步骤忘了,按照回忆整理如下:

INSTALL PLUGIN tokudb_background_job_status SONAME 'ha_tokudb.so';
SELECT @@tokudb_version;

SET SESSION TOKUDB_ANALYZE_IN_BACKGROUND=1;
SET SESSION TOKUDB_ANALYZE_MODE=TOKUDB_ANALYZE_RECOUNT_ROWS;
ANALYZE TABLE feed;
select * from INFORMATION_SCHEMA.TOKUDB_BACKGROUND_JOB_STATUS;

然后等待后台执行完ANALYZE,即可恢复。

下载kernel源文件,
https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.9.tar.xz
解压,然后到对应目录执行make

$cd /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915
$make -C /usr/lib/modules/`uname -r`/build/ M=`pwd` modules    -j 3

然后发现有错误:

In file included from /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915/i915_trace.h:1006:0,
                 from /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915/i915_trace_points.c:13:
./include/trace/define_trace.h:89:42: 致命错误:../../drivers/gpu/drm/i915/i915_trace.h:没有那个文件或目录
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
                                          ^
编译中断

按照经验是需要修改TRACE_INCLUDE_PATH就可以了。
但是发现改为绝对路径以后仍然不能。 卡了很久。
后来一检查:

配置的是:
#define TRACE_INCLUDE_PATH /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915

错误是:
In file included from /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915/i915_trace.h:1006:0,
                 from /opt/dev/workspace/kernel/linux-4.16.9/drivers/gpu/drm/i915/i915_trace_points.c:13:
./include/trace/define_trace.h:89:42: 致命错误:/opt/dev/workspace/kernel/1-4.16.9/drivers/gpu/drm/i915/i915_trace.h:没有那个文件或目录
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
                                          ^

错误的是我的目录有-号: linux-4.16.9 VS 1-4.16.9

神奇。。。。

于是换了一个目录名搞定。

跳票了好几次,真心的不容易。
ACME v2 and Wildcard Certificate Support is Live
对很多小站的https证书的推广应该有极大的促进。

看看intermediate CA的内容:(以前是测试的Issuer: CN=Fake LE Root X1

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0a:01:41:42:00:00:01:53:85:73:6a:0b:85:ec:a7:08
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=Digital Signature Trust Co., CN=DST Root CA X3
        Validity
            Not Before: Mar 17 16:40:46 2016 GMT
            Not After : Mar 17 16:40:46 2021 GMT
        Subject: C=US, O=Let's Encrypt, CN=Let's Encrypt Authority X3
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:9c:d3:0c:f0:5a:e5:2e:47:b7:72:5d:37:83:b3:
                    68:63:30:ea:d7:35:26:19:25:e1:bd:be:35:f1:70:
                    92:2f:b7:b8:4b:41:05:ab:a9:9e:35:08:58:ec:b1:
                    2a:c4:68:87:0b:a3:e3:75:e4:e6:f3:a7:62:71:ba:
                    79:81:60:1f:d7:91:9a:9f:f3:d0:78:67:71:c8:69:
                    0e:95:91:cf:fe:e6:99:e9:60:3c:48:cc:7e:ca:4d:
                    77:12:24:9d:47:1b:5a:eb:b9:ec:1e:37:00:1c:9c:
                    ac:7b:a7:05:ea:ce:4a:eb:bd:41:e5:36:98:b9:cb:
                    fd:6d:3c:96:68:df:23:2a:42:90:0c:86:74:67:c8:
                    7f:a5:9a:b8:52:61:14:13:3f:65:e9:82:87:cb:db:
                    fa:0e:56:f6:86:89:f3:85:3f:97:86:af:b0:dc:1a:
                    ef:6b:0d:95:16:7d:c4:2b:a0:65:b2:99:04:36:75:
                    80:6b:ac:4a:f3:1b:90:49:78:2f:a2:96:4f:2a:20:
                    25:29:04:c6:74:c0:d0:31:cd:8f:31:38:95:16:ba:
                    a8:33:b8:43:f1:b1:1f:c3:30:7f:a2:79:31:13:3d:
                    2d:36:f8:e3:fc:f2:33:6a:b9:39:31:c5:af:c4:8d:
                    0d:1d:64:16:33:aa:fa:84:29:b6:d4:0b:c0:d8:7d:
                    c3:93
                Exponent: 65537 (0x10001)

更新了自己的证书:
openssl s_client -connect kexiao8.com:443

depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
verify return:1
depth=0 CN = kexiao8.com
verify return:1
---
Certificate chain
 0 s:/CN=kexiao8.com
   i:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3
 1 s:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3
   i:/O=Digital Signature Trust Co./CN=DST Root CA X3
---

以前遇到过mysql对index_merge进行优化时,按照它自己的“猜测”优化,导致性能下降的问题, 后来靠强制force index(...)关闭优化,解决了。
最近有发现一个对联合索引的“智能”优化。这导致同一个sql5.5前后的版本执行效率完全不一样。

表结构如下:

CREATE TABLE `feed` (
  `feedid` bigint(11) NOT NULL AUTO_INCREMENT,
  `userid` int(11) NOT NULL,
  `typeid` int(11) NOT NULL,
  `dataid` int(11) NOT NULL,
  `invalid` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`feedid`),
  UNIQUE KEY `dataid_typeid` (`dataid`,`typeid`),
  KEY `userid_t_i` (`userid`,`typeid`,`invalid`)
) ENGINE=TokuDB AUTO_INCREMENT=1516204596 DEFAULT CHARSET=utf8 `compression`='tokudb_zlib'

SQL语句时:

select feedid,userid,dataid,typeid from feed where userid = '25057158'   and typeid in (0,2,6)  and (invalid = 0 or invalid = 11) order by feedid desc limit 0,30 ;

按理说完全符合联合索引,应该很快能执行成功,但是实际上发现在新版mysql中有扫描全表。
explain看:

mysql> explain select feedid,userid,dataid,typeid from feed where userid = '25057158'   and typeid in (0,2,6)  and (invalid = 0 or invalid = 11) order by feedid desc limit 0,30 ;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | feed  | index | userid_t_i    | PRIMARY | 8       | NULL |   30 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.01 sec)

可以看到居然是用了PRIMARY去做查询,实际执行的结果也如此。完全忽略了possible_keys userid_t_i。
我猜想,是mysql发现sql中出现了多处INOR, 认为这个索引的效率是极其低下的,因此“智能”换成了它认为更高效的主键作为索引。

所以如果使用强制索引,或者没有INOR, 它不会做这个优化。 用explain再验证了这一点:

mysql> explain select feedid,userid,dataid,typeid from feed force index (userid) where userid = '178804206'   and typeid in (0,2,6)  and (invalid = 0 or invalid = 11) order by feedid desc limit 0,30;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-----------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                       |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-----------------------------+
|    1 | SIMPLE      | feed  | range | userid_t_i    | userid_t_i | 12      | NULL | 2076 | Using where; Using filesort |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-----------------------------+

mysql> explain select feedid,userid,dataid,typeid from feed where userid = '25057158'   and typeid = 0  and invalid = 0  order by feedid desc limit 0,30 ;
+------+-------------+-------+------+---------------+------------+---------+-------------------+------+-------------+
| id   | select_type | table | type | possible_keys | key        | key_len | ref               | rows | Extra       |
+------+-------------+-------+------+---------------+------------+---------+-------------------+------+-------------+
|    1 | SIMPLE      | feed  | ref  | userid_t_i    | userid_t_i | 12      | const,const,const |  585 | Using where |
+------+-------------+-------+------+---------------+------------+---------+-------------------+------+-------------+

此外, 这个优化也应该是由于sql末尾有order by引起的, 如果没有排序,应该不会扫描全表:

mysql> explain select feedid,userid,dataid,typeid from feed where userid = '25057158'   and typeid in (0,2,6)  and (invalid = 0 or invalid = 11)  limit 0,30 ;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
|    1 | SIMPLE      | feed  | range | userid_t_i    | userid_t_i | 12      | NULL |  591 | Using where |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)

而如果把order by feedid desc去掉的sql真实的执行一遍以后,因为有了缓存,再次去explain第一个sql的时候,居然不提示用PRIMARY key了,难道是一个智能学习的过程? 而实际的执行时也是只用了userid_t_i作为索引。

mysql> explain select feedid,userid,dataid,typeid from feed where userid = '25057158'   and typeid in (0,2,6)  and (invalid = 0 or invalid = 11)  limit 0,30 ;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
|    1 | SIMPLE      | feed  | range | userid_t_i    | userid_t_i | 12      | NULL |    7 | Using where |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------+

总之,我猜测是mysql不同版本对optimizer_switch等参数的不同调整导致的, 也许可以在mysql文档中找出问题并解决。
比如: https://dev.mysql.com/doc/refman/5.7/en/index-merge-optimization.html