2021-04-15

如何在 NET 程序万种死法中有效的生成 Dump (上)

一:背景

相信很多人都知道通过 任务管理器 抓取dump,虽然简单粗暴,但无法满足程序的无数种死法,比如:

  • 内存膨胀,程序爆炸

  • CPU爆高,程序累死

  • 应用无响应,用户气死

  • 意外退出,和人生一样

既然手工太弱鸡,那有什么好的工具呢? 除了 adplus,本文推荐一款神器 procdump, 下载地址:https://docs.microsoft.com/zh-cn/sysinternals/downloads/procdump ,还能支持 linux 😘😘😘,具体怎么安装就不细说了。

二:内存膨胀,程序爆炸

内存膨胀 这种情况我相信很有朋友都遇到过,我见过最多的案例就是用了小缓存 static,然后有意无意的忘记释放,导致无限堆积终爆炸,那这种怎么用 procdump 去抓呢?

为了方便演示,我先写一个无限分配内存的例子。

  static void Main(string[] args)  {   List<string> list = new List<string>();   for (int i = 0; i < int.MaxValue; i++)   {    list.Add(string.Join(",", Enumerable.Range(0, 10000)));   }   Console.ReadLine();  }

将程序跑起来后,设置 procdump 在内存超过 1G 的时候自动抓取全内存 dump,使用如下命令.

C:\Windows\system32>procdump ConsoleApp2 -m 1024 -ma E:\net5\ConsoleApp1\ConsoleApp2\bin\DebugProcDump v10.0 - Sysinternals process dump utilityCopyright (C) 2009-2020 Mark Russinovich and Andrew RichardsSysinternals - www.sysinternals.comProcess:    ConsoleApp2.exe (24112)Process image:   E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exeCPU threshold:   n/aPerformance counter: n/aCommit threshold:  >= 1024 MBThreshold seconds:  10Hung window check:  DisabledLog debug strings:  DisabledException monitor:  DisabledException filter:  [Includes]      *      [Excludes]Terminate monitor:  DisabledCloning type:   DisabledConcurrent limit:  n/aAvoid outage:   n/aNumber of dumps:  1Dump folder:   E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\Dump filename/mask: PROCESSNAME_YYMMDD_HHMMSSQueue to WER:   DisabledKill after dump:  DisabledPress Ctrl-C to end monitoring without terminating the process.[21:23:43] Commit: 1087Mb[21:23:43] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_212343.dmp[21:23:43] Dump 1 writing: Estimated dump file size is 1179 MB.[21:23:44] Dump 1 complete: 1179 MB written in 1.3 seconds[21:23:44] Dump count reached.

从最后五行可以看出,当内存达到 1087M 的时候自动生成了 dump 文件,接下来用 windbg 看一看。

  • 查看当前 process 的内存占用量,使用 !address -summary 即可
0:000> !address -summary          Mapping file section regions...Mapping module regions...Mapping PEB regions...Mapping TEB and stack regions...Mapping heap regions...Mapping page heap regions...Mapping other regions...Mapping stack trace database regions...Mapping activation context regions...--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalFree          63   b30b4000 ( 2.798 GB)   69.94%<unknown>        228   48547000 ( 1.130 GB) 93.99% 28.25%Image         210   4115000 ( 65.082 MB) 5.29% 1.59%Stack         21   700000 ( 7.000 MB) 0.57% 0.17%Heap          12   170000 ( 1.438 MB) 0.12% 0.04%Other          7    5a000 ( 360.000 kB) 0.03% 0.01%TEB          7    13000 ( 76.000 kB) 0.01% 0.00%PEB          1    3000 ( 12.000 kB) 0.00% 0.00%--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_PRIVATE        250   47121000 ( 1.110 GB) 92.36% 27.76%MEM_IMAGE        217   411e000 ( 65.117 MB) 5.29% 1.59%MEM_MAPPED        19   1cfd000 ( 28.988 MB) 2.35% 0.71%--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_FREE         63   b30b4000 ( 2.798 GB)   69.94%MEM_COMMIT        357   47f12000 ( 1.124 GB) 93.49% 28.10%MEM_RESERVE        129   502a000 ( 80.164 MB) 6.51% 1.96%--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotalPAGE_READWRITE       177   437d5000 ( 1.055 GB) 87.70% 26.36%PAGE_EXECUTE_READ      35   33c7000 ( 51.777 MB) 4.21% 1.26%PAGE_READONLY       90   c41000 ( 12.254 MB) 1.00% 0.30%PAGE_WRITECOPY       34   70b000 ( 7.043 MB) 0.57% 0.17%PAGE_READWRITE|PAGE_GUARD    14    23000 ( 140.000 kB) 0.01% 0.00%PAGE_EXECUTE_READWRITE     7    7000 ( 28.000 kB) 0.00% 0.00%--- Largest Region by Usage ----------- Base Address -------- Region Size ----------Free          80010000   7f130000 ( 1.986 GB)<unknown>         438e1000   200f000 ( 32.059 MB)Image          660e0000   f55000 ( 15.332 MB)Stack           e00000    fd000 (1012.000 kB)Heap           c97000    98000 ( 608.000 kB)Other          ff2c0000    33000 ( 204.000 kB)TEB           990000    3000 ( 12.000 kB)PEB           98d000    3000 ( 12.000 kB)

看到上面 PAGE_READWRITE 行的 (1.055 GB) 吗? 和刚才 Console 中的 1087M 遥相呼应,没毛病。

  • 寻找大对象,在托管堆中使用 !dumpheap -stat -min 1024 即可
||0:0:000> !dumpheap -stat -min 1024Statistics:  MT Count TotalSize Class Name65d42788  2  13044 System.Object[]65d42d74  2  98328 System.String[]65d42c60  73  1082988 System.Char[]65d424e4 11452 1119913984 System.String

从输出的最后一行可以看出,System.String 有1w多个,接下来可以增加 -type 属性筛选出 >10k 的字符串。

0:000> !dumpheap -type System.String -min 10240 Address  MT  Size03c75568 65d424e4 97792  03c8d378 65d424e4 97792 4a855060 65d424e4 97792  Statistics:  MT Count TotalSize Class Name65d424e4 11452 1119913984 System.StringTotal 11452 objects0:000> !gcroot 4a855060Thread 36e4:*** WARNING: Unable to verify checksum for ConsoleApp2.exe 00b3f358 012108d1 ConsoleApp2.Program.Main(System.String[]) [E:\net5\ConsoleApp1\ConsoleApp2\Program.cs @ 18]  ebp+18: 00b3f370   -> 02c71fd8 System.Collections.Generic.List`1[[System.String, mscorlib]]   -> 02cce2ec System.String[]   -> 4a855060 System.StringFound 1 unique roots (run '!GCRoot -all' to see all roots).

从最后的 !gcroot 看,确实是被 Program.cs:18 行的 List 所持有,到此水落石出。

三:CPU爆高,程序累死

说起CPU爆高的案例,我发现更多的是在 非托管堆 上,比如GC回收,争抢锁等,很少有人能傻到在 托管层 上把cpu搞起来。

对了,分析CPU 爆高有一个小技巧,那就是连续抓 dump 快照,看两个 dump 中的线程运行情况,这时候就非常适合 procdump,先来看测试代码。

 class Program {  static void Main(string[] args)  {   Parallel.For(0, int.MaxValue, (i) =>   {    while (true)    {    }   });   Console.ReadLine();  } }

现在我设定 连续 5s 内 CPU 超过 70% 抓取 dump,直到 2 个为止

C:\Windows\system32>procdump ConsoleApp2 -s 5 -n 2 -c 70 E:\net5\ConsoleApp1\ConsoleApp2\bin\DebugProcDump v10.0 - Sysinternals process dump utilityCopyright (C) 2009-2020 Mark Russinovich and Andrew RichardsSysinternals - www.sysinternals.comProcess:    ConsoleApp2.exe (22152)Process image:   E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exeCPU threshold:   >= 70% of systemPerformance counter: n/aCommit threshold:  n/aThreshold seconds:  5Hung window check:  DisabledLog debug strings:  DisabledException monitor:  DisabledException filter:  [Includes]      *      [Excludes]Terminate monitor:  DisabledCloning type:   DisabledConcurrent limit:  n/aAvoid outage:   n/aNumber of dumps:  2Dump folder:   E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\Dump filename/mask: PROCESSNAME_YYMMDD_HHMMSSQueue to WER:   DisabledKill after dump:  DisabledPress Ctrl-C to end monitoring without terminating the process.[22:25:47] CPU: 95% 1s[22:25:48] CPU: 100% 2s[22:25:50] CPU: 96% 3s[22:25:51] CPU: 98% 4s[22:25:52] CPU: 99% 5s (Trigger)[22:25:53] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222553.dmp[22:25:54] Dump 1 complete: 5 MB written in 0.3 seconds[22:25:56] CPU: 88% 1s[22:25:58] CPU: 93% 2s[22:26:00] CPU: 89% 3s[22:26:02] CPU: 89% 4s[22:26:04] CPU: 95% 5s (Trigger)[22:26:05] Dump 2 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222605.dmp[22:26:06] Dump 2 complete: 5 MB written in 0.4 seconds[22:26:07] Dump count reached.

从最后输出中可以看到,连续 5s CPU 超过了 70% 抓取了 dump,总共来了2个。

现在 dump 有了,接下来用两个 windbg 实例打开,验证下 dump 的生成时间,如下图所示:

从图中可以看到,两个 dump 生成时间相隔 12s,而且通过 !runaway 发现下面的线程:

  • 14:2cb8
  • 19:3f8c
  • ...

都运行了长达 10s ,这说明什么?说明这二个线程应该在某个地方死循环了。。。对吧。。。

切到 14 号线程通过 !clrstack 看调用堆栈即可,都是死在 ConsoleApp2.Program+c.b__0_0(Int32) 这里出不来。。。

四:总结

感觉篇幅有点长了,就先说到这里吧,有兴趣的话,可以把 procdump 拉下来玩一玩 🤭。

更多高质量干货:参见我的 GitHub: dotnetfly

图片名称







原文转载:http://www.shaoqun.com/a/688739.html

跨境电商:https://www.ikjzd.com/

c79:https://www.ikjzd.com/w/1016

自贸区跨境通网站:https://www.ikjzd.com/w/1329


一:背景相信很多人都知道通过任务管理器抓取dump,虽然简单粗暴,但无法满足程序的无数种死法,比如:内存膨胀,程序爆炸CPU爆高,程序累死应用无响应,用户气死意外退出,和人生一样既然手工太弱鸡,那有什么好的工具呢?除了adplus,本文推荐一款神器procdump,下载地址:https://docs.microsoft.com/zh-cn/sysinternals/downloads/procdu
锦桥纺织网:https://www.ikjzd.com/w/2469
小马哥:https://www.ikjzd.com/w/1655
myyearbook:https://www.ikjzd.com/w/726
独立站引流渠道小妙招,教你快速上手站外推广:https://www.ikjzd.com/home/133397
如何识别网红并更好的合作:https://www.ikjzd.com/tl/96103
老公出差后我认识了教练 口述健身教练来我家发生的事:http://lady.shaoqun.com/a/269538.html

No comments:

Post a Comment