Ali The Expert

Logo

How to enable Kdump on RHEL 7 and CentOS 7

How to enable Kdump on RHEL 7 and CentOS 7

Kdump is a kernel feature which is used to capture crash dumps when the system or kernel crash. For enabling kdump we have to reserve some portion of physical RAM which will be used to execute kdump kernel in the event of kernel panic or crash

When a kernel crash or kernel panic occurs then the running kernel runs ‘exec(dump kernel)‘ and it loads dump kernel from reserve memory and then contents of RAM and Swap is copied to more file either on local disk or on remote disk and finally reboot the box.

By analyzing the crash dumps we can find the reason or the root case of system failure. If you have OS support then you can share the crash dumps to the vendor for analysis.
In this article we will demonstrate how to enable kdump on RHEL 7 and CentOS 7

Step:1 Install ‘kexec-tools’ using the yum command

Use the below yum command to install ‘kexec-tools’ package in case it is not installed.
Use the below yum command to install ‘kexec-tools’ package in case it is not installed.

Step:2 Update the GRUB2 file to Reserve Memory for the Kdump kernel

Edit the GRUB2 file (/etc/default/grub), add the parameter ‘crashkernel=‘ in the line beginning with ‘GRUB_CMDLINE_LINUX‘
GRUB_CMDLINE_LINUX=”rd.lvm.lv=centos/swapvconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=128Mvconsole.keymap=us rhgb quiet”
Execute the below command to regenerate grub2 configuration.
[root@cloud ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
In case of UEFI firmware, use the below command
[root@cloud ~]# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Above command will inform bootlaoder to reserve 128 MB RAM after reboot.
Reboot the box now using below command :
[root@cloud ~]# shutdown -r now

Step:3 Update the dump location & default action in the file (/etc/kdump.conf)

To store crash dump or vmcore file on a local file system, edit the file ‘/etc/kdump.conf‘ and specify the location as per your setup. In my case i am using a separate local file system ( /var/crash). It is recommended that size of file system should be equivalent to the size of your system’s RAM or file system should have free space equivalent to the size of RAM. Kdump allows to compress the dump data using ‘core collector’ option (core_collector makedumpfile -c ) where -c is used for compression.
In case if kdump fails to store the dump file to specified location then default action will be performed which is mention in the default directive. In my case default action is reboot.
Update the below three directives in kdump.conf file.
[root@cloud ~]# vi /etc/kdump.conf

path /var/crash
core_collector makedumpfile -c
default reboot
Different Options to store dump :
Untitled-1

Step:4 Start and enable kdump service

[root@cloud ~]# systemctl start kdump.service
[root@cloud ~]# systemctl enable kdump.service
[root@cloud ~]#

Step:5 Now Test Kdump by manually crashing the system

Before crashing your system , please verify whether the kdump service is running or not using below command.
[root@cloud crash]# systemctl is-active kdump.service
[root@cloud crash]# service kdump status
To test our kdump configuration we will manually crash our system with below commands.
[root@cloud ~]# echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger

This will create a crash dump file (vmcore ) under ‘/var/crash‘ file system.

[root@cloud ~]# ls -lR /var/crash
/var/crash:
total 0
drwxr-xr-x. 2 root root 42 Mar 4 03:02 127.0.0.1-2016-03-04-03:02:17

/var/crash/127.0.0.1-2016-03-04-03:02:17:
total 135924
-rw-------. 1 root root 139147524 Mar 4 03:02 vmcore
-rw-r--r--. 1 root root 35640 Mar 4 03:02 vmcore-dmesg.txt
[root@cloud ~]#

Step:6 Use ‘crash’ command to analyze and debug crash dumps

Crash is the utility or command to debug and analyze the crash dump or vmcore file.

To use the crash, make sure two packages are installed : ‘crash & kernel-debuginfo‘

[root@cloud ~]# yum install crash
change ‘enbled=0’ to ‘enabled=1’
[root@cloud ~]# yum install kernel-debuginfo
Once the kernel-debuginfo is installed , then try to execute below crash command, it will give us a crash prompt where we can run commands to find process info , list of open files when the system got crashed.
[root@cloud ~]# crash /var/crash/127.0.0.1-2016-03-04-14\:20\:06/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux crash>

Type ‘ps‘ command to list the Process which were running when the system got crashed.

crash> ps
process-list-during-system-crash
To view the files that were open when system got crashed , type ‘files’ command at crash prompt.
crash> files
PID: 5577 TASK: ffff88007b44f300 CPU: 0 COMMAND: "bash"
ROOT: / CWD: /root
 FD FILE DENTRY INODE TYPE PATH
 0 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0
 1 ffff880036b73900 ffff880068c409c0 ffff8800794a8d10 REG /proc/sysrq-trigger
 2 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0
 10 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0
255 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0
crash>

Type ‘sys’ command to list the system info when it got crashed.

crash> sys
 KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.10.1.el7.x86_64/vmlinux
 DUMPFILE: /var/crash/127.0.0.1-2016-03-04-14:20:06/vmcore
 CPUS: 1
 DATE: Fri Mar 4 14:20:01 2016
 UPTIME: 00:02:00
LOAD AVERAGE: 0.75, 0.48, 0.19
 TASKS: 115
 NODENAME: cloud.linuxtechi.com
 RELEASE: 3.10.0-327.10.1.el7.x86_64
 VERSION: #1 SMP Tue Feb 16 17:03:50 UTC 2016
 MACHINE: x86_64 (2388 Mhz)
 MEMORY: 2 GB
 PANIC: "SysRq : Trigger a crash"
crash>

To get help of any command on crash prompt , type ‘help <command>‘ , example is shown below.

Untitled-2