* Need clarification @ 2021-11-10 13:11 Lakshmi Narasimhan Sundararajan 2021-11-11 12:23 ` Zdenek Kabelac 0 siblings, 1 reply; 4+ messages in thread From: Lakshmi Narasimhan Sundararajan @ 2021-11-10 13:11 UTC (permalink / raw) To: lvm-devel Hi LVM Team! A very good day to you. I have the following observation, and I need your inputs to understand behavior. 1/ create a volume group on a single block device. 2/ create a logical volume on the volume group. 3/ pump IO to the dm device 4/ while IOs are active, force kernel crash through the sysrq interface. This results in a kernel hang. possibly because of IOs waiting to be serviced still. This behavior is seen over thin pool, thin device as well. 1/ Is this behavior known or understood well as to why the kernel does not complete a shutdown? 2/ Is there any configuration with the lvm/dm layer that can allow the kernel to proceed to complete shutdown and reboot failing those incomplete IOs? Please advise. Thanks LN ^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification 2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan @ 2021-11-11 12:23 ` Zdenek Kabelac 2021-11-11 16:22 ` Bryn M. Reeves 0 siblings, 1 reply; 4+ messages in thread From: Zdenek Kabelac @ 2021-11-11 12:23 UTC (permalink / raw) To: lvm-devel Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a): > Hi LVM Team! > A very good day to you. > > I have the following observation, and I need your inputs to understand behavior. > > 1/ create a volume group on a single block device. > 2/ create a logical volume on the volume group. > 3/ pump IO to the dm device > 4/ while IOs are active, force kernel crash through the sysrq interface. > > This results in a kernel hang. possibly because of IOs waiting to be > serviced still. > This behavior is seen over thin pool, thin device as well. > > 1/ Is this behavior known or understood well as to why the kernel does > not complete a shutdown? > 2/ Is there any configuration with the lvm/dm layer that can allow the > kernel to proceed to complete shutdown and reboot failing those > incomplete IOs? > > Please advise. I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel. Other then that I'm not much sure I'm getting your point here. You crash your kernel and then you are wondering why it's hang ?? When kernel crashes (Ooopses) - it's a situation that system has to be rebooted - kernel cannot recover from such crash as its internal data structures cannot be trusted any more (i.e. it's very much like if you user-space app core dumps and you don't expect, you will continue to run your text editor when you divide by 0). Maybe you are not well describing what are you actually testing ? My guess - maybe you are actually checking what happens on disk failure ? Also you've completely forgetting to describe version of your kernel so there is simply way too many unknowns to give any sensible advice here... Zdenek ^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification 2021-11-11 12:23 ` Zdenek Kabelac @ 2021-11-11 16:22 ` Bryn M. Reeves 2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan 0 siblings, 1 reply; 4+ messages in thread From: Bryn M. Reeves @ 2021-11-11 16:22 UTC (permalink / raw) To: lvm-devel On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote: > Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a): > > Hi LVM Team! > > A very good day to you. > > > > I have the following observation, and I need your inputs to understand behavior. > > > > 1/ create a volume group on a single block device. > > 2/ create a logical volume on the volume group. > > 3/ pump IO to the dm device > > 4/ while IOs are active, force kernel crash through the sysrq interface. > > > > This results in a kernel hang. possibly because of IOs waiting to be > > serviced still. > > This behavior is seen over thin pool, thin device as well. > > > > 1/ Is this behavior known or understood well as to why the kernel does > > not complete a shutdown? > > 2/ Is there any configuration with the lvm/dm layer that can allow the > > kernel to proceed to complete shutdown and reboot failing those > > incomplete IOs? > > > > Please advise. > > I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel. It sounds like they are using the 'c' sysrq command; this invokes a crash (used to be just dereferencing a NULL pointer) and is commonly used to test configurations like kdump, or automatic reboot on panic. >From what the reporter is saying it sounds like that is not happening in the described scenario although presumably it has been configured? After the 'c' it's not possible to issue further sysrq commands via the /proc interface, so the 'b' would have to be invoked via sysrq keyboard commands. > Other then that I'm not much sure I'm getting your point here. > > You crash your kernel and then you are wondering why it's hang ?? If you have kdump enabled, or /proc/sys/kernel/panic is set to a value that is > 0 then this is definitely not expected behaviour. In this case it's often useful to enable keyboard sysrq (since those can be issued following the 'echo c > /proc/sysrq-trigger'), and to carry out a thread dump (sysrq-t) to print all stacks to the console. To be useful this generally needs a serial console or some other mechanism to capture the large volume of messages for analysis. Regards, Bryn. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification 2021-11-11 16:22 ` Bryn M. Reeves @ 2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan 0 siblings, 0 replies; 4+ messages in thread From: Lakshmi Narasimhan Sundararajan @ 2021-11-12 8:16 UTC (permalink / raw) To: lvm-devel I am testing out recovery of the node when using dm devices as storage drives and a kernel crash for any reason. My expectation was the system would be able to sync and unmount filesystems during a kernel crash forced through (echo c > /proc/sysrq-trigger). But the system did not complete shutdown and got hung. It is generally true that if there is a kernel crash then the internal structures may become inconsistent, but with forced reboot through sysrq-trigger and option 'c', it is guaranteed none of the internal structures are inconsistent, so I was surprised that sync/unmount over dm devices could not complete. Based on your inputs, I was able to root cause the issue to incorrect kdump configuration on the kernel. It looks like my system has hit this bug https://bugzilla.redhat.com/show_bug.cgi?id=1510654 Below are the background details. 1/ linux kernel version [root at ip-10-13-11-45 ~]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root at ip-10-13-11-45 ~]# uname -r 5.7.12-1.el7.elrepo.x86_64 [root at ip-10-13-11-45 ~]# 2/ kdump is configured with 'auto' Linux kernel configuration defined as below with crashkernel=auto option. menuentry 'CentOS Linux (5.7.12-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.7.12-1.el7.elrepo.x86_64-advanced-adf91006-76ef-4113-850a-090c2be36f20' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 set root='hd0,msdos1' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1' 691ac0a0-e4a2-40e3-b3ad-a720d630f49b else search --no-floppy --fs-uuid --set=root 691ac0a0-e4a2-40e3-b3ad-a720d630f49b fi linux16 /vmlinuz-5.7.12-1.el7.elrepo.x86_64 root=UUID=adf91006-76ef-4113-850a-090c2be36f20 ro crashkernel=auto rhgb quiet biosdevname=0 net.ifnames=0 initrd16 /initramfs-5.7.12-1.el7.elrepo.x86_64.img } Many thanks for your inputs. Regards On Thu, Nov 11, 2021 at 9:52 PM Bryn M. Reeves <breeves@redhat.com> wrote: > > On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote: > > Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a): > > > Hi LVM Team! > > > A very good day to you. > > > > > > I have the following observation, and I need your inputs to understand behavior. > > > > > > 1/ create a volume group on a single block device. > > > 2/ create a logical volume on the volume group. > > > 3/ pump IO to the dm device > > > 4/ while IOs are active, force kernel crash through the sysrq interface. > > > > > > This results in a kernel hang. possibly because of IOs waiting to be > > > serviced still. > > > This behavior is seen over thin pool, thin device as well. > > > > > > 1/ Is this behavior known or understood well as to why the kernel does > > > not complete a shutdown? > > > 2/ Is there any configuration with the lvm/dm layer that can allow the > > > kernel to proceed to complete shutdown and reboot failing those > > > incomplete IOs? > > > > > > Please advise. > > > > I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel. > > It sounds like they are using the 'c' sysrq command; this invokes a > crash (used to be just dereferencing a NULL pointer) and is commonly > used to test configurations like kdump, or automatic reboot on panic. > > From what the reporter is saying it sounds like that is not happening in > the described scenario although presumably it has been configured? > > After the 'c' it's not possible to issue further sysrq commands via the > /proc interface, so the 'b' would have to be invoked via sysrq keyboard > commands. > > > Other then that I'm not much sure I'm getting your point here. > > > > You crash your kernel and then you are wondering why it's hang ?? > > If you have kdump enabled, or /proc/sys/kernel/panic is set to a value > that is > 0 then this is definitely not expected behaviour. > > In this case it's often useful to enable keyboard sysrq (since those can > be issued following the 'echo c > /proc/sysrq-trigger'), and to carry > out a thread dump (sysrq-t) to print all stacks to the console. To be > useful this generally needs a serial console or some other mechanism to > capture the large volume of messages for analysis. > > Regards, > Bryn. > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-11-12 8:16 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan 2021-11-11 12:23 ` Zdenek Kabelac 2021-11-11 16:22 ` Bryn M. Reeves 2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.