* Need clarification
@ 2021-11-10 13:11 Lakshmi Narasimhan Sundararajan
2021-11-11 12:23 ` Zdenek Kabelac
0 siblings, 1 reply; 4+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2021-11-10 13:11 UTC (permalink / raw)
To: lvm-devel
Hi LVM Team!
A very good day to you.
I have the following observation, and I need your inputs to understand behavior.
1/ create a volume group on a single block device.
2/ create a logical volume on the volume group.
3/ pump IO to the dm device
4/ while IOs are active, force kernel crash through the sysrq interface.
This results in a kernel hang. possibly because of IOs waiting to be
serviced still.
This behavior is seen over thin pool, thin device as well.
1/ Is this behavior known or understood well as to why the kernel does
not complete a shutdown?
2/ Is there any configuration with the lvm/dm layer that can allow the
kernel to proceed to complete shutdown and reboot failing those
incomplete IOs?
Please advise.
Thanks
LN
^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification
2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan
@ 2021-11-11 12:23 ` Zdenek Kabelac
2021-11-11 16:22 ` Bryn M. Reeves
0 siblings, 1 reply; 4+ messages in thread
From: Zdenek Kabelac @ 2021-11-11 12:23 UTC (permalink / raw)
To: lvm-devel
Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> Hi LVM Team!
> A very good day to you.
>
> I have the following observation, and I need your inputs to understand behavior.
>
> 1/ create a volume group on a single block device.
> 2/ create a logical volume on the volume group.
> 3/ pump IO to the dm device
> 4/ while IOs are active, force kernel crash through the sysrq interface.
>
> This results in a kernel hang. possibly because of IOs waiting to be
> serviced still.
> This behavior is seen over thin pool, thin device as well.
>
> 1/ Is this behavior known or understood well as to why the kernel does
> not complete a shutdown?
> 2/ Is there any configuration with the lvm/dm layer that can allow the
> kernel to proceed to complete shutdown and reboot failing those
> incomplete IOs?
>
> Please advise.
I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel.
Other then that I'm not much sure I'm getting your point here.
You crash your kernel and then you are wondering why it's hang ??
When kernel crashes (Ooopses) - it's a situation that system has to be
rebooted - kernel cannot recover from such crash as its internal data
structures cannot be trusted any more (i.e. it's very much like if you
user-space app core dumps and you don't expect, you will continue to run your
text editor when you divide by 0).
Maybe you are not well describing what are you actually testing ?
My guess - maybe you are actually checking what happens on disk failure ?
Also you've completely forgetting to describe version of your kernel so there
is simply way too many unknowns to give any sensible advice here...
Zdenek
^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification
2021-11-11 12:23 ` Zdenek Kabelac
@ 2021-11-11 16:22 ` Bryn M. Reeves
2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan
0 siblings, 1 reply; 4+ messages in thread
From: Bryn M. Reeves @ 2021-11-11 16:22 UTC (permalink / raw)
To: lvm-devel
On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote:
> Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> > Hi LVM Team!
> > A very good day to you.
> >
> > I have the following observation, and I need your inputs to understand behavior.
> >
> > 1/ create a volume group on a single block device.
> > 2/ create a logical volume on the volume group.
> > 3/ pump IO to the dm device
> > 4/ while IOs are active, force kernel crash through the sysrq interface.
> >
> > This results in a kernel hang. possibly because of IOs waiting to be
> > serviced still.
> > This behavior is seen over thin pool, thin device as well.
> >
> > 1/ Is this behavior known or understood well as to why the kernel does
> > not complete a shutdown?
> > 2/ Is there any configuration with the lvm/dm layer that can allow the
> > kernel to proceed to complete shutdown and reboot failing those
> > incomplete IOs?
> >
> > Please advise.
>
> I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel.
It sounds like they are using the 'c' sysrq command; this invokes a
crash (used to be just dereferencing a NULL pointer) and is commonly
used to test configurations like kdump, or automatic reboot on panic.
>From what the reporter is saying it sounds like that is not happening in
the described scenario although presumably it has been configured?
After the 'c' it's not possible to issue further sysrq commands via the
/proc interface, so the 'b' would have to be invoked via sysrq keyboard
commands.
> Other then that I'm not much sure I'm getting your point here.
>
> You crash your kernel and then you are wondering why it's hang ??
If you have kdump enabled, or /proc/sys/kernel/panic is set to a value
that is > 0 then this is definitely not expected behaviour.
In this case it's often useful to enable keyboard sysrq (since those can
be issued following the 'echo c > /proc/sysrq-trigger'), and to carry
out a thread dump (sysrq-t) to print all stacks to the console. To be
useful this generally needs a serial console or some other mechanism to
capture the large volume of messages for analysis.
Regards,
Bryn.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Need clarification
2021-11-11 16:22 ` Bryn M. Reeves
@ 2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan
0 siblings, 0 replies; 4+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2021-11-12 8:16 UTC (permalink / raw)
To: lvm-devel
I am testing out recovery of the node when using dm devices as storage
drives and a kernel crash for any reason.
My expectation was the system would be able to sync and unmount
filesystems during a kernel crash forced through (echo c >
/proc/sysrq-trigger).
But the system did not complete shutdown and got hung.
It is generally true that if there is a kernel crash then the internal
structures may become inconsistent, but with forced reboot through
sysrq-trigger and option 'c', it is guaranteed none of the internal
structures are inconsistent, so I was surprised that sync/unmount over
dm devices
could not complete.
Based on your inputs, I was able to root cause the issue to incorrect
kdump configuration on the kernel.
It looks like my system has hit this bug
https://bugzilla.redhat.com/show_bug.cgi?id=1510654
Below are the background details.
1/ linux kernel version
[root at ip-10-13-11-45 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
[root at ip-10-13-11-45 ~]# uname -r
5.7.12-1.el7.elrepo.x86_64
[root at ip-10-13-11-45 ~]#
2/ kdump is configured with 'auto'
Linux kernel configuration defined as below with crashkernel=auto option.
menuentry 'CentOS Linux (5.7.12-1.el7.elrepo.x86_64) 7 (Core)' --class
centos --class gnu-linux --class gnu --class os --unrestricted
$menuentry_id_option
'gnulinux-5.7.12-1.el7.elrepo.x86_64-advanced-adf91006-76ef-4113-850a-090c2be36f20'
{
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root
--hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1
--hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'
691ac0a0-e4a2-40e3-b3ad-a720d630f49b
else
search --no-floppy --fs-uuid --set=root
691ac0a0-e4a2-40e3-b3ad-a720d630f49b
fi
linux16 /vmlinuz-5.7.12-1.el7.elrepo.x86_64
root=UUID=adf91006-76ef-4113-850a-090c2be36f20 ro crashkernel=auto
rhgb quiet biosdevname=0 net.ifnames=0
initrd16 /initramfs-5.7.12-1.el7.elrepo.x86_64.img
}
Many thanks for your inputs.
Regards
On Thu, Nov 11, 2021 at 9:52 PM Bryn M. Reeves <breeves@redhat.com> wrote:
>
> On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote:
> > Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> > > Hi LVM Team!
> > > A very good day to you.
> > >
> > > I have the following observation, and I need your inputs to understand behavior.
> > >
> > > 1/ create a volume group on a single block device.
> > > 2/ create a logical volume on the volume group.
> > > 3/ pump IO to the dm device
> > > 4/ while IOs are active, force kernel crash through the sysrq interface.
> > >
> > > This results in a kernel hang. possibly because of IOs waiting to be
> > > serviced still.
> > > This behavior is seen over thin pool, thin device as well.
> > >
> > > 1/ Is this behavior known or understood well as to why the kernel does
> > > not complete a shutdown?
> > > 2/ Is there any configuration with the lvm/dm layer that can allow the
> > > kernel to proceed to complete shutdown and reboot failing those
> > > incomplete IOs?
> > >
> > > Please advise.
> >
> > I'm pretty sure 'echo b > /proc/sysrq-trigger' will reboot your kernel.
>
> It sounds like they are using the 'c' sysrq command; this invokes a
> crash (used to be just dereferencing a NULL pointer) and is commonly
> used to test configurations like kdump, or automatic reboot on panic.
>
> From what the reporter is saying it sounds like that is not happening in
> the described scenario although presumably it has been configured?
>
> After the 'c' it's not possible to issue further sysrq commands via the
> /proc interface, so the 'b' would have to be invoked via sysrq keyboard
> commands.
>
> > Other then that I'm not much sure I'm getting your point here.
> >
> > You crash your kernel and then you are wondering why it's hang ??
>
> If you have kdump enabled, or /proc/sys/kernel/panic is set to a value
> that is > 0 then this is definitely not expected behaviour.
>
> In this case it's often useful to enable keyboard sysrq (since those can
> be issued following the 'echo c > /proc/sysrq-trigger'), and to carry
> out a thread dump (sysrq-t) to print all stacks to the console. To be
> useful this generally needs a serial console or some other mechanism to
> capture the large volume of messages for analysis.
>
> Regards,
> Bryn.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-11-12 8:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan
2021-11-11 12:23 ` Zdenek Kabelac
2021-11-11 16:22 ` Bryn M. Reeves
2021-11-12 8:16 ` Lakshmi Narasimhan Sundararajan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.