All of lore.kernel.org
 help / color / mirror / Atom feed
* Need clarification
@ 2021-11-10 13:11 Lakshmi Narasimhan Sundararajan
  2021-11-11 12:23 ` Zdenek Kabelac
  0 siblings, 1 reply; 4+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2021-11-10 13:11 UTC (permalink / raw)
  To: lvm-devel

Hi LVM Team!
A very good day to you.

I have the following observation, and I need your inputs to understand behavior.

1/ create a volume group on a single block device.
2/ create a logical volume on the volume group.
3/ pump IO to the dm device
4/ while IOs are active, force kernel crash through the sysrq interface.

This results in a kernel hang. possibly because of IOs waiting to be
serviced still.
This behavior is seen over thin pool, thin device as well.

1/ Is this behavior known or understood well as to why the kernel does
not complete a shutdown?
2/ Is there any configuration with the lvm/dm layer that can allow the
kernel to proceed to complete shutdown and reboot failing those
incomplete IOs?

Please advise.

Thanks
LN



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Need clarification
  2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan
@ 2021-11-11 12:23 ` Zdenek Kabelac
  2021-11-11 16:22   ` Bryn M. Reeves
  0 siblings, 1 reply; 4+ messages in thread
From: Zdenek Kabelac @ 2021-11-11 12:23 UTC (permalink / raw)
  To: lvm-devel

Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> Hi LVM Team!
> A very good day to you.
> 
> I have the following observation, and I need your inputs to understand behavior.
> 
> 1/ create a volume group on a single block device.
> 2/ create a logical volume on the volume group.
> 3/ pump IO to the dm device
> 4/ while IOs are active, force kernel crash through the sysrq interface.
> 
> This results in a kernel hang. possibly because of IOs waiting to be
> serviced still.
> This behavior is seen over thin pool, thin device as well.
> 
> 1/ Is this behavior known or understood well as to why the kernel does
> not complete a shutdown?
> 2/ Is there any configuration with the lvm/dm layer that can allow the
> kernel to proceed to complete shutdown and reboot failing those
> incomplete IOs?
> 
> Please advise.

I'm pretty sure     'echo b > /proc/sysrq-trigger'  will reboot your kernel.

Other then that I'm not much sure I'm getting your point here.

You crash your kernel and then you are wondering why it's hang ??

When kernel crashes (Ooopses) - it's a situation that system has to be 
rebooted -  kernel cannot recover from such crash as its internal data 
structures cannot be trusted any more  (i.e. it's very much like if you 
user-space app core dumps and you don't expect, you will continue to run your 
text editor when you divide by 0).

Maybe you are not well describing what are you actually testing ?

My guess - maybe you are actually checking what happens on disk failure ?

Also you've completely forgetting to describe version of your kernel so there 
is simply way too many unknowns to give any sensible advice here...

Zdenek



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Need clarification
  2021-11-11 12:23 ` Zdenek Kabelac
@ 2021-11-11 16:22   ` Bryn M. Reeves
  2021-11-12  8:16     ` Lakshmi Narasimhan Sundararajan
  0 siblings, 1 reply; 4+ messages in thread
From: Bryn M. Reeves @ 2021-11-11 16:22 UTC (permalink / raw)
  To: lvm-devel

On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote:
> Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> > Hi LVM Team!
> > A very good day to you.
> > 
> > I have the following observation, and I need your inputs to understand behavior.
> > 
> > 1/ create a volume group on a single block device.
> > 2/ create a logical volume on the volume group.
> > 3/ pump IO to the dm device
> > 4/ while IOs are active, force kernel crash through the sysrq interface.
> > 
> > This results in a kernel hang. possibly because of IOs waiting to be
> > serviced still.
> > This behavior is seen over thin pool, thin device as well.
> > 
> > 1/ Is this behavior known or understood well as to why the kernel does
> > not complete a shutdown?
> > 2/ Is there any configuration with the lvm/dm layer that can allow the
> > kernel to proceed to complete shutdown and reboot failing those
> > incomplete IOs?
> > 
> > Please advise.
> 
> I'm pretty sure     'echo b > /proc/sysrq-trigger'  will reboot your kernel.

It sounds like they are using the 'c' sysrq command; this invokes a
crash (used to be just dereferencing a NULL pointer) and is commonly
used to test configurations like kdump, or automatic reboot on panic.

>From what the reporter is saying it sounds like that is not happening in
the described scenario although presumably it has been configured?

After the 'c' it's not possible to issue further sysrq commands via the
/proc interface, so the 'b' would have to be invoked via sysrq keyboard
commands.

> Other then that I'm not much sure I'm getting your point here.
> 
> You crash your kernel and then you are wondering why it's hang ??

If you have kdump enabled, or /proc/sys/kernel/panic is set to a value
that is > 0 then this is definitely not expected behaviour.

In this case it's often useful to enable keyboard sysrq (since those can
be issued following the 'echo c > /proc/sysrq-trigger'), and to carry
out a thread dump (sysrq-t) to print all stacks to the console. To be
useful this generally needs a serial console or some other mechanism to
capture the large volume of messages for analysis.

Regards,
Bryn.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Need clarification
  2021-11-11 16:22   ` Bryn M. Reeves
@ 2021-11-12  8:16     ` Lakshmi Narasimhan Sundararajan
  0 siblings, 0 replies; 4+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2021-11-12  8:16 UTC (permalink / raw)
  To: lvm-devel

I am testing out recovery of the node when using dm devices as storage
drives and a kernel crash for any reason.
My expectation was the system would be able to sync and unmount
filesystems during a kernel crash forced through (echo c >
/proc/sysrq-trigger).
But the system did not complete shutdown and got hung.
It is generally true that if there is a kernel crash then the internal
structures may become inconsistent, but with forced reboot through
sysrq-trigger and option 'c', it is guaranteed none of the internal
structures are inconsistent, so I was surprised that sync/unmount over
dm devices
could not complete.

Based on your inputs, I was able to root cause the issue to incorrect
kdump configuration on the kernel.
It looks like my system has hit this bug
https://bugzilla.redhat.com/show_bug.cgi?id=1510654

Below are the background details.
1/ linux kernel version
[root at ip-10-13-11-45 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root at ip-10-13-11-45 ~]# uname -r
5.7.12-1.el7.elrepo.x86_64
[root at ip-10-13-11-45 ~]#



2/ kdump is configured with 'auto'
Linux kernel configuration defined as below with crashkernel=auto option.
menuentry 'CentOS Linux (5.7.12-1.el7.elrepo.x86_64) 7 (Core)' --class
centos --class gnu-linux --class gnu --class os --unrestricted
$menuentry_id_option
'gnulinux-5.7.12-1.el7.elrepo.x86_64-advanced-adf91006-76ef-4113-850a-090c2be36f20'
{
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root
--hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1
--hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'
691ac0a0-e4a2-40e3-b3ad-a720d630f49b
        else
          search --no-floppy --fs-uuid --set=root
691ac0a0-e4a2-40e3-b3ad-a720d630f49b
        fi
        linux16 /vmlinuz-5.7.12-1.el7.elrepo.x86_64
root=UUID=adf91006-76ef-4113-850a-090c2be36f20 ro crashkernel=auto
rhgb quiet biosdevname=0 net.ifnames=0
        initrd16 /initramfs-5.7.12-1.el7.elrepo.x86_64.img
}

Many thanks for your inputs.
Regards

On Thu, Nov 11, 2021 at 9:52 PM Bryn M. Reeves <breeves@redhat.com> wrote:
>
> On Thu, Nov 11, 2021 at 01:23:03PM +0100, Zdenek Kabelac wrote:
> > Dne 10. 11. 21 v 14:11 Lakshmi Narasimhan Sundararajan napsal(a):
> > > Hi LVM Team!
> > > A very good day to you.
> > >
> > > I have the following observation, and I need your inputs to understand behavior.
> > >
> > > 1/ create a volume group on a single block device.
> > > 2/ create a logical volume on the volume group.
> > > 3/ pump IO to the dm device
> > > 4/ while IOs are active, force kernel crash through the sysrq interface.
> > >
> > > This results in a kernel hang. possibly because of IOs waiting to be
> > > serviced still.
> > > This behavior is seen over thin pool, thin device as well.
> > >
> > > 1/ Is this behavior known or understood well as to why the kernel does
> > > not complete a shutdown?
> > > 2/ Is there any configuration with the lvm/dm layer that can allow the
> > > kernel to proceed to complete shutdown and reboot failing those
> > > incomplete IOs?
> > >
> > > Please advise.
> >
> > I'm pretty sure     'echo b > /proc/sysrq-trigger'  will reboot your kernel.
>
> It sounds like they are using the 'c' sysrq command; this invokes a
> crash (used to be just dereferencing a NULL pointer) and is commonly
> used to test configurations like kdump, or automatic reboot on panic.
>
> From what the reporter is saying it sounds like that is not happening in
> the described scenario although presumably it has been configured?
>
> After the 'c' it's not possible to issue further sysrq commands via the
> /proc interface, so the 'b' would have to be invoked via sysrq keyboard
> commands.
>
> > Other then that I'm not much sure I'm getting your point here.
> >
> > You crash your kernel and then you are wondering why it's hang ??
>
> If you have kdump enabled, or /proc/sys/kernel/panic is set to a value
> that is > 0 then this is definitely not expected behaviour.
>
> In this case it's often useful to enable keyboard sysrq (since those can
> be issued following the 'echo c > /proc/sysrq-trigger'), and to carry
> out a thread dump (sysrq-t) to print all stacks to the console. To be
> useful this generally needs a serial console or some other mechanism to
> capture the large volume of messages for analysis.
>
> Regards,
> Bryn.
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-12  8:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 13:11 Need clarification Lakshmi Narasimhan Sundararajan
2021-11-11 12:23 ` Zdenek Kabelac
2021-11-11 16:22   ` Bryn M. Reeves
2021-11-12  8:16     ` Lakshmi Narasimhan Sundararajan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.