linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* kernel panic due to a missing work initialization in case of zero kato value
@ 2021-04-20 18:36 Engel, Amit
  2021-04-20 22:24 ` Chaitanya Kulkarni
  2021-04-21  2:32 ` kernel panic due to a missing work initialization in case of zero Hou Pu
  0 siblings, 2 replies; 5+ messages in thread
From: Engel, Amit @ 2021-04-20 18:36 UTC (permalink / raw)
  To: sagi, linux-nvme; +Cc: Engel, Amit

Hello,

We hit a kernel panic as a result of the below sequence:
In the current nvmet implementation, as part of 'nvmet_start_keep_alive_timer'
nvmet_keep_alive_timer work will be initialized only if kato != 0

when nvme connect cmd is being executed with a zero kato value
'INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer)' will not be called

once keep alive cmd arrives, we call 'mod_delayed_work' for a work that has not been initialized
this will lead to kernel WARNING:
Apr 20 10:32:59 FNM00190700796-A kernel: WARNING: CPU: 11 PID: 75133 at kernel/workqueue.c:1447 __queue_work.cold.55+0xc/0x3c
And eventually to soft lockup

A simple fix for this issue (I will post a patch soon) is to initialize the work (as part of 'nvmet_start_keep_alive_timer') even if kato == 0

Thanks
Amit E


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel panic due to a missing work initialization in case of zero kato value
  2021-04-20 18:36 kernel panic due to a missing work initialization in case of zero kato value Engel, Amit
@ 2021-04-20 22:24 ` Chaitanya Kulkarni
  2021-04-21  2:32 ` kernel panic due to a missing work initialization in case of zero Hou Pu
  1 sibling, 0 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2021-04-20 22:24 UTC (permalink / raw)
  To: Engel, Amit, sagi, linux-nvme

On 4/20/21 11:46, Engel, Amit wrote:
> A simple fix for this issue (I will post a patch soon) is to initialize the work (as part of 'nvmet_start_keep_alive_timer') even if kato == 0
>
> Thanks
> Amit 

This may need a fixes tag.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* kernel panic due to a missing work initialization in case of zero
  2021-04-20 18:36 kernel panic due to a missing work initialization in case of zero kato value Engel, Amit
  2021-04-20 22:24 ` Chaitanya Kulkarni
@ 2021-04-21  2:32 ` Hou Pu
  2021-04-21 13:29   ` Engel, Amit
  1 sibling, 1 reply; 5+ messages in thread
From: Hou Pu @ 2021-04-21  2:32 UTC (permalink / raw)
  To: amit.engel; +Cc: linux-nvme, sagi

On 4/20/21 11:46, Engel, Amit wrote:
> Hello,
> 
> We hit a kernel panic as a result of the below sequence:
> In the current nvmet implementation, as part of 'nvmet_start_keep_alive_timer'
> nvmet_keep_alive_timer work will be initialized only if kato != 0
> 
> when nvme connect cmd is being executed with a zero kato value
> 'INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer)' will not be called
> 
> once keep alive cmd arrives, we call 'mod_delayed_work' for a work that has not been initialized
> this will lead to kernel WARNING:
> Apr 20 10:32:59 FNM00190700796-A kernel: WARNING: CPU: 11 PID: 75133 at kernel/workqueue.c:1447 __queue_work.cold.55+0xc/0x3c
> And eventually to soft lockup

Hello Engel,

Could you verify this with latest nvme-5.13 branch? I think this
might be the same problem as commit 7b96918a173 (nvmet: avoid
queuing keep-alive timer if it is disabled) fixed.

Thanks,
Hou

> 
> A simple fix for this issue (I will post a patch soon) is to initialize the work (as part of 'nvmet_start_keep_alive_timer') even if kato == 0
> 
> Thanks
> Amit E

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: kernel panic due to a missing work initialization in case of zero
  2021-04-21  2:32 ` kernel panic due to a missing work initialization in case of zero Hou Pu
@ 2021-04-21 13:29   ` Engel, Amit
  2021-04-22 12:16     ` Hou Pu
  0 siblings, 1 reply; 5+ messages in thread
From: Engel, Amit @ 2021-04-21 13:29 UTC (permalink / raw)
  To: Hou Pu; +Cc: linux-nvme, sagi

Hi Hou,
Yes, commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixes the panic we hit.

One comment:
It might be more elegant to move 
INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
From nvmet_start_keep_alive_timer To nvmet_alloc_ctrl
This way, we will not INIT ka_work each time the keep alive timer is started
(each nvmet_set_feat_kato for example, will start_keep_alive_timer)
IMO it make more sense to INIT_DELAYED_WORK only once (as part of alloc_ctrl)

Let me know what you think and if you want me to provide this minor change

Thanks
Amit

-----Original Message-----
From: Hou Pu <houpu.main@gmail.com> 
Sent: Wednesday, April 21, 2021 5:32 AM
To: Engel, Amit
Cc: linux-nvme@lists.infradead.org; sagi@grimberg.me
Subject: kernel panic due to a missing work initialization in case of zero


[EXTERNAL EMAIL] 

On 4/20/21 11:46, Engel, Amit wrote:
> Hello,
> 
> We hit a kernel panic as a result of the below sequence:
> In the current nvmet implementation, as part of 'nvmet_start_keep_alive_timer'
> nvmet_keep_alive_timer work will be initialized only if kato != 0
> 
> when nvme connect cmd is being executed with a zero kato value 
> 'INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer)' will not 
> be called
> 
> once keep alive cmd arrives, we call 'mod_delayed_work' for a work 
> that has not been initialized this will lead to kernel WARNING:
> Apr 20 10:32:59 FNM00190700796-A kernel: WARNING: CPU: 11 PID: 75133 
> at kernel/workqueue.c:1447 __queue_work.cold.55+0xc/0x3c And 
> eventually to soft lockup

Hello Engel,

Could you verify this with latest nvme-5.13 branch? I think this might be the same problem as commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixed.

Thanks,
Hou

> 
> A simple fix for this issue (I will post a patch soon) is to 
> initialize the work (as part of 'nvmet_start_keep_alive_timer') even 
> if kato == 0
> 
> Thanks
> Amit E

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel panic due to a missing work initialization in case of zero
  2021-04-21 13:29   ` Engel, Amit
@ 2021-04-22 12:16     ` Hou Pu
  0 siblings, 0 replies; 5+ messages in thread
From: Hou Pu @ 2021-04-22 12:16 UTC (permalink / raw)
  To: Engel, Amit; +Cc: linux-nvme, sagi

On Wed, Apr 21, 2021 at 9:29 PM Engel, Amit <Amit.Engel@dell.com> wrote:
>
> Hi Hou,
> Yes, commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixes the panic we hit.
Thanks.
>
> One comment:
> It might be more elegant to move
> INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
> From nvmet_start_keep_alive_timer To nvmet_alloc_ctrl
> This way, we will not INIT ka_work each time the keep alive timer is started
> (each nvmet_set_feat_kato for example, will start_keep_alive_timer)
> IMO it make more sense to INIT_DELAYED_WORK only once (as part of alloc_ctrl)
>
> Let me know what you think and if you want me to provide this minor change
>

Yes, this makes more sense AFAIK.
I'm OK with it.

Thanks,
Hou

> Thanks
> Amit
>
> -----Original Message-----
> From: Hou Pu <houpu.main@gmail.com>
> Sent: Wednesday, April 21, 2021 5:32 AM
> To: Engel, Amit
> Cc: linux-nvme@lists.infradead.org; sagi@grimberg.me
> Subject: kernel panic due to a missing work initialization in case of zero
>
>
> [EXTERNAL EMAIL]
>
> On 4/20/21 11:46, Engel, Amit wrote:
> > Hello,
> >
> > We hit a kernel panic as a result of the below sequence:
> > In the current nvmet implementation, as part of 'nvmet_start_keep_alive_timer'
> > nvmet_keep_alive_timer work will be initialized only if kato != 0
> >
> > when nvme connect cmd is being executed with a zero kato value
> > 'INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer)' will not
> > be called
> >
> > once keep alive cmd arrives, we call 'mod_delayed_work' for a work
> > that has not been initialized this will lead to kernel WARNING:
> > Apr 20 10:32:59 FNM00190700796-A kernel: WARNING: CPU: 11 PID: 75133
> > at kernel/workqueue.c:1447 __queue_work.cold.55+0xc/0x3c And
> > eventually to soft lockup
>
> Hello Engel,
>
> Could you verify this with latest nvme-5.13 branch? I think this might be the same problem as commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixed.
>
> Thanks,
> Hou
>
> >
> > A simple fix for this issue (I will post a patch soon) is to
> > initialize the work (as part of 'nvmet_start_keep_alive_timer') even
> > if kato == 0
> >
> > Thanks
> > Amit E

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-22 12:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 18:36 kernel panic due to a missing work initialization in case of zero kato value Engel, Amit
2021-04-20 22:24 ` Chaitanya Kulkarni
2021-04-21  2:32 ` kernel panic due to a missing work initialization in case of zero Hou Pu
2021-04-21 13:29   ` Engel, Amit
2021-04-22 12:16     ` Hou Pu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).