All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Leng <lengchao@huawei.com>
To: Hannes Reinecke <hare@suse.de>, <mwilck@suse.com>,
	Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	Christoph Hellwig <hch@lst.de>
Cc: Daniel Wagner <dwagner@suse.de>, <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH v2] nvme: rdma/tcp: call nvme_mpath_stop() from reconnect workqueue
Date: Mon, 26 Apr 2021 10:31:10 +0800	[thread overview]
Message-ID: <9c65df32-bbf5-68b3-566e-3b79f7e6f893@huawei.com> (raw)
In-Reply-To: <65167282-84e7-d08b-f97d-edb0d1372a49@suse.de>



On 2021/4/25 19:34, Hannes Reinecke wrote:
> On 4/23/21 3:38 PM, mwilck@suse.com wrote:
>> From: Martin Wilck <mwilck@suse.com>
>>
>> We have observed a few crashes run_timer_softirq(), where a broken
>> timer_list struct belonging to an anatt_timer was encountered. The broken
>> structures look like this, and we see actually multiple ones attached to
>> the same timer base:
>>
>> crash> struct timer_list 0xffff92471bcfdc90
>> struct timer_list {
>>    entry = {
>>      next = 0xdead000000000122,  // LIST_POISON2
>>      pprev = 0x0
>>    },
>>    expires = 4296022933,
>>    function = 0xffffffffc06de5e0 <nvme_anatt_timeout>,
>>    flags = 20
>> }
>>
>> If such a timer is encountered in run_timer_softirq(), the kernel
>> crashes. The test scenario was an I/O load test with lots of NVMe
>> controllers, some of which were removed and re-added on the storage side.
>>
> ...
> 
> But isn't this the result of detach_timer()? IE this suspiciously looks like perfectly normal operation; is you look at expire_timers() we're first calling 'detach_timer()' before calling the timer function, ie every crash in the timer function would have this signature.
> And, incidentally, so would any timer function which does not crash.
> 
> Sorry to kill your analysis ...
> 
> This doesn't mean that the patch isn't valid (in the sense that it resolve the issue), but we definitely will need to work on root cause analysis.
The process maybe:1.ana_work add the timer;2.error recovery occurs,
in reconnecting, reinitialize the timer and call nvme_read_ana_log,
nvme_read_ana_log may add the timer again.
The same timer is added twice, crash will happens later.

Indeed ana_log_buf has the similar bug, it's been encountered in our testing.
To fix this bug, I also make the same patch and tested for more than 2 weeks.

This patch can fix the both bugs.
> 
> Cheera,
> 
> Hannes

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-04-26  2:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23 13:38 [PATCH v2] nvme: rdma/tcp: call nvme_mpath_stop() from reconnect workqueue mwilck
     [not found] ` <CAFL455k3aBLcZrZPq=Q-9aws4UesstA5gSOr_E7mEFrLT+KbKw@mail.gmail.com>
2021-04-23 16:43   ` Martin Wilck
2021-04-23 17:09     ` Martin Wilck
2021-04-24  0:21 ` Sagi Grimberg
2021-04-26 14:51   ` Christoph Hellwig
2021-04-26 16:27     ` Martin Wilck
2021-04-27  1:45       ` Chao Leng
2021-04-27  7:30         ` Martin Wilck
2021-04-27  8:56           ` Martin Wilck
2021-04-27  9:04   ` Martin Wilck
2021-04-25  1:07 ` Chao Leng
2021-04-25 11:34 ` Hannes Reinecke
2021-04-26  2:31   ` Chao Leng [this message]
2021-04-26 15:18     ` Martin Wilck
2021-04-26  9:34   ` Martin Wilck
2021-04-26 10:06     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c65df32-bbf5-68b3-566e-3b79f7e6f893@huawei.com \
    --to=lengchao@huawei.com \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mwilck@suse.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.