linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH]nvme-pci: Fixes EEH failure on ppc
       [not found]   ` <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com>
@ 2018-02-07  1:24     ` Ming Lei
  2018-02-07 20:19       ` wenxiong
  0 siblings, 1 reply; 2+ messages in thread
From: Ming Lei @ 2018-02-07  1:24 UTC (permalink / raw)
  To: wenxiong; +Cc: Keith Busch, wenxiong, linux-nvme, axboe, linux-kernel, wenxiong

On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote:
> On 2018-02-06 10:33, Keith Busch wrote:
> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@vmlinux.vnet.ibm.com
> > wrote:
> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return
> > > nvme_timeout(struct request *req, bool reserved)
> > >  	struct nvme_command cmd;
> > >  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
> > > 
> > > +	/* If PCI error recovery process is happening, we cannot reset or
> > > +	 * the recovery mechanism will surely fail.
> > > +	 */
> > > +	if (pci_channel_offline(to_pci_dev(dev->dev)))
> > > +		return BLK_EH_HANDLED;
> > > +
> > 
> > This patch will tell the block layer to complete the request and
> > consider
> > it a success, but it doesn't look like the command actually completed at
> > all. You're going to get data corruption this way, right? Is returning
> > BLK_EH_HANDLED immediately really the right thing to do here?
> 
> Hi Ming,
> 
> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in this
> case?

Hi Wenxiong,

Looks Keith is correct, and this timed out request will be completed by
block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO
isn't completed actually, so either data loss(write) or read failure is
caused.

Maybe BLK_EH_RESET_TIMER is fine under this situation.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH]nvme-pci: Fixes EEH failure on ppc
  2018-02-07  1:24     ` [PATCH]nvme-pci: Fixes EEH failure on ppc Ming Lei
@ 2018-02-07 20:19       ` wenxiong
  0 siblings, 0 replies; 2+ messages in thread
From: wenxiong @ 2018-02-07 20:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-kernel, linux-nvme, Keith Busch, wenxiong, wenxiong

On 2018-02-06 19:24, Ming Lei wrote:
> On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote:
>> On 2018-02-06 10:33, Keith Busch wrote:
>> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@vmlinux.vnet.ibm.com
>> > wrote:
>> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return
>> > > nvme_timeout(struct request *req, bool reserved)
>> > >  	struct nvme_command cmd;
>> > >  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
>> > >
>> > > +	/* If PCI error recovery process is happening, we cannot reset or
>> > > +	 * the recovery mechanism will surely fail.
>> > > +	 */
>> > > +	if (pci_channel_offline(to_pci_dev(dev->dev)))
>> > > +		return BLK_EH_HANDLED;
>> > > +
>> >
>> > This patch will tell the block layer to complete the request and
>> > consider
>> > it a success, but it doesn't look like the command actually completed at
>> > all. You're going to get data corruption this way, right? Is returning
>> > BLK_EH_HANDLED immediately really the right thing to do here?
>> 
>> Hi Ming,
>> 
>> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in 
>> this
>> case?
> 
> Hi Wenxiong,
> 
> Looks Keith is correct, and this timed out request will be completed by
> block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO
> isn't completed actually, so either data loss(write) or read failure is
> caused.
> 
> Maybe BLK_EH_RESET_TIMER is fine under this situation.
> 
> Thanks,
> Ming
> 
Hi Ming,

Thanks! I have tried with BLK_EH_RESET_TIMER and EEH recovery works 
fine. I am going to resubmit the patch.

Thanks,
Wendy

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-02-07 20:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1517867380-18790-1-git-send-email-wenxiong@vmlinux.vnet.ibm.com>
     [not found] ` <20180206163347.GG31110@localhost.localdomain>
     [not found]   ` <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com>
2018-02-07  1:24     ` [PATCH]nvme-pci: Fixes EEH failure on ppc Ming Lei
2018-02-07 20:19       ` wenxiong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).