linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: <linasvepstas@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	<linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH] pci-error-recover: doc cleanup
Date: Fri, 9 Dec 2016 15:59:44 +0800	[thread overview]
Message-ID: <584A6470.60502@cn.fujitsu.com> (raw)
In-Reply-To: <CAHrUA36r3o3ziEdMz-8=w5XTymsMQZYRXXrCt=H+1F3M4+6RnQ@mail.gmail.com>



On 12/09/2016 02:44 PM, Linas Vepstas wrote:
> On Fri, Dec 9, 2016 at 2:37 PM, Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>
>>
>> On 12/09/2016 02:24 PM, Linas Vepstas wrote:
>>> I suppose I'm confused, but I recall that link resets are non-fatal.
>>> Fatal errors typically require that the the pci adapter be completely
>>> reset, any adapter firmware to be reloaded from scratch, the device
>>> driver has to kill all device state and start from scratch. Its huge.
>>> If the fatal error is on pci device that is under a block device
>>> holding a file system, then (usually) there is no way to recover,
>>> because the block layer (and file system) cannot deal with a block
>>> device that disappeared and then reappeared some few seconds later.
>>> (maybe some future zfs or lvm or btrfs might be able to deal with
>>> this, but not today)
>>>
>>> By contrast, link resets are far more gentle: the device driver might
>>> have to discard some half-full FIFO's, or cancel some in-flight
>>> commands, but can otherwise gracefully recover without telling the
>>> higher layers that there were any problems.
>>>
>>> --linas
>>>
>>
>> I am little confused too, even not sure if we are talking the same
>> *fatal error*, I am talking the fatal error defined in PCI Express spec,
>> chapter 6.2.2.2.1:
>>
>> Fatal errors are uncorrectable error conditions which render the
>> particular Link and related hardware unreliable. For Fatal errors, a
>> reset of the components on the Link may be required to return to
>> reliable operation. Platform handling of Fatal errors, and any efforts
>> to limit the effects of these errors, is platform implementation specific.
>>
>> Link reset means set *secondary bus reset* bit in pci bridge config
>> space, can reset the link and device simultaneously, is the strongest
>> kind of reset as I know.
> 
> OK, well, its been far too many years, and I don't have the PCI spec
> at my fingertips.
> Isn't there a link reset that can be performed, without forcing a device reset?
> 

At least I don't find the exact words saying that.

-- 
Sincerely,
Cao jin

> The intent was that some PCI link errors are due to vibration,
> ground-bounce, humidity, etc. and that these errors can be detected
> and do not corrupt the device state or the device driver state.  Since
> they are not associated with data corruption (or rather, the
> corruption is local to the link), these can be recovered by reseting
> just the link, without resetting the whole adapter. They may require
> reseting some device-driver state, but not all of it.
> 
> However, this was all decided before the PCI-E spec was written, so
> maybe the newer PCI-E specs now say something different.
> 
> --linas
> 
>>
>>> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>>>
>>>>
>>>> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>>>>> On Thu, 8 Dec 2016 16:16:14 +0800
>>>>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>>>>
>>>>>>  The platform resets the link, and then calls the link_reset() callback
>>>>>>  on all affected device drivers.  This is a PCI-Express specific state
>>>>>> -and is done whenever a non-fatal error has been detected that can be
>>>>>> +and is done whenever a fatal error has been detected that can be
>>>>>>  "solved" by resetting the link. This call informs the driver of the
>>>>>
>>>>> As far as I can tell, the original text was correct here; why do you
>>>>> think this change needs to be made?
>>>>>
>>>>
>>>> See do_recovery() in aer core, reset_link() is called only seeing fatal
>>>> error.
>>>>
>>>> --
>>>> Sincerely,
>>>> Cao jin
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Sincerely,
>> Cao jin
>>
>>
> 
> 
> .
> 

  reply	other threads:[~2016-12-09  7:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08  8:16 [PATCH] pci-error-recover: doc cleanup Cao jin
2016-12-08 14:05 ` Jonathan Corbet
2016-12-08 14:13   ` Cao jin
2016-12-09  6:24     ` Linas Vepstas
2016-12-09  6:37       ` Cao jin
2016-12-09  6:44         ` Linas Vepstas
2016-12-09  7:59           ` Cao jin [this message]
2016-12-09 16:11           ` Alex Williamson
2016-12-09 14:37         ` Jonathan Corbet
2016-12-19  3:25           ` Cao jin
2016-12-09  6:50       ` Andrew Donnellan
2016-12-14  2:39         ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=584A6470.60502@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=linasvepstas@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).