All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yanmin Zhang <yanmin_zhang@linux.intel.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Zhang, LongX" <longx.zhang@intel.com>,
	"linasvepstas@gmail.com" <linasvepstas@gmail.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Joseph.Liu@Emulex.Com" <Joseph.Liu@Emulex.Com>
Subject: Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset
Date: Tue, 21 May 2013 15:40:01 +0800	[thread overview]
Message-ID: <1369122001.4432.14.camel@ymzhang.sh.intel.com> (raw)
In-Reply-To: <20130520224824.GA31740@google.com>

On Mon, 2013-05-20 at 16:48 -0600, Bjorn Helgaas wrote:
> On Fri, Apr 26, 2013 at 06:28:59AM +0000, Zhang, LongX wrote:
> > From: Zhang Long <longx.zhang@intel.com>
> > 
> > Specific pci device drivers might have many functions to call
> > pci_channel_offline to check device states. When slot_reset happens,
> > drivers' slot_reset callback might call such functions and eventually
> > abort the reset.
> > 
> > The patch resets pdev->error_state to pci_channel_io_normal at
> > the begining of report_slot_reset.
> > 
> > Thank Liu Joseph for pointing it out.
> > 
> > Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> > Signed-off-by: Zhang Long <longx.zhang@intel.com>
> > ---
> >  drivers/pci/pcie/aer/aerdrv_core.c |    1 +
> >  drivers/pci/pcie/portdrv_pci.c     |   12 +++++-------
> >  2 files changed, 6 insertions(+), 7 deletions(-)
Thank all for the kind comments.

> > 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 564d97f..c61fd44 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
> >  	result_data = (struct aer_broadcast_data *) data;
> >  
> >  	device_lock(&dev->dev);
> > +	dev->error_state = pci_channel_io_normal;
> >  	if (!dev->driver ||
> >  		!dev->driver->err_handler ||
> >  		!dev->driver->err_handler->slot_reset)
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index ed4d094..7abefd9 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -332,13 +332,11 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
> >  	pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
> >  	int retval;
> >  
> > -	/* If fatal, restore cfg space for possible link reset at upstream */
> > -	if (dev->error_state == pci_channel_io_frozen) {
> > -		dev->state_saved = true;
> > -		pci_restore_state(dev);
> > -		pcie_portdrv_restore_config(dev);
> > -		pci_enable_pcie_error_reporting(dev);
> > -	}
> > +	/* restore cfg space for possible link reset at upstream */
> > +	dev->state_saved = true;
It's indeed a dirty trick to change it to true. The reason is suspend. The port would
suspend/resume at suspend-to-ram. When the port is suspended, PCI power framework calls
pci_save_state. When the port is resumed, PCI framework calls pci_restore_state and
dev->state_saved is set to false.
A better solution is to add double space in pci_dev->saved_config_space/saved_cap_space,
which seems consume unnecessary resource.

> > +	pci_restore_state(dev);
> > +	pcie_portdrv_restore_config(dev);
> > +	pci_enable_pcie_error_reporting(dev);
> >  
> >  	/* get true return value from &status */
> >  	retval = device_for_each_child(&dev->dev, &status, slot_reset_iter);
> I think this patch changes the behavior in the case of a non-fatal error
> where one of the .error_detected() methods returned
> PCI_ERS_RESULT_NEED_RESET.  In that case, pcie_portdrv_slot_reset()
> previously did not restore config space, but after your patch, it *will*
> restore it.  We need an explanation of why this is safe.
AER standard doesn't define such behavior like if we need reset a slot.
When we implemented the feature in kernel, we assumed at non-fatal error,
config space is still good.

With the new patch, we change the behavior a little.

> 
> I think you should split this into two patches: the first would remove the
> "if (dev->error_state == pci_channel_io_frozen)" test from portdrv_pci.c
The first patch would depend on the 2nd patch as 2nd patch resets error_state 
to pci_channel_io_normal.

> and explain the reason, and the second would make the aerdrv_core.c change.
> 
> I'm also concerned that in that same case (a non-fatal error where one of
> the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET), I don't
> think we actually *do* any kind of device reset.  This isn't related to
> your patch, of course, so if you resolve the config space restore question,
> we can deal with the reset question later.
It's a good question. At the beginning when we enabled AER feature in kernel,
we didn't really implement the real device reset. It's in the TODO list.

Yanmin



  reply	other threads:[~2013-05-21  7:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-26  6:28 Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset Zhang, LongX
2013-05-02  0:30 ` Yanmin Zhang
     [not found]   ` <CAHrUA34H_B-ffebx6ueHgV1ax-PL_3hLZr1JGaP2NJZ7bYgvtw@mail.gmail.com>
2013-05-03  0:33     ` Yanmin Zhang
2013-05-03  2:00       ` Greg Kroah-Hartman
2013-05-03  3:13         ` Yanmin Zhang
2013-05-03 18:00           ` Linas Vepstas
2013-05-07  6:01             ` Yanmin Zhang
2013-05-17 23:43 ` Bjorn Helgaas
2013-05-17 23:56   ` Rafael J. Wysocki
2013-05-20 17:21     ` Bjorn Helgaas
2013-05-20 17:52       ` Linas Vepstas
2013-05-20 14:38   ` Liu, Joseph
2013-05-20 15:37     ` Linas Vepstas
2013-05-21  7:49       ` Yanmin Zhang
2013-05-21 13:38         ` Linas Vepstas
2013-05-20 22:48 ` Bjorn Helgaas
2013-05-21  7:40   ` Yanmin Zhang [this message]
2013-05-21 16:17     ` Bjorn Helgaas
2013-05-21 15:41   ` Liu, Joseph
2013-05-21 16:26     ` Linas Vepstas
2013-05-21 16:51     ` Bjorn Helgaas
2013-06-04 18:04       ` Bjorn Helgaas
2013-06-05  0:38         ` Yanmin Zhang
2013-06-05 13:30           ` Bjorn Helgaas
2013-06-06  6:29             ` Yanmin Zhang
2013-06-10 17:24               ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1369122001.4432.14.camel@ymzhang.sh.intel.com \
    --to=yanmin_zhang@linux.intel.com \
    --cc=Joseph.Liu@Emulex.Com \
    --cc=bhelgaas@google.com \
    --cc=linasvepstas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=longx.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.