All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yanmin Zhang <yanmin_zhang@linux.intel.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Liu, Joseph" <Joseph.Liu@emulex.com>,
	"Zhang, LongX" <longx.zhang@intel.com>,
	"linasvepstas@gmail.com" <linasvepstas@gmail.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset
Date: Thu, 06 Jun 2013 14:29:16 +0800	[thread overview]
Message-ID: <1370500156.4432.32.camel@ymzhang.sh.intel.com> (raw)
In-Reply-To: <CAErSpo743raREgO4zJL0pONZyOTDBy7qLX9UZzCyRsLjBD31gQ@mail.gmail.com>

On Wed, 2013-06-05 at 07:30 -0600, Bjorn Helgaas wrote:
> On Tue, Jun 4, 2013 at 6:38 PM, Yanmin Zhang
> <yanmin_zhang@linux.intel.com> wrote:
> > On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
> >> I'm not sure where we are with this patch.  I think Joseph initially
> >> reported a problem (though I haven't actually seen that), and this
> >> patch fixed it, so it seems like there's something we want to do here.
> > Yes, indeed. We checked Powerpc error handling and plan to use the
> > similar method to deal with the issue.
> >
> > Sorry for replying very late. I move to Android smartphone area and am
> > very busy on many top issues.
> 
> OK.  Can you point us at a bugzilla or email archive of Joseph's
> original problem report, or post it to this thread?  Then maybe
> somebody else can pick it up and make progress on this.

Joseph sent email to my outlook emailbox. I changed it a little to delete company
sensitive product name.

===========================Joseph's original email to me================
Hi Tom and Yanmin,
 
Hope this email can reach you well. I am working on a driver's PCI error recovery with AER. I have a question with one of my observations from platform AER driver's behavior. As your names and emails are listed to the PCIe AER driver coming with the kernel, I send this question to you:
 
During injecting AER Non-Correctable/Fatal error and PCIe error recovery process, I observed the following from our Low Level Driver (LLD):
 
1. The LLD's error_detected() callback got called and the PCI channel state passed in as "pci_channel_io_frozen", as expected;
 
2. The LLD's error_detected() callback function returned with PCI_ERS_RESULT_NEED_RESET, requesting a PCI slot reset;
 
3. The LLD's slot_reset() callback got called and it attempted to re-initialize the device hardware for the recovery. However, the PCI slot state was still in "pci_channel_io_frozen" and the pci_channel_offline() helper routine returned 1. This is not expected, and it actually prevented LLD in performing needed access to memory mapped I/O space for preparing the device for recovery;
 
4. Later, the LLD's resume() callback got called and the PCI slot state was set to "pci_channel_io_normal";
 
In the upstream Linux kernel 3.7.0's pci-error-recovery.txt, "STEP 4 Slot Reset", it stated after platform has performed PCI slot reset and then calls LLD's slot_reset() callback:
 
"This call gives drivers the chance to re-initialize the hardware (re-download firmware, etc.).  At this point, the driver may assume that the card is in a fresh state and is fully functional. The slot is unfrozen and the driver has full access to PCI config space, memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) will also be available."
 
I looked into the kernel 3.7.0's AER driver's aerdrv_core.c and see that the PCI slot state was set to "pci_channel_io_frozen" on AER_FATAL serverity, and only set back to "pci_channel_io_normal" in report_resume() function. The PCI slot state was not set to "pci_channel_io_normal" when invoking report_slot_reset().
 
As a comparison, I also perform the EEH error recovery with the LLD driver on a PPC platform, which indeed set the PCI slot state to "pci_channel_io_normal" when calling LLD's slot_reset() callback function.
 
Can you please verify this platform AER driver's behavior is intended and will stay with the AER platform support, or it should be changed to be consistent with the PCI error recovery procedure described in the pci-error-recovery.txt documentation? I also noticed that before invoking the broadcast_error_message() function with "slot_reset", there is a comment in the 3.7.0 kernel source:
 
/*
 * TODO: Should call platform-specific
 * functions to reset slot before calling
 * drivers' slot_reset callbacks?
 */
And I do see that only the PCIe Link_Reset was performed, no PCI fundamental reset was performed even the LLD set the PCIe device's "pdev->needs_freset = 1", is this the area that later will be added for performing needed PCI hot reset or fundamental reset at this stage?
 
Your timely response is very appreciated. Thanks in advance and please let me know if you think I should redirect the question to someone else.
 
Best Regards,
Joseph Liu, Ph.D.
Senior Principal Engineer
Emulex Corporation



  reply	other threads:[~2013-06-06  6:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-26  6:28 Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset Zhang, LongX
2013-05-02  0:30 ` Yanmin Zhang
     [not found]   ` <CAHrUA34H_B-ffebx6ueHgV1ax-PL_3hLZr1JGaP2NJZ7bYgvtw@mail.gmail.com>
2013-05-03  0:33     ` Yanmin Zhang
2013-05-03  2:00       ` Greg Kroah-Hartman
2013-05-03  3:13         ` Yanmin Zhang
2013-05-03 18:00           ` Linas Vepstas
2013-05-07  6:01             ` Yanmin Zhang
2013-05-17 23:43 ` Bjorn Helgaas
2013-05-17 23:56   ` Rafael J. Wysocki
2013-05-20 17:21     ` Bjorn Helgaas
2013-05-20 17:52       ` Linas Vepstas
2013-05-20 14:38   ` Liu, Joseph
2013-05-20 15:37     ` Linas Vepstas
2013-05-21  7:49       ` Yanmin Zhang
2013-05-21 13:38         ` Linas Vepstas
2013-05-20 22:48 ` Bjorn Helgaas
2013-05-21  7:40   ` Yanmin Zhang
2013-05-21 16:17     ` Bjorn Helgaas
2013-05-21 15:41   ` Liu, Joseph
2013-05-21 16:26     ` Linas Vepstas
2013-05-21 16:51     ` Bjorn Helgaas
2013-06-04 18:04       ` Bjorn Helgaas
2013-06-05  0:38         ` Yanmin Zhang
2013-06-05 13:30           ` Bjorn Helgaas
2013-06-06  6:29             ` Yanmin Zhang [this message]
2013-06-10 17:24               ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1370500156.4432.32.camel@ymzhang.sh.intel.com \
    --to=yanmin_zhang@linux.intel.com \
    --cc=Joseph.Liu@emulex.com \
    --cc=bhelgaas@google.com \
    --cc=linasvepstas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=longx.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.