linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: MENG Xin <zinces@gmail.com>
To: linux-pci@vger.kernel.org
Subject: Will direct Hot Reset impact system?
Date: Tue, 4 Sep 2012 08:35:00 +0800	[thread overview]
Message-ID: <CAE7L+6DJZMk7arka0bszNeiafMTcjPA3byT0gN0cDtK-rWz2AA@mail.gmail.com> (raw)

Hi, all,


My card is a Gen2 x8 device, plugged on DELL R710 (Intel 5520 IOH
platform with Xeon X5560). There is a PLX 8624 Switch populated on
card, 4 identical Endpoint devices are connected with the Switch,
hierarchy is shown as below. OS is Suse10 (sp3), kernel version
2.6.16.60-0.54.5-smp.

-[0000:00]-+-00.0
           +-07.0-[0000:06-0c]----00.0-[0000:07-0c]--+-04.0-[0000:08]--
           |                                         +-05.0-[0000:09]----00.0
           |                                         +-06.0-[0000:0a]----00.0
           |                                         +-08.0-[0000:0b]----00.0
           |                                         \-09.0-[0000:0c]----00.0


I’m investigating a software recover mechanism based on Hot Reset.
When fatal error is detected and reported from the card, I use Hot
Reset to recover the card. To test the recover flow, I also use Hot
Reset to break normal operation. Here is the sequence:
1. Turn off AER reporting in Root Complex by clearing “Root Error
Command Register (0x2C)
2. Mask all Non-correctable Error in Root Port’s AER.
3. Turn off conventional PCI error reporting by clearing “SERR#
Enable” in both “Command Register (0x04)” and “Bridge Control Register
(0x3E)”.
4. Issue Hot Reset by writing “Secondary Bus Reset” bit in Root Port’s
“Bridge Control Register”
5. Card driver detects transaction problem
6. Driver clears “Bus Master Enable” and polling “Transaction Pending
bit” in both Root Port and Switch’s upstream port to wait existing
transaction done.
7. Driver issues Hot Reset by writing “Secondary Bus Reset” bit in Root Port.
8. Driver performs post initialization after link up.

Such iteration can go several rounds and link down will occur between
Root Port and Switch’s Upstream port. I tried to modify the flow,
before step 4, I added code to clear “Bus Master Enable” and
“Transaction Pending bit” polling. But link down still occurs. I see
there is “graceful” Hot Reset flow supported in kernel by calling some
system functions. But it could be a big effort for the card driver to
cooperate with that framework. So I took the shortcut. My question is:
will such direct Hot Reset impact overall system functionality? Or is
there any chance for IOH to disable link training after exiting from
Hot Reset? Can IOH detects such Hot Reset even if I masked all
Non-correctable Error in Root’s AER?

Thank you very much!


Best regards,
Xin Meng

                 reply	other threads:[~2012-09-04  0:35 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAE7L+6DJZMk7arka0bszNeiafMTcjPA3byT0gN0cDtK-rWz2AA@mail.gmail.com \
    --to=zinces@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).