linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Will direct Hot Reset impact system?
@ 2012-09-04  0:35 MENG Xin
  0 siblings, 0 replies; only message in thread
From: MENG Xin @ 2012-09-04  0:35 UTC (permalink / raw)
  To: linux-pci

Hi, all,


My card is a Gen2 x8 device, plugged on DELL R710 (Intel 5520 IOH
platform with Xeon X5560). There is a PLX 8624 Switch populated on
card, 4 identical Endpoint devices are connected with the Switch,
hierarchy is shown as below. OS is Suse10 (sp3), kernel version
2.6.16.60-0.54.5-smp.

-[0000:00]-+-00.0
           +-07.0-[0000:06-0c]----00.0-[0000:07-0c]--+-04.0-[0000:08]--
           |                                         +-05.0-[0000:09]----00.0
           |                                         +-06.0-[0000:0a]----00.0
           |                                         +-08.0-[0000:0b]----00.0
           |                                         \-09.0-[0000:0c]----00.0


I’m investigating a software recover mechanism based on Hot Reset.
When fatal error is detected and reported from the card, I use Hot
Reset to recover the card. To test the recover flow, I also use Hot
Reset to break normal operation. Here is the sequence:
1. Turn off AER reporting in Root Complex by clearing “Root Error
Command Register (0x2C)
2. Mask all Non-correctable Error in Root Port’s AER.
3. Turn off conventional PCI error reporting by clearing “SERR#
Enable” in both “Command Register (0x04)” and “Bridge Control Register
(0x3E)”.
4. Issue Hot Reset by writing “Secondary Bus Reset” bit in Root Port’s
“Bridge Control Register”
5. Card driver detects transaction problem
6. Driver clears “Bus Master Enable” and polling “Transaction Pending
bit” in both Root Port and Switch’s upstream port to wait existing
transaction done.
7. Driver issues Hot Reset by writing “Secondary Bus Reset” bit in Root Port.
8. Driver performs post initialization after link up.

Such iteration can go several rounds and link down will occur between
Root Port and Switch’s Upstream port. I tried to modify the flow,
before step 4, I added code to clear “Bus Master Enable” and
“Transaction Pending bit” polling. But link down still occurs. I see
there is “graceful” Hot Reset flow supported in kernel by calling some
system functions. But it could be a big effort for the card driver to
cooperate with that framework. So I took the shortcut. My question is:
will such direct Hot Reset impact overall system functionality? Or is
there any chance for IOH to disable link training after exiting from
Hot Reset? Can IOH detects such Hot Reset even if I masked all
Non-correctable Error in Root’s AER?

Thank you very much!


Best regards,
Xin Meng

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-09-04  0:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04  0:35 Will direct Hot Reset impact system? MENG Xin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).