linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sam Bobroff <sbobroff@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Subject: [PATCH 0/6] powerpc/eeh: Improve recovery of passed-through devices
Date: Thu, 29 Nov 2018 14:16:36 +1100	[thread overview]
Message-ID: <cover.1543460917.git.sbobroff@linux.ibm.com> (raw)

Hello,

Here are changes that allow EEH to successfully recover after a failure that
affects of both host and guest devices. This happens, for example, when a PHB
containing passed-through devices is fenced. (Failures that include only
passed-through devices are ignored by the host.)

Currently, when an error affects both passed-through and un-passed-through
devices, the passed-through devices are treated as if their driver was not EEH
aware. This causes them to be hot-unplugged as part of recovery.

The hot unplug request is forwarded to the guest which checks the device status
before releasing the device. Because the host is recovering the device, it
reports the device status as EEH_STATE_UNAVAILABLE which causes the guest to
wait for the device to become available. This deadlocks the recovery process.

This change causes the host to recover it's own devices but leave
passed-through devices frozen until the guest performs it's own recovery. (They
are not removed.) If the guest detects the error and begins recovery itself,
waiting for the device state to change away from EEH_STATE_UNAVAILABLE causes
it to wait until the host has finished it's recovery and the guest's subsequent
recovery can then succeed.

Note that resetting a PE may implicitly thaw both it and child PEs, and to
prevent the device from being accidentally used by the guest (which may be
unaware of the failure and reset) when in this state, we re-freeze those
devices. This does leave a small window of opportunity but that will need to be
addressed with a firmware change.

I've also included a fix to the reset function (the last patch), because
without it some scenarios still fail. An example is injecting an error into
a PHB and then exiting a guest that contains passed-through devices from that
PHB so that an EEH event is raised during the process of passing the device
back to the host.

Cheers,
Sam.

Sam Bobroff (6):
  powerpc/eeh: Cleanup eeh_pe_clear_frozen_state()
  powerpc/eeh: remove sw_state from eeh_unfreeze_pe()
  powerpc/eeh: Add include_passed to eeh_pe_state_clear()
  powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state()
  powerpc/eeh: Improve recovery of passed-through devices
  powerpc/eeh: Correct retries in eeh_pe_reset_full()

 arch/powerpc/include/asm/eeh.h     |   4 +-
 arch/powerpc/include/asm/ppc-pci.h |   4 +-
 arch/powerpc/kernel/eeh.c          | 103 +++++++++++++++++++----------
 arch/powerpc/kernel/eeh_driver.c   |  86 ++++++++++--------------
 arch/powerpc/kernel/eeh_pe.c       |  68 ++++++++-----------
 arch/powerpc/kernel/eeh_sysfs.c    |   3 +-
 drivers/vfio/vfio_spapr_eeh.c      |   6 +-
 7 files changed, 140 insertions(+), 134 deletions(-)

-- 
2.19.0.2.gcad72f5712


             reply	other threads:[~2018-11-29  3:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29  3:16 Sam Bobroff [this message]
2018-11-29  3:16 ` [PATCH 1/6] powerpc/eeh: Cleanup eeh_pe_clear_frozen_state() Sam Bobroff
2019-02-08 13:02   ` [1/6] " Michael Ellerman
2018-11-29  3:16 ` [PATCH 2/6] powerpc/eeh: remove sw_state from eeh_unfreeze_pe() Sam Bobroff
2018-11-29  3:16 ` [PATCH 3/6] powerpc/eeh: Add include_passed to eeh_pe_state_clear() Sam Bobroff
2018-11-29  3:16 ` [PATCH 4/6] powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state() Sam Bobroff
2018-11-29  3:16 ` [PATCH 5/6] powerpc/eeh: Improve recovery of passed-through devices Sam Bobroff
2018-11-29  3:16 ` [PATCH 6/6] powerpc/eeh: Correct retries in eeh_pe_reset_full() Sam Bobroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1543460917.git.sbobroff@linux.ibm.com \
    --to=sbobroff@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).