All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-scsi@vger.kernel.org
Subject: [isci-rnc PATCH v1 00/37] remote node context rework
Date: Thu, 22 Mar 2012 17:27:48 -0700	[thread overview]
Message-ID: <20120323002504.18065.45709.stgit@dwillia2-linux.jf.intel.com> (raw)

This series is unfortunately being posted during the merge window but has
been in development since the beginning of February.  Part of the delay
in getting it posted was due to waiting for the libsas error-handling
and hotplug work to stabilize, and waiting for the full test results to
come back showing that this development topic was effective.

With these patches, on top of the libsas changes, our hotplug /
exception handling tests are now passing.  Previously we were seeing
frequent system crashes and sometimes locking up the silicon.  I will
let Jeff describe the technical changes in more detail, but it bears
pointing out that this comes with a modest but welcome cleanup of the
code complexity:

 15 files changed, 1316 insertions(+), 1633 deletions(-)

Also available via:

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci.git isci-rnc-v1

...this topic builds on top of isci-for-3.4-v4.

--
Dan

From: Jeff Skirvin

In the controller, devices as they appear on a SAS domain (or
direct-attached SATA devices) are represented by memory structures known
as "Remote Node Contexts" (RNCs).  These structures are transferred from
main memory to the controller using a set of register commands; these
commands include setting up the context ("posting"), removing the
context ("invalidating"), and commands to control the scheduling of
commands and connections to that remote device ("suspensions" and
"resumptions").  There is a similar path to control RNC scheduling from
the protocol engine, which interprets the results of command and data
transmission and reception.

In general, the controller chooses among non-suspended RNCs to find one
that has work requiring scheduling the transmission of command and data
frames to a target.  Likewise, when a target tries to return data back
to the initiator, the state of the RNC is used by the controller to
determine how to treat the incoming request. As an example, if the RNC
is in the state "TX/RX Suspended", incoming SSP connection requests from
the target will be rejected by the controller hardware.  When an RNC is
"TX Suspended", it will not be selected by the controller hardware to
start outgoing command or data operations (with certain priority-based
exceptions).

As mentioned above, there are two sources for management of the RNC
states: commands from driver software, and the result of transmission
and reception conditions of commands and data signaled by the controller
hardware.  As an example of the latter, if an outgoing SSP command ends
with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will
transition to the "TX Suspended" state, and this is signaled by the
controller hardware in the status to the completion of the pending
command as well as signaled in a controller hardware event.  Examples of
the former are included in the patch changelogs.

Driver software is required to suspend the RNC in a "TX/RX Suspended"
condition before any outstanding commands can be terminated.  Failure to
guarantee this can lead to a complete hardware hang condition.  Earlier
versions of the driver software did not guarantee that an RNC was
correctly managed before I/O termination, and so operated in an unsafe
way.

Further, the driver performed unnecessary contortions to preserve the
remote device command state and so was more complicated than it needed
to be.  A simplifying driver assumption is that once an I/O has entered
the error handler path without having completed in the target, the
requirement on the driver is that all use of the sas_task must end.
Beyond that, recovery of operation is dependent on libsas and other
components to reset, rediscover and reconfigure the device before normal
operation can restart.  In the driver, this simplifying assumption meant
that the RNC management could be reduced to entry into the suspended
state, terminating the targeted I/O request, and resuming the RNC as
needed for device-specific management such as an SSP Abort Task or LUN
Reset Management request.

---

Jeff Skirvin (37):
      isci: Manage the link layer hang detect timer for RNC suspensions.
      isci: Fixed bug in resumption from RNC Tx/Rx suspend state.
      isci: Handle all suspending TC completions
      isci: Terminate outstanding TCs on TX/RX RNC suspensions.
      isci: Manage device suspensions during TC terminations.
      isci: Remote device must be suspended for NCQ cleanup.
      isci: Remote device stop also suspends the RNC and terminates I/O.
      isci: Escalate to I_T_Nexus_Reset when the device is gone.
      isci: Redesign device suspension, abort, cleanup.
      isci: Add suspension cases for RNC INVALIDATING, POSTING states.
      isci: Device access in the error path does not depend on IDEV_GONE.
      isci: All pending TCs are terminated when the RNC is invalidated.
      isci: Only set IDEV_GONE in the device stop path.
      isci: Remove isci_device reqs_in_process and dev_node from isci_device.
      isci: Distinguish between remote device suspension cases
      isci: Fix the terminated I/O to not call sas_task_abort().
      isci: Save the suspension hint for upcoming suspensions.
      isci: Manage the LLHANG timer enable/disable per-device.
      isci: Make sure all TCs are terminated and cleaned in LUN reset.
      isci: Implement waiting for suspend in the abort path.
      isci: When in the abort path, defeat other resume calls until done.
      isci: Callbacks to libsas occur under scic_lock and are synchronized.
      isci: Manage tag releases differently when aborting tasks.
      isci: Fix RNC suspend call for SCI_RESUMING state.
      isci: Wait for RNC resumption before leaving the abort path.
      isci: Directly control IREQ_ABORT_PATH_ACTIVE when completing TMFs.
      isci: Add protocol indicator for TMF requests.
      isci: Added timeouts to RNC suspensions in the abort path.
      isci: Change the phy control and link reset interface for HW reasons.
      isci: Don't wait for an RNC suspend if it's being destroyed.
      isci: Restore the ATAPI device RNC management code.
      isci: Check IDEV_GONE before performing abort path operations.
      isci: Remove obviated host callback list.
      isci: Manage the IREQ_NO_AUTO_FREE_TAG under scic_lock.
      isci: Fix RNC AWAIT_SUSPENSION->INVALIDATING transition.
      isci: Fixed RNC bug that lost the suspension or resumption during destroy
      isci: End the RNC resumption wait when the RNC is destroyed.

 drivers/scsi/isci/host.c                 |  143 ++---
 drivers/scsi/isci/host.h                 |    8 
 drivers/scsi/isci/init.c                 |    3 
 drivers/scsi/isci/phy.c                  |    2 
 drivers/scsi/isci/port.c                 |   45 +-
 drivers/scsi/isci/port.h                 |    5 
 drivers/scsi/isci/remote_device.c        |  547 ++++++++++++++++-----
 drivers/scsi/isci/remote_device.h        |   62 ++
 drivers/scsi/isci/remote_node_context.c  |  345 ++++++++++---
 drivers/scsi/isci/remote_node_context.h  |   43 +-
 drivers/scsi/isci/request.c              |  696 +++++++++++---------------
 drivers/scsi/isci/request.h              |  116 ----
 drivers/scsi/isci/scu_completion_codes.h |    2 
 drivers/scsi/isci/task.c                 |  800 +++++-------------------------
 drivers/scsi/isci/task.h                 |  132 -----
 15 files changed, 1316 insertions(+), 1633 deletions(-)

             reply	other threads:[~2012-03-23  0:12 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-23  0:27 Dan Williams [this message]
2012-03-23  0:27 ` [isci-rnc PATCH v1 01/37] isci: Manage the link layer hang detect timer for RNC suspensions Dan Williams
2012-03-23  0:27 ` [isci-rnc PATCH v1 02/37] isci: Fixed bug in resumption from RNC Tx/Rx suspend state Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 03/37] isci: Handle all suspending TC completions Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 04/37] isci: Terminate outstanding TCs on TX/RX RNC suspensions Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 05/37] isci: Manage device suspensions during TC terminations Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 06/37] isci: Remote device must be suspended for NCQ cleanup Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 07/37] isci: Remote device stop also suspends the RNC and terminates I/O Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 08/37] isci: Escalate to I_T_Nexus_Reset when the device is gone Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 09/37] isci: Redesign device suspension, abort, cleanup Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 10/37] isci: Add suspension cases for RNC INVALIDATING, POSTING states Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 11/37] isci: Device access in the error path does not depend on IDEV_GONE Dan Williams
2012-03-23  0:28 ` [isci-rnc PATCH v1 12/37] isci: All pending TCs are terminated when the RNC is invalidated Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 13/37] isci: Only set IDEV_GONE in the device stop path Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 14/37] isci: Remove isci_device reqs_in_process and dev_node from isci_device Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 15/37] isci: Distinguish between remote device suspension cases Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 16/37] isci: Fix the terminated I/O to not call sas_task_abort() Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 17/37] isci: Save the suspension hint for upcoming suspensions Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 18/37] isci: Manage the LLHANG timer enable/disable per-device Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 19/37] isci: Make sure all TCs are terminated and cleaned in LUN reset Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 20/37] isci: Implement waiting for suspend in the abort path Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 21/37] isci: When in the abort path, defeat other resume calls until done Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 22/37] isci: Callbacks to libsas occur under scic_lock and are synchronized Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 23/37] isci: Manage tag releases differently when aborting tasks Dan Williams
2012-03-23  0:29 ` [isci-rnc PATCH v1 24/37] isci: Fix RNC suspend call for SCI_RESUMING state Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 25/37] isci: Wait for RNC resumption before leaving the abort path Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 26/37] isci: Directly control IREQ_ABORT_PATH_ACTIVE when completing TMFs Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 27/37] isci: Add protocol indicator for TMF requests Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 28/37] isci: Added timeouts to RNC suspensions in the abort path Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 29/37] isci: Change the phy control and link reset interface for HW reasons Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 30/37] isci: Don't wait for an RNC suspend if it's being destroyed Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 31/37] isci: Restore the ATAPI device RNC management code Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 32/37] isci: Check IDEV_GONE before performing abort path operations Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 33/37] isci: Remove obviated host callback list Dan Williams
2012-03-23  0:30 ` [isci-rnc PATCH v1 34/37] isci: Manage the IREQ_NO_AUTO_FREE_TAG under scic_lock Dan Williams
2012-03-23  0:31 ` [isci-rnc PATCH v1 35/37] isci: Fix RNC AWAIT_SUSPENSION->INVALIDATING transition Dan Williams
2012-03-23  0:31 ` [isci-rnc PATCH v1 36/37] isci: Fixed RNC bug that lost the suspension or resumption during destroy Dan Williams
2012-03-23  0:31 ` [isci-rnc PATCH v1 37/37] isci: End the RNC resumption wait when the RNC is destroyed Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120323002504.18065.45709.stgit@dwillia2-linux.jf.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.