All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state
@ 2017-04-11  1:17 Mauricio Faria de Oliveira
  2017-04-11  1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:17 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

This patch series resolves a problem in which all paths of a multipath device
became _permanently_ failed after a storage system had moved both controllers
into a _temporarily_ unavailable state (that is SCSI_ACCESS_STATE_UNAVAILABLE).

This happened because once scsi_dh_alua had set the 'pg->state' to that value,
any IO coming to that PG via alua_prep_fn() would be immediately failed there.

It was possible to confirm that IO coming to that PG by another function path
(e.g., SG_IO) would perform normally once that PG's respective storage system
controller had transitioned back to an active state.

- Patch 1 essentially resolves that problem by allowing IO requests coming in
  the SCSI_ACCESS_STATE_UNAVAILABLE to actually proceed in alua_prep_fn(). It
  also schedules a recheck in alua_check_sense() to update pg->state properly.
  The problem/debug test-case is included in its commit message for reference.

- Patch 2 and Patch 3 address uncertainty & potentially incorrect assumptions
  when trying to reconcile the alua: RTPG information in the kernel logs with
  the actual port groups state at a given point in time and to multipath/path
  checkers status/failed/reinstated messages, since scsi_dh_alua could update
  the PG state for the 'other' PG (i.e., not the PG by which the RTPG request
  was sent to) but only present an updated state message for the 'current' PG.

- Patch 4 silences the scsi_dh_alua messages about RTPG state/information for
  the unavailable state if it is no news (i.e., not a transition to/out-of it),
  only keeping the first and (potentially) last message (when it is some news).
  That's because during the period in which the unavailable state is in place,
  the path checkers will naturally have to go through alua_check_sense() path,
  which schedules a recheck and thus alua_rtpg() goes through the sdev_printk.

This patch series has been tested with the 4.11-rc4 kernel.

For documentation purposes, I'll reply to this cover letter with the analysis
of such cases of this problem, and the accompanying messages from kernel logs.

Mauricio Faria de Oliveira (4):
  scsi: scsi_dh_alua: allow I/O in the target port unavailable state
  scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg()
    sdev_printk
  scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
  scsi: scsi_dh_alua: do not print target port group state if it remains
    unavailable

 drivers/scsi/device_handler/scsi_dh_alua.c | 99 ++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 11 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the target port unavailable state
  2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
@ 2017-04-11  1:17 ` Mauricio Faria de Oliveira
  2017-04-13 21:14   ` Bart Van Assche
  2017-04-11  1:17 ` [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk Mauricio Faria de Oliveira
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:17 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

According to SPC-4 (5.15.2.4.5 Unavailable state), the unavailable
state may (or may not) transition to other states (e.g., microcode
downloading or hardware error, which may be temporary or permanent
conditions, respectively).

But, scsi_dh_alua currently fails the I/O requests early once that
state is established (in alua_prep_fn()), which provides no chance
for path checkers going through that function path to really check
whether the path actually still fails I/O requests or recovered to
an active state.

This might cause device-mapper multipath to fail all paths to some
storage system that moves the controllers to the unavailable state
for firmware upgrades, and never recover regardless of the storage
system doing upgrades one controller at a time and get them online.

Then I/O requests are blocked indefinitely due to queue_if_no_path
but the underlying individual paths are fully operational, and can
be verified as such through other function paths (e.g., SG_IO):

    # multipath -l
    mpatha (360050764008100dac000000000000100) dm-0 IBM,2145
    size=40G features='2 queue_if_no_path retain_attached_hw_handler'
    hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | |- 1:0:1:0 sdf 8:80  failed undef running
    | `- 2:0:1:0 sdn 8:208 failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 1:0:0:0 sdb 8:16  failed undef running
      `- 2:0:0:0 sdj 8:144 failed undef running

    # strace -e read \
        sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
        2>&1 | grep 512
    read(3, 0x3fff7ba80000, 512) = -1 EIO (Input/output error)

    # strace -e ioctl \
        sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
        blk_sgio=1 \
        2>&1 | grep 512
    ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[10]=[28, 00, 00, 00,
    00, 00, 00, 00, 01, 00], <...>) = 0

So, allow I/O to target port (groups) in the unavailable state, so the
path checkers can actually check them, and schedule a recheck whenever
the unavailable state is detected so pg->state can be updated properly
(and further SCSI IO error messages then silenced through alua_prep_fn()).

Once a path checker eventually detects an active state again, the port
group state will be updated by the path activation call, alua_activate(),
as it schedules an alua_rtpg() check.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index c01b47e5b55a..5e5a33cac951 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -431,6 +431,20 @@ static int alua_check_sense(struct scsi_device *sdev,
 			alua_check(sdev, false);
 			return NEEDS_RETRY;
 		}
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0c) {
+			/*
+			 * LUN Not Accessible - target port in unavailable state.
+			 *
+			 * It may (not) be possible to transition to other states;
+			 * the transition might take a while or not happen at all,
+			 * depending on the storage system model, error type, etc.
+			 *
+			 * Do not retry, so failover to another target port occur.
+			 * Schedule a recheck to update state for other functions.
+			 */
+			alua_check(sdev, true);
+			return SUCCESS;
+		}
 		break;
 	case UNIT_ATTENTION:
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
@@ -1057,6 +1071,8 @@ static void alua_check(struct scsi_device *sdev, bool force)
  *
  * Fail I/O to all paths not in state
  * active/optimized or active/non-optimized.
+ * Allow I/O to all paths in state unavailable
+ * so path checkers can actually check them.
  */
 static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
 {
@@ -1072,6 +1088,8 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
 	rcu_read_unlock();
 	if (state == SCSI_ACCESS_STATE_TRANSITIONING)
 		ret = BLKPREP_DEFER;
+	else if (state == SCSI_ACCESS_STATE_UNAVAILABLE)
+		req->rq_flags |= RQF_QUIET;
 	else if (state != SCSI_ACCESS_STATE_OPTIMAL &&
 		 state != SCSI_ACCESS_STATE_ACTIVE &&
 		 state != SCSI_ACCESS_STATE_LBA) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk
  2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
  2017-04-11  1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
@ 2017-04-11  1:17 ` Mauricio Faria de Oliveira
  2017-04-13 21:18   ` Bart Van Assche
  2017-04-11  1:18 ` [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too Mauricio Faria de Oliveira
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:17 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

Factor out the sdev_printk() statement with the RTPG information
in alua_rtpg() into a new function, alua_rtpg_print(), that will
also be used in the following patch.

The only functional difference is that the 'valid_states' value
is now referenced via a pointer, and can be NULL (optional), in
which case the 'valid_states' information is not printed.  That
is for the following patch too.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 43 ++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 5e5a33cac951..0d3508f2e285 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -523,6 +523,37 @@ static int alua_tur(struct scsi_device *sdev)
 }
 
 /*
+ * alua_rtpg_print - Print REPORT TARGET GROUP STATES information
+ * @sdev: the device evaluated (or associated with this port group).
+ * @pg: the port group the information is associated with.
+ * @valid_states: pointer to valid_states value.
+ *                (optional; e.g., obtained via another port group)
+ *
+ * Must be called with pg->lock held.
+ */
+static void alua_rtpg_print(struct scsi_device *sdev, struct alua_port_group *pg,
+			    int *valid_states)
+{
+	if (valid_states)
+		sdev_printk(KERN_INFO, sdev,
+		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
+		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
+		    pg->pref ? "preferred" : "non-preferred",
+		    (*valid_states) & TPGS_SUPPORT_TRANSITION ? 'T' : 't',
+		    (*valid_states) & TPGS_SUPPORT_OFFLINE ? 'O' : 'o',
+		    (*valid_states) & TPGS_SUPPORT_LBA_DEPENDENT ? 'L' : 'l',
+		    (*valid_states) & TPGS_SUPPORT_UNAVAILABLE ? 'U' : 'u',
+		    (*valid_states) & TPGS_SUPPORT_STANDBY ? 'S' : 's',
+		    (*valid_states) & TPGS_SUPPORT_NONOPTIMIZED ? 'N' : 'n',
+		    (*valid_states) & TPGS_SUPPORT_OPTIMIZED  ?  'A' : 'a');
+	else
+		sdev_printk(KERN_INFO, sdev,
+		    "%s: port group %02x state %c %s\n",
+		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
+		    pg->pref ? "preferred" : "non-preferred");
+}
+
+/*
  * alua_rtpg - Evaluate REPORT TARGET GROUP STATES
  * @sdev: the device to be evaluated.
  *
@@ -679,17 +710,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 	}
 
 	spin_lock_irqsave(&pg->lock, flags);
-	sdev_printk(KERN_INFO, sdev,
-		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
-		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
-		    pg->pref ? "preferred" : "non-preferred",
-		    valid_states&TPGS_SUPPORT_TRANSITION?'T':'t',
-		    valid_states&TPGS_SUPPORT_OFFLINE?'O':'o',
-		    valid_states&TPGS_SUPPORT_LBA_DEPENDENT?'L':'l',
-		    valid_states&TPGS_SUPPORT_UNAVAILABLE?'U':'u',
-		    valid_states&TPGS_SUPPORT_STANDBY?'S':'s',
-		    valid_states&TPGS_SUPPORT_NONOPTIMIZED?'N':'n',
-		    valid_states&TPGS_SUPPORT_OPTIMIZED?'A':'a');
+	alua_rtpg_print(sdev, pg, &valid_states);
 
 	switch (pg->state) {
 	case SCSI_ACCESS_STATE_TRANSITIONING:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
  2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
  2017-04-11  1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
  2017-04-11  1:17 ` [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk Mauricio Faria de Oliveira
@ 2017-04-11  1:18 ` Mauricio Faria de Oliveira
  2017-04-13 21:35   ` Bart Van Assche
  2017-04-11  1:18 ` [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable Mauricio Faria de Oliveira
  2017-04-11  1:21 ` [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
  4 siblings, 1 reply; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:18 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

Currently, alua_rtpg() can change the 'state' and 'preferred'
values for the current port group _and_ of other port groups
present in the response buffer/descriptors.

However, it reports such changes _only_ for the current port
group (i.e., only for 'pg' but not for 'tmp_pg').

This might cause uncertainty and confusion when going through
the kernel logs for analyzing/debugging scsi_dh_alua behavior,
which is not helpful during support and development scenarios.

So, print such changes for other port groups than the current
one.

This requires finding a scsi_device to call sdev_printk() with
for that other port group. Using 'tmp_pg->rtpg_sdev' is not an
option because in that 'if' conditional the 'tmp_pg' is not in
the ALUA_PG_RUNNING state, so that pointer may be NULL. So the
for-loop over the tmp->pg device_handler structures is used to
find a valid scsi_device that is associated to this port group.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0d3508f2e285..c2c9173fd883 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -688,6 +688,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 				if ((tmp_pg == pg) ||
 				    !(tmp_pg->flags & ALUA_PG_RUNNING)) {
 					struct alua_dh_data *h;
+					struct scsi_device *tmp_pg_sdev = NULL;
 
 					tmp_pg->state = desc[0] & 0x0f;
 					tmp_pg->pref = desc[0] >> 7;
@@ -697,6 +698,17 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 						/* h->sdev should always be valid */
 						BUG_ON(!h->sdev);
 						h->sdev->access_state = desc[0];
+
+						/*
+						 * If tmp_pg is not the running pg, and its worker
+						 * is not running, tmp_pg->rtpg_sdev might be NULL.
+						 * Use another/one associated scsi_device to print
+						 * its RTPG information.
+						 */
+						if ((tmp_pg != pg) && !tmp_pg_sdev) {
+							tmp_pg_sdev = h->sdev;
+							alua_rtpg_print(tmp_pg_sdev, tmp_pg, NULL);
+						}
 					}
 					rcu_read_unlock();
 				}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable
  2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
                   ` (2 preceding siblings ...)
  2017-04-11  1:18 ` [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too Mauricio Faria de Oliveira
@ 2017-04-11  1:18 ` Mauricio Faria de Oliveira
  2017-04-13 21:40   ` Bart Van Assche
  2017-04-11  1:21 ` [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
  4 siblings, 1 reply; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:18 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

Path checkers will periodically check all paths to a target port group
in unavailable state more often (as they are 'failed'), possibly for a
long or indefinite period of time, or for a large number of paths.

That might end up flooding the kernel log with the scsi_dh_alua target
port group state messages, which are triggered by the rtpg recheck
scheduled in alua_check_sense().

So, modify alua_rtpg() not to print such message when that unavailable
state remains (i.e., the target port group state did not transition to
or from the unavailable state). All other cases continue to be printed.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index c2c9173fd883..8025e024c011 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -554,6 +554,17 @@ static void alua_rtpg_print(struct scsi_device *sdev, struct alua_port_group *pg
 }
 
 /*
+ * alua_state_remains - Whether a RTPG state remains the same across 2 values.
+ * @state: the state value to check for.
+ * @old_state: the old state value.
+ * @new_state: the new state value.
+ */
+static bool alua_state_remains(int state, int old_state, int new_state)
+{
+	return ((old_state == state) && (new_state == state));
+}
+
+/*
  * alua_rtpg - Evaluate REPORT TARGET GROUP STATES
  * @sdev: the device to be evaluated.
  *
@@ -571,6 +582,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 	unsigned int tpg_desc_tbl_off;
 	unsigned char orig_transition_tmo;
 	unsigned long flags;
+	int orig_state;
 
 	if (!pg->expiry) {
 		unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
@@ -656,6 +668,8 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 		goto retry;
 	}
 
+	orig_state = pg->state;
+
 	orig_transition_tmo = pg->transition_tmo;
 	if ((buff[4] & RTPG_FMT_MASK) == RTPG_FMT_EXT_HDR && buff[5] != 0)
 		pg->transition_tmo = buff[5];
@@ -689,6 +703,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 				    !(tmp_pg->flags & ALUA_PG_RUNNING)) {
 					struct alua_dh_data *h;
 					struct scsi_device *tmp_pg_sdev = NULL;
+					int tmp_pg_orig_state = tmp_pg->state;
 
 					tmp_pg->state = desc[0] & 0x0f;
 					tmp_pg->pref = desc[0] >> 7;
@@ -704,8 +719,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 						 * is not running, tmp_pg->rtpg_sdev might be NULL.
 						 * Use another/one associated scsi_device to print
 						 * its RTPG information.
+						 * Don't print it if state remains 'unavailable'.
 						 */
-						if ((tmp_pg != pg) && !tmp_pg_sdev) {
+						if ((tmp_pg != pg) && !tmp_pg_sdev &&
+						    !alua_state_remains(SCSI_ACCESS_STATE_UNAVAILABLE,
+									tmp_pg_orig_state, tmp_pg->state)) {
 							tmp_pg_sdev = h->sdev;
 							alua_rtpg_print(tmp_pg_sdev, tmp_pg, NULL);
 						}
@@ -722,7 +740,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 	}
 
 	spin_lock_irqsave(&pg->lock, flags);
-	alua_rtpg_print(sdev, pg, &valid_states);
+
+	/* Print RTPG information (except if state remains 'unavailable'). */
+	if (likely(!alua_state_remains(SCSI_ACCESS_STATE_UNAVAILABLE,
+					orig_state, pg->state)))
+		alua_rtpg_print(sdev, pg, &valid_states);
 
 	switch (pg->state) {
 	case SCSI_ACCESS_STATE_TRANSITIONING:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state
  2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
                   ` (3 preceding siblings ...)
  2017-04-11  1:18 ` [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable Mauricio Faria de Oliveira
@ 2017-04-11  1:21 ` Mauricio Faria de Oliveira
  4 siblings, 0 replies; 10+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-11  1:21 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: linux-scsi, bart.vanassche

On 04/10/2017 10:17 PM, Mauricio Faria de Oliveira wrote:
> For documentation purposes, I'll reply to this cover letter with the analysis
> of such cases of this problem, and the accompanying messages from kernel logs.

Here it goes, for anyone interested.
Scenario: 4 LUNs, 2 target port groups (PGs), 2 paths per PG.

mpatha (360050764008100dac000000000000100) dm-0 IBM,2145
size=40G features='2 queue_if_no_path retain_attached_hw_handler' 
hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 1:0:1:0 sdf 8:80  active undef running
| `- 2:0:1:0 sdn 8:208 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
   |- 1:0:0:0 sdb 8:16  active undef running
   `- 2:0:0:0 sdj 8:144 active undef running

This LUN's active port group becomes Unavailable.
The ongoing I/O requests sense it via SCSI command errors.

	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 1:0:1:0: [sdf] tag#0 Add. 
Sense: Logical unit not accessible, target port in unavailable state
	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 2:0:1:0: [sdn] tag#0 Add. 
Sense: Logical unit not accessible, target port in unavailable state

Accordingly, dm-mpath fails both paths:

	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:80.
	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:208.

The SCSI command error has gone through alua_check_sense().
It scheduled an alua_rtpg(), which confirms its port group
(for 1:0:1:0) and the other (for 2:0:0:0) are unavailable.

	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 2:0:0:0: alua: port group 01 
state U non-preferred
	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 1:0:1:0: alua: port group 00 
state U non-preferred supports tolusna

Failing over, dm-mpath switched path groups, to the other/enabled port 
group.

Notice that further SCSI command errors _happen_, but are not _reported_,
as alua_rtpg() has set pg->state = unavailable before more I/O requests
eventually reached alua_prep_fn(), which sets them with RQF_QUIET.

Accordingly, dm-mpath fails both paths:

	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:144.
	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:16.

The path checkers are running, and constantly receiving senses of 
unavailable state
(seen through debug messages), and alua_rtpg() rechecks are scheduled, 
but no
repeated messages for 'state U' are printed, as it has not changed at all.

After a few minutes, both port groups become active 
(optimized/non-optimized),
and dm-mpath reinstate all paths:

	Apr  6 07:21:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:16.
	Apr  6 07:21:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:80.
	Apr  6 07:21:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:144.
	Apr  6 07:21:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:208.

The dm-mpath reinstate calls alua_activate() -> alua_rtpg() for both 
paths in the active port group.
The first alua_rtpg() worker is still running when the second 
alua_rtpg() is scheduled,
so pg->rtpg_sdev does not change, and it's reported as 1:0:1:0 twice 
(but called for 1:0:1:0 and 2:0:1:0).

	Apr  6 07:21:01 ltcalpine-lp2 kernel: sd 2:0:0:0: alua: port group 01 
state N non-preferred
	Apr  6 07:21:01 ltcalpine-lp2 kernel: sd 1:0:1:0: alua: port group 00 
state A non-preferred supports tolusna
	Apr  6 07:21:01 ltcalpine-lp2 kernel: sd 2:0:0:0: alua: port group 01 
state N non-preferred
	Apr  6 07:21:01 ltcalpine-lp2 kernel: sd 1:0:1:0: alua: port group 00 
state A non-preferred supports tolusna



...


mpathb (360050764008100dac000000000000101) dm-1 IBM,2145
size=40G features='2 queue_if_no_path retain_attached_hw_handler' 
hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 1:0:0:1 sdc 8:32  active undef running
| `- 2:0:0:1 sdk 8:160 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
   |- 1:0:1:1 sdg 8:96  active undef running
   `- 2:0:1:1 sdo 8:224 active undef running


Similarly,

	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 2:0:0:1: [sdk] tag#2 Add. 
Sense: Logical unit not accessible, target port in unavailable state
	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 1:0:0:1: [sdc] tag#5 Add. 
Sense: Logical unit not accessible, target port in unavailable state

	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:160.
	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:32.

In this case alua_rtpg() does not print messages for the 'other' port group
(without 'supports'), because the other PG worker was running at the time.
There was an alua_rtpg() scheduled due to alua_check_sense() for 2:0:0:1,
and another for alua_activate() of 1:0:1:1 (from the 'enabled' path group).

	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 2:0:0:1: alua: port group 00 
state U non-preferred supports tolusna
	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 1:0:1:1: alua: port group 01 
state U non-preferred supports tolusna

	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:96.
	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:224.

After a few minutes,

	Apr  5 20:53:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:96.
	Apr  5 20:53:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:224.

	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 2:0:0:1: alua: port group 00 
state A non-preferred
	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 1:0:1:1: alua: port group 01 
state N non-preferred supports tolusna
	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 2:0:0:1: alua: port group 00 
state A non-preferred
	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 1:0:1:1: alua: port group 01 
state N non-preferred supports tolusna

	Apr  5 20:53:06 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:32.
	Apr  5 20:53:06 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:160.


...


mpathc (360050764008100dac000000000000141) dm-2 IBM,2145
size=40G features='2 queue_if_no_path retain_attached_hw_handler' 
hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 1:0:0:2 sdd 8:48  active undef running
| `- 2:0:0:2 sdl 8:176 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
   |- 1:0:1:2 sdh 8:112 active undef running
   `- 2:0:1:2 sdp 8:240 active undef running

With this one, no SCSI command error messages because
the path checkers noticed the unavailable state first.

	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:176.
	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:48.

In this case alua_rtpg() does not print messages for the 'other' port group
(same reason as described in previous case). Both port groups are 
unavailable.

	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 2:0:0:2: alua: port group 00 
state U non-preferred supports tolusna
	Apr  5 19:59:52 ltcalpine-lp2 kernel: sd 1:0:1:2: alua: port group 01 
state U non-preferred supports tolusna

	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:240.
	Apr  5 19:59:52 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:112.

After a few minutes, they return to active (opt/non-opt) state.

	Apr  5 20:53:01 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:112.
	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 2:0:0:2: alua: port group 00 
state A non-preferred
	Apr  5 20:53:01 ltcalpine-lp2 kernel: sd 1:0:1:2: alua: port group 01 
state N non-preferred supports tolusna
	Apr  5 20:53:04 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:176.
	Apr  5 20:53:04 ltcalpine-lp2 kernel: sd 2:0:1:2: alua: port group 01 
state N non-preferred
	Apr  5 20:53:04 ltcalpine-lp2 kernel: sd 2:0:0:2: alua: port group 00 
state A non-preferred supports tolusna
	Apr  5 20:53:05 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:48.
	Apr  5 20:53:05 ltcalpine-lp2 kernel: sd 2:0:1:2: alua: port group 01 
state N non-preferred
	Apr  5 20:53:05 ltcalpine-lp2 kernel: sd 1:0:0:2: alua: port group 00 
state A non-preferred supports tolusna
	Apr  5 20:53:06 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:240.

...

mpathd (360050764008100dac000000000000142) dm-3 IBM,2145
size=40G features='2 queue_if_no_path retain_attached_hw_handler' 
hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 1:0:1:3 sdi 8:128 active undef running
| `- 2:0:1:3 sdq 65:0  active undef running
`-+- policy='service-time 0' prio=0 status=enabled
   |- 1:0:0:3 sde 8:64  active undef running
   `- 2:0:0:3 sdm 8:192 active undef running

Similarly to the cases above, with some duplicated messages for the RTPG 
information
because alua_activate() schedules both ALUA_PG_RUN_[RS]TPG, so 
alua_rtpg_work() first
runs ALUA_PG_RUN_RTPG / alua_rtpg() which prints the message once, and 
then goes into
ALUA_PG_RUN_STPG / alua_stpg(), but as this storage system only supports 
implict TGPS,
it falls into SCSI_DH_RETRY, which schedules another 
ALUA_PG_RUN_RTPG/alua_rtpg() and
it prints the message again.

	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 2:0:1:3: [sdq] tag#5 Add. 
Sense: Logical unit not accessible, target port in unavailable state
	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 65:0.
	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 1:0:1:3: [sdi] tag#8 Add. 
Sense: Logical unit not accessible, target port in unavailable state
	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:128.

	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 2:0:0:3: alua: port group 01 
state U non-preferred
	Apr  6 07:14:56 ltcalpine-lp2 kernel: sd 1:0:1:3: alua: port group 00 
state U non-preferred supports tolusna

	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:192.
	Apr  6 07:14:56 ltcalpine-lp2 kernel: device-mapper: multipath: Failing 
path 8:64.

	Apr  6 07:47:25 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:192.
	Apr  6 07:47:25 ltcalpine-lp2 kernel: sd 2:0:1:3: alua: port group 00 
state A non-preferred
	Apr  6 07:47:25 ltcalpine-lp2 kernel: sd 2:0:0:3: alua: port group 01 
state N non-preferred supports tolusna
	Apr  6 07:47:25 ltcalpine-lp2 kernel: sd 2:0:1:3: alua: port group 00 
state A non-preferred
	Apr  6 07:47:25 ltcalpine-lp2 kernel: sd 2:0:0:3: alua: port group 01 
state N non-preferred supports tolusna
	Apr  6 07:47:26 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:64.
	Apr  6 07:47:26 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 8:128.
	Apr  6 07:47:26 ltcalpine-lp2 kernel: device-mapper: multipath: 
Reinstating path 65:0.
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 2:0:1:3: alua: port group 00 
state A non-preferred
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 1:0:0:3: alua: port group 01 
state N non-preferred supports tolusna
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 2:0:1:3: alua: port group 00 
state A non-preferred
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 1:0:0:3: alua: port group 01 
state N non-preferred supports tolusna
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 2:0:0:3: alua: port group 01 
state N non-preferred
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 1:0:1:3: alua: port group 00 
state A non-preferred supports tolusna
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 2:0:0:3: alua: port group 01 
state N non-preferred
	Apr  6 07:47:26 ltcalpine-lp2 kernel: sd 1:0:1:3: alua: port group 00 
state A non-preferred supports tolusna



-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the target port unavailable state
  2017-04-11  1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
@ 2017-04-13 21:14   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-04-13 21:14 UTC (permalink / raw)
  To: mauricfo, hare, martin.petersen; +Cc: linux-scsi

On Mon, 2017-04-10 at 22:17 -0300, Mauricio Faria de Oliveira wrote:
> According to SPC-4 (5.15.2.4.5 Unavailable state), the unavailable
> state may (or may not) transition to other states (e.g., microcode
> downloading or hardware error, which may be temporary or permanent
> conditions, respectively).
> 
> But, scsi_dh_alua currently fails the I/O requests early once that
> state is established (in alua_prep_fn()), which provides no chance
> for path checkers going through that function path to really check
> whether the path actually still fails I/O requests or recovered to
> an active state.
> 
> This might cause device-mapper multipath to fail all paths to some
> storage system that moves the controllers to the unavailable state
> for firmware upgrades, and never recover regardless of the storage
> system doing upgrades one controller at a time and get them online.
> 
> Then I/O requests are blocked indefinitely due to queue_if_no_path
> but the underlying individual paths are fully operational, and can
> be verified as such through other function paths (e.g., SG_IO):
> 
>     # multipath -l
>     mpatha (360050764008100dac000000000000100) dm-0 IBM,2145
>     size=40G features='2 queue_if_no_path retain_attached_hw_handler'
>     hwhandler='1 alua' wp=rw
>     |-+- policy='service-time 0' prio=0 status=enabled
>     | |- 1:0:1:0 sdf 8:80  failed undef running
>     | `- 2:0:1:0 sdn 8:208 failed undef running
>     `-+- policy='service-time 0' prio=0 status=enabled
>       |- 1:0:0:0 sdb 8:16  failed undef running
>       `- 2:0:0:0 sdj 8:144 failed undef running
> 
>     # strace -e read \
>         sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
>         2>&1 | grep 512
>     read(3, 0x3fff7ba80000, 512) = -1 EIO (Input/output error)
> 
>     # strace -e ioctl \
>         sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
>         blk_sgio=1 \
>         2>&1 | grep 512
>     ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[10]=[28, 00, 00, 00,
>     00, 00, 00, 00, 01, 00], <...>) = 0
> 
> So, allow I/O to target port (groups) in the unavailable state, so the
> path checkers can actually check them, and schedule a recheck whenever
> the unavailable state is detected so pg->state can be updated properly
> (and further SCSI IO error messages then silenced through alua_prep_fn()).
> 
> Once a path checker eventually detects an active state again, the port
> group state will be updated by the path activation call, alua_activate(),
> as it schedules an alua_rtpg() check.
> 
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
> Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index c01b47e5b55a..5e5a33cac951 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -431,6 +431,20 @@ static int alua_check_sense(struct scsi_device *sdev,
>  			alua_check(sdev, false);
>  			return NEEDS_RETRY;
>  		}
> +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0c) {
> +			/*
> +			 * LUN Not Accessible - target port in unavailable state.
> +			 *
> +			 * It may (not) be possible to transition to other states;
> +			 * the transition might take a while or not happen at all,
> +			 * depending on the storage system model, error type, etc.
> +			 *
> +			 * Do not retry, so failover to another target port occur.
> +			 * Schedule a recheck to update state for other functions.
> +			 */
> +			alua_check(sdev, true);
> +			return SUCCESS;
> +		}
>  		break;
>  	case UNIT_ATTENTION:
>  		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
> @@ -1057,6 +1071,8 @@ static void alua_check(struct scsi_device *sdev, bool force)
>   *
>   * Fail I/O to all paths not in state
>   * active/optimized or active/non-optimized.
> + * Allow I/O to all paths in state unavailable
> + * so path checkers can actually check them.
>   */
>  static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  {
> @@ -1072,6 +1088,8 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  	rcu_read_unlock();
>  	if (state == SCSI_ACCESS_STATE_TRANSITIONING)
>  		ret = BLKPREP_DEFER;
> +	else if (state == SCSI_ACCESS_STATE_UNAVAILABLE)
> +		req->rq_flags |= RQF_QUIET;
>  	else if (state != SCSI_ACCESS_STATE_OPTIMAL &&
>  		 state != SCSI_ACCESS_STATE_ACTIVE &&
>  		 state != SCSI_ACCESS_STATE_LBA) {

Hello Mauricio,

Please also add support for the "standby" state to both alua_check_sense()
and alua_prep_fn() while you are modifying these functions.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk
  2017-04-11  1:17 ` [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk Mauricio Faria de Oliveira
@ 2017-04-13 21:18   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-04-13 21:18 UTC (permalink / raw)
  To: mauricfo, hare, martin.petersen; +Cc: linux-scsi

On Mon, 2017-04-10 at 22:17 -0300, Mauricio Faria de Oliveira wrote:
>  /*
> + * alua_rtpg_print - Print REPORT TARGET GROUP STATES information
> + * @sdev: the device evaluated (or associated with this port group).
> + * @pg: the port group the information is associated with.
> + * @valid_states: pointer to valid_states value.
> + *                (optional; e.g., obtained via another port group)
> + *
> + * Must be called with pg->lock held.
> + */

Please use lockdep_assert_held() instead of documenting locking conventions as
comments. lockdep_assert_held() is verified at runtime with CONFIG_PROVE_LOCKING=y
but comments not.

> +static void alua_rtpg_print(struct scsi_device *sdev, struct alua_port_group *pg,
> +			    int *valid_states)
> +{
> +	if (valid_states)
> +		sdev_printk(KERN_INFO, sdev,
> +		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
> +		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
> +		    pg->pref ? "preferred" : "non-preferred",
> +		    (*valid_states) & TPGS_SUPPORT_TRANSITION ? 'T' : 't',
> +		    (*valid_states) & TPGS_SUPPORT_OFFLINE ? 'O' : 'o',
> +		    (*valid_states) & TPGS_SUPPORT_LBA_DEPENDENT ? 'L' : 'l',
> +		    (*valid_states) & TPGS_SUPPORT_UNAVAILABLE ? 'U' : 'u',
> +		    (*valid_states) & TPGS_SUPPORT_STANDBY ? 'S' : 's',
> +		    (*valid_states) & TPGS_SUPPORT_NONOPTIMIZED ? 'N' : 'n',
> +		    (*valid_states) & TPGS_SUPPORT_OPTIMIZED  ?  'A' : 'a');
> +	else
> +		sdev_printk(KERN_INFO, sdev,
> +		    "%s: port group %02x state %c %s\n",
> +		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
> +		    pg->pref ? "preferred" : "non-preferred");
> +}

Please define two functions - one for valid_states != NULL and one for valid_states
== NULL such that valid_states can be passed by value instead of by reference.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
  2017-04-11  1:18 ` [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too Mauricio Faria de Oliveira
@ 2017-04-13 21:35   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-04-13 21:35 UTC (permalink / raw)
  To: mauricfo, hare, martin.petersen; +Cc: linux-scsi

On Mon, 2017-04-10 at 22:18 -0300, Mauricio Faria de Oliveira wrote:
> Currently, alua_rtpg() can change the 'state' and 'preferred'
> values for the current port group _and_ of other port groups
> present in the response buffer/descriptors.
> 
> However, it reports such changes _only_ for the current port
> group (i.e., only for 'pg' but not for 'tmp_pg').
> 
> This might cause uncertainty and confusion when going through
> the kernel logs for analyzing/debugging scsi_dh_alua behavior,
> which is not helpful during support and development scenarios.
> 
> So, print such changes for other port groups than the current
> one.
> 
> This requires finding a scsi_device to call sdev_printk() with
> for that other port group. Using 'tmp_pg->rtpg_sdev' is not an
> option because in that 'if' conditional the 'tmp_pg' is not in
> the ALUA_PG_RUNNING state, so that pointer may be NULL. So the
> for-loop over the tmp->pg device_handler structures is used to
> find a valid scsi_device that is associated to this port group.
> 
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 0d3508f2e285..c2c9173fd883 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -688,6 +688,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>  				if ((tmp_pg == pg) ||
>  				    !(tmp_pg->flags & ALUA_PG_RUNNING)) {
>  					struct alua_dh_data *h;
> +					struct scsi_device *tmp_pg_sdev = NULL;
>  
>  					tmp_pg->state = desc[0] & 0x0f;
>  					tmp_pg->pref = desc[0] >> 7;
> @@ -697,6 +698,17 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>  						/* h->sdev should always be valid */
>  						BUG_ON(!h->sdev);
>  						h->sdev->access_state = desc[0];
> +
> +						/*
> +						 * If tmp_pg is not the running pg, and its worker
> +						 * is not running, tmp_pg->rtpg_sdev might be NULL.
> +						 * Use another/one associated scsi_device to print
> +						 * its RTPG information.
> +						 */
> +						if ((tmp_pg != pg) && !tmp_pg_sdev) {
> +							tmp_pg_sdev = h->sdev;
> +							alua_rtpg_print(tmp_pg_sdev, tmp_pg, NULL);
> +						}
>  					}
>  					rcu_read_unlock();
>  				}

Hello Mauricio,

Please consider moving the alua_rtpg_print() call out of the loop. That will
not only allow to eliminate the tmp_pg_sdev variable but will also reduce the
nesting depth of the code. E.g. something like the following (untested) code:

					list_for_each_entry_rcu(h,
						&tmp_pg->dh_list, node) {
						[ ... ]
					}
+ 					h = list_first_or_null_rcu(&tmp_pg->dh_list, typeof(*h), node);
+ 					if (tmp_pg != pg && h)
+ 						alua_rtpg_print(h->sdev, tmp_pg, NULL);
 					rcu_read_unlock();

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable
  2017-04-11  1:18 ` [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable Mauricio Faria de Oliveira
@ 2017-04-13 21:40   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-04-13 21:40 UTC (permalink / raw)
  To: mauricfo, hare, martin.petersen; +Cc: linux-scsi

On Mon, 2017-04-10 at 22:18 -0300, Mauricio Faria de Oliveira wrote:
>  /*
> + * alua_state_remains - Whether a RTPG state remains the same across 2 values.
> + * @state: the state value to check for.
> + * @old_state: the old state value.
> + * @new_state: the new state value.
> + */
> +static bool alua_state_remains(int state, int old_state, int new_state)
> +{
> +	return ((old_state == state) && (new_state == state));
> +}

Hello Mauricio,

All parentheses in the return statement are superfluous. Please consider
removing these.

> +/*
> -	alua_rtpg_print(sdev, pg, &valid_states);
> +
> +	/* Print RTPG information (except if state remains 'unavailable'). */
> +	if (likely(!alua_state_remains(SCSI_ACCESS_STATE_UNAVAILABLE,
> +					orig_state, pg->state)))
> +		alua_rtpg_print(sdev, pg, &valid_states);

Using "likely()" may prevent the CPU branch predictor to do it's work so in
kernel code usually likely() is only used in code that is in the hot path.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-04-13 21:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11  1:17 [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira
2017-04-11  1:17 ` [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the " Mauricio Faria de Oliveira
2017-04-13 21:14   ` Bart Van Assche
2017-04-11  1:17 ` [PATCH 2/4] scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg() sdev_printk Mauricio Faria de Oliveira
2017-04-13 21:18   ` Bart Van Assche
2017-04-11  1:18 ` [PATCH 3/4] scsi: scsi_dh_alua: print changes to RTPG state of other PGs too Mauricio Faria de Oliveira
2017-04-13 21:35   ` Bart Van Assche
2017-04-11  1:18 ` [PATCH 4/4] scsi: scsi_dh_alua: do not print target port group state if it remains unavailable Mauricio Faria de Oliveira
2017-04-13 21:40   ` Bart Van Assche
2017-04-11  1:21 ` [PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state Mauricio Faria de Oliveira

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.