[PATCH 1/1]: scsi scsi_dh_alua: don't fail I/O until transition time expires

* [PATCH 1/1]: scsi scsi_dh_alua: don't fail I/O until transition time expires
@ 2021-05-28 18:34 Brian Bunker
  2021-06-07 12:42 ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Bunker @ 2021-05-28 18:34 UTC (permalink / raw)
  To: linux-scsi

Do not return an error to multipath which will result in a failed path until the transition time expires.

The current patch which returns BLK_STS_AGAIN for ALUA transitioning breaks the assumptions in our target regarding ALUA states. With that change an error is very quickly returned to multipath which in turn immediately fails the path. The assumption in that patch seems to be that another path will be available for multipath to use. That assumption I don’t believe is fair to make since while one path is in a transitioning state it is reasonable to assume that other paths may also be in non active states. 

The SPC spec has a note on this:
The IMPLICIT TRANSITION TIME field indicates the minimum amount of time in seconds the application client should wait prior to timing out an implicit state transition (see 5.15.2.2). A value of zero indicates that the implicit transition time is not specified.

In the SCSI ALUA device handler a value of 0 translates to the transition time being set to 60 seconds. The current approach of failing I/O on the transitioning path in a much quicker time than what is stated seems to violate this aspect of the specification.

#define ALUA_FAILOVER_TIMEOUT		60
unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;

This patch uses the expiry the same way it is used elsewhere in the device handler. Once the transition state is entered keep retrying until the expiry value is reached. If that happens, return the error to multipath the same way that is currently done with BLK_STS_AGAIN.

Signed-off-by: Brian Bunker <brian@purestorage.com>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
Acked-by: Seamus Connor <sconnor@purestorage.com>

____

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c86c01bfecdb..8fd4f677f9c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1395,7 +1395,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
                        break;
                default:
                        errors++;
-                       blk_mq_end_request(rq, ret);
+                       blk_mq_end_request(rq, BLK_STS_IOERR);
                }
        } while (!list_empty(list));
 out:
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index efa8c0381476..dd51200665b9 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -417,13 +417,36 @@ static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
                        /*
                         * LUN Not Accessible - ALUA state transition
                         */
+                       unsigned long expiry = 0;
                        rcu_read_lock();
                        pg = rcu_dereference(h->pg);
-                       if (pg)
-                               pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+                       if (pg) {
+                               if (pg->state != SCSI_ACCESS_STATE_TRANSITIONING) {
+                                       unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
+                                       pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+
+                                       if (pg->transition_tmo)
+                                               transition_tmo = pg->transition_tmo * HZ;
+                                       pg->expiry = round_jiffies_up(jiffies + transition_tmo);
+                                       sdev_printk(KERN_INFO, sdev,
+                                               "%s: port group %02x expiry set to %lu\n",
+                                               ALUA_DH_NAME, pg->group_id, pg->expiry);
+                               }
+                               expiry = pg->expiry;
+                       }
+                       rcu_read_unlock();
+                       if (expiry != 0 && time_before_eq(jiffies, expiry)) {
+                               alua_check(sdev, false);
+                               return NEEDS_RETRY;
+                       }
+                       rcu_read_lock();
+                       pg = rcu_dereference(h->pg);
+                       if (pg && pg->state == SCSI_ACCESS_STATE_TRANSITIONING)
+                               pg->state = SCSI_ACCESS_STATE_UNAVAILABLE;
                        rcu_read_unlock();
-                       alua_check(sdev, false);
-                       return NEEDS_RETRY;
+                       sdev_printk(KERN_ERR, sdev,
+                               "%s: transition timeout (expiry %lu) exceeded\n",
+                               ALUA_DH_NAME, expiry);
                }
                break;
        case UNIT_ATTENTION:
@@ -1109,7 +1132,7 @@ static blk_status_t alua_prep_fn(struct scsi_device *sdev, struct request *req)
        case SCSI_ACCESS_STATE_LBA:
                return BLK_STS_OK;
        case SCSI_ACCESS_STATE_TRANSITIONING:
-               return BLK_STS_AGAIN;
+               return BLK_STS_RESOURCE;
        default:
                req->rq_flags |= RQF_QUIET;
                return BLK_STS_IOERR;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 532304d42f00..bdc5d015de77 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -735,9 +735,6 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
                                case 0x24: /* depopulation in progress */
                                        action = ACTION_DELAYED_RETRY;
                                        break;
-                               case 0x0a: /* ALUA state transition */
-                                       blk_stat = BLK_STS_AGAIN;
-                                       fallthrough;
                                default:
                                        action = ACTION_FAIL;
                                        break;


Brian Bunker
SW Eng
brian@purestorage.com




^ permalink raw reply related	[flat|nested] 7+ messages in thread