* [PATCH] ibmvfc: fix command state accounting and stale response detection
@ 2021-07-16 20:52 Tyrel Datwyler
2021-07-17 15:34 ` Christophe Leroy
2021-07-29 3:37 ` Martin K. Petersen
0 siblings, 2 replies; 3+ messages in thread
From: Tyrel Datwyler @ 2021-07-16 20:52 UTC (permalink / raw)
To: james.bottomley
Cc: martin.petersen, linux-scsi, linuxppc-dev, linux-kernel, brking,
Tyrel Datwyler, stable
Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside
the host/queue lock") responses to commands were completed sequentially
with the host lock held such that a command had a basic binary state of
active or free. It was therefore a simple affair of ensuring the
assocaiated ibmvfc_event to a VIOS response was valid by testing that it
was not already free. The lock relexation work to complete commands
outside the lock inadverdently made it a trinary command state such that
a command is either in flight, received and being completed, or
completed and now free. This breaks the stale command detection logic as
a command may be still marked active and been placed on the delayed
completion list when a second stale response for the same command
arrives. This can lead to double completions and list corruption. This
issue was exposed by a recent VIOS regression were a missing memory
barrier could occasionally result in the ibmvfc client receiveing a
duplicate response for the same command.
Fix the issue by introducing the atomic ibmvfc_event.active to track the
trinary state of a command. The state is explicitly set to 1 when a
command is successfully sent. The CRQ response handlers use
atomic_dec_if_positive() to test for stale responses and correctly
transition to the completion state when a active command is received.
Finally, atomic_dec_and_test() is used to sanity check transistions
when commands are freed as a result of a completion, or moved to the
purge list as a result of error handling or adapter reset.
Cc: stable@vger.kernel.org
Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the host/queue lock")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 19 +++++++++++++++++--
drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index bee1bec49c09..935b01ee44b7 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -807,6 +807,13 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
for (i = 0; i < size; ++i) {
struct ibmvfc_event *evt = &pool->events[i];
+ /*
+ * evt->active states
+ * 1 = in flight
+ * 0 = being completed
+ * -1 = free/freed
+ */
+ atomic_set(&evt->active, -1);
atomic_set(&evt->free, 1);
evt->crq.valid = 0x80;
evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i));
@@ -1017,6 +1024,7 @@ static void ibmvfc_free_event(struct ibmvfc_event *evt)
BUG_ON(!ibmvfc_valid_event(pool, evt));
BUG_ON(atomic_inc_return(&evt->free) != 1);
+ BUG_ON(atomic_dec_and_test(&evt->active));
spin_lock_irqsave(&evt->queue->l_lock, flags);
list_add_tail(&evt->queue_list, &evt->queue->free);
@@ -1072,6 +1080,12 @@ static void ibmvfc_complete_purge(struct list_head *purge_list)
**/
static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
{
+ /*
+ * Anything we are failing should still be active. Otherwise, it
+ * implies we already got a response for the command and are doing
+ * something bad like double completing it.
+ */
+ BUG_ON(!atomic_dec_and_test(&evt->active));
if (evt->cmnd) {
evt->cmnd->result = (error_code << 16);
evt->done = ibmvfc_scsi_eh_done;
@@ -1723,6 +1737,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
evt->done(evt);
} else {
+ atomic_set(&evt->active, 1);
spin_unlock_irqrestore(&evt->queue->l_lock, flags);
ibmvfc_trc_start(evt);
}
@@ -3251,7 +3266,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost,
return;
}
- if (unlikely(atomic_read(&evt->free))) {
+ if (unlikely(atomic_dec_if_positive(&evt->active))) {
dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
crq->ioba);
return;
@@ -3778,7 +3793,7 @@ static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost
return;
}
- if (unlikely(atomic_read(&evt->free))) {
+ if (unlikely(atomic_dec_if_positive(&evt->active))) {
dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
crq->ioba);
return;
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 4f0f3baefae4..92fb889d7eb0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -745,6 +745,7 @@ struct ibmvfc_event {
struct ibmvfc_target *tgt;
struct scsi_cmnd *cmnd;
atomic_t free;
+ atomic_t active;
union ibmvfc_iu *xfer_iu;
void (*done)(struct ibmvfc_event *evt);
void (*_done)(struct ibmvfc_event *evt);
--
2.27.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] ibmvfc: fix command state accounting and stale response detection
2021-07-16 20:52 [PATCH] ibmvfc: fix command state accounting and stale response detection Tyrel Datwyler
@ 2021-07-17 15:34 ` Christophe Leroy
2021-07-29 3:37 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Christophe Leroy @ 2021-07-17 15:34 UTC (permalink / raw)
To: Tyrel Datwyler
Cc: linuxppc-dev, brking, stable, linux-kernel, linux-scsi,
martin.petersen, james.bottomley
Tyrel Datwyler <tyreld@linux.ibm.com> a écrit :
> Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside
> the host/queue lock") responses to commands were completed sequentially
> with the host lock held such that a command had a basic binary state of
> active or free. It was therefore a simple affair of ensuring the
> assocaiated ibmvfc_event to a VIOS response was valid by testing that it
> was not already free. The lock relexation work to complete commands
> outside the lock inadverdently made it a trinary command state such that
> a command is either in flight, received and being completed, or
> completed and now free. This breaks the stale command detection logic as
> a command may be still marked active and been placed on the delayed
> completion list when a second stale response for the same command
> arrives. This can lead to double completions and list corruption. This
> issue was exposed by a recent VIOS regression were a missing memory
> barrier could occasionally result in the ibmvfc client receiveing a
> duplicate response for the same command.
>
> Fix the issue by introducing the atomic ibmvfc_event.active to track the
> trinary state of a command. The state is explicitly set to 1 when a
> command is successfully sent. The CRQ response handlers use
> atomic_dec_if_positive() to test for stale responses and correctly
> transition to the completion state when a active command is received.
> Finally, atomic_dec_and_test() is used to sanity check transistions
> when commands are freed as a result of a completion, or moved to the
> purge list as a result of error handling or adapter reset.
>
> Cc: stable@vger.kernel.org
> Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the
> host/queue lock")
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 19 +++++++++++++++++--
> drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index bee1bec49c09..935b01ee44b7 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -807,6 +807,13 @@ static int ibmvfc_init_event_pool(struct
> ibmvfc_host *vhost,
> for (i = 0; i < size; ++i) {
> struct ibmvfc_event *evt = &pool->events[i];
>
> + /*
> + * evt->active states
> + * 1 = in flight
> + * 0 = being completed
> + * -1 = free/freed
> + */
> + atomic_set(&evt->active, -1);
> atomic_set(&evt->free, 1);
> evt->crq.valid = 0x80;
> evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i));
> @@ -1017,6 +1024,7 @@ static void ibmvfc_free_event(struct ibmvfc_event *evt)
>
> BUG_ON(!ibmvfc_valid_event(pool, evt));
> BUG_ON(atomic_inc_return(&evt->free) != 1);
> + BUG_ON(atomic_dec_and_test(&evt->active));
Avoid new BUG_ONs. See
https://www.kernel.org/doc/html/latest/process/deprecated.html
>
> spin_lock_irqsave(&evt->queue->l_lock, flags);
> list_add_tail(&evt->queue_list, &evt->queue->free);
> @@ -1072,6 +1080,12 @@ static void ibmvfc_complete_purge(struct
> list_head *purge_list)
> **/
> static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
> {
> + /*
> + * Anything we are failing should still be active. Otherwise, it
> + * implies we already got a response for the command and are doing
> + * something bad like double completing it.
> + */
> + BUG_ON(!atomic_dec_and_test(&evt->active));
Same
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] ibmvfc: fix command state accounting and stale response detection
2021-07-16 20:52 [PATCH] ibmvfc: fix command state accounting and stale response detection Tyrel Datwyler
2021-07-17 15:34 ` Christophe Leroy
@ 2021-07-29 3:37 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2021-07-29 3:37 UTC (permalink / raw)
To: james.bottomley, Tyrel Datwyler
Cc: Martin K . Petersen, linuxppc-dev, brking, linux-kernel, stable,
linux-scsi
On Fri, 16 Jul 2021 14:52:20 -0600, Tyrel Datwyler wrote:
> Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside
> the host/queue lock") responses to commands were completed sequentially
> with the host lock held such that a command had a basic binary state of
> active or free. It was therefore a simple affair of ensuring the
> assocaiated ibmvfc_event to a VIOS response was valid by testing that it
> was not already free. The lock relexation work to complete commands
> outside the lock inadverdently made it a trinary command state such that
> a command is either in flight, received and being completed, or
> completed and now free. This breaks the stale command detection logic as
> a command may be still marked active and been placed on the delayed
> completion list when a second stale response for the same command
> arrives. This can lead to double completions and list corruption. This
> issue was exposed by a recent VIOS regression were a missing memory
> barrier could occasionally result in the ibmvfc client receiveing a
> duplicate response for the same command.
>
> [...]
Applied to 5.14/scsi-fixes, thanks!
[1/1] ibmvfc: fix command state accounting and stale response detection
https://git.kernel.org/mkp/scsi/c/73bfdf707d01
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-07-29 3:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 20:52 [PATCH] ibmvfc: fix command state accounting and stale response detection Tyrel Datwyler
2021-07-17 15:34 ` Christophe Leroy
2021-07-29 3:37 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).