All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: mwilck@suse.com
Cc: Donald Buczek <buczek@molgen.mpg.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	James Bottomley <jejb@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de>,
	Don Brace <Don.Brace@microchip.com>,
	Kevin Barnett <Kevin.Barnett@microchip.com>,
	John Garry <john.garry@huawei.com>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
Date: Fri, 22 Jan 2021 11:23:40 +0800	[thread overview]
Message-ID: <20210122032340.GB509982@T590> (raw)
In-Reply-To: <20210120184548.20219-1-mwilck@suse.com>

On Wed, Jan 20, 2021 at 07:45:48PM +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Donald: please give this patch a try.
> 
> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> contained this hunk:
> 
> -       busy = atomic_inc_return(&shost->host_busy) - 1;
>         if (atomic_read(&shost->host_blocked) > 0) {
> -               if (busy)
> +               if (scsi_host_busy(shost) > 0)
>                         goto starved;
> 
> The previous code would increase the busy count before checking host_blocked.
> With 6eb045e092ef, the busy count would be increased (by setting the
> SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.
> 
> Users have reported a regression with the smartpqi driver [1] which has been
> shown to be caused by this commit [2].
> 
> It seems that by moving the increase of the busy counter further down, it could
> happen that the can_queue limit of the controller could be exceeded if several
> CPUs were executing this code in parallel on different queues.

can_queue limit should never be exceeded because it is respected by
blk-mq since each hw queue's queue depth is .can_queue.

smartpqi's issue is that its .can_queue does not represent each hw
queue's depth, instead the .can_queue represents queue depth of the
whole HBA.

As John mentioned, smartpqi should have switched to hosttags.

BTW, looks the following code has soft lockup risk:

pqi_alloc_io_request():
        while (1) {
                io_request = &ctrl_info->io_request_pool[i];
                if (atomic_inc_return(&io_request->refcount) == 1)
                        break;
                atomic_dec(&io_request->refcount);
                i = (i + 1) % ctrl_info->max_io_slots;
        }

> 
> This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
> the host_blocked test again. It also inserts barriers to make sure
> scsi_host_busy() on once CPU will notice the increase of the count from another.
> 
> [1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
> [2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2

If the above is true wrt. smartpqi's can_queue usage, your patch may not fix the
issue completely in which you think '.can_queue is exceeded'.

> 
> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Don Brace <Don.Brace@microchip.com>
> Cc: Kevin Barnett <Kevin.Barnett@microchip.com>
> Cc: Donald Buczek <buczek@molgen.mpg.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/hosts.c    | 2 ++
>  drivers/scsi/scsi_lib.c | 8 +++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 2f162603876f..1c452a1c18fd 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
>  	int *count = data;
>  	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>  
> +	/* This pairs with set_bit() in scsi_host_queue_ready() */
> +	smp_mb__before_atomic();

So the above barrier orders atomic_read(&shost->host_blocked) and
test_bit()?

>  	if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
>  		(*count)++;
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b3f14f05340a..0a9a36c349ee 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  	if (scsi_host_in_recovery(shost))
>  		return 0;
>  
> +	set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> +	/* This pairs with test_bit() in scsi_host_check_in_flight() */
> +	smp_mb__after_atomic();
> +
>  	if (atomic_read(&shost->host_blocked) > 0) {
> -		if (scsi_host_busy(shost) > 0)
> +		if (scsi_host_busy(shost) > 1)
>  			goto starved;
>  
>  		/*
> @@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  		spin_unlock_irq(shost->host_lock);
>  	}
>  
> -	__set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> -

Looks this patch fine.

However, I'd suggest to confirm smartpqi's .can_queue usage first, which
looks one big issue.

-- 
Ming


  parent reply	other threads:[~2021-01-22  3:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 18:45 [PATCH] scsi: scsi_host_queue_ready: increase busy count early mwilck
2021-01-20 20:26 ` John Garry
2021-01-21 12:01   ` Donald Buczek
2021-01-21 12:35     ` John Garry
2021-01-21 12:44       ` Donald Buczek
2021-01-21 13:05         ` John Garry
2021-01-21 23:32           ` Martin Wilck
2021-03-11 16:36             ` Donald Buczek
2021-02-01 22:44           ` Don.Brace
2021-02-02 20:04           ` Don.Brace
2021-02-02 20:48             ` Martin Wilck
2021-02-03  8:49               ` John Garry
2021-02-03  8:58                 ` Paul Menzel
2021-02-03 15:30                   ` Don.Brace
2021-02-03 15:56               ` Don.Brace
2021-02-03 18:25                 ` John Garry
2021-02-03 19:01                   ` Don.Brace
2021-02-22 14:23                 ` Roger Willcocks
2021-02-23  8:57                   ` John Garry
2021-02-23 14:06                     ` Roger Willcocks
2021-02-23 16:17                       ` John Garry
2021-03-01 14:51                   ` Paul Menzel
2021-01-21  9:07 ` Donald Buczek
2021-01-21 10:05   ` Martin Wilck
2021-01-22  0:14     ` Martin Wilck
2021-01-22  3:23 ` Ming Lei [this message]
2021-01-22 14:05   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122032340.GB509982@T590 \
    --to=ming.lei@redhat.com \
    --cc=Don.Brace@microchip.com \
    --cc=Kevin.Barnett@microchip.com \
    --cc=buczek@molgen.mpg.de \
    --cc=hare@suse.de \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mwilck@suse.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.