All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: John Garry <john.garry@huawei.com>,
	Don.Brace@microchip.com, pmenzel@molgen.mpg.de,
	Kevin.Barnett@microchip.com, Scott.Teel@microchip.com,
	Justin.Lindley@microchip.com, Scott.Benesh@microchip.com,
	Gerry.Morong@microchip.com, Mahesh.Rajashekhara@microchip.com,
	hch@infradead.org, joseph.szczypek@hpe.com, POSWALD@suse.com,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Ming Lei <ming.lei@redhat.com>
Cc: linux-scsi@vger.kernel.org, it+linux-scsi@molgen.mpg.de,
	buczek@molgen.mpg.de, gregkh@linuxfoundation.org
Subject: Re: [PATCH V3 15/25] smartpqi: fix host qdepth limit
Date: Tue, 19 Jan 2021 15:12:43 +0100	[thread overview]
Message-ID: <4555695d649afada5d4358485f0a146aa0848f65.camel@suse.com> (raw)
In-Reply-To: <b3e4e597-779b-7c1e-0d3c-07bc3dab1bb5@huawei.com>

On Tue, 2021-01-19 at 10:33 +0000, John Garry wrote:
> > > 
> > > Am 10.12.20 um 21:35 schrieb Don Brace:
> > > > From: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
> > > > 
> > > > * Correct scsi-mid-layer sending more requests than
> > > >     exposed host Q depth causing firmware ASSERT issue.
> > > >     * Add host Qdepth counter.
> > > 
> > > This supposedly fixes the regression between Linux 5.4 and 5.9,
> > > which
> > > we reported in [1].
> > > 
> > >       kernel: smartpqi 0000:89:00.0: controller is offline:
> > > status code
> > > 0x6100c
> > >       kernel: smartpqi 0000:89:00.0: controller offline
> > > 
> > > Thank you for looking into this issue and fixing it. We are going
> > > to
> > > test this.
> > > 
> > > For easily finding these things in the git history or the WWW, it
> > > would be great if these log messages could be included (in the
> > > future).
> > > DON> Thanks for your suggestion. Well add them in the next time.
> > > 
> > > Also, that means, that the regression is still present in Linux
> > > 5.10,
> > > released yesterday, and this commit does not apply to these
> > > versions.
> > > 
> > > DON> They have started 5.10-RC7 now. So possibly 5.11 or 5.12
> > > depending when all of the patches are applied. The patch in
> > > question
> > > is among 28 other patches.
> > > 
> > > Mahesh, do you have any idea, what commit caused the regression
> > > and
> > > why the issue started to show up?
> > > DON> The smartpqi driver sets two scsi_host_template member
> > > fields:
> > > .can_queue and .nr_hw_queues. But we have not yet converted to
> > > host_tagset. So the queue_depth becomes nr_hw_queues * can_queue,
> > > which is more than the hw can support. That can be verified by
> > > looking
> > > at scsi_host.h.
> > >          /*
> > >           * In scsi-mq mode, the number of hardware queues
> > > supported by
> > > the LLD.
> > >           *
> > >           * Note: it is assumed that each hardware queue has a
> > > queue
> > > depth of
> > >           * can_queue. In other words, the total queue depth per
> > > host
> > >           * is nr_hw_queues * can_queue. However, for when
> > > host_tagset
> > > is set,
> > >           * the total queue depth is can_queue.
> > >           */
> > > 
> > > So, until we make this change, the queue_depth change prevents
> > > the
> > > above issue from happening.
> > 
> > can_queue and nr_hw_queues have been set like this as long as the
> > driver existed. Why did Paul observe a regression with 5.9?
> > 
> > And why can't you simply set can_queue to (ctrl_info-
> > >scsi_ml_can_queue / nr_hw_queues)?
> > 
> > Don: I did this in an internal patch, but this patch seemed to work
> > the best for our driver. HBA performance remained steady when
> > running benchmarks.

That was a stupid suggestion on my part. Sorry.

> I guess that this is a fallout from commit 6eb045e092ef ("scsi:
>   core: avoid host-wide host_busy counter for scsi_mq"). But that
> commit 
> is correct.

It would be good if someone (Paul?) could verify whether that commit
actually caused the regression they saw.

Looking at that 6eb045e092ef, I notice this hunk:

 
-       busy = atomic_inc_return(&shost->host_busy) - 1;
        if (atomic_read(&shost->host_blocked) > 0) {
-               if (busy)
+               if (scsi_host_busy(shost) > 0)
                        goto starved;

Before 6eb045e092ef, the busy count was incremented with membarrier
before looking at "host_blocked". The new code does this instead:

@ -1403,6 +1400,8 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
                spin_unlock_irq(shost->host_lock);
        }
 
+       __set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
+

but it happens *after* the "host_blocked" check. Could that perhaps
have caused the regression?

Thanks
Martin


  reply	other threads:[~2021-01-19 23:35 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10 20:34 [PATCH V3 00/25] smartpqi updates Don Brace
2020-12-10 20:34 ` [PATCH V3 01/25] smartpqi: add support for product id Don Brace
2021-01-07 16:43   ` Martin Wilck
2020-12-10 20:34 ` [PATCH V3 02/25] smartpqi: refactor aio submission code Don Brace
2021-01-07 16:43   ` Martin Wilck
2020-12-10 20:34 ` [PATCH V3 03/25] smartpqi: refactor build sg list code Don Brace
2021-01-07 16:43   ` Martin Wilck
2020-12-10 20:34 ` [PATCH V3 04/25] smartpqi: add support for raid5 and raid6 writes Don Brace
2021-01-07 16:44   ` Martin Wilck
2021-01-08 22:56     ` Don.Brace
2021-01-13 10:26       ` Martin Wilck
2020-12-10 20:34 ` [PATCH V3 05/25] smartpqi: add support for raid1 writes Don Brace
2021-01-07 16:44   ` Martin Wilck
2021-01-09 16:56     ` Don.Brace
2020-12-10 20:34 ` [PATCH V3 06/25] smartpqi: add support for BMIC sense feature cmd and feature bits Don Brace
2021-01-07 16:44   ` Martin Wilck
2021-01-11 17:22     ` Don.Brace
2021-01-22 16:45     ` Don.Brace
2021-01-22 19:04       ` Martin Wilck
2020-12-10 20:35 ` [PATCH V3 07/25] smartpqi: update AIO Sub Page 0x02 support Don Brace
2021-01-07 16:44   ` Martin Wilck
2021-01-11 20:53     ` Don.Brace
2020-12-10 20:35 ` [PATCH V3 08/25] smartpqi: add support for long firmware version Don Brace
2021-01-07 16:45   ` Martin Wilck
2021-01-11 22:25     ` Don.Brace
2021-01-22 20:01     ` Don.Brace
2020-12-10 20:35 ` [PATCH V3 09/25] smartpqi: align code with oob driver Don Brace
2021-01-08  0:13   ` Martin Wilck
2020-12-10 20:35 ` [PATCH V3 10/25] smartpqi: add stream detection Don Brace
2021-01-08  0:14   ` Martin Wilck
2021-01-15 21:58     ` Don.Brace
2020-12-10 20:35 ` [PATCH V3 11/25] smartpqi: add host level stream detection enable Don Brace
2021-01-08  0:13   ` Martin Wilck
2021-01-12 20:28     ` Don.Brace
2020-12-10 20:35 ` [PATCH V3 12/25] smartpqi: enable support for NVMe encryption Don Brace
2021-01-08  0:14   ` Martin Wilck
2020-12-10 20:35 ` [PATCH V3 13/25] smartpqi: disable write_same for nvme hba disks Don Brace
2021-01-08  0:13   ` Martin Wilck
2020-12-10 20:35 ` [PATCH V3 14/25] smartpqi: fix driver synchronization issues Don Brace
2021-01-07 23:32   ` Martin Wilck
2021-01-08  4:13     ` Martin K. Petersen
2021-01-15 21:13     ` Don.Brace
2021-01-27 23:01     ` Don.Brace
     [not found]       ` <c1e6b199f5ccda5ccec5223dfcbd1fba22171c86.camel@suse.com>
2021-02-01 22:47         ` Don.Brace
2020-12-10 20:35 ` [PATCH V3 15/25] smartpqi: fix host qdepth limit Don Brace
2020-12-14 17:54   ` Paul Menzel
2020-12-15 20:23     ` Don.Brace
2021-01-07 23:43       ` Martin Wilck
2021-01-15 21:17         ` Don.Brace
2021-01-19 10:33           ` John Garry
2021-01-19 14:12             ` Martin Wilck [this message]
2021-01-19 17:43               ` Paul Menzel
2021-01-20 16:42               ` Donald Buczek
2021-01-20 17:03                 ` Don.Brace
2021-01-20 18:35                 ` Martin Wilck
2021-02-10 15:27             ` Don.Brace
2021-02-10 15:42               ` John Garry
2021-02-10 16:29                 ` Don.Brace
2021-03-29 21:15                   ` Paul Menzel
2021-03-29 21:16                     ` Paul Menzel
2021-03-30 14:37                       ` Donald Buczek
2020-12-10 20:35 ` [PATCH V3 16/25] smartpqi: convert snprintf to scnprintf Don Brace
2021-01-07 23:51   ` Martin Wilck
2020-12-10 20:35 ` [PATCH V3 17/25] smartpqi: change timing of release of QRM memory during OFA Don Brace
2021-01-08  0:14   ` Martin Wilck
2021-01-27 17:46     ` Don.Brace
2020-12-10 20:36 ` [PATCH V3 18/25] smartpqi: return busy indication for IOCTLs when ofa is active Don Brace
2020-12-10 20:36 ` [PATCH V3 19/25] smartpqi: add phy id support for the physical drives Don Brace
2021-01-08  0:03   ` Martin Wilck
2020-12-10 20:36 ` [PATCH V3 20/25] smartpqi: update sas initiator_port_protocols and target_port_protocols Don Brace
2021-01-08  0:12   ` Martin Wilck
2020-12-10 20:36 ` [PATCH V3 21/25] smartpqi: add additional logging for LUN resets Don Brace
2021-01-08  0:27   ` Martin Wilck
2021-01-25 17:09     ` Don.Brace
2020-12-10 20:36 ` [PATCH V3 22/25] smartpqi: update enclosure identifier in sysf Don Brace
2021-01-08  0:30   ` Martin Wilck
2021-01-25 17:13     ` Don.Brace
2021-01-25 19:44       ` Martin Wilck
2021-01-25 20:36         ` Don.Brace
2020-12-10 20:36 ` [PATCH V3 23/25] smartpqi: correct system hangs when resuming from hibernation Don Brace
2021-01-08  0:34   ` Martin Wilck
2021-01-27 17:39     ` Don.Brace
2021-01-27 17:45       ` Martin Wilck
2020-12-10 20:36 ` [PATCH V3 24/25] smartpqi: add new pci ids Don Brace
2021-01-08  0:35   ` Martin Wilck
2020-12-10 20:36 ` [PATCH V3 25/25] smartpqi: update version to 2.1.6-005 Don Brace
2020-12-21 14:31 ` [PATCH V3 00/25] smartpqi updates Donald Buczek
     [not found]   ` <SN6PR11MB2848D8C9DF9856A2B7AA69ACE1C00@SN6PR11MB2848.namprd11.prod.outlook.com>
2020-12-22 13:13     ` Donald Buczek
2020-12-28 15:57       ` Don.Brace
2020-12-28 19:25         ` Don.Brace
2020-12-28 22:36           ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4555695d649afada5d4358485f0a146aa0848f65.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=Don.Brace@microchip.com \
    --cc=Gerry.Morong@microchip.com \
    --cc=Justin.Lindley@microchip.com \
    --cc=Kevin.Barnett@microchip.com \
    --cc=Mahesh.Rajashekhara@microchip.com \
    --cc=POSWALD@suse.com \
    --cc=Scott.Benesh@microchip.com \
    --cc=Scott.Teel@microchip.com \
    --cc=buczek@molgen.mpg.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=it+linux-scsi@molgen.mpg.de \
    --cc=jejb@linux.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=joseph.szczypek@hpe.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.