All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergei Trofimovich <slyich@gmail.com>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Don Brace <don.brace@microchip.com>,
	storagedev@microchip.com, linux-scsi@vger.kernel.org
Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org,
	Joe Szczypek <jszczype@redhat.com>,
	Scott Benesh <scott.benesh@microchip.com>,
	Scott Teel <scott.teel@microchip.com>,
	Tomas Henzl <thenzl@redhat.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600
Date: Wed, 3 Mar 2021 08:55:33 +0000	[thread overview]
Message-ID: <20210303085533.505b1590@sf> (raw)
In-Reply-To: <20210303002236.2f4ec01f@sf>

On Wed, 3 Mar 2021 00:22:36 +0000
Sergei Trofimovich <slyich@gmail.com> wrote:

> On Tue, 2 Mar 2021 23:31:32 +0100
> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> 
> > Hi Sergei!
> > 
> > On 3/2/21 11:26 PM, Sergei Trofimovich wrote:  
> > > Gave v5.12-rc1 a try today and got a similar boot failure around
> > > hpsa queue initialization, but my failure is later:
> > >     https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> > > Maybe I get different error because I flipped on most debugging
> > > kernel options :)
> > > 
> > > Looks like 'ERROR: Invalid distance value range' while being
> > > very scary are harmless. It's just a new spammy way for kernel
> > > to report lack of NUMA config on the machine (no SRAT and SLIT
> > > ACPI tables).
> > > 
> > > At least I get hpsa detected on PCI bus. But I guess it's discovered
> > > configuration is very wrong as I get unaligned accesses:
> > >     [   19.811570] kernel unaligned access to 0xe000000105dd8295, ip=0xa000000100b874d1
> > > 
> > > Bisecting now.    
> > 
> > Sounds good. I guess we should get Jens' fix for the signal regression
> > merged as well as your two fixes for strace.  
> 
> "bisected" (cheated halfway through) and verified that reverting
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again.
> 
> CCing authors who might be able to help us here.
> 
> commit f749d8b7a9896bc6e5ffe104cc64345037e0b152
> Author: Don Brace <don.brace@microchip.com>
> Date:   Mon Feb 15 16:26:57 2021 -0600
> 
>     scsi: hpsa: Correct dev cmds outstanding for retried cmds
> 
>     Prevent incrementing device->commands_outstanding for ioaccel command
>     retries that are driver initiated.  If the command goes through the retry
>     path, the device->commands_outstanding counter has already accounted for
>     the number of commands outstanding to the device.  Only commands going
>     through function hpsa_cmd_resolve_events decrement this counter.
> 
>      - ioaccel commands go to either HBA disks or to logical volumes comprised
>        of SSDs.
> 
>     The extra increment is causing device resets to hang.
> 
>      - Resets wait for all device outstanding commands to complete before
>        returning.
> 
>     Replace unused field abort_pending with retry_pending. This is a
>     maintenance driver so these changes have the least impact/risk.
> 
>     Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
>     Tested-by: Joe Szczypek <jszczype@redhat.com>
>     Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
>     Reviewed-by: Scott Teel <scott.teel@microchip.com>
>     Reviewed-by: Tomas Henzl <thenzl@redhat.com>
>     Signed-off-by: Don Brace <don.brace@microchip.com>
>     Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> Don, do you happen to know why this patch caused some controller init failure
> for device
>     14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600
> ?
> 
> Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good
> 
> The difference between the two boots is 
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1
> in -good case.
> 
> Looks like hpsa controller fails to initialize in bad case (could be a race?).

Also CCing hpsa maintainer mailing lists.

Looking more into the suspect commit
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f749d8b7a9896bc6e5ffe104cc64345037e0b152
it roughly does the:

@@ -448,7 +448,7 @@ struct CommandList {
 	 */
 	struct hpsa_scsi_dev_t *phys_disk;
 
-	int abort_pending;
+	bool retry_pending;
 	struct hpsa_scsi_dev_t *device;
 	atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
 } __aligned(COMMANDLIST_ALIGNMENT);
...
@@ -1151,7 +1151,10 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 {
        dial_down_lockup_detection_during_fw_flash(h, c);
        atomic_inc(&h->commands_outstanding);
-       if (c->device)
+       /*
+        * Check to see if the command is being retried.
+        */
+       if (c->device && !c->retry_pending)
                atomic_inc(&c->device->commands_outstanding);

But I don't immediately see anything wrong with it.

-- 

  Sergei

WARNING: multiple messages have this Message-ID (diff)
From: Sergei Trofimovich <slyich@gmail.com>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Don Brace <don.brace@microchip.com>,
	storagedev@microchip.com, linux-scsi@vger.kernel.org
Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org,
	Joe Szczypek <jszczype@redhat.com>,
	Scott Benesh <scott.benesh@microchip.com>,
	Scott Teel <scott.teel@microchip.com>,
	Tomas Henzl <thenzl@redhat.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds"
Date: Wed, 03 Mar 2021 08:55:33 +0000	[thread overview]
Message-ID: <20210303085533.505b1590@sf> (raw)
In-Reply-To: <20210303002236.2f4ec01f@sf>

On Wed, 3 Mar 2021 00:22:36 +0000
Sergei Trofimovich <slyich@gmail.com> wrote:

> On Tue, 2 Mar 2021 23:31:32 +0100
> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> 
> > Hi Sergei!
> > 
> > On 3/2/21 11:26 PM, Sergei Trofimovich wrote:  
> > > Gave v5.12-rc1 a try today and got a similar boot failure around
> > > hpsa queue initialization, but my failure is later:
> > >     https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> > > Maybe I get different error because I flipped on most debugging
> > > kernel options :)
> > > 
> > > Looks like 'ERROR: Invalid distance value range' while being
> > > very scary are harmless. It's just a new spammy way for kernel
> > > to report lack of NUMA config on the machine (no SRAT and SLIT
> > > ACPI tables).
> > > 
> > > At least I get hpsa detected on PCI bus. But I guess it's discovered
> > > configuration is very wrong as I get unaligned accesses:
> > >     [   19.811570] kernel unaligned access to 0xe000000105dd8295, ip=0xa000000100b874d1
> > > 
> > > Bisecting now.    
> > 
> > Sounds good. I guess we should get Jens' fix for the signal regression
> > merged as well as your two fixes for strace.  
> 
> "bisected" (cheated halfway through) and verified that reverting
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again.
> 
> CCing authors who might be able to help us here.
> 
> commit f749d8b7a9896bc6e5ffe104cc64345037e0b152
> Author: Don Brace <don.brace@microchip.com>
> Date:   Mon Feb 15 16:26:57 2021 -0600
> 
>     scsi: hpsa: Correct dev cmds outstanding for retried cmds
> 
>     Prevent incrementing device->commands_outstanding for ioaccel command
>     retries that are driver initiated.  If the command goes through the retry
>     path, the device->commands_outstanding counter has already accounted for
>     the number of commands outstanding to the device.  Only commands going
>     through function hpsa_cmd_resolve_events decrement this counter.
> 
>      - ioaccel commands go to either HBA disks or to logical volumes comprised
>        of SSDs.
> 
>     The extra increment is causing device resets to hang.
> 
>      - Resets wait for all device outstanding commands to complete before
>        returning.
> 
>     Replace unused field abort_pending with retry_pending. This is a
>     maintenance driver so these changes have the least impact/risk.
> 
>     Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
>     Tested-by: Joe Szczypek <jszczype@redhat.com>
>     Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
>     Reviewed-by: Scott Teel <scott.teel@microchip.com>
>     Reviewed-by: Tomas Henzl <thenzl@redhat.com>
>     Signed-off-by: Don Brace <don.brace@microchip.com>
>     Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> Don, do you happen to know why this patch caused some controller init failure
> for device
>     14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600
> ?
> 
> Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good
> 
> The difference between the two boots is 
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1
> in -good case.
> 
> Looks like hpsa controller fails to initialize in bad case (could be a race?).

Also CCing hpsa maintainer mailing lists.

Looking more into the suspect commit
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id÷49d8b7a9896bc6e5ffe104cc64345037e0b152
it roughly does the:

@@ -448,7 +448,7 @@ struct CommandList {
 	 */
 	struct hpsa_scsi_dev_t *phys_disk;
 
-	int abort_pending;
+	bool retry_pending;
 	struct hpsa_scsi_dev_t *device;
 	atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
 } __aligned(COMMANDLIST_ALIGNMENT);
...
@@ -1151,7 +1151,10 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 {
        dial_down_lockup_detection_during_fw_flash(h, c);
        atomic_inc(&h->commands_outstanding);
-       if (c->device)
+       /*
+        * Check to see if the command is being retried.
+        */
+       if (c->device && !c->retry_pending)
                atomic_inc(&c->device->commands_outstanding);

But I don't immediately see anything wrong with it.

-- 

  Sergei

  reply	other threads:[~2021-03-03 15:05 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210222230519.73f3e239@sf>
2021-02-22 23:34 ` 5.11 regression: "ia64: add support for TIF_NOTIFY_SIGNAL" breaks ia64 boot Jens Axboe
2021-02-22 23:34   ` Jens Axboe
2021-02-22 23:55   ` John Paul Adrian Glaubitz
2021-02-22 23:55     ` John Paul Adrian Glaubitz
     [not found]     ` <20210223083507.43b5a6dd@sf>
2021-02-23  9:13       ` John Paul Adrian Glaubitz
2021-02-23 12:36         ` John Paul Adrian Glaubitz
2021-02-23 12:36           ` John Paul Adrian Glaubitz
     [not found]           ` <20210223192743.0198d4a9@sf>
     [not found]             ` <20210302222630.5056f243@sf>
2021-03-02 22:31               ` John Paul Adrian Glaubitz
2021-03-02 22:31                 ` John Paul Adrian Glaubitz
2021-03-03  0:22                 ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Sergei Trofimovich
2021-03-03  0:22                   ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" bre Sergei Trofimovich
2021-03-03  8:55                   ` Sergei Trofimovich [this message]
2021-03-03  8:55                     ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Sergei Trofimovich
2021-03-03 17:33                     ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Don.Brace
2021-03-03 17:33                       ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Don.Brace
     [not found]                       ` <20210303220401.501449e5@sf>
2021-03-04 17:00                         ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Don.Brace
2021-03-04 17:00                           ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Don.Brace
2021-03-05 13:26                           ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Tomas Henzl
2021-03-05 13:26                             ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Tomas Henzl
2021-03-12 22:27                             ` [PATCH] hpsa: fix boot on ia64 (atomic_t alignment) Sergei Trofimovich
2021-03-12 22:27                               ` Sergei Trofimovich
2021-03-16 16:30                               ` Don.Brace
2021-03-16 16:30                                 ` Don.Brace
2021-03-16 18:28                                 ` Arnd Bergmann
2021-03-16 18:28                                   ` Arnd Bergmann
2021-03-17  2:25                                   ` Martin K. Petersen
2021-03-17  2:25                                     ` Martin K. Petersen
2021-03-17 13:19                                     ` David Laight
2021-03-17 19:06                                       ` Don.Brace
2021-03-17 19:06                                         ` Don.Brace
2021-03-17 17:28                               ` John Paul Adrian Glaubitz
2021-03-17 17:28                                 ` John Paul Adrian Glaubitz
2021-03-27 10:24                                 ` Sergei Trofimovich
2021-03-24  7:08                               ` John Paul Adrian Glaubitz
2021-03-24  7:08                                 ` John Paul Adrian Glaubitz
2021-03-24 18:37                               ` Don.Brace
2021-03-24 18:37                                 ` Don.Brace
2021-03-29 11:25                                 ` John Paul Adrian Glaubitz
2021-03-29 11:25                                   ` John Paul Adrian Glaubitz
2021-03-29 14:22                                   ` Arnd Bergmann
2021-03-29 14:22                                     ` Arnd Bergmann
2021-03-30  3:02                                     ` Martin K. Petersen
2021-03-30  3:02                                       ` Martin K. Petersen
2021-03-30  7:19                                       ` [PATCH v2 1/3] hpsa: use __packed on individual structs, not header-wide Sergei Trofimovich
2021-03-30  7:19                                         ` Sergei Trofimovich
2021-03-30  7:19                                         ` [PATCH v2 2/3] hpsa: fix boot on ia64 (atomic_t alignment) Sergei Trofimovich
2021-03-30  7:19                                           ` Sergei Trofimovich
2021-03-30  7:19                                         ` [PATCH v2 3/3] hpsa: add an assert to prevent from __packed reintroduction Sergei Trofimovich
2021-03-30  7:19                                           ` Sergei Trofimovich
2021-03-30  7:34                                           ` Arnd Bergmann
2021-03-30  7:34                                             ` Arnd Bergmann
2021-04-02 14:40                                             ` Elliott, Robert (Servers)
2021-04-02 14:40                                               ` Elliott, Robert (Servers)
2021-04-03 14:51                                               ` Sergei Trofimovich
2021-04-03 14:51                                                 ` Sergei Trofimovich
2021-03-30  7:30                                         ` [PATCH v2 1/3] hpsa: use __packed on individual structs, not header-wide Arnd Bergmann
2021-03-30  7:30                                           ` Arnd Bergmann
2021-03-30  7:43                                           ` Arnd Bergmann
2021-03-30  7:43                                             ` Arnd Bergmann
2021-04-02  3:54                                         ` Martin K. Petersen
2021-04-02  3:54                                           ` Martin K. Petersen
2021-04-15 18:41                                           ` Don.Brace
2021-04-15 18:41                                             ` Don.Brace
2021-03-05  9:22                       ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Geert Uytterhoeven
2021-03-05  9:22                         ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Geert Uytterhoeven
2021-03-05 13:31                         ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Arnd Bergmann
2021-03-05 13:31                           ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Arnd Bergmann
2021-03-05 20:45                           ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Don.Brace
2021-03-05 20:45                             ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Don.Brace
2021-03-03 15:42                   ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Don.Brace
2021-03-03 15:42                     ` [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Don.Brace
2021-03-17 17:42             ` 5.11 regression: "ia64: add support for TIF_NOTIFY_SIGNAL" breaks ia64 boot John Paul Adrian Glaubitz
2021-03-17 17:42               ` John Paul Adrian Glaubitz
2021-03-17 17:53               ` John Paul Adrian Glaubitz
2021-03-17 17:53                 ` John Paul Adrian Glaubitz
     [not found]   ` <20210222235359.75d1a912@sf>
2021-02-23  0:34     ` Jens Axboe
2021-02-23  0:34       ` Jens Axboe
2021-02-23  0:41       ` Jens Axboe
2021-02-23  0:41         ` Jens Axboe
2021-02-23  0:43         ` Jens Axboe
2021-02-23  0:43           ` Jens Axboe
     [not found]           ` <20210223080830.23bccdbf@sf>
2021-03-02 22:07             ` Sergei Trofimovich
2021-03-02 22:07               ` Sergei Trofimovich
2021-03-02 22:31               ` Jens Axboe
2021-03-02 22:31                 ` Jens Axboe
     [not found]                 ` <20210302232716.353ed49b@sf>
2021-03-03  0:34                   ` Jens Axboe
2021-03-03  0:34                     ` Jens Axboe
2021-03-03  3:51                     ` Jens Axboe
2021-03-03  3:51                       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210303085533.505b1590@sf \
    --to=slyich@gmail.com \
    --cc=don.brace@microchip.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=jszczype@redhat.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=scott.benesh@microchip.com \
    --cc=scott.teel@microchip.com \
    --cc=storagedev@microchip.com \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.