linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
@ 2019-06-27 16:14 Konstantin Khorenko
  2019-06-27 16:14 ` [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Khorenko @ 2019-06-27 16:14 UTC (permalink / raw)
  To: Adaptec OEM Raid Solutions, Prasad B Munirathnam,
	Raghava Aditya Renukunta
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	James E . J . Bottomley, Martin K . Petersen

Problem description:
====================
A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204]
A lot of disks attached to the controller.
Simple test: running mkfs.ext4 on many disks on the same controller in
parallel
(mkfs is not important here, any serious io load triggers controller aborts)

Results:
* no problems (controller resets) with kernels prior to
  395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

* latest ms kernel v5.2-rc6-15-g249155c20f9b - mkfs processes are in D state,
  lot of complains in logs like:

  [  654.894633] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,43,0):
  [  699.441034] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,40,0):
  [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
  [  714.457428] aacraid: Host adapter reset request. SCSI hang ?
  ...
  [  759.514759] aacraid: Host adapter reset request. SCSI hang ?
  [  759.514869] aacraid 0000:03:00.0: outstanding cmd: midlevel-0
  [  759.514870] aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
  [  759.514872] aacraid 0000:03:00.0: outstanding cmd: error handler-498
  [  759.514873] aacraid 0000:03:00.0: outstanding cmd: firmware-471
  [  759.514875] aacraid 0000:03:00.0: outstanding cmd: kernel-60
  [  759.514912] aacraid 0000:03:00.0: Controller reset type is 3
  [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
  [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded

Same complains on Ubuntu kernel 4.15.0-50-generic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586

Controller:
===========
03:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
         Subsystem: Adaptec Series 6 - ASR-6405 - 4 internal 6G SAS ports

Test:
=====
# cat dev.list
/dev/sdq1
/dev/sde1
/dev/sds1
/dev/sdb1
/dev/sdk1
/dev/sdaj1
/dev/sdaf1
/dev/sdd1
/dev/sdac1
/dev/sdai1
/dev/sdz1
/dev/sdj1
/dev/sdy1
/dev/sdn1
/dev/sdae1
/dev/sdg1
/dev/sdi1
/dev/sdc1
/dev/sdf1
/dev/sdl1
/dev/sda1
/dev/sdab1
/dev/sdr1
/dev/sdo1
/dev/sdah1
/dev/sdm1
/dev/sdt1
/dev/sdp1
/dev/sdad1
/dev/sdh1

===========================================
# cat run_mkfs.sh
#!/bin/bash

while read i; do
   mkfs.ext4 $i -q -E lazy_itable_init=1 -O uninit_bg -m 0 &
done

=================================
# cat dev.list | ./run_mkfs.sh

The issue is 100% reproducible.

i've bisected to the culprit patch, it's
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

it changes arc ctrl checks for Series-6 controllers
and i've checked that resurrection of original logic in arc ctrl checks
eliminates controller hangs/resets.


Konstantin Khorenko (1):
  scsi: aacraid: resurrect correct arc ctrl checks for Series-6

 drivers/scsi/aacraid/aacraid.h  | 11 -----------
 drivers/scsi/aacraid/comminit.c | 14 ++++++++++----
 drivers/scsi/aacraid/commsup.c  |  4 +++-
 drivers/scsi/aacraid/linit.c    |  7 +++++--
 4 files changed, 18 insertions(+), 18 deletions(-)

-- 
2.15.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6
  2019-06-27 16:14 [PATCH 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
@ 2019-06-27 16:14 ` Konstantin Khorenko
  2019-07-07 10:09   ` Andrey Jr. Melnikov
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Khorenko @ 2019-06-27 16:14 UTC (permalink / raw)
  To: Adaptec OEM Raid Solutions, Prasad B Munirathnam,
	Raghava Aditya Renukunta
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	James E . J . Bottomley, Martin K . Petersen

This partially reverts ms commit
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

The patch above not only drops Series-9 cards checks but also
changes logic for Series-6 controllers which leads to controller
hangs/resets under high io load.

So revert to original arc ctrl checks for Series-6 controllers.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
https://bugzilla.redhat.com/show_bug.cgi?id=1724077
https://jira.sw.ru/browse/PSBM-95736

Fixes: 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
Cc: stable@vger.kernel.org

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 drivers/scsi/aacraid/aacraid.h  | 11 -----------
 drivers/scsi/aacraid/comminit.c | 14 ++++++++++----
 drivers/scsi/aacraid/commsup.c  |  4 +++-
 drivers/scsi/aacraid/linit.c    |  7 +++++--
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 3fa03230f6ba..b674fb645523 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2729,17 +2729,6 @@ int _aac_rx_init(struct aac_dev *dev);
 int aac_rx_select_comm(struct aac_dev *dev, int comm);
 int aac_rx_deliver_producer(struct fib * fib);
 
-static inline int aac_is_src(struct aac_dev *dev)
-{
-	u16 device = dev->pdev->device;
-
-	if (device == PMC_DEVICE_S6 ||
-		device == PMC_DEVICE_S7 ||
-		device == PMC_DEVICE_S8)
-		return 1;
-	return 0;
-}
-
 static inline int aac_supports_2T(struct aac_dev *dev)
 {
 	return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index d4fcfa1e54e0..b8046b6c1239 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -41,7 +41,9 @@ static inline int aac_is_msix_mode(struct aac_dev *dev)
 {
 	u32 status = 0;
 
-	if (aac_is_src(dev))
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+	    dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8)
 		status = src_readl(dev, MUnit.OMR);
 	return (status & AAC_INT_MODE_MSIX);
 }
@@ -349,7 +351,8 @@ int aac_send_shutdown(struct aac_dev * dev)
 	/* FIB should be freed only after getting the response from the F/W */
 	if (status != -ERESTARTSYS)
 		aac_fib_free(fibctx);
-	if (aac_is_src(dev) &&
+	if ((dev->pdev->device == PMC_DEVICE_S7 ||
+	     dev->pdev->device == PMC_DEVICE_S8) &&
 	     dev->msi_enabled)
 		aac_set_intx_mode(dev);
 	return status;
@@ -605,7 +608,8 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 		dev->max_fib_size = status[1] & 0xFFE0;
 		host->sg_tablesize = status[2] >> 16;
 		dev->sg_tablesize = status[2] & 0xFFFF;
-		if (aac_is_src(dev)) {
+		if (dev->pdev->device == PMC_DEVICE_S7 ||
+		    dev->pdev->device == PMC_DEVICE_S8) {
 			if (host->can_queue > (status[3] >> 16) -
 					AAC_NUM_MGT_FIB)
 				host->can_queue = (status[3] >> 16) -
@@ -624,7 +628,9 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 			pr_warn("numacb=%d ignored\n", numacb);
 	}
 
-	if (aac_is_src(dev))
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+	    dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8)
 		aac_define_int_mode(dev);
 	/*
 	 *	Ok now init the communication subsystem
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 2142a649e865..705e003caa95 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2574,7 +2574,9 @@ void aac_free_irq(struct aac_dev *dev)
 {
 	int i;
 
-	if (aac_is_src(dev)) {
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+	    dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8) {
 		if (dev->max_msix > 1) {
 			for (i = 0; i < dev->max_msix; i++)
 				free_irq(pci_irq_vector(dev->pdev, i),
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 644f7f5c61a2..3b7968b17169 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1560,7 +1560,9 @@ static void __aac_shutdown(struct aac_dev * aac)
 
 	aac_adapter_disable_int(aac);
 
-	if (aac_is_src(aac)) {
+	if (aac->pdev->device == PMC_DEVICE_S6 ||
+	    aac->pdev->device == PMC_DEVICE_S7 ||
+	    aac->pdev->device == PMC_DEVICE_S8) {
 		if (aac->max_msix > 1) {
 			for (i = 0; i < aac->max_msix; i++) {
 				free_irq(pci_irq_vector(aac->pdev, i),
@@ -1835,7 +1837,8 @@ static int aac_acquire_resources(struct aac_dev *dev)
 	aac_adapter_enable_int(dev);
 
 
-	if (aac_is_src(dev))
+	if (dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8)
 		aac_define_int_mode(dev);
 
 	if (dev->msi_enabled)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6
  2019-06-27 16:14 ` [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
@ 2019-07-07 10:09   ` Andrey Jr. Melnikov
  2019-07-07 23:49     ` Finn Thain
  0 siblings, 1 reply; 26+ messages in thread
From: Andrey Jr. Melnikov @ 2019-07-07 10:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

In gmane.linux.scsi Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> This partially reverts ms commit
> 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

> The patch above not only drops Series-9 cards checks but also
> changes logic for Series-6 controllers which leads to controller
> hangs/resets under high io load.

> So revert to original arc ctrl checks for Series-6 controllers.

> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
> https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> https://jira.sw.ru/browse/PSBM-95736

> Fixes: 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
> Cc: stable@vger.kernel.org

> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> ---
>  drivers/scsi/aacraid/aacraid.h  | 11 -----------
>  drivers/scsi/aacraid/comminit.c | 14 ++++++++++----
>  drivers/scsi/aacraid/commsup.c  |  4 +++-
>  drivers/scsi/aacraid/linit.c    |  7 +++++--
>  4 files changed, 18 insertions(+), 18 deletions(-)

> diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
> index 3fa03230f6ba..b674fb645523 100644
> --- a/drivers/scsi/aacraid/aacraid.h
> +++ b/drivers/scsi/aacraid/aacraid.h
> @@ -2729,17 +2729,6 @@ int _aac_rx_init(struct aac_dev *dev);
>  int aac_rx_select_comm(struct aac_dev *dev, int comm);
>  int aac_rx_deliver_producer(struct fib * fib);
>  
> -static inline int aac_is_src(struct aac_dev *dev)
> -{
> -       u16 device = dev->pdev->device;
> -
> -       if (device == PMC_DEVICE_S6 ||
> -               device == PMC_DEVICE_S7 ||
> -               device == PMC_DEVICE_S8)
> -               return 1;
> -       return 0;
> -}
> -

Why remove helper?

>  static inline int aac_supports_2T(struct aac_dev *dev)
>  {
>         return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
> diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
> index d4fcfa1e54e0..b8046b6c1239 100644
> --- a/drivers/scsi/aacraid/comminit.c
> +++ b/drivers/scsi/aacraid/comminit.c
> @@ -41,7 +41,9 @@ static inline int aac_is_msix_mode(struct aac_dev *dev)
>  {
>         u32 status = 0;
>  
> -       if (aac_is_src(dev))
> +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> +           dev->pdev->device == PMC_DEVICE_S7 ||
> +           dev->pdev->device == PMC_DEVICE_S8)
>                 status = src_readl(dev, MUnit.OMR);
>         return (status & AAC_INT_MODE_MSIX);
>  }
> @@ -349,7 +351,8 @@ int aac_send_shutdown(struct aac_dev * dev)
>         /* FIB should be freed only after getting the response from the F/W */
>         if (status != -ERESTARTSYS)
>                 aac_fib_free(fibctx);
Fix this
> -       if (aac_is_src(dev) &&
> +       if ((dev->pdev->device == PMC_DEVICE_S7 ||
> +            dev->pdev->device == PMC_DEVICE_S8) &&
>              dev->msi_enabled)
>                 aac_set_intx_mode(dev);
>         return status;
> @@ -605,7 +608,8 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
>                 dev->max_fib_size = status[1] & 0xFFE0;
>                 host->sg_tablesize = status[2] >> 16;
>                 dev->sg_tablesize = status[2] & 0xFFFF;
this one
> -               if (aac_is_src(dev)) {
> +               if (dev->pdev->device == PMC_DEVICE_S7 ||
> +                   dev->pdev->device == PMC_DEVICE_S8) {
>                         if (host->can_queue > (status[3] >> 16) -
>                                         AAC_NUM_MGT_FIB)
>                                 host->can_queue = (status[3] >> 16) -
> @@ -624,7 +628,9 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
>                         pr_warn("numacb=%d ignored\n", numacb);
>         }
>  
> -       if (aac_is_src(dev))
> +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> +           dev->pdev->device == PMC_DEVICE_S7 ||
> +           dev->pdev->device == PMC_DEVICE_S8)
>                 aac_define_int_mode(dev);
>         /*
>          *      Ok now init the communication subsystem
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index 2142a649e865..705e003caa95 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -2574,7 +2574,9 @@ void aac_free_irq(struct aac_dev *dev)
>  {
>         int i;
>  
> -       if (aac_is_src(dev)) {
> +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> +           dev->pdev->device == PMC_DEVICE_S7 ||
> +           dev->pdev->device == PMC_DEVICE_S8) {
>                 if (dev->max_msix > 1) {
>                         for (i = 0; i < dev->max_msix; i++)
>                                 free_irq(pci_irq_vector(dev->pdev, i),
> diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> index 644f7f5c61a2..3b7968b17169 100644
> --- a/drivers/scsi/aacraid/linit.c
> +++ b/drivers/scsi/aacraid/linit.c
> @@ -1560,7 +1560,9 @@ static void __aac_shutdown(struct aac_dev * aac)
>  
>         aac_adapter_disable_int(aac);
>  
> -       if (aac_is_src(aac)) {
> +       if (aac->pdev->device == PMC_DEVICE_S6 ||
> +           aac->pdev->device == PMC_DEVICE_S7 ||
> +           aac->pdev->device == PMC_DEVICE_S8) {
>                 if (aac->max_msix > 1) {
>                         for (i = 0; i < aac->max_msix; i++) {
>                                 free_irq(pci_irq_vector(aac->pdev, i),
> @@ -1835,7 +1837,8 @@ static int aac_acquire_resources(struct aac_dev *dev)
>         aac_adapter_enable_int(dev);
>  
>  
and this.
> -       if (aac_is_src(dev))
> +       if (dev->pdev->device == PMC_DEVICE_S7 ||
> +           dev->pdev->device == PMC_DEVICE_S8)
>                 aac_define_int_mode(dev);
>  
>         if (dev->msi_enabled)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6
  2019-07-07 10:09   ` Andrey Jr. Melnikov
@ 2019-07-07 23:49     ` Finn Thain
  2019-07-10  9:24       ` Konstantin Khorenko
  0 siblings, 1 reply; 26+ messages in thread
From: Finn Thain @ 2019-07-07 23:49 UTC (permalink / raw)
  To: Andrey Jr. Melnikov, Konstantin Khorenko, Raghava Aditya Renukunta
  Cc: linux-scsi, linux-kernel

Andrey,

It is helpful to send your review to the patch author. I've added 
Konstantin to the Cc list, as well as Raghava (who introduced the 
regression addressed by Konstantin's patch).

If I'm not mistaken, your review misunderstands the patch description.

FWIW, Konstantin's patch might have been easier to follow if it was a 
simple 'git revert'.

-- 

On Sun, 7 Jul 2019, Andrey Jr. Melnikov wrote:

> In gmane.linux.scsi Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > This partially reverts ms commit
> > 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
> 
> > The patch above not only drops Series-9 cards checks but also
> > changes logic for Series-6 controllers which leads to controller
> > hangs/resets under high io load.
> 
> > So revert to original arc ctrl checks for Series-6 controllers.
> 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
> > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > https://jira.sw.ru/browse/PSBM-95736
> 
> > Fixes: 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
> > Cc: stable@vger.kernel.org
> 
> > Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> > ---
> >  drivers/scsi/aacraid/aacraid.h  | 11 -----------
> >  drivers/scsi/aacraid/comminit.c | 14 ++++++++++----
> >  drivers/scsi/aacraid/commsup.c  |  4 +++-
> >  drivers/scsi/aacraid/linit.c    |  7 +++++--
> >  4 files changed, 18 insertions(+), 18 deletions(-)
> 
> > diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
> > index 3fa03230f6ba..b674fb645523 100644
> > --- a/drivers/scsi/aacraid/aacraid.h
> > +++ b/drivers/scsi/aacraid/aacraid.h
> > @@ -2729,17 +2729,6 @@ int _aac_rx_init(struct aac_dev *dev);
> >  int aac_rx_select_comm(struct aac_dev *dev, int comm);
> >  int aac_rx_deliver_producer(struct fib * fib);
> >  
> > -static inline int aac_is_src(struct aac_dev *dev)
> > -{
> > -       u16 device = dev->pdev->device;
> > -
> > -       if (device == PMC_DEVICE_S6 ||
> > -               device == PMC_DEVICE_S7 ||
> > -               device == PMC_DEVICE_S8)
> > -               return 1;
> > -       return 0;
> > -}
> > -
> 
> Why remove helper?
> 
> >  static inline int aac_supports_2T(struct aac_dev *dev)
> >  {
> >         return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
> > diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
> > index d4fcfa1e54e0..b8046b6c1239 100644
> > --- a/drivers/scsi/aacraid/comminit.c
> > +++ b/drivers/scsi/aacraid/comminit.c
> > @@ -41,7 +41,9 @@ static inline int aac_is_msix_mode(struct aac_dev *dev)
> >  {
> >         u32 status = 0;
> >  
> > -       if (aac_is_src(dev))
> > +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> > +           dev->pdev->device == PMC_DEVICE_S7 ||
> > +           dev->pdev->device == PMC_DEVICE_S8)
> >                 status = src_readl(dev, MUnit.OMR);
> >         return (status & AAC_INT_MODE_MSIX);
> >  }
> > @@ -349,7 +351,8 @@ int aac_send_shutdown(struct aac_dev * dev)
> >         /* FIB should be freed only after getting the response from the F/W */
> >         if (status != -ERESTARTSYS)
> >                 aac_fib_free(fibctx);
> Fix this
> > -       if (aac_is_src(dev) &&
> > +       if ((dev->pdev->device == PMC_DEVICE_S7 ||
> > +            dev->pdev->device == PMC_DEVICE_S8) &&
> >              dev->msi_enabled)
> >                 aac_set_intx_mode(dev);
> >         return status;
> > @@ -605,7 +608,8 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
> >                 dev->max_fib_size = status[1] & 0xFFE0;
> >                 host->sg_tablesize = status[2] >> 16;
> >                 dev->sg_tablesize = status[2] & 0xFFFF;
> this one
> > -               if (aac_is_src(dev)) {
> > +               if (dev->pdev->device == PMC_DEVICE_S7 ||
> > +                   dev->pdev->device == PMC_DEVICE_S8) {
> >                         if (host->can_queue > (status[3] >> 16) -
> >                                         AAC_NUM_MGT_FIB)
> >                                 host->can_queue = (status[3] >> 16) -
> > @@ -624,7 +628,9 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
> >                         pr_warn("numacb=%d ignored\n", numacb);
> >         }
> >  
> > -       if (aac_is_src(dev))
> > +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> > +           dev->pdev->device == PMC_DEVICE_S7 ||
> > +           dev->pdev->device == PMC_DEVICE_S8)
> >                 aac_define_int_mode(dev);
> >         /*
> >          *      Ok now init the communication subsystem
> > diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> > index 2142a649e865..705e003caa95 100644
> > --- a/drivers/scsi/aacraid/commsup.c
> > +++ b/drivers/scsi/aacraid/commsup.c
> > @@ -2574,7 +2574,9 @@ void aac_free_irq(struct aac_dev *dev)
> >  {
> >         int i;
> >  
> > -       if (aac_is_src(dev)) {
> > +       if (dev->pdev->device == PMC_DEVICE_S6 ||
> > +           dev->pdev->device == PMC_DEVICE_S7 ||
> > +           dev->pdev->device == PMC_DEVICE_S8) {
> >                 if (dev->max_msix > 1) {
> >                         for (i = 0; i < dev->max_msix; i++)
> >                                 free_irq(pci_irq_vector(dev->pdev, i),
> > diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> > index 644f7f5c61a2..3b7968b17169 100644
> > --- a/drivers/scsi/aacraid/linit.c
> > +++ b/drivers/scsi/aacraid/linit.c
> > @@ -1560,7 +1560,9 @@ static void __aac_shutdown(struct aac_dev * aac)
> >  
> >         aac_adapter_disable_int(aac);
> >  
> > -       if (aac_is_src(aac)) {
> > +       if (aac->pdev->device == PMC_DEVICE_S6 ||
> > +           aac->pdev->device == PMC_DEVICE_S7 ||
> > +           aac->pdev->device == PMC_DEVICE_S8) {
> >                 if (aac->max_msix > 1) {
> >                         for (i = 0; i < aac->max_msix; i++) {
> >                                 free_irq(pci_irq_vector(aac->pdev, i),
> > @@ -1835,7 +1837,8 @@ static int aac_acquire_resources(struct aac_dev *dev)
> >         aac_adapter_enable_int(dev);
> >  
> >  
> and this.
> > -       if (aac_is_src(dev))
> > +       if (dev->pdev->device == PMC_DEVICE_S7 ||
> > +           dev->pdev->device == PMC_DEVICE_S8)
> >                 aac_define_int_mode(dev);
> >  
> >         if (dev->msi_enabled)
> 
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6
  2019-07-07 23:49     ` Finn Thain
@ 2019-07-10  9:24       ` Konstantin Khorenko
  2019-07-10  9:31         ` [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Khorenko @ 2019-07-10  9:24 UTC (permalink / raw)
  To: Finn Thain, Andrey Jr. Melnikov, Raghava Aditya Renukunta
  Cc: linux-scsi, linux-kernel

On 07/08/2019 02:49 AM, Finn Thain wrote:
> Andrey,
>
> It is helpful to send your review to the patch author. I've added
> Konstantin to the Cc list, as well as Raghava (who introduced the
> regression addressed by Konstantin's patch).
>
> If I'm not mistaken, your review misunderstands the patch description.
>
> FWIW, Konstantin's patch might have been easier to follow if it was a
> simple 'git revert'.

Hi Finn, Andrey,

Finn,
thank you for putting me back to the thread, appreciated.
And i agree with you, may be git revert followed by independent patch
which removes Series-9 mentions is easier to read, so sending the second
version - in that way.


Andrey,
please take a look at the new version patches, hope it's easier to understand.

And talking about the helper: i thought about leaving it, but we have several places which check for Series 7 and 8 only
and several places which check for Series 6,7,8, so either
- we need 2 helpers
- we have a helper to check for Series 7,8 only and in some places will have a check for Series 6 + helper
- introduce the helper with parameter

Honestly i don't like any of variants above, so just left the code without helper,
not that many checks and easier to read the code IMHO.

Thank you!

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2019-07-10  9:24       ` Konstantin Khorenko
@ 2019-07-10  9:31         ` Konstantin Khorenko
  2019-07-10  9:31           ` [PATCH v2 1/2] Revert "scsi: aacraid: Remove reference to Series-9" Konstantin Khorenko
  2019-07-10  9:31           ` [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only) Konstantin Khorenko
  0 siblings, 2 replies; 26+ messages in thread
From: Konstantin Khorenko @ 2019-07-10  9:31 UTC (permalink / raw)
  To: Prasad B Munirathnam, Raghava Aditya Renukunta, Finn Thain,
	Andrey Jr . Melnikov
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions

Problem description:
====================
A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204]
A lot of disks attached to the controller.
Simple test: running mkfs.ext4 on many disks on the same controller in
parallel
(mkfs is not important here, any serious io load triggers controller aborts)

Results:
* no problems (controller resets) with kernels prior to
  395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

* latest ms kernel v5.2-rc6-15-g249155c20f9b - mkfs processes are in D state,
  lot of complains in logs like:

  [  654.894633] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,43,0):
  [  699.441034] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,40,0):
  [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
  [  714.457428] aacraid: Host adapter reset request. SCSI hang ?
  ...
  [  759.514759] aacraid: Host adapter reset request. SCSI hang ?
  [  759.514869] aacraid 0000:03:00.0: outstanding cmd: midlevel-0
  [  759.514870] aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
  [  759.514872] aacraid 0000:03:00.0: outstanding cmd: error handler-498
  [  759.514873] aacraid 0000:03:00.0: outstanding cmd: firmware-471
  [  759.514875] aacraid 0000:03:00.0: outstanding cmd: kernel-60
  [  759.514912] aacraid 0000:03:00.0: Controller reset type is 3
  [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
  [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded

Same complains on Ubuntu kernel 4.15.0-50-generic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586

Controller:
===========
03:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
         Subsystem: Adaptec Series 6 - ASR-6405 - 4 internal 6G SAS ports

Test:
=====
# cat dev.list
/dev/sdq1
/dev/sde1
/dev/sds1
/dev/sdb1
/dev/sdk1
/dev/sdaj1
/dev/sdaf1
/dev/sdd1
/dev/sdac1
/dev/sdai1
/dev/sdz1
/dev/sdj1
/dev/sdy1
/dev/sdn1
/dev/sdae1
/dev/sdg1
/dev/sdi1
/dev/sdc1
/dev/sdf1
/dev/sdl1
/dev/sda1
/dev/sdab1
/dev/sdr1
/dev/sdo1
/dev/sdah1
/dev/sdm1
/dev/sdt1
/dev/sdp1
/dev/sdad1
/dev/sdh1

===========================================
# cat run_mkfs.sh
#!/bin/bash

while read i; do
   mkfs.ext4 $i -q -E lazy_itable_init=1 -O uninit_bg -m 0 &
done

=================================
# cat dev.list | ./run_mkfs.sh

The issue is 100% reproducible.

i've bisected to the culprit patch, it's
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

it changes arc ctrl checks for Series-6 controllers
and i've checked that resurrection of original logic in arc ctrl checks
eliminates controller hangs/resets.

Konstantin Khorenko (2):
  Revert "scsi: aacraid: Remove reference to Series-9"
  scsi: aacraid: Remove references to Series-9 (only)

 drivers/scsi/aacraid/aacraid.h  | 11 -----------
 drivers/scsi/aacraid/comminit.c | 15 +++++++++++----
 drivers/scsi/aacraid/commsup.c  |  4 +++-
 drivers/scsi/aacraid/linit.c    |  8 +++++---
 4 files changed, 19 insertions(+), 19 deletions(-)

-- 
2.15.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] Revert "scsi: aacraid: Remove reference to Series-9"
  2019-07-10  9:31         ` [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
@ 2019-07-10  9:31           ` Konstantin Khorenko
  2019-07-10  9:31           ` [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only) Konstantin Khorenko
  1 sibling, 0 replies; 26+ messages in thread
From: Konstantin Khorenko @ 2019-07-10  9:31 UTC (permalink / raw)
  To: Prasad B Munirathnam, Raghava Aditya Renukunta, Finn Thain,
	Andrey Jr . Melnikov
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions

This reverts commit 395e5df79a9588abf1099ea746f11872c9086252.

The patch being reverted not only drops Series-9 cards
checks but also changes logic for Series-6 controllers which
lead to controller hungs/resets under high io load.

So revert the original patch, references to Series 9 are to
be removed by next patch.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
https://bugzilla.redhat.com/show_bug.cgi?id=1724077
https://jira.sw.ru/browse/PSBM-95736

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 drivers/scsi/aacraid/aacraid.h  | 12 +-----------
 drivers/scsi/aacraid/comminit.c | 18 ++++++++++++++----
 drivers/scsi/aacraid/commsup.c  |  5 ++++-
 drivers/scsi/aacraid/linit.c    | 10 +++++++---
 4 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 3fa03230f6ba..aef47d0e718c 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -416,6 +416,7 @@ struct aac_ciss_identify_pd {
 #define PMC_DEVICE_S6	0x28b
 #define PMC_DEVICE_S7	0x28c
 #define PMC_DEVICE_S8	0x28d
+#define PMC_DEVICE_S9	0x28f
 
 #define aac_phys_to_logical(x)  ((x)+1)
 #define aac_logical_to_phys(x)  ((x)?(x)-1:0)
@@ -2729,17 +2730,6 @@ int _aac_rx_init(struct aac_dev *dev);
 int aac_rx_select_comm(struct aac_dev *dev, int comm);
 int aac_rx_deliver_producer(struct fib * fib);
 
-static inline int aac_is_src(struct aac_dev *dev)
-{
-	u16 device = dev->pdev->device;
-
-	if (device == PMC_DEVICE_S6 ||
-		device == PMC_DEVICE_S7 ||
-		device == PMC_DEVICE_S8)
-		return 1;
-	return 0;
-}
-
 static inline int aac_supports_2T(struct aac_dev *dev)
 {
 	return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index d4fcfa1e54e0..edaa2d53e704 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -41,8 +41,11 @@ static inline int aac_is_msix_mode(struct aac_dev *dev)
 {
 	u32 status = 0;
 
-	if (aac_is_src(dev))
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+		dev->pdev->device == PMC_DEVICE_S7 ||
+		dev->pdev->device == PMC_DEVICE_S8) {
 		status = src_readl(dev, MUnit.OMR);
+	}
 	return (status & AAC_INT_MODE_MSIX);
 }
 
@@ -349,7 +352,9 @@ int aac_send_shutdown(struct aac_dev * dev)
 	/* FIB should be freed only after getting the response from the F/W */
 	if (status != -ERESTARTSYS)
 		aac_fib_free(fibctx);
-	if (aac_is_src(dev) &&
+	if ((dev->pdev->device == PMC_DEVICE_S7 ||
+	     dev->pdev->device == PMC_DEVICE_S8 ||
+	     dev->pdev->device == PMC_DEVICE_S9) &&
 	     dev->msi_enabled)
 		aac_set_intx_mode(dev);
 	return status;
@@ -605,7 +610,9 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 		dev->max_fib_size = status[1] & 0xFFE0;
 		host->sg_tablesize = status[2] >> 16;
 		dev->sg_tablesize = status[2] & 0xFFFF;
-		if (aac_is_src(dev)) {
+		if (dev->pdev->device == PMC_DEVICE_S7 ||
+		    dev->pdev->device == PMC_DEVICE_S8 ||
+		    dev->pdev->device == PMC_DEVICE_S9) {
 			if (host->can_queue > (status[3] >> 16) -
 					AAC_NUM_MGT_FIB)
 				host->can_queue = (status[3] >> 16) -
@@ -624,7 +631,10 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 			pr_warn("numacb=%d ignored\n", numacb);
 	}
 
-	if (aac_is_src(dev))
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+	    dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8 ||
+	    dev->pdev->device == PMC_DEVICE_S9)
 		aac_define_int_mode(dev);
 	/*
 	 *	Ok now init the communication subsystem
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 2142a649e865..b047b1e2215a 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2574,7 +2574,10 @@ void aac_free_irq(struct aac_dev *dev)
 {
 	int i;
 
-	if (aac_is_src(dev)) {
+	if (dev->pdev->device == PMC_DEVICE_S6 ||
+	    dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8 ||
+	    dev->pdev->device == PMC_DEVICE_S9) {
 		if (dev->max_msix > 1) {
 			for (i = 0; i < dev->max_msix; i++)
 				free_irq(pci_irq_vector(dev->pdev, i),
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 644f7f5c61a2..f669a4405217 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1559,8 +1559,10 @@ static void __aac_shutdown(struct aac_dev * aac)
 	aac_send_shutdown(aac);
 
 	aac_adapter_disable_int(aac);
-
-	if (aac_is_src(aac)) {
+	if (aac->pdev->device == PMC_DEVICE_S6 ||
+	    aac->pdev->device == PMC_DEVICE_S7 ||
+	    aac->pdev->device == PMC_DEVICE_S8 ||
+	    aac->pdev->device == PMC_DEVICE_S9) {
 		if (aac->max_msix > 1) {
 			for (i = 0; i < aac->max_msix; i++) {
 				free_irq(pci_irq_vector(aac->pdev, i),
@@ -1835,7 +1837,9 @@ static int aac_acquire_resources(struct aac_dev *dev)
 	aac_adapter_enable_int(dev);
 
 
-	if (aac_is_src(dev))
+	if ((dev->pdev->device == PMC_DEVICE_S7 ||
+	     dev->pdev->device == PMC_DEVICE_S8 ||
+	     dev->pdev->device == PMC_DEVICE_S9))
 		aac_define_int_mode(dev);
 
 	if (dev->msi_enabled)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only)
  2019-07-10  9:31         ` [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
  2019-07-10  9:31           ` [PATCH v2 1/2] Revert "scsi: aacraid: Remove reference to Series-9" Konstantin Khorenko
@ 2019-07-10  9:31           ` Konstantin Khorenko
  2019-07-12  1:30             ` Martin K. Petersen
  1 sibling, 1 reply; 26+ messages in thread
From: Konstantin Khorenko @ 2019-07-10  9:31 UTC (permalink / raw)
  To: Prasad B Munirathnam, Raghava Aditya Renukunta, Finn Thain,
	Andrey Jr . Melnikov
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions

The patch removes references to Series 9 adapters following
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9"),
but doesn't touch Series 6 adapters logic.

Leaving Series 6 adapters untouched avoids controller
hungs/resets under high io load.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
https://bugzilla.redhat.com/show_bug.cgi?id=1724077
https://jira.sw.ru/browse/PSBM-95736

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 drivers/scsi/aacraid/aacraid.h  | 1 -
 drivers/scsi/aacraid/comminit.c | 9 +++------
 drivers/scsi/aacraid/commsup.c  | 3 +--
 drivers/scsi/aacraid/linit.c    | 8 +++-----
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index aef47d0e718c..b674fb645523 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -416,7 +416,6 @@ struct aac_ciss_identify_pd {
 #define PMC_DEVICE_S6	0x28b
 #define PMC_DEVICE_S7	0x28c
 #define PMC_DEVICE_S8	0x28d
-#define PMC_DEVICE_S9	0x28f
 
 #define aac_phys_to_logical(x)  ((x)+1)
 #define aac_logical_to_phys(x)  ((x)?(x)-1:0)
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index edaa2d53e704..c8db6614b712 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -353,8 +353,7 @@ int aac_send_shutdown(struct aac_dev * dev)
 	if (status != -ERESTARTSYS)
 		aac_fib_free(fibctx);
 	if ((dev->pdev->device == PMC_DEVICE_S7 ||
-	     dev->pdev->device == PMC_DEVICE_S8 ||
-	     dev->pdev->device == PMC_DEVICE_S9) &&
+	     dev->pdev->device == PMC_DEVICE_S8) &&
 	     dev->msi_enabled)
 		aac_set_intx_mode(dev);
 	return status;
@@ -611,8 +610,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 		host->sg_tablesize = status[2] >> 16;
 		dev->sg_tablesize = status[2] & 0xFFFF;
 		if (dev->pdev->device == PMC_DEVICE_S7 ||
-		    dev->pdev->device == PMC_DEVICE_S8 ||
-		    dev->pdev->device == PMC_DEVICE_S9) {
+		    dev->pdev->device == PMC_DEVICE_S8) {
 			if (host->can_queue > (status[3] >> 16) -
 					AAC_NUM_MGT_FIB)
 				host->can_queue = (status[3] >> 16) -
@@ -633,8 +631,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 
 	if (dev->pdev->device == PMC_DEVICE_S6 ||
 	    dev->pdev->device == PMC_DEVICE_S7 ||
-	    dev->pdev->device == PMC_DEVICE_S8 ||
-	    dev->pdev->device == PMC_DEVICE_S9)
+	    dev->pdev->device == PMC_DEVICE_S8)
 		aac_define_int_mode(dev);
 	/*
 	 *	Ok now init the communication subsystem
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index b047b1e2215a..705e003caa95 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2576,8 +2576,7 @@ void aac_free_irq(struct aac_dev *dev)
 
 	if (dev->pdev->device == PMC_DEVICE_S6 ||
 	    dev->pdev->device == PMC_DEVICE_S7 ||
-	    dev->pdev->device == PMC_DEVICE_S8 ||
-	    dev->pdev->device == PMC_DEVICE_S9) {
+	    dev->pdev->device == PMC_DEVICE_S8) {
 		if (dev->max_msix > 1) {
 			for (i = 0; i < dev->max_msix; i++)
 				free_irq(pci_irq_vector(dev->pdev, i),
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index f669a4405217..d5082b191aa8 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1561,8 +1561,7 @@ static void __aac_shutdown(struct aac_dev * aac)
 	aac_adapter_disable_int(aac);
 	if (aac->pdev->device == PMC_DEVICE_S6 ||
 	    aac->pdev->device == PMC_DEVICE_S7 ||
-	    aac->pdev->device == PMC_DEVICE_S8 ||
-	    aac->pdev->device == PMC_DEVICE_S9) {
+	    aac->pdev->device == PMC_DEVICE_S8) {
 		if (aac->max_msix > 1) {
 			for (i = 0; i < aac->max_msix; i++) {
 				free_irq(pci_irq_vector(aac->pdev, i),
@@ -1837,9 +1836,8 @@ static int aac_acquire_resources(struct aac_dev *dev)
 	aac_adapter_enable_int(dev);
 
 
-	if ((dev->pdev->device == PMC_DEVICE_S7 ||
-	     dev->pdev->device == PMC_DEVICE_S8 ||
-	     dev->pdev->device == PMC_DEVICE_S9))
+	if (dev->pdev->device == PMC_DEVICE_S7 ||
+	    dev->pdev->device == PMC_DEVICE_S8)
 		aac_define_int_mode(dev);
 
 	if (dev->msi_enabled)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only)
  2019-07-10  9:31           ` [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only) Konstantin Khorenko
@ 2019-07-12  1:30             ` Martin K. Petersen
  2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
  0 siblings, 1 reply; 26+ messages in thread
From: Martin K. Petersen @ 2019-07-12  1:30 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Prasad B Munirathnam, Raghava Aditya Renukunta, David Carroll,
	Finn Thain, Andrey Jr . Melnikov, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions


Hi Konstantin,

> The patch removes references to Series 9 adapters following
> 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9"),
> but doesn't touch Series 6 adapters logic.

We'll need some guidance from the Microsemi folks on this issue.

> https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> https://jira.sw.ru/browse/PSBM-95736

These two links don't appear to be publicly accessible and therefore do
not belong in the patch.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2019-07-12  1:30             ` Martin K. Petersen
@ 2019-08-19 16:35               ` Konstantin Khorenko
  2019-08-19 16:35                 ` [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
                                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Konstantin Khorenko @ 2019-08-19 16:35 UTC (permalink / raw)
  To: Martin K . Petersen, Sagar Biradar
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions

Problem description:
====================
A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204]
A lot of disks attached to the controller.
Simple test: running mkfs.ext4 on many disks on the same controller in
parallel (mkfs is not important here, any serious io load triggers controller
aborts)

Results:
* no problems (controller resets) with kernels prior to
  395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

* latest ms kernel v5.2-rc6-15-g249155c20f9b - mkfs processes are in D state,
  lot of complains in logs like:

  [  654.894633] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,43,0):
  [  699.441034] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,40,0):
  [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
  [  714.457428] aacraid: Host adapter reset request. SCSI hang ?
  ...
  [  759.514759] aacraid: Host adapter reset request. SCSI hang ?
  [  759.514869] aacraid 0000:03:00.0: outstanding cmd: midlevel-0
  [  759.514870] aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
  [  759.514872] aacraid 0000:03:00.0: outstanding cmd: error handler-498
  [  759.514873] aacraid 0000:03:00.0: outstanding cmd: firmware-471
  [  759.514875] aacraid 0000:03:00.0: outstanding cmd: kernel-60
  [  759.514912] aacraid 0000:03:00.0: Controller reset type is 3
  [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
  [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded

Same complains on Ubuntu kernel 4.15.0-50-generic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586

Controller:
===========
03:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
         Subsystem: Adaptec Series 6 - ASR-6405 - 4 internal 6G SAS ports

Test:
=====
# cat dev.list
/dev/sdq1
/dev/sde1
/dev/sds1
/dev/sdb1
/dev/sdk1
/dev/sdaj1
/dev/sdaf1
/dev/sdd1
/dev/sdac1
/dev/sdai1
/dev/sdz1
/dev/sdj1
/dev/sdy1
/dev/sdn1
/dev/sdae1
/dev/sdg1
/dev/sdi1
/dev/sdc1
/dev/sdf1
/dev/sdl1
/dev/sda1
/dev/sdab1
/dev/sdr1
/dev/sdo1
/dev/sdah1
/dev/sdm1
/dev/sdt1
/dev/sdp1
/dev/sdad1
/dev/sdh1

===========================================
# cat run_mkfs.sh
#!/bin/bash

while read i; do
   mkfs.ext4 $i -q -E lazy_itable_init=1 -O uninit_bg -m 0 &
done

=================================
# cat dev.list | ./run_mkfs.sh

The issue is 100% reproducible.

i've bisected to the culprit patch, it's
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

it changes arc ctrl checks for Series-6 controllers
and i've checked that resurrection of original logic in arc ctrl checks
eliminates controller hangs/resets.

Konstantin Khorenko (1):
  scsi: aacraid: resurrect correct arc ctrl checks for Series-6

--
v3 changes:
 * introduced another wrapper to check for devices except for Series 6
   controllers upon request from Sagar Biradar (Microchip)

 * dropped mentions of private bug ids


 drivers/scsi/aacraid/aacraid.h  | 11 +++++++++++
 drivers/scsi/aacraid/comminit.c |  5 ++---
 drivers/scsi/aacraid/linit.c    |  2 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

-- 
2.15.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6
  2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
@ 2019-08-19 16:35                 ` Konstantin Khorenko
  2019-08-29 21:52                 ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Martin K. Petersen
  2021-05-06 22:22                 ` James Hilliard
  2 siblings, 0 replies; 26+ messages in thread
From: Konstantin Khorenko @ 2019-08-19 16:35 UTC (permalink / raw)
  To: Martin K . Petersen, Sagar Biradar
  Cc: Konstantin Khorenko, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions

The patch introduces another wrapper similar to aac_is_src()
which avoids checking for Series 6 devices.

Use this new wrapper in order to revert original arc ctrl checks for
Series-6 controllers which were occasionally changed by commit
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

The patch above not only drops Series-9 cards checks but also
changes logic for Series-6 controllers which lead to controller
hungs/resets under high io load.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 drivers/scsi/aacraid/aacraid.h  | 11 +++++++++++
 drivers/scsi/aacraid/comminit.c |  5 ++---
 drivers/scsi/aacraid/linit.c    |  2 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 3fa03230f6ba..ddfa78c05728 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2740,6 +2740,17 @@ static inline int aac_is_src(struct aac_dev *dev)
 	return 0;
 }
 
+static inline int aac_is_srcv(struct aac_dev *dev)
+{
+	u16 device = dev->pdev->device;
+
+	if (device == PMC_DEVICE_S7 ||
+	    device == PMC_DEVICE_S8)
+		return 1;
+
+	return 0;
+}
+
 static inline int aac_supports_2T(struct aac_dev *dev)
 {
 	return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index d4fcfa1e54e0..1918e46ae3ec 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -349,8 +349,7 @@ int aac_send_shutdown(struct aac_dev * dev)
 	/* FIB should be freed only after getting the response from the F/W */
 	if (status != -ERESTARTSYS)
 		aac_fib_free(fibctx);
-	if (aac_is_src(dev) &&
-	     dev->msi_enabled)
+	if (aac_is_srcv(dev) && dev->msi_enabled)
 		aac_set_intx_mode(dev);
 	return status;
 }
@@ -605,7 +604,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 		dev->max_fib_size = status[1] & 0xFFE0;
 		host->sg_tablesize = status[2] >> 16;
 		dev->sg_tablesize = status[2] & 0xFFFF;
-		if (aac_is_src(dev)) {
+		if (aac_is_srcv(dev)) {
 			if (host->can_queue > (status[3] >> 16) -
 					AAC_NUM_MGT_FIB)
 				host->can_queue = (status[3] >> 16) -
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 644f7f5c61a2..c8badc9d9ae7 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1835,7 +1835,7 @@ static int aac_acquire_resources(struct aac_dev *dev)
 	aac_adapter_enable_int(dev);
 
 
-	if (aac_is_src(dev))
+	if (aac_is_srcv(dev))
 		aac_define_int_mode(dev);
 
 	if (dev->msi_enabled)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
  2019-08-19 16:35                 ` [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
@ 2019-08-29 21:52                 ` Martin K. Petersen
  2021-05-06 22:22                 ` James Hilliard
  2 siblings, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2019-08-29 21:52 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Martin K . Petersen, Sagar Biradar, linux-scsi, linux-kernel,
	Adaptec OEM Raid Solutions


> Problem description:
> ====================
> A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204] A lot
> of disks attached to the controller.  Simple test: running mkfs.ext4
> on many disks on the same controller in parallel (mkfs is not
> important here, any serious io load triggers controller aborts)

Microchip folks: Please review!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
  2019-08-19 16:35                 ` [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
  2019-08-29 21:52                 ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Martin K. Petersen
@ 2021-05-06 22:22                 ` James Hilliard
       [not found]                   ` <ffdb2223-eed3-75b4-a003-4e4c96b49947@grossegger.com>
  2 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2021-05-06 22:22 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Martin K . Petersen, Sagar Biradar, linux-scsi,
	Linux Kernel Mailing List, Adaptec OEM Raid Solutions

On Mon, Aug 19, 2019 at 10:35 AM Konstantin Khorenko
<khorenko@virtuozzo.com> wrote:
>
> Problem description:
> ====================
> A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204]
Hitting this on a Adaptec RAID 71605 as well with BIOS V7.5.0[32118]
> A lot of disks attached to the controller.
> Simple test: running mkfs.ext4 on many disks on the same controller in
> parallel (mkfs is not important here, any serious io load triggers controller
> aborts)
I saw a zfs resilver trigger this.
>
>
> Results:
> * no problems (controller resets) with kernels prior to
>   395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
>
> * latest ms kernel v5.2-rc6-15-g249155c20f9b - mkfs processes are in D state,
>   lot of complains in logs like:
>
>   [  654.894633] aacraid: Host adapter abort request.
>   aacraid: Outstanding commands on (0,1,43,0):
>   [  699.441034] aacraid: Host adapter abort request.
>   aacraid: Outstanding commands on (0,1,40,0):
>   [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
>   [  714.457428] aacraid: Host adapter reset request. SCSI hang ?
>   ...
>   [  759.514759] aacraid: Host adapter reset request. SCSI hang ?
>   [  759.514869] aacraid 0000:03:00.0: outstanding cmd: midlevel-0
>   [  759.514870] aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
>   [  759.514872] aacraid 0000:03:00.0: outstanding cmd: error handler-498
>   [  759.514873] aacraid 0000:03:00.0: outstanding cmd: firmware-471
>   [  759.514875] aacraid 0000:03:00.0: outstanding cmd: kernel-60
>   [  759.514912] aacraid 0000:03:00.0: Controller reset type is 3
>   [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
>   [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded
>
> Same complains on Ubuntu kernel 4.15.0-50-generic:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586
It's popping up in proxmox as well looks like:
https://forum.proxmox.com/threads/aacraid-host-adapter-abort-request-errors.86903/

When I tested this patch it appears to reduce the frequency of the
issue although I did
still hit an abort request:
aacraid: Host adapter abort request.
aacraid: Outstanding commands on (0,1,47,0):
>
>
>
> Controller:
> ===========
> 03:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
>          Subsystem: Adaptec Series 6 - ASR-6405 - 4 internal 6G SAS ports
>
> Test:
> =====
> # cat dev.list
> /dev/sdq1
> /dev/sde1
> /dev/sds1
> /dev/sdb1
> /dev/sdk1
> /dev/sdaj1
> /dev/sdaf1
> /dev/sdd1
> /dev/sdac1
> /dev/sdai1
> /dev/sdz1
> /dev/sdj1
> /dev/sdy1
> /dev/sdn1
> /dev/sdae1
> /dev/sdg1
> /dev/sdi1
> /dev/sdc1
> /dev/sdf1
> /dev/sdl1
> /dev/sda1
> /dev/sdab1
> /dev/sdr1
> /dev/sdo1
> /dev/sdah1
> /dev/sdm1
> /dev/sdt1
> /dev/sdp1
> /dev/sdad1
> /dev/sdh1
>
> ===========================================
> # cat run_mkfs.sh
> #!/bin/bash
>
> while read i; do
>    mkfs.ext4 $i -q -E lazy_itable_init=1 -O uninit_bg -m 0 &
> done
>
> =================================
> # cat dev.list | ./run_mkfs.sh
>
> The issue is 100% reproducible.
>
> i've bisected to the culprit patch, it's
> 395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")
>
> it changes arc ctrl checks for Series-6 controllers
> and i've checked that resurrection of original logic in arc ctrl checks
> eliminates controller hangs/resets.
>
> Konstantin Khorenko (1):
>   scsi: aacraid: resurrect correct arc ctrl checks for Series-6
>
> --
> v3 changes:
>  * introduced another wrapper to check for devices except for Series 6
>    controllers upon request from Sagar Biradar (Microchip)
>
>  * dropped mentions of private bug ids
>
>
>  drivers/scsi/aacraid/aacraid.h  | 11 +++++++++++
>  drivers/scsi/aacraid/comminit.c |  5 ++---
>  drivers/scsi/aacraid/linit.c    |  2 +-
>  3 files changed, 14 insertions(+), 4 deletions(-)
>
> --
> 2.15.1
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
       [not found]                   ` <ffdb2223-eed3-75b4-a003-4e4c96b49947@grossegger.com>
@ 2022-02-23  2:41                     ` Martin K. Petersen
  2022-10-10 12:31                       ` James Hilliard
  0 siblings, 1 reply; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-23  2:41 UTC (permalink / raw)
  To: Christian Großegger
  Cc: linux-scsi, Adaptec OEM Raid Solutions, Martin K . Petersen,
	Sagar Biradar, Linux Kernel Mailing List, Konstantin Khorenko,
	James Hilliard, Don Brace


Christian,

> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
> repaired with Konstantin Khorenko (1):
>
>   scsi: aacraid: resurrect correct arc ctrl checks for Series-6

It would be great to get this patch resubmitted by Konstantin and acked
by Microchip.

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-02-23  2:41                     ` Martin K. Petersen
@ 2022-10-10 12:31                       ` James Hilliard
  2022-10-19 18:00                         ` Konstantin Khorenko
  0 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2022-10-10 12:31 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christian Großegger, linux-scsi, Adaptec OEM Raid Solutions,
	Sagar Biradar, Linux Kernel Mailing List, Konstantin Khorenko,
	Don Brace

On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> Christian,
>
> > The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
> > repaired with Konstantin Khorenko (1):
> >
> >   scsi: aacraid: resurrect correct arc ctrl checks for Series-6
>
> It would be great to get this patch resubmitted by Konstantin and acked
> by Microchip.

Does the patch need to be rebased?

Based on this it looks like someone at microchip may have already reviewed:
v3 changes:
 * introduced another wrapper to check for devices except for Series 6
   controllers upon request from Sagar Biradar (Microchip)


>
> Thanks!
>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-10-10 12:31                       ` James Hilliard
@ 2022-10-19 18:00                         ` Konstantin Khorenko
  2022-10-26 20:10                           ` James Hilliard
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Khorenko @ 2022-10-19 18:00 UTC (permalink / raw)
  To: James Hilliard, Martin K. Petersen
  Cc: Christian Großegger, linux-scsi, Adaptec OEM Raid Solutions,
	Sagar Biradar, Linux Kernel Mailing List, Don Brace

On 10.10.2022 14:31, James Hilliard wrote:
> On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> <martin.petersen@oracle.com> wrote:
>>
>>
>> Christian,
>>
>>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
>>> repaired with Konstantin Khorenko (1):
>>>
>>>    scsi: aacraid: resurrect correct arc ctrl checks for Series-6
>>
>> It would be great to get this patch resubmitted by Konstantin and acked
>> by Microchip.
> 
> Does the patch need to be rebased?

James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.

> Based on this it looks like someone at microchip may have already reviewed:
> v3 changes:
>   * introduced another wrapper to check for devices except for Series 6
>     controllers upon request from Sagar Biradar (Microchip)

Well, back in the year 2019 i've created a bug in RedHat bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1724077
(the bug is private, this is default for Redhat bugs)

In this bug Sagar Biradar (with the email @microchip.com) suggested me to rework the patch - i've done 
that and sent the v3.

And nothing happened after that, but in a ~year (2020-06-19) the bug was closed with the resolution 
NOTABUG and a comment that S6 users will find the patch useful.

i suppose S6 is so old that RedHat just does not have customers using it and Microchip company itself 
is also not that interested in handling so old hardware issues.

Sorry, i was unable to get a final ack from Microchip,
i've written direct emails to the addresses which is found in the internet, tried to connect via 
linkedin, no luck.

--
Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-10-19 18:00                         ` Konstantin Khorenko
@ 2022-10-26 20:10                           ` James Hilliard
       [not found]                             ` <BYAPR11MB36066925274C38555F20FB17FA339@BYAPR11MB3606.namprd11.prod.outlook.com>
  0 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2022-10-26 20:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Konstantin Khorenko, Christian Großegger, linux-scsi,
	Adaptec OEM Raid Solutions, Sagar Biradar,
	Linux Kernel Mailing List, Don Brace

On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko
<khorenko@virtuozzo.com> wrote:
>
> On 10.10.2022 14:31, James Hilliard wrote:
> > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > <martin.petersen@oracle.com> wrote:
> >>
> >>
> >> Christian,
> >>
> >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
> >>> repaired with Konstantin Khorenko (1):
> >>>
> >>>    scsi: aacraid: resurrect correct arc ctrl checks for Series-6
> >>
> >> It would be great to get this patch resubmitted by Konstantin and acked
> >> by Microchip.

Can we merge this as is since microchip does not appear to be maintaining
this driver any more or responding?

> >
> > Does the patch need to be rebased?
>
> James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
>
> > Based on this it looks like someone at microchip may have already reviewed:
> > v3 changes:
> >   * introduced another wrapper to check for devices except for Series 6
> >     controllers upon request from Sagar Biradar (Microchip)
>
> Well, back in the year 2019 i've created a bug in RedHat bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> (the bug is private, this is default for Redhat bugs)
>
> In this bug Sagar Biradar (with the email @microchip.com) suggested me to rework the patch - i've done
> that and sent the v3.
>
> And nothing happened after that, but in a ~year (2020-06-19) the bug was closed with the resolution
> NOTABUG and a comment that S6 users will find the patch useful.
>
> i suppose S6 is so old that RedHat just does not have customers using it and Microchip company itself
> is also not that interested in handling so old hardware issues.
>
> Sorry, i was unable to get a final ack from Microchip,
> i've written direct emails to the addresses which is found in the internet, tried to connect via
> linkedin, no luck.
>
> --
> Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
       [not found]                             ` <BYAPR11MB36066925274C38555F20FB17FA339@BYAPR11MB3606.namprd11.prod.outlook.com>
@ 2022-11-13 18:42                               ` James Hilliard
  2022-11-15 14:05                                 ` Sagar.Biradar
  0 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2022-11-13 18:42 UTC (permalink / raw)
  To: Sagar.Biradar
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, Linux Kernel Mailing List

On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James and Konstantin,
>
> *Limiting the audience to avoid spamming*
>
> Sorry for delayed response as I was on vacation.
> This one got missed somehow as someone else was looking into this and is no longer with the company.
>
> I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> I will get back to you with some more questions or the confirmation in a day or two max.

Did this ever get looked at?

As this exact patch was merged into the vendor aacraid a while ago I'm not sure
why it wouldn't be good to merge to mainline as well.

Vendor aacraid release with this patch merged:
https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-60001.tgz

>
>
> Thanks for your patience.
> Sagar
>
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Thursday, October 27, 2022 1:40 AM
> To: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org; Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Don Brace - C33706 <Don.Brace@microchip.com>
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> >
> > On 10.10.2022 14:31, James Hilliard wrote:
> > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > > <martin.petersen@oracle.com> wrote:
> > >>
> > >>
> > >> Christian,
> > >>
> > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
> > >>> repaired with Konstantin Khorenko (1):
> > >>>
> > >>>    scsi: aacraid: resurrect correct arc ctrl checks for Series-6
> > >>
> > >> It would be great to get this patch resubmitted by Konstantin and
> > >> acked by Microchip.
>
> Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
>
> > >
> > > Does the patch need to be rebased?
> >
> > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> >
> > > Based on this it looks like someone at microchip may have already reviewed:
> > > v3 changes:
> > >   * introduced another wrapper to check for devices except for Series 6
> > >     controllers upon request from Sagar Biradar (Microchip)
> >
> > Well, back in the year 2019 i've created a bug in RedHat bugzilla
> > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > (the bug is private, this is default for Redhat bugs)
> >
> > In this bug Sagar Biradar (with the email @microchip.com) suggested me
> > to rework the patch - i've done that and sent the v3.
> >
> > And nothing happened after that, but in a ~year (2020-06-19) the bug
> > was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> >
> > i suppose S6 is so old that RedHat just does not have customers using
> > it and Microchip company itself is also not that interested in handling so old hardware issues.
> >
> > Sorry, i was unable to get a final ack from Microchip, i've written
> > direct emails to the addresses which is found in the internet, tried
> > to connect via linkedin, no luck.
> >
> > --
> > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-11-13 18:42                               ` James Hilliard
@ 2022-11-15 14:05                                 ` Sagar.Biradar
  2022-11-16 21:55                                   ` James Hilliard
  0 siblings, 1 reply; 26+ messages in thread
From: Sagar.Biradar @ 2022-11-15 14:05 UTC (permalink / raw)
  To: james.hilliard1
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel

Hi James,
I have looked into the patch thoroughly.
We suspect this change might expose an old legacy interrupt issue on some processors.

We are currently debugging and digging further details to be able to explain it in much detailed fashion.
I will keep you the thread posted as soon as we have something interesting.

Sagar

-----Original Message-----
From: James Hilliard <james.hilliard1@gmail.com> 
Sent: Monday, November 14, 2022 12:13 AM
To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James and Konstantin,
>
> *Limiting the audience to avoid spamming*
>
> Sorry for delayed response as I was on vacation.
> This one got missed somehow as someone else was looking into this and is no longer with the company.
>
> I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> I will get back to you with some more questions or the confirmation in a day or two max.

Did this ever get looked at?

As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.

Vendor aacraid release with this patch merged:
https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-60001.tgz

>
>
> Thanks for your patience.
> Sagar
>
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Thursday, October 27, 2022 1:40 AM
> To: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian Großegger 
> <christian@grossegger.com>; linux-scsi@vger.kernel.org; Adaptec OEM 
> Raid Solutions <aacraid@microsemi.com>; Sagar Biradar - C34249 
> <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List 
> <linux-kernel@vger.kernel.org>; Don Brace - C33706 
> <Don.Brace@microchip.com>
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> >
> > On 10.10.2022 14:31, James Hilliard wrote:
> > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen 
> > > <martin.petersen@oracle.com> wrote:
> > >>
> > >>
> > >> Christian,
> > >>
> > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be 
> > >>> repaired with Konstantin Khorenko (1):
> > >>>
> > >>>    scsi: aacraid: resurrect correct arc ctrl checks for Series-6
> > >>
> > >> It would be great to get this patch resubmitted by Konstantin and 
> > >> acked by Microchip.
>
> Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
>
> > >
> > > Does the patch need to be rebased?
> >
> > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> >
> > > Based on this it looks like someone at microchip may have already reviewed:
> > > v3 changes:
> > >   * introduced another wrapper to check for devices except for Series 6
> > >     controllers upon request from Sagar Biradar (Microchip)
> >
> > Well, back in the year 2019 i've created a bug in RedHat bugzilla
> > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > (the bug is private, this is default for Redhat bugs)
> >
> > In this bug Sagar Biradar (with the email @microchip.com) suggested 
> > me to rework the patch - i've done that and sent the v3.
> >
> > And nothing happened after that, but in a ~year (2020-06-19) the bug 
> > was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> >
> > i suppose S6 is so old that RedHat just does not have customers 
> > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> >
> > Sorry, i was unable to get a final ack from Microchip, i've written 
> > direct emails to the addresses which is found in the internet, tried 
> > to connect via linkedin, no luck.
> >
> > --
> > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-11-15 14:05                                 ` Sagar.Biradar
@ 2022-11-16 21:55                                   ` James Hilliard
  2022-11-18  3:36                                     ` Sagar.Biradar
  0 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2022-11-16 21:55 UTC (permalink / raw)
  To: Sagar.Biradar
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel

On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James,
> I have looked into the patch thoroughly.
> We suspect this change might expose an old legacy interrupt issue on some processors.

I did see this error once with this patch when a drive was having issues:
[ 4306.357531] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030025] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030111] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030172] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
[ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0
[ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0
[ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3
[ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0
[ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0
[ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3
[ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset
[ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded
[ 4365.895079] aacraid: Comm Interface type2 enabled
[ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan
[ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
(14.0 TB/12.7 TiB)
[ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks
[ 5643.714301] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 5672.351532] #PF: supervisor read access in kernel mode
[ 5672.353262] #PF: error_code(0x0000) - not-present page
[ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0
[ 5672.356444] Oops: 0000 [#1] SMP PTI
[ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
     5.15.64-1-pve #1
[ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
05/21/2021
[ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
[ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00
4c 8b
[ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046
[ 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000001
[ 5672.371073] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000
[ 5672.373007] RBP: ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001
[ 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12: 0000000000000000
[ 5672.376418] R13: 0000000000000000 R14: ffff88968e1ec0d0 R15: 0000000000000000
[ 5672.378136] FS:  00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
knlGS:0000000000000000
[ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0
[ 5672.383023] Call Trace:
[ 5672.384673]  <IRQ>
[ 5672.386282]  ? task_tick_fair+0x88/0x530
[ 5672.386469] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50
[ 5672.391431] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.393273]  scsi_dma_unmap+0x3b/0x50
[ 5672.397079] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]

Does that look related?

>
> We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> I will keep you the thread posted as soon as we have something interesting.
>
> Sagar
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Monday, November 14, 2022 12:13 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James and Konstantin,
> >
> > *Limiting the audience to avoid spamming*
> >
> > Sorry for delayed response as I was on vacation.
> > This one got missed somehow as someone else was looking into this and is no longer with the company.
> >
> > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > I will get back to you with some more questions or the confirmation in a day or two max.
>
> Did this ever get looked at?
>
> As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
>
> Vendor aacraid release with this patch merged:
> https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-60001.tgz
>
> >
> >
> > Thanks for your patience.
> > Sagar
> >
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Thursday, October 27, 2022 1:40 AM
> > To: Martin K. Petersen <martin.petersen@oracle.com>
> > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian Großegger
> > <christian@grossegger.com>; linux-scsi@vger.kernel.org; Adaptec OEM
> > Raid Solutions <aacraid@microsemi.com>; Sagar Biradar - C34249
> > <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List
> > <linux-kernel@vger.kernel.org>; Don Brace - C33706
> > <Don.Brace@microchip.com>
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > >
> > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > > > <martin.petersen@oracle.com> wrote:
> > > >>
> > > >>
> > > >> Christian,
> > > >>
> > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should be
> > > >>> repaired with Konstantin Khorenko (1):
> > > >>>
> > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for Series-6
> > > >>
> > > >> It would be great to get this patch resubmitted by Konstantin and
> > > >> acked by Microchip.
> >
> > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> >
> > > >
> > > > Does the patch need to be rebased?
> > >
> > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > >
> > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > v3 changes:
> > > >   * introduced another wrapper to check for devices except for Series 6
> > > >     controllers upon request from Sagar Biradar (Microchip)
> > >
> > > Well, back in the year 2019 i've created a bug in RedHat bugzilla
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > (the bug is private, this is default for Redhat bugs)
> > >
> > > In this bug Sagar Biradar (with the email @microchip.com) suggested
> > > me to rework the patch - i've done that and sent the v3.
> > >
> > > And nothing happened after that, but in a ~year (2020-06-19) the bug
> > > was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > >
> > > i suppose S6 is so old that RedHat just does not have customers
> > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > >
> > > Sorry, i was unable to get a final ack from Microchip, i've written
> > > direct emails to the addresses which is found in the internet, tried
> > > to connect via linkedin, no luck.
> > >
> > > --
> > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-11-16 21:55                                   ` James Hilliard
@ 2022-11-18  3:36                                     ` Sagar.Biradar
  2022-12-03 23:55                                       ` James Hilliard
  0 siblings, 1 reply; 26+ messages in thread
From: Sagar.Biradar @ 2022-11-18  3:36 UTC (permalink / raw)
  To: james.hilliard1
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel

Hi James,
Thanks for your response. 
This issue seems to be slightly different and may have been originating from the drive itself (not too sure).

The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.



-----Original Message-----
From: James Hilliard <james.hilliard1@gmail.com> 
Sent: Thursday, November 17, 2022 3:26 AM
To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James,
> I have looked into the patch thoroughly.
> We suspect this change might expose an old legacy interrupt issue on some processors.

I did see this error once with this patch when a drive was having issues:
[ 4306.357531] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030025] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030111] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.030172] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
[ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
(14.0 TB/12.7 TiB)
[ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
[ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
     5.15.64-1-pve #1
[ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
05/21/2021
[ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
[ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [ 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP: ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [ 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12: 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14: ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:  00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
knlGS:0000000000000000
[ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
[ 5672.384673]  <IRQ>
[ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,1,41,0):
[ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]

Does that look related?

>
> We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> I will keep you the thread posted as soon as we have something interesting.
>
> Sagar
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Monday, November 14, 2022 12:13 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; 
> christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 
> <Don.Brace@microchip.com>; Tom White - C33503 
> <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel 
> Mailing List <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James and Konstantin,
> >
> > *Limiting the audience to avoid spamming*
> >
> > Sorry for delayed response as I was on vacation.
> > This one got missed somehow as someone else was looking into this and is no longer with the company.
> >
> > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > I will get back to you with some more questions or the confirmation in a day or two max.
>
> Did this ever get looked at?
>
> As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
>
> Vendor aacraid release with this patch merged:
> https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-60
> 001.tgz
>
> >
> >
> > Thanks for your patience.
> > Sagar
> >
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Thursday, October 27, 2022 1:40 AM
> > To: Martin K. Petersen <martin.petersen@oracle.com>
> > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian 
> > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org; 
> > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar - 
> > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List 
> > <linux-kernel@vger.kernel.org>; Don Brace - C33706 
> > <Don.Brace@microchip.com>
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > know the content is safe
> >
> > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > >
> > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen 
> > > > <martin.petersen@oracle.com> wrote:
> > > >>
> > > >>
> > > >> Christian,
> > > >>
> > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should 
> > > >>> be repaired with Konstantin Khorenko (1):
> > > >>>
> > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for 
> > > >>> Series-6
> > > >>
> > > >> It would be great to get this patch resubmitted by Konstantin 
> > > >> and acked by Microchip.
> >
> > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> >
> > > >
> > > > Does the patch need to be rebased?
> > >
> > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > >
> > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > v3 changes:
> > > >   * introduced another wrapper to check for devices except for Series 6
> > > >     controllers upon request from Sagar Biradar (Microchip)
> > >
> > > Well, back in the year 2019 i've created a bug in RedHat bugzilla
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > (the bug is private, this is default for Redhat bugs)
> > >
> > > In this bug Sagar Biradar (with the email @microchip.com) 
> > > suggested me to rework the patch - i've done that and sent the v3.
> > >
> > > And nothing happened after that, but in a ~year (2020-06-19) the 
> > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > >
> > > i suppose S6 is so old that RedHat just does not have customers 
> > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > >
> > > Sorry, i was unable to get a final ack from Microchip, i've 
> > > written direct emails to the addresses which is found in the 
> > > internet, tried to connect via linkedin, no luck.
> > >
> > > --
> > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-11-18  3:36                                     ` Sagar.Biradar
@ 2022-12-03 23:55                                       ` James Hilliard
  2022-12-06  5:59                                         ` Sagar.Biradar
  0 siblings, 1 reply; 26+ messages in thread
From: James Hilliard @ 2022-12-03 23:55 UTC (permalink / raw)
  To: Sagar.Biradar
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel

On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James,
> Thanks for your response.
> This issue seems to be slightly different and may have been originating from the drive itself (not too sure).

Yeah, the drive was having hardware issues, although it does sound like a
potential error condition that's not being correctly handled by aacraid.

>
> The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
> We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.

Hmm, are there any available details on this "int-x missing" issue, I
couldn't find
any public details/reports relating to that.

Is there a list of CPU's known to be affected?

Does it occur in the vendor aacraid release that has this patch merged?

>
>
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Thursday, November 17, 2022 3:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James,
> > I have looked into the patch thoroughly.
> > We suspect this change might expose an old legacy interrupt issue on some processors.
>
> I did see this error once with this patch when a drive was having issues:
> [ 4306.357531] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030025] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030111] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030172] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
> [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
> (14.0 TB/12.7 TiB)
> [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
> [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
>      5.15.64-1-pve #1
> [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
> 05/21/2021
> [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
> [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
> 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
> 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [ 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP: ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [ 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12: 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14: ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:  00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
> knlGS:0000000000000000
> [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
> [ 5672.384673]  <IRQ>
> [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
>
> Does that look related?
>
> >
> > We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> > I will keep you the thread posted as soon as we have something interesting.
> >
> > Sagar
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Monday, November 14, 2022 12:13 AM
> > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
> > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
> > <Don.Brace@microchip.com>; Tom White - C33503
> > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel
> > Mailing List <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> > >
> > > Hi James and Konstantin,
> > >
> > > *Limiting the audience to avoid spamming*
> > >
> > > Sorry for delayed response as I was on vacation.
> > > This one got missed somehow as someone else was looking into this and is no longer with the company.
> > >
> > > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > > I will get back to you with some more questions or the confirmation in a day or two max.
> >
> > Did this ever get looked at?
> >
> > As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
> >
> > Vendor aacraid release with this patch merged:
> > https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-60
> > 001.tgz
> >
> > >
> > >
> > > Thanks for your patience.
> > > Sagar
> > >
> > >
> > > -----Original Message-----
> > > From: James Hilliard <james.hilliard1@gmail.com>
> > > Sent: Thursday, October 27, 2022 1:40 AM
> > > To: Martin K. Petersen <martin.petersen@oracle.com>
> > > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian
> > > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org;
> > > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar -
> > > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List
> > > <linux-kernel@vger.kernel.org>; Don Brace - C33706
> > > <Don.Brace@microchip.com>
> > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > > constantly resets under high io load
> > >
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > >
> > > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > > >
> > > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > > > > <martin.petersen@oracle.com> wrote:
> > > > >>
> > > > >>
> > > > >> Christian,
> > > > >>
> > > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 should
> > > > >>> be repaired with Konstantin Khorenko (1):
> > > > >>>
> > > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for
> > > > >>> Series-6
> > > > >>
> > > > >> It would be great to get this patch resubmitted by Konstantin
> > > > >> and acked by Microchip.
> > >
> > > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> > >
> > > > >
> > > > > Does the patch need to be rebased?
> > > >
> > > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > > >
> > > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > > v3 changes:
> > > > >   * introduced another wrapper to check for devices except for Series 6
> > > > >     controllers upon request from Sagar Biradar (Microchip)
> > > >
> > > > Well, back in the year 2019 i've created a bug in RedHat bugzilla
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > > (the bug is private, this is default for Redhat bugs)
> > > >
> > > > In this bug Sagar Biradar (with the email @microchip.com)
> > > > suggested me to rework the patch - i've done that and sent the v3.
> > > >
> > > > And nothing happened after that, but in a ~year (2020-06-19) the
> > > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > > >
> > > > i suppose S6 is so old that RedHat just does not have customers
> > > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > > >
> > > > Sorry, i was unable to get a final ack from Microchip, i've
> > > > written direct emails to the addresses which is found in the
> > > > internet, tried to connect via linkedin, no luck.
> > > >
> > > > --
> > > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-12-03 23:55                                       ` James Hilliard
@ 2022-12-06  5:59                                         ` Sagar.Biradar
  2022-12-16 20:44                                           ` Sagar.Biradar
  0 siblings, 1 reply; 26+ messages in thread
From: Sagar.Biradar @ 2022-12-06  5:59 UTC (permalink / raw)
  To: james.hilliard1
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel

Hi James,
We were in the process of finding the related information and we have finally found some details.
I am reviewing that as I write this email. 
I will get back to you once I review and sort that information with more details.

Thanks
Sagar

-----Original Message-----
From: James Hilliard <james.hilliard1@gmail.com> 
Sent: Sunday, December 4, 2022 5:26 AM
To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James,
> Thanks for your response.
> This issue seems to be slightly different and may have been originating from the drive itself (not too sure).

Yeah, the drive was having hardware issues, although it does sound like a potential error condition that's not being correctly handled by aacraid.

>
> The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
> We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.

Hmm, are there any available details on this "int-x missing" issue, I couldn't find any public details/reports relating to that.

Is there a list of CPU's known to be affected?

Does it occur in the vendor aacraid release that has this patch merged?

>
>
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Thursday, November 17, 2022 3:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; 
> christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 
> <Don.Brace@microchip.com>; Tom White - C33503 
> <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James,
> > I have looked into the patch thoroughly.
> > We suspect this change might expose an old legacy interrupt issue on some processors.
>
> I did see this error once with this patch when a drive was having issues:
> [ 4306.357531] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030025] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030111] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030172] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
> [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
> (14.0 TB/12.7 TiB)
> [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
> [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
>      5.15.64-1-pve #1
> [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
> 05/21/2021
> [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
> [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
> 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
> 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00 
> 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [ 
> 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 
> 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI: 
> 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP: 
> ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [ 
> 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12: 
> 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14: 
> ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:  
> 00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
> knlGS:0000000000000000
> [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
> [ 5672.384673]  <IRQ>
> [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
>
> Does that look related?
>
> >
> > We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> > I will keep you the thread posted as soon as we have something interesting.
> >
> > Sagar
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Monday, November 14, 2022 12:13 AM
> > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; 
> > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 
> > <Don.Brace@microchip.com>; Tom White - C33503 
> > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel 
> > Mailing List <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > know the content is safe
> >
> > On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> > >
> > > Hi James and Konstantin,
> > >
> > > *Limiting the audience to avoid spamming*
> > >
> > > Sorry for delayed response as I was on vacation.
> > > This one got missed somehow as someone else was looking into this and is no longer with the company.
> > >
> > > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > > I will get back to you with some more questions or the confirmation in a day or two max.
> >
> > Did this ever get looked at?
> >
> > As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
> >
> > Vendor aacraid release with this patch merged:
> > https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-
> > 60
> > 001.tgz
> >
> > >
> > >
> > > Thanks for your patience.
> > > Sagar
> > >
> > >
> > > -----Original Message-----
> > > From: James Hilliard <james.hilliard1@gmail.com>
> > > Sent: Thursday, October 27, 2022 1:40 AM
> > > To: Martin K. Petersen <martin.petersen@oracle.com>
> > > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian 
> > > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org; 
> > > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar 
> > > -
> > > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List 
> > > <linux-kernel@vger.kernel.org>; Don Brace - C33706 
> > > <Don.Brace@microchip.com>
> > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> > > constantly resets under high io load
> > >
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > > know the content is safe
> > >
> > > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > > >
> > > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen 
> > > > > <martin.petersen@oracle.com> wrote:
> > > > >>
> > > > >>
> > > > >> Christian,
> > > > >>
> > > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 
> > > > >>> should be repaired with Konstantin Khorenko (1):
> > > > >>>
> > > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for
> > > > >>> Series-6
> > > > >>
> > > > >> It would be great to get this patch resubmitted by Konstantin 
> > > > >> and acked by Microchip.
> > >
> > > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> > >
> > > > >
> > > > > Does the patch need to be rebased?
> > > >
> > > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > > >
> > > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > > v3 changes:
> > > > >   * introduced another wrapper to check for devices except for Series 6
> > > > >     controllers upon request from Sagar Biradar (Microchip)
> > > >
> > > > Well, back in the year 2019 i've created a bug in RedHat 
> > > > bugzilla
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > > (the bug is private, this is default for Redhat bugs)
> > > >
> > > > In this bug Sagar Biradar (with the email @microchip.com) 
> > > > suggested me to rework the patch - i've done that and sent the v3.
> > > >
> > > > And nothing happened after that, but in a ~year (2020-06-19) the 
> > > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > > >
> > > > i suppose S6 is so old that RedHat just does not have customers 
> > > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > > >
> > > > Sorry, i was unable to get a final ack from Microchip, i've 
> > > > written direct emails to the addresses which is found in the 
> > > > internet, tried to connect via linkedin, no luck.
> > > >
> > > > --
> > > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-12-06  5:59                                         ` Sagar.Biradar
@ 2022-12-16 20:44                                           ` Sagar.Biradar
  2022-12-20  1:12                                             ` James Hilliard
  2022-12-20 19:44                                             ` Konstantin Khorenko
  0 siblings, 2 replies; 26+ messages in thread
From: Sagar.Biradar @ 2022-12-16 20:44 UTC (permalink / raw)
  To: Sagar.Biradar, james.hilliard1
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel, Gilbert.Wu

Hi James / Konstantin,
Here are the details that we have compiled so far . . 
I will just repost the problem definition and the concerns discussed so far (to avoid back and forth)...

Issue : Series 6 Patch [regression] aacraid: Host adapter constantly aborts under load (https://bugzilla.redhat.com/show_bug.cgi?id=1724077)

Synopsis: running mkfs.ext4 on different disks on the same controller in parallel. (Nothing seems to break, appears to always recover, but there are a lot of timeouts.)
[  699.442950] aacraid: Host adapter reset request. SCSI hang ?
[  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
[  850.296705] aacraid 0000:03:00.0: IOP reset succeeded
* with kernel 3.10.0-862.20.2.el7.x86_64 - PASS
* with kernel 3.10.0-957.21.3.el7.x86_64 - FAIL

Konstantin’s patch (https://lkml.org/lkml/2019/8/19/758) : upon testing the patch on the Virtuozzo kernel, it was found to be working fine, and the same issue was observed on Ubuntu later.
But MCHP knows this patch/change will have issues with Xeon V2 interrupts, adding this change into the tree can harm the customers who use this processor. (CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2))
However, the patch may work fine on Xeon V3/V4 and later processors.

Adaptec ASK Article references our concern : https://ask.adaptec.com/app/answers/detail/a_id/17400/kw/msi
Though the article lists appears like a "VMware" specific - the issue is independent of the Operating system.
We have discovered a conflict between the Series 6 and 6E RAID controllers, VMware ESXi 5.5 and Intel Xeon V2 processors that is caused by incorrect interrupt handling. 
The system is using the legacy interrupt handling but needs to be switched to MSI (Message Signaled Interrupts) instead.
This issue caused by switching to the legacy mode occurs on CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2).
* Note: Xeon V2 is “Ivy Bridge”

Workaround: The proposed solution would be to let the driver use the MSI mechanism with the aacraid driver parameter "msi" set to 1 (“msi=1") .  ("echo 1 > /sys/module/aacraid/parameters/msi")

Konstantin,
Is it possible for you or someone you know to test on your original test bed with the "msi" set to "1", and post the results?
We are parallelly working on additional tests locally.
Please write to me if you need more information


Thanks in advance
Sagar


-----Original Message-----
From: Sagar.Biradar@microchip.com <Sagar.Biradar@microchip.com> 
Sent: Tuesday, December 6, 2022 11:30 AM
To: james.hilliard1@gmail.com
Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

Hi James,
We were in the process of finding the related information and we have finally found some details.
I am reviewing that as I write this email.
I will get back to you once I review and sort that information with more details.

Thanks
Sagar

-----Original Message-----
From: James Hilliard <james.hilliard1@gmail.com>
Sent: Sunday, December 4, 2022 5:26 AM
To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James,
> Thanks for your response.
> This issue seems to be slightly different and may have been originating from the drive itself (not too sure).

Yeah, the drive was having hardware issues, although it does sound like a potential error condition that's not being correctly handled by aacraid.

>
> The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
> We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.

Hmm, are there any available details on this "int-x missing" issue, I couldn't find any public details/reports relating to that.

Is there a list of CPU's known to be affected?

Does it occur in the vendor aacraid release that has this patch merged?

>
>
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Thursday, November 17, 2022 3:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; 
> christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 
> <Don.Brace@microchip.com>; Tom White - C33503 
> <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James,
> > I have looked into the patch thoroughly.
> > We suspect this change might expose an old legacy interrupt issue on some processors.
>
> I did see this error once with this patch when a drive was having issues:
> [ 4306.357531] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030025] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030111] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.030172] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
> [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
> (14.0 TB/12.7 TiB)
> [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
> [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
>      5.15.64-1-pve #1
> [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
> 05/21/2021
> [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
> [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
> 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
> 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00 
> 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [ 
> 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX:
> 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI:
> 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP:
> ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [ 
> 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12:
> 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14:
> ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:
> 00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
> knlGS:0000000000000000
> [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
> [ 5672.384673]  <IRQ>
> [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (0,1,41,0):
> [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
>
> Does that look related?
>
> >
> > We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> > I will keep you the thread posted as soon as we have something interesting.
> >
> > Sagar
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Monday, November 14, 2022 12:13 AM
> > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; 
> > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 
> > <Don.Brace@microchip.com>; Tom White - C33503 
> > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel 
> > Mailing List <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > know the content is safe
> >
> > On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> > >
> > > Hi James and Konstantin,
> > >
> > > *Limiting the audience to avoid spamming*
> > >
> > > Sorry for delayed response as I was on vacation.
> > > This one got missed somehow as someone else was looking into this and is no longer with the company.
> > >
> > > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > > I will get back to you with some more questions or the confirmation in a day or two max.
> >
> > Did this ever get looked at?
> >
> > As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
> >
> > Vendor aacraid release with this patch merged:
> > https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-
> > 60
> > 001.tgz
> >
> > >
> > >
> > > Thanks for your patience.
> > > Sagar
> > >
> > >
> > > -----Original Message-----
> > > From: James Hilliard <james.hilliard1@gmail.com>
> > > Sent: Thursday, October 27, 2022 1:40 AM
> > > To: Martin K. Petersen <martin.petersen@oracle.com>
> > > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian 
> > > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org; 
> > > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar
> > > -
> > > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List 
> > > <linux-kernel@vger.kernel.org>; Don Brace - C33706 
> > > <Don.Brace@microchip.com>
> > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 
> > > constantly resets under high io load
> > >
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > > know the content is safe
> > >
> > > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > > >
> > > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen 
> > > > > <martin.petersen@oracle.com> wrote:
> > > > >>
> > > > >>
> > > > >> Christian,
> > > > >>
> > > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017 
> > > > >>> should be repaired with Konstantin Khorenko (1):
> > > > >>>
> > > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for
> > > > >>> Series-6
> > > > >>
> > > > >> It would be great to get this patch resubmitted by Konstantin 
> > > > >> and acked by Microchip.
> > >
> > > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> > >
> > > > >
> > > > > Does the patch need to be rebased?
> > > >
> > > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > > >
> > > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > > v3 changes:
> > > > >   * introduced another wrapper to check for devices except for Series 6
> > > > >     controllers upon request from Sagar Biradar (Microchip)
> > > >
> > > > Well, back in the year 2019 i've created a bug in RedHat 
> > > > bugzilla
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > > (the bug is private, this is default for Redhat bugs)
> > > >
> > > > In this bug Sagar Biradar (with the email @microchip.com) 
> > > > suggested me to rework the patch - i've done that and sent the v3.
> > > >
> > > > And nothing happened after that, but in a ~year (2020-06-19) the 
> > > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > > >
> > > > i suppose S6 is so old that RedHat just does not have customers 
> > > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > > >
> > > > Sorry, i was unable to get a final ack from Microchip, i've 
> > > > written direct emails to the addresses which is found in the 
> > > > internet, tried to connect via linkedin, no luck.
> > > >
> > > > --
> > > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-12-16 20:44                                           ` Sagar.Biradar
@ 2022-12-20  1:12                                             ` James Hilliard
  2022-12-20 19:44                                             ` Konstantin Khorenko
  1 sibling, 0 replies; 26+ messages in thread
From: James Hilliard @ 2022-12-20  1:12 UTC (permalink / raw)
  To: Sagar.Biradar
  Cc: martin.petersen, khorenko, christian, aacraid, Don.Brace,
	Tom.White, linux-scsi, linux-kernel, Gilbert.Wu

On Fri, Dec 16, 2022 at 1:44 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James / Konstantin,
> Here are the details that we have compiled so far . .
> I will just repost the problem definition and the concerns discussed so far (to avoid back and forth)...
>
> Issue : Series 6 Patch [regression] aacraid: Host adapter constantly aborts under load (https://bugzilla.redhat.com/show_bug.cgi?id=1724077)
>
> Synopsis: running mkfs.ext4 on different disks on the same controller in parallel. (Nothing seems to break, appears to always recover, but there are a lot of timeouts.)
> [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
> [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
> [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded
> * with kernel 3.10.0-862.20.2.el7.x86_64 - PASS
> * with kernel 3.10.0-957.21.3.el7.x86_64 - FAIL
>
> Konstantin’s patch (https://lkml.org/lkml/2019/8/19/758) : upon testing the patch on the Virtuozzo kernel, it was found to be working fine, and the same issue was observed on Ubuntu later.
> But MCHP knows this patch/change will have issues with Xeon V2 interrupts, adding this change into the tree can harm the customers who use this processor. (CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2))
> However, the patch may work fine on Xeon V3/V4 and later processors.
>
> Adaptec ASK Article references our concern : https://ask.adaptec.com/app/answers/detail/a_id/17400/kw/msi
> Though the article lists appears like a "VMware" specific - the issue is independent of the Operating system.
> We have discovered a conflict between the Series 6 and 6E RAID controllers, VMware ESXi 5.5 and Intel Xeon V2 processors that is caused by incorrect interrupt handling.
> The system is using the legacy interrupt handling but needs to be switched to MSI (Message Signaled Interrupts) instead.
> This issue caused by switching to the legacy mode occurs on CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2).
> * Note: Xeon V2 is “Ivy Bridge”
>
> Workaround: The proposed solution would be to let the driver use the MSI mechanism with the aacraid driver parameter "msi" set to 1 (“msi=1") .  ("echo 1 > /sys/module/aacraid/parameters/msi")

Hmm, so this commit indicates that series 6 raid cards should be always using
MSI interrupts regardless of that msi param:
https://github.com/torvalds/linux/commit/9022d375bd22869ba3e5ad3635f00427cfb934fc

However it appears that the aac_msi check wasn't removed here, maybe it
should have been?:
https://github.com/torvalds/linux/blob/v6.1/drivers/scsi/aacraid/rx.c#L647

>
> Konstantin,
> Is it possible for you or someone you know to test on your original test bed with the "msi" set to "1", and post the results?
> We are parallelly working on additional tests locally.
> Please write to me if you need more information
>
>
> Thanks in advance
> Sagar
>
>
> -----Original Message-----
> From: Sagar.Biradar@microchip.com <Sagar.Biradar@microchip.com>
> Sent: Tuesday, December 6, 2022 11:30 AM
> To: james.hilliard1@gmail.com
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> Hi James,
> We were in the process of finding the related information and we have finally found some details.
> I am reviewing that as I write this email.
> I will get back to you once I review and sort that information with more details.
>
> Thanks
> Sagar
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Sunday, December 4, 2022 5:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James,
> > Thanks for your response.
> > This issue seems to be slightly different and may have been originating from the drive itself (not too sure).
>
> Yeah, the drive was having hardware issues, although it does sound like a potential error condition that's not being correctly handled by aacraid.
>
> >
> > The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
> > We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.
>
> Hmm, are there any available details on this "int-x missing" issue, I couldn't find any public details/reports relating to that.
>
> Is there a list of CPU's known to be affected?
>
> Does it occur in the vendor aacraid release that has this patch merged?
>
> >
> >
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Thursday, November 17, 2022 3:26 AM
> > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
> > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
> > <Don.Brace@microchip.com>; Tom White - C33503
> > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
> > >
> > > Hi James,
> > > I have looked into the patch thoroughly.
> > > We suspect this change might expose an old legacy interrupt issue on some processors.
> >
> > I did see this error once with this patch when a drive was having issues:
> > [ 4306.357531] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030025] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030111] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030172] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
> > [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
> > (14.0 TB/12.7 TiB)
> > [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
> > [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
> >      5.15.64-1-pve #1
> > [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
> > 05/21/2021
> > [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
> > [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
> > 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
> > 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00
> > 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [
> > 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX:
> > 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI:
> > 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP:
> > ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [
> > 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12:
> > 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14:
> > ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:
> > 00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
> > knlGS:0000000000000000
> > [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
> > [ 5672.384673]  <IRQ>
> > [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
> >
> > Does that look related?
> >
> > >
> > > We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> > > I will keep you the thread posted as soon as we have something interesting.
> > >
> > > Sagar
> > >
> > > -----Original Message-----
> > > From: James Hilliard <james.hilliard1@gmail.com>
> > > Sent: Monday, November 14, 2022 12:13 AM
> > > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
> > > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
> > > <Don.Brace@microchip.com>; Tom White - C33503
> > > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel
> > > Mailing List <linux-kernel@vger.kernel.org>
> > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > > constantly resets under high io load
> > >
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > >
> > > On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> > > >
> > > > Hi James and Konstantin,
> > > >
> > > > *Limiting the audience to avoid spamming*
> > > >
> > > > Sorry for delayed response as I was on vacation.
> > > > This one got missed somehow as someone else was looking into this and is no longer with the company.
> > > >
> > > > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > > > I will get back to you with some more questions or the confirmation in a day or two max.
> > >
> > > Did this ever get looked at?
> > >
> > > As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
> > >
> > > Vendor aacraid release with this patch merged:
> > > https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-
> > > 60
> > > 001.tgz
> > >
> > > >
> > > >
> > > > Thanks for your patience.
> > > > Sagar
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: James Hilliard <james.hilliard1@gmail.com>
> > > > Sent: Thursday, October 27, 2022 1:40 AM
> > > > To: Martin K. Petersen <martin.petersen@oracle.com>
> > > > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian
> > > > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org;
> > > > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar
> > > > -
> > > > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List
> > > > <linux-kernel@vger.kernel.org>; Don Brace - C33706
> > > > <Don.Brace@microchip.com>
> > > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > > > constantly resets under high io load
> > > >
> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > > know the content is safe
> > > >
> > > > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > > > >
> > > > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > > > > > <martin.petersen@oracle.com> wrote:
> > > > > >>
> > > > > >>
> > > > > >> Christian,
> > > > > >>
> > > > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017
> > > > > >>> should be repaired with Konstantin Khorenko (1):
> > > > > >>>
> > > > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for
> > > > > >>> Series-6
> > > > > >>
> > > > > >> It would be great to get this patch resubmitted by Konstantin
> > > > > >> and acked by Microchip.
> > > >
> > > > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> > > >
> > > > > >
> > > > > > Does the patch need to be rebased?
> > > > >
> > > > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > > > >
> > > > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > > > v3 changes:
> > > > > >   * introduced another wrapper to check for devices except for Series 6
> > > > > >     controllers upon request from Sagar Biradar (Microchip)
> > > > >
> > > > > Well, back in the year 2019 i've created a bug in RedHat
> > > > > bugzilla
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > > > (the bug is private, this is default for Redhat bugs)
> > > > >
> > > > > In this bug Sagar Biradar (with the email @microchip.com)
> > > > > suggested me to rework the patch - i've done that and sent the v3.
> > > > >
> > > > > And nothing happened after that, but in a ~year (2020-06-19) the
> > > > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > > > >
> > > > > i suppose S6 is so old that RedHat just does not have customers
> > > > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > > > >
> > > > > Sorry, i was unable to get a final ack from Microchip, i've
> > > > > written direct emails to the addresses which is found in the
> > > > > internet, tried to connect via linkedin, no luck.
> > > > >
> > > > > --
> > > > > Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
  2022-12-16 20:44                                           ` Sagar.Biradar
  2022-12-20  1:12                                             ` James Hilliard
@ 2022-12-20 19:44                                             ` Konstantin Khorenko
  1 sibling, 0 replies; 26+ messages in thread
From: Konstantin Khorenko @ 2022-12-20 19:44 UTC (permalink / raw)
  To: Sagar.Biradar, james.hilliard1
  Cc: martin.petersen, christian, aacraid, Don.Brace, Tom.White,
	linux-scsi, linux-kernel, Gilbert.Wu

On 16.12.2022 21:44, Sagar.Biradar@microchip.com wrote:
> Hi James / Konstantin,

<skipped>

> Konstantin,
> Is it possible for you or someone you know to test on your original test bed with the "msi" set to "1", and post the results?

Hi Sagar,

thank you for looking into this.
i'm very sorry, in my case that was an customer complain for a Node in production and it was long long 
ago, unfortunately we definitely won't be able to test anything nowadays.

--
Best regards,

Konstantin Khorenko
Virtuozzo Linux Kernel Team

> We are parallelly working on additional tests locally.
> Please write to me if you need more information
> 
> 
> Thanks in advance
> Sagar
> 
> 
> -----Original Message-----
> From: Sagar.Biradar@microchip.com <Sagar.Biradar@microchip.com>
> Sent: Tuesday, December 6, 2022 11:30 AM
> To: james.hilliard1@gmail.com
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi James,
> We were in the process of finding the related information and we have finally found some details.
> I am reviewing that as I write this email.
> I will get back to you once I review and sort that information with more details.
> 
> Thanks
> Sagar
> 
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Sunday, December 4, 2022 5:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
>>
>> Hi James,
>> Thanks for your response.
>> This issue seems to be slightly different and may have been originating from the drive itself (not too sure).
> 
> Yeah, the drive was having hardware issues, although it does sound like a potential error condition that's not being correctly handled by aacraid.
> 
>>
>> The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
>> We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.
> 
> Hmm, are there any available details on this "int-x missing" issue, I couldn't find any public details/reports relating to that.
> 
> Is there a list of CPU's known to be affected?
> 
> Does it occur in the vendor aacraid release that has this patch merged?
> 
>>
>>
>>
>> -----Original Message-----
>> From: James Hilliard <james.hilliard1@gmail.com>
>> Sent: Thursday, November 17, 2022 3:26 AM
>> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
>> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
>> christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
>> <Don.Brace@microchip.com>; Tom White - C33503
>> <Tom.White@microchip.com>; linux-scsi@vger.kernel.org;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
>> constantly resets under high io load
>>
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>> the content is safe
>>
>> On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
>>>
>>> Hi James,
>>> I have looked into the patch thoroughly.
>>> We suspect this change might expose an old legacy interrupt issue on some processors.
>>
>> I did see this error once with this patch when a drive was having issues:
>> [ 4306.357531] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 4335.030025] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 4335.030111] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 4335.030172] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
>> [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
>> (14.0 TB/12.7 TiB)
>> [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
>> [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
>>       5.15.64-1-pve #1
>> [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
>> 05/21/2021
>> [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
>> [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
>> 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
>> 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00
>> 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [
>> 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX:
>> 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI:
>> 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP:
>> ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [
>> 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12:
>> 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14:
>> ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:
>> 00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
>> knlGS:0000000000000000
>> [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
>> [ 5672.384673]  <IRQ>
>> [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
>>                 aacraid: Outstanding commands on (0,1,41,0):
>> [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
>>
>> Does that look related?
>>
>>>
>>> We are currently debugging and digging further details to be able to explain it in much detailed fashion.
>>> I will keep you the thread posted as soon as we have something interesting.
>>>
>>> Sagar
>>>
>>> -----Original Message-----
>>> From: James Hilliard <james.hilliard1@gmail.com>
>>> Sent: Monday, November 14, 2022 12:13 AM
>>> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
>>> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
>>> christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
>>> <Don.Brace@microchip.com>; Tom White - C33503
>>> <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel
>>> Mailing List <linux-kernel@vger.kernel.org>
>>> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
>>> constantly resets under high io load
>>>
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>> know the content is safe
>>>
>>> On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
>>>>
>>>> Hi James and Konstantin,
>>>>
>>>> *Limiting the audience to avoid spamming*
>>>>
>>>> Sorry for delayed response as I was on vacation.
>>>> This one got missed somehow as someone else was looking into this and is no longer with the company.
>>>>
>>>> I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
>>>> I will get back to you with some more questions or the confirmation in a day or two max.
>>>
>>> Did this ever get looked at?
>>>
>>> As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
>>>
>>> Vendor aacraid release with this patch merged:
>>> https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-
>>> 60
>>> 001.tgz
>>>
>>>>
>>>>
>>>> Thanks for your patience.
>>>> Sagar
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: James Hilliard <james.hilliard1@gmail.com>
>>>> Sent: Thursday, October 27, 2022 1:40 AM
>>>> To: Martin K. Petersen <martin.petersen@oracle.com>
>>>> Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian
>>>> Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org;
>>>> Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar
>>>> -
>>>> C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List
>>>> <linux-kernel@vger.kernel.org>; Don Brace - C33706
>>>> <Don.Brace@microchip.com>
>>>> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
>>>> constantly resets under high io load
>>>>
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>>> know the content is safe
>>>>
>>>> On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
>>>>>
>>>>> On 10.10.2022 14:31, James Hilliard wrote:
>>>>>> On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
>>>>>> <martin.petersen@oracle.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Christian,
>>>>>>>
>>>>>>>> The faulty patch (Commit: 395e5df79a9588abf) from 2017
>>>>>>>> should be repaired with Konstantin Khorenko (1):
>>>>>>>>
>>>>>>>>     scsi: aacraid: resurrect correct arc ctrl checks for
>>>>>>>> Series-6
>>>>>>>
>>>>>>> It would be great to get this patch resubmitted by Konstantin
>>>>>>> and acked by Microchip.
>>>>
>>>> Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
>>>>
>>>>>>
>>>>>> Does the patch need to be rebased?
>>>>>
>>>>> James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
>>>>>
>>>>>> Based on this it looks like someone at microchip may have already reviewed:
>>>>>> v3 changes:
>>>>>>    * introduced another wrapper to check for devices except for Series 6
>>>>>>      controllers upon request from Sagar Biradar (Microchip)
>>>>>
>>>>> Well, back in the year 2019 i've created a bug in RedHat
>>>>> bugzilla
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1724077
>>>>> (the bug is private, this is default for Redhat bugs)
>>>>>
>>>>> In this bug Sagar Biradar (with the email @microchip.com)
>>>>> suggested me to rework the patch - i've done that and sent the v3.
>>>>>
>>>>> And nothing happened after that, but in a ~year (2020-06-19) the
>>>>> bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
>>>>>
>>>>> i suppose S6 is so old that RedHat just does not have customers
>>>>> using it and Microchip company itself is also not that interested in handling so old hardware issues.
>>>>>
>>>>> Sorry, i was unable to get a final ack from Microchip, i've
>>>>> written direct emails to the addresses which is found in the
>>>>> internet, tried to connect via linkedin, no luck.
>>>>>
>>>>> --
>>>>> Konstantin Khorenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-12-20 19:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27 16:14 [PATCH 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-06-27 16:14 ` [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
2019-07-07 10:09   ` Andrey Jr. Melnikov
2019-07-07 23:49     ` Finn Thain
2019-07-10  9:24       ` Konstantin Khorenko
2019-07-10  9:31         ` [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-07-10  9:31           ` [PATCH v2 1/2] Revert "scsi: aacraid: Remove reference to Series-9" Konstantin Khorenko
2019-07-10  9:31           ` [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only) Konstantin Khorenko
2019-07-12  1:30             ` Martin K. Petersen
2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-08-19 16:35                 ` [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
2019-08-29 21:52                 ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Martin K. Petersen
2021-05-06 22:22                 ` James Hilliard
     [not found]                   ` <ffdb2223-eed3-75b4-a003-4e4c96b49947@grossegger.com>
2022-02-23  2:41                     ` Martin K. Petersen
2022-10-10 12:31                       ` James Hilliard
2022-10-19 18:00                         ` Konstantin Khorenko
2022-10-26 20:10                           ` James Hilliard
     [not found]                             ` <BYAPR11MB36066925274C38555F20FB17FA339@BYAPR11MB3606.namprd11.prod.outlook.com>
2022-11-13 18:42                               ` James Hilliard
2022-11-15 14:05                                 ` Sagar.Biradar
2022-11-16 21:55                                   ` James Hilliard
2022-11-18  3:36                                     ` Sagar.Biradar
2022-12-03 23:55                                       ` James Hilliard
2022-12-06  5:59                                         ` Sagar.Biradar
2022-12-16 20:44                                           ` Sagar.Biradar
2022-12-20  1:12                                             ` James Hilliard
2022-12-20 19:44                                             ` Konstantin Khorenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).