linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Hilliard <james.hilliard1@gmail.com>
To: Sagar.Biradar@microchip.com
Cc: martin.petersen@oracle.com, khorenko@virtuozzo.com,
	christian@grossegger.com, aacraid@microsemi.com,
	Don.Brace@microchip.com, Tom.White@microchip.com,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Gilbert.Wu@microchip.com
Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
Date: Mon, 19 Dec 2022 18:12:09 -0700	[thread overview]
Message-ID: <CADvTj4qJZH5PoFJRKVF9zfQZAG-GOt2QHC7fDGiLPzo+iOX0cw@mail.gmail.com> (raw)
In-Reply-To: <BYAPR11MB3606E15393A4C11CCFAF9C53FAE69@BYAPR11MB3606.namprd11.prod.outlook.com>

On Fri, Dec 16, 2022 at 1:44 PM <Sagar.Biradar@microchip.com> wrote:
>
> Hi James / Konstantin,
> Here are the details that we have compiled so far . .
> I will just repost the problem definition and the concerns discussed so far (to avoid back and forth)...
>
> Issue : Series 6 Patch [regression] aacraid: Host adapter constantly aborts under load (https://bugzilla.redhat.com/show_bug.cgi?id=1724077)
>
> Synopsis: running mkfs.ext4 on different disks on the same controller in parallel. (Nothing seems to break, appears to always recover, but there are a lot of timeouts.)
> [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
> [  759.515013] aacraid 0000:03:00.0: Issuing IOP reset
> [  850.296705] aacraid 0000:03:00.0: IOP reset succeeded
> * with kernel 3.10.0-862.20.2.el7.x86_64 - PASS
> * with kernel 3.10.0-957.21.3.el7.x86_64 - FAIL
>
> Konstantin’s patch (https://lkml.org/lkml/2019/8/19/758) : upon testing the patch on the Virtuozzo kernel, it was found to be working fine, and the same issue was observed on Ubuntu later.
> But MCHP knows this patch/change will have issues with Xeon V2 interrupts, adding this change into the tree can harm the customers who use this processor. (CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2))
> However, the patch may work fine on Xeon V3/V4 and later processors.
>
> Adaptec ASK Article references our concern : https://ask.adaptec.com/app/answers/detail/a_id/17400/kw/msi
> Though the article lists appears like a "VMware" specific - the issue is independent of the Operating system.
> We have discovered a conflict between the Series 6 and 6E RAID controllers, VMware ESXi 5.5 and Intel Xeon V2 processors that is caused by incorrect interrupt handling.
> The system is using the legacy interrupt handling but needs to be switched to MSI (Message Signaled Interrupts) instead.
> This issue caused by switching to the legacy mode occurs on CPU Intel Xeon E5-2609/2630/2650 v2 ( E5-26XX V2).
> * Note: Xeon V2 is “Ivy Bridge”
>
> Workaround: The proposed solution would be to let the driver use the MSI mechanism with the aacraid driver parameter "msi" set to 1 (“msi=1") .  ("echo 1 > /sys/module/aacraid/parameters/msi")

Hmm, so this commit indicates that series 6 raid cards should be always using
MSI interrupts regardless of that msi param:
https://github.com/torvalds/linux/commit/9022d375bd22869ba3e5ad3635f00427cfb934fc

However it appears that the aac_msi check wasn't removed here, maybe it
should have been?:
https://github.com/torvalds/linux/blob/v6.1/drivers/scsi/aacraid/rx.c#L647

>
> Konstantin,
> Is it possible for you or someone you know to test on your original test bed with the "msi" set to "1", and post the results?
> We are parallelly working on additional tests locally.
> Please write to me if you need more information
>
>
> Thanks in advance
> Sagar
>
>
> -----Original Message-----
> From: Sagar.Biradar@microchip.com <Sagar.Biradar@microchip.com>
> Sent: Tuesday, December 6, 2022 11:30 AM
> To: james.hilliard1@gmail.com
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> Hi James,
> We were in the process of finding the related information and we have finally found some details.
> I am reviewing that as I write this email.
> I will get back to you once I review and sort that information with more details.
>
> Thanks
> Sagar
>
> -----Original Message-----
> From: James Hilliard <james.hilliard1@gmail.com>
> Sent: Sunday, December 4, 2022 5:26 AM
> To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com; christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503 <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Thu, Nov 17, 2022 at 11:36 PM <Sagar.Biradar@microchip.com> wrote:
> >
> > Hi James,
> > Thanks for your response.
> > This issue seems to be slightly different and may have been originating from the drive itself (not too sure).
>
> Yeah, the drive was having hardware issues, although it does sound like a potential error condition that's not being correctly handled by aacraid.
>
> >
> > The original issue I was talking about would still occur with the missing legacy interrupt on certain processors.
> > We are still actively looking into the old "int-x missing" issue that we suspect might possibly originate from the patch.
>
> Hmm, are there any available details on this "int-x missing" issue, I couldn't find any public details/reports relating to that.
>
> Is there a list of CPU's known to be affected?
>
> Does it occur in the vendor aacraid release that has this patch merged?
>
> >
> >
> >
> > -----Original Message-----
> > From: James Hilliard <james.hilliard1@gmail.com>
> > Sent: Thursday, November 17, 2022 3:26 AM
> > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
> > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
> > <Don.Brace@microchip.com>; Tom White - C33503
> > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > constantly resets under high io load
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > On Tue, Nov 15, 2022 at 10:05 AM <Sagar.Biradar@microchip.com> wrote:
> > >
> > > Hi James,
> > > I have looked into the patch thoroughly.
> > > We suspect this change might expose an old legacy interrupt issue on some processors.
> >
> > I did see this error once with this patch when a drive was having issues:
> > [ 4306.357531] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030025] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030111] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.030172] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 4335.189886] aacraid: Host bus reset request. SCSI hang ?
> > [ 4335.189951] aacraid 0000:81:00.0: outstanding cmd: midlevel-0 [ 4335.189989] aacraid 0000:81:00.0: outstanding cmd: lowlevel-0 [ 4335.190101] aacraid 0000:81:00.0: outstanding cmd: error handler-3 [ 4335.190141] aacraid 0000:81:00.0: outstanding cmd: firmware-0 [ 4335.190177] aacraid 0000:81:00.0: outstanding cmd: kernel-0 [ 4335.274070] aacraid 0000:81:00.0: Controller reset type is 3 [ 4335.274142] aacraid 0000:81:00.0: Issuing IOP reset [ 4365.862127] aacraid 0000:81:00.0: IOP reset succeeded [ 4365.895079] aacraid: Comm Interface type2 enabled [ 4374.938119] aacraid 0000:81:00.0: Scheduling bus rescan [ 4387.022913] sd 0:1:41:0: [sdi] 27344764928 512-byte logical blocks:
> > (14.0 TB/12.7 TiB)
> > [ 4387.022988] sd 0:1:41:0: [sdi] 4096-byte physical blocks [ 5643.714301] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.349423] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 5672.351532] #PF: supervisor read access in kernel mode [ 5672.353262] #PF: error_code(0x0000) - not-present page [ 5672.354860] PGD 8000007ad6ac7067 P4D 8000007ad6ac7067 PUD 7af0892067 PMD 0 [ 5672.356444] Oops: 0000 [#1] SMP PTI
> > [ 5672.358075] CPU: 9 PID: 644201 Comm: cc1plus Tainted: P           O
> >      5.15.64-1-pve #1
> > [ 5672.359749] Hardware name: Supermicro Super Server/X10DRC, BIOS 3.4
> > 05/21/2021
> > [ 5672.361465] RIP: 0010:dma_direct_unmap_sg+0x49/0x1a0
> > [ 5672.363223] Code: ec 20 89 4d d4 4c 89 45 c8 85 d2 0f 8e bb 00 00
> > 00 49 89 fe 49 89 f7 89 d3 45 31 ed 4c 8b 05 ae fd b0 01 49 8b be 60
> > 02 00 00 <45> 8b 4f 18 49 8b 77 10 49 f7 d0 48 85 ff 0f 84 06 01 00 00
> > 4c 8b [ 5672.367024] RSP: 0000:ffffa4ff58c7cde0 EFLAGS: 00010046 [
> > 5672.369020] RAX: 0000000000000000 RBX: 0000000000000003 RCX:
> > 0000000000000001 [ 5672.371073] RDX: 0000000000000003 RSI:
> > 0000000000000000 RDI: 0000000000000000 [ 5672.373007] RBP:
> > ffffa4ff58c7ce28 R08: 0000000000000000 R09: 0000000000000001 [
> > 5672.374795] R10: 0000000000000000 R11: ffffa4ff58c7cff8 R12:
> > 0000000000000000 [ 5672.376418] R13: 0000000000000000 R14:
> > ffff88968e1ec0d0 R15: 0000000000000000 [ 5672.378136] FS:
> > 00007ff103d25ac0(0000) GS:ffff89547fac0000(0000)
> > knlGS:0000000000000000
> > [ 5672.379760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5672.381402] CR2: 0000000000000018 CR3: 0000007ae90cc004 CR4: 00000000001706e0 [ 5672.383023] Call Trace:
> > [ 5672.384673]  <IRQ>
> > [ 5672.386282]  ? task_tick_fair+0x88/0x530 [ 5672.386469] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.387921]  dma_unmap_sg_attrs+0x32/0x50 [ 5672.391431] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.393273]  scsi_dma_unmap+0x3b/0x50 [ 5672.397079] aacraid: Host adapter abort request.
> >                aacraid: Outstanding commands on (0,1,41,0):
> > [ 5672.398180]  aac_srb_callback+0x88/0x3c0 [aacraid]
> >
> > Does that look related?
> >
> > >
> > > We are currently debugging and digging further details to be able to explain it in much detailed fashion.
> > > I will keep you the thread posted as soon as we have something interesting.
> > >
> > > Sagar
> > >
> > > -----Original Message-----
> > > From: James Hilliard <james.hilliard1@gmail.com>
> > > Sent: Monday, November 14, 2022 12:13 AM
> > > To: Sagar Biradar - C34249 <Sagar.Biradar@microchip.com>
> > > Cc: martin.petersen@oracle.com; khorenko@virtuozzo.com;
> > > christian@grossegger.com; aacraid@microsemi.com; Don Brace - C33706
> > > <Don.Brace@microchip.com>; Tom White - C33503
> > > <Tom.White@microchip.com>; linux-scsi@vger.kernel.org; Linux Kernel
> > > Mailing List <linux-kernel@vger.kernel.org>
> > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > > constantly resets under high io load
> > >
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > >
> > > On Thu, Oct 27, 2022 at 1:17 PM <Sagar.Biradar@microchip.com> wrote:
> > > >
> > > > Hi James and Konstantin,
> > > >
> > > > *Limiting the audience to avoid spamming*
> > > >
> > > > Sorry for delayed response as I was on vacation.
> > > > This one got missed somehow as someone else was looking into this and is no longer with the company.
> > > >
> > > > I will look into this, meanwhile I wanted to check if you (or someone else you know) had a chance to test this thoroughly with the latest kernel?
> > > > I will get back to you with some more questions or the confirmation in a day or two max.
> > >
> > > Did this ever get looked at?
> > >
> > > As this exact patch was merged into the vendor aacraid a while ago I'm not sure why it wouldn't be good to merge to mainline as well.
> > >
> > > Vendor aacraid release with this patch merged:
> > > https://download.adaptec.com/raid/aac/linux/aacraid-linux-src-1.2.1-
> > > 60
> > > 001.tgz
> > >
> > > >
> > > >
> > > > Thanks for your patience.
> > > > Sagar
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: James Hilliard <james.hilliard1@gmail.com>
> > > > Sent: Thursday, October 27, 2022 1:40 AM
> > > > To: Martin K. Petersen <martin.petersen@oracle.com>
> > > > Cc: Konstantin Khorenko <khorenko@virtuozzo.com>; Christian
> > > > Großegger <christian@grossegger.com>; linux-scsi@vger.kernel.org;
> > > > Adaptec OEM Raid Solutions <aacraid@microsemi.com>; Sagar Biradar
> > > > -
> > > > C34249 <Sagar.Biradar@microchip.com>; Linux Kernel Mailing List
> > > > <linux-kernel@vger.kernel.org>; Don Brace - C33706
> > > > <Don.Brace@microchip.com>
> > > > Subject: Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405
> > > > constantly resets under high io load
> > > >
> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > > know the content is safe
> > > >
> > > > On Wed, Oct 19, 2022 at 2:03 PM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > > > >
> > > > > On 10.10.2022 14:31, James Hilliard wrote:
> > > > > > On Tue, Feb 22, 2022 at 10:41 PM Martin K. Petersen
> > > > > > <martin.petersen@oracle.com> wrote:
> > > > > >>
> > > > > >>
> > > > > >> Christian,
> > > > > >>
> > > > > >>> The faulty patch (Commit: 395e5df79a9588abf) from 2017
> > > > > >>> should be repaired with Konstantin Khorenko (1):
> > > > > >>>
> > > > > >>>    scsi: aacraid: resurrect correct arc ctrl checks for
> > > > > >>> Series-6
> > > > > >>
> > > > > >> It would be great to get this patch resubmitted by Konstantin
> > > > > >> and acked by Microchip.
> > > >
> > > > Can we merge this as is since microchip does not appear to be maintaining this driver any more or responding?
> > > >
> > > > > >
> > > > > > Does the patch need to be rebased?
> > > > >
> > > > > James, i have just checked - the old patch (v3) applies cleanly onto latest master branch.
> > > > >
> > > > > > Based on this it looks like someone at microchip may have already reviewed:
> > > > > > v3 changes:
> > > > > >   * introduced another wrapper to check for devices except for Series 6
> > > > > >     controllers upon request from Sagar Biradar (Microchip)
> > > > >
> > > > > Well, back in the year 2019 i've created a bug in RedHat
> > > > > bugzilla
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1724077
> > > > > (the bug is private, this is default for Redhat bugs)
> > > > >
> > > > > In this bug Sagar Biradar (with the email @microchip.com)
> > > > > suggested me to rework the patch - i've done that and sent the v3.
> > > > >
> > > > > And nothing happened after that, but in a ~year (2020-06-19) the
> > > > > bug was closed with the resolution NOTABUG and a comment that S6 users will find the patch useful.
> > > > >
> > > > > i suppose S6 is so old that RedHat just does not have customers
> > > > > using it and Microchip company itself is also not that interested in handling so old hardware issues.
> > > > >
> > > > > Sorry, i was unable to get a final ack from Microchip, i've
> > > > > written direct emails to the addresses which is found in the
> > > > > internet, tried to connect via linkedin, no luck.
> > > > >
> > > > > --
> > > > > Konstantin Khorenko

  reply	other threads:[~2022-12-20  1:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27 16:14 [PATCH 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-06-27 16:14 ` [PATCH 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
2019-07-07 10:09   ` Andrey Jr. Melnikov
2019-07-07 23:49     ` Finn Thain
2019-07-10  9:24       ` Konstantin Khorenko
2019-07-10  9:31         ` [PATCH v2 0/2] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-07-10  9:31           ` [PATCH v2 1/2] Revert "scsi: aacraid: Remove reference to Series-9" Konstantin Khorenko
2019-07-10  9:31           ` [PATCH v2 2/2] scsi: aacraid: Remove references to Series-9 (only) Konstantin Khorenko
2019-07-12  1:30             ` Martin K. Petersen
2019-08-19 16:35               ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Konstantin Khorenko
2019-08-19 16:35                 ` [PATCH v3 1/1] scsi: aacraid: resurrect correct arc ctrl checks for Series-6 Konstantin Khorenko
2019-08-29 21:52                 ` [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load Martin K. Petersen
2021-05-06 22:22                 ` James Hilliard
     [not found]                   ` <ffdb2223-eed3-75b4-a003-4e4c96b49947@grossegger.com>
2022-02-23  2:41                     ` Martin K. Petersen
2022-10-10 12:31                       ` James Hilliard
2022-10-19 18:00                         ` Konstantin Khorenko
2022-10-26 20:10                           ` James Hilliard
     [not found]                             ` <BYAPR11MB36066925274C38555F20FB17FA339@BYAPR11MB3606.namprd11.prod.outlook.com>
2022-11-13 18:42                               ` James Hilliard
2022-11-15 14:05                                 ` Sagar.Biradar
2022-11-16 21:55                                   ` James Hilliard
2022-11-18  3:36                                     ` Sagar.Biradar
2022-12-03 23:55                                       ` James Hilliard
2022-12-06  5:59                                         ` Sagar.Biradar
2022-12-16 20:44                                           ` Sagar.Biradar
2022-12-20  1:12                                             ` James Hilliard [this message]
2022-12-20 19:44                                             ` Konstantin Khorenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADvTj4qJZH5PoFJRKVF9zfQZAG-GOt2QHC7fDGiLPzo+iOX0cw@mail.gmail.com \
    --to=james.hilliard1@gmail.com \
    --cc=Don.Brace@microchip.com \
    --cc=Gilbert.Wu@microchip.com \
    --cc=Sagar.Biradar@microchip.com \
    --cc=Tom.White@microchip.com \
    --cc=aacraid@microsemi.com \
    --cc=christian@grossegger.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).