regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* scsi regression that after months is still not addressed and now bothering 6.1.y users, too
@ 2023-11-21  9:50 Thorsten Leemhuis
  2023-11-21  9:57 ` Thorsten Leemhuis
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2023-11-21  9:50 UTC (permalink / raw)
  To: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen,
	Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Linux kernel regressions list,
	Hannes Reinecke, scsi, LKML, Sasha Levin, Gilbert Wu, John Garry

* @SCSI maintainers: could you please look into below please?

* @Stable team: you might want to take a look as well and consider a
revert in 6.1.y (yes, I know, those are normally avoided, but here it
might make sense).

Hi everyone!

TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
hangs for a while) that was reported months ago already but is still not
fixed. Not only that, it apparently more and more users run into this
recently, as the culprit was recently integrated into 6.1.y; I wonder if
it would be best to revert it there, unless a fix for mainline comes
into reach soon.

Details:

Quite a few machines with Adaptec controllers seems to hang for a few
tens of seconds to a few minutes before things start to work normally
again for a while:
https://bugzilla.kernel.org/show_bug.cgi?id=217599

That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
commit despite a warning of mine to Sasha recently made it into 6.1.53
-- and that way apparently recently reached more users recently, as
quite a few joined that ticket.

The culprit is authored by Sagar Biradar who unless I missed something
never replied even once to the ticket or earlier mails about it. Lore
has no messages from him since early June.

Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but
that didn't work out (see the ticket for details). Since then things
look stalled again, which is, ehh, unfortunate when it comes to
regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21  9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis
@ 2023-11-21  9:57 ` Thorsten Leemhuis
  2023-11-21 11:30 ` John Garry
  2023-11-24 16:25 ` Greg KH
  2 siblings, 0 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2023-11-21  9:57 UTC (permalink / raw)
  To: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen,
	Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Linux kernel regressions list,
	Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry

On 21.11.23 10:50, Thorsten Leemhuis wrote:
> * @SCSI maintainers: could you please look into below please?
> 
> * @Stable team: you might want to take a look as well and consider a
> revert in 6.1.y (yes, I know, those are normally avoided, but here it
> might make sense).
> 
> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
> hangs for a while) that was reported months ago already but is still not
> fixed. Not only that, it apparently more and more users run into this
> recently, as the culprit was recently integrated into 6.1.y; I wonder if
> it would be best to revert it there, unless a fix for mainline comes
> into reach soon.
>
> Details:
> 
> Quite a few machines with Adaptec controllers seems to hang for a few
> tens of seconds to a few minutes before things start to work normally
> again for a while:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599

Quick follow up, only saw this now while posting something to the
ticket: according to one reporter the problem even causes data damage.
To quote:

'''
if you run fsck.ext4 on ext4 file system with buggy kernel it will
damage file system and its data

using buggy kernel BTRFS scrub also says that checksums are wrong
'''

Ciao, Thorsten

> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
> commit despite a warning of mine to Sasha recently made it into 6.1.53
> -- and that way apparently recently reached more users recently, as
> quite a few joined that ticket.
> 
> The culprit is authored by Sagar Biradar who unless I missed something
> never replied even once to the ticket or earlier mails about it. Lore
> has no messages from him since early June.
> 
> Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but
> that didn't work out (see the ticket for details). Since then things
> look stalled again, which is, ehh, unfortunate when it comes to
> regressions.
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21  9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis
  2023-11-21  9:57 ` Thorsten Leemhuis
@ 2023-11-21 11:30 ` John Garry
  2023-11-21 12:24   ` Linux regression tracking (Thorsten Leemhuis)
  2023-11-24 16:25 ` Greg KH
  2 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2023-11-21 11:30 UTC (permalink / raw)
  To: Thorsten Leemhuis, Greg KH, Sagar Biradar, James Bottomley,
	Martin K. Petersen, Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Linux kernel regressions list,
	Hannes Reinecke, scsi, LKML, Gilbert Wu

On 21/11/2023 09:50, Thorsten Leemhuis wrote:
> Quite a few machines with Adaptec controllers seems to hang for a few
> tens of seconds to a few minutes before things start to work normally
> again for a while:
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=217599__;!!ACWV5N9M2RV99hQ!L26RD0hu99l3f709EFnXU_V7OaB1jG4Hi7BjKvxRuhDWKFmjrgfksLuXA6eBrBCRtOT8JcRRUvzRsHbyEm41r7tL_pbDfw$  
> 
> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
> commit despite a warning of mine to Sasha recently made it into 6.1.53
> -- and that way apparently recently reached more users recently, as
> quite a few joined that ticket.

Is there a full kernel log for this hanging system?

I can only see snippets in the ticket.

And what does /sys/class/scsi_host/host*/nr_hw_queues show?

Thanks,
John



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21 11:30 ` John Garry
@ 2023-11-21 12:24   ` Linux regression tracking (Thorsten Leemhuis)
  2023-11-21 13:05     ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 12:24 UTC (permalink / raw)
  To: John Garry, Greg KH, Sagar Biradar, James Bottomley,
	Martin K. Petersen, Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Linux kernel regressions list,
	Hannes Reinecke, scsi, LKML, Gilbert Wu

On 21.11.23 12:30, John Garry wrote:
> On 21/11/2023 09:50, Thorsten Leemhuis wrote:
>> Quite a few machines with Adaptec controllers seems to hang for a few
>> tens of seconds to a few minutes before things start to work normally
>> again for a while:
>> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=217599__;!!ACWV5N9M2RV99hQ!L26RD0hu99l3f709EFnXU_V7OaB1jG4Hi7BjKvxRuhDWKFmjrgfksLuXA6eBrBCRtOT8JcRRUvzRsHbyEm41r7tL_pbDfw$ 
>> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
>> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
>> commit despite a warning of mine to Sasha recently made it into 6.1.53
>> -- and that way apparently recently reached more users recently, as
>> quite a few joined that ticket.
> 
> Is there a full kernel log for this hanging system?
> 
> I can only see snippets in the ticket.
> 
> And what does /sys/class/scsi_host/host*/nr_hw_queues show?

Sorry, I'm just the man-in-the-middle: you need to ask in the ticket, as
 the privacy policy for bugzilla.kernel.org does not allow to CC the
reporters from the ticket here without their consent.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21 12:24   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-11-21 13:05     ` James Bottomley
  2023-11-21 13:24       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2023-11-21 13:05 UTC (permalink / raw)
  To: Linux regressions mailing list, John Garry, Greg KH,
	Sagar Biradar, Martin K. Petersen, Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu

On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking (Thorsten
Leemhuis) wrote:
> On 21.11.23 12:30, John Garry wrote:
[...]
> > Is there a full kernel log for this hanging system?
> > 
> > I can only see snippets in the ticket.
> > 
> > And what does /sys/class/scsi_host/host*/nr_hw_queues show?
> 
> Sorry, I'm just the man-in-the-middle: you need to ask in the ticket,
> as  the privacy policy for bugzilla.kernel.org does not allow to CC
> the reporters from the ticket here without their consent.

How did you arrive at that conclusion?  Tickets for linux-scsi are
vectored to the list:

https://lore.kernel.org/linux-scsi/bug-217599-11613@https.bugzilla.kernel.org%2F/

So all the email addresses in the bugzilla are already archived on our
list.

James


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21 13:05     ` James Bottomley
@ 2023-11-21 13:24       ` Linux regression tracking (Thorsten Leemhuis)
  2023-11-21 13:31         ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 13:24 UTC (permalink / raw)
  To: James Bottomley, Linux regressions mailing list, John Garry,
	Greg KH, Sagar Biradar, Martin K. Petersen,
	Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu

On 21.11.23 14:05, James Bottomley wrote:
> On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking (Thorsten
> Leemhuis) wrote:
>> On 21.11.23 12:30, John Garry wrote:
> [...]
>>> Is there a full kernel log for this hanging system?
>>> I can only see snippets in the ticket.
>>> And what does /sys/class/scsi_host/host*/nr_hw_queues show?
>>
>> Sorry, I'm just the man-in-the-middle: you need to ask in the ticket,
>> as  the privacy policy for bugzilla.kernel.org does not allow to CC
>> the reporters from the ticket here without their consent.
> 
> How did you arrive at that conclusion?

To quote https://bugzilla.kernel.org/createaccount.cgi:
"""
Note that your email address will never be displayed to logged out
users. Only registered users will be able to see it.
"""

Not sure since when it's there. Maybe it was added due to EU GDPR?
Konstantin should know. But for me that's enough to not CC people. I
even heard from one well known kernel developer that his company got a
GDPR complaint because he had mentioning the reporters name and email
address in a Reported-by: tag.

Side note: bugbot afaics can solve the initial problem (e.g. interact
with reporters in bugzilla by mail without exposing their email
address). But to use bugbot one *afaik* still has to reassign a ticket
to a specific product and component in bugzilla. Some subsystem
maintainers don't want that, as that issues then does not show up in the
usual queries.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21 13:24       ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-11-21 13:31         ` James Bottomley
  0 siblings, 0 replies; 12+ messages in thread
From: James Bottomley @ 2023-11-21 13:31 UTC (permalink / raw)
  To: Linux regressions mailing list, John Garry, Greg KH,
	Sagar Biradar, Martin K. Petersen, Adaptec OEM Raid Solutions
  Cc: stable, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu

On Tue, 2023-11-21 at 14:24 +0100, Linux regression tracking (Thorsten
Leemhuis) wrote:
> On 21.11.23 14:05, James Bottomley wrote:
> > On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking
> > (Thorsten
> > Leemhuis) wrote:
> > > On 21.11.23 12:30, John Garry wrote:
> > [...]
> > > > Is there a full kernel log for this hanging system?
> > > > I can only see snippets in the ticket.
> > > > And what does /sys/class/scsi_host/host*/nr_hw_queues show?
> > > 
> > > Sorry, I'm just the man-in-the-middle: you need to ask in the
> > > ticket, as  the privacy policy for bugzilla.kernel.org does not
> > > allow to CC the reporters from the ticket here without their
> > > consent.
> > 
> > How did you arrive at that conclusion?
> 
> To quote https://bugzilla.kernel.org/createaccount.cgi:
> """
> Note that your email address will never be displayed to logged out
> users. Only registered users will be able to see it.
> """

OK, so someone needs to update that to reflect reality.

> Not sure since when it's there. Maybe it was added due to EU GDPR?
> Konstantin should know. But for me that's enough to not CC people. I
> even heard from one well known kernel developer that his company got
> a
> GDPR complaint because he had mentioning the reporters name and email
> address in a Reported-by: tag.
> 
> Side note: bugbot afaics can solve the initial problem (e.g. interact
> with reporters in bugzilla by mail without exposing their email
> address). But to use bugbot one *afaik* still has to reassign a
> ticket to a specific product and component in bugzilla. Some
> subsystem maintainers don't want that, as that issues then does not
> show up in the usual queries.

I'm not sure we need to solve a problem that doesn't exist. Switching
to email is a standard maintainer response:

https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
https://lore.kernel.org/all/20230314144145.07a3e680362eb77061fe6d0e@linux-foundation.org/
...

James


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-21  9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis
  2023-11-21  9:57 ` Thorsten Leemhuis
  2023-11-21 11:30 ` John Garry
@ 2023-11-24 16:25 ` Greg KH
  2023-11-24 22:44   ` Martin K. Petersen
  2023-11-25  7:10   ` Thorsten Leemhuis
  2 siblings, 2 replies; 12+ messages in thread
From: Greg KH @ 2023-11-24 16:25 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Sagar Biradar, James Bottomley, Martin K. Petersen,
	Adaptec OEM Raid Solutions, stable, Sasha Levin,
	Linux kernel regressions list, Hannes Reinecke, scsi, LKML,
	Gilbert Wu, John Garry

On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote:
> * @SCSI maintainers: could you please look into below please?
> 
> * @Stable team: you might want to take a look as well and consider a
> revert in 6.1.y (yes, I know, those are normally avoided, but here it
> might make sense).
> 
> Hi everyone!
> 
> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
> hangs for a while) that was reported months ago already but is still not
> fixed. Not only that, it apparently more and more users run into this
> recently, as the culprit was recently integrated into 6.1.y; I wonder if
> it would be best to revert it there, unless a fix for mainline comes
> into reach soon.
> 
> Details:
> 
> Quite a few machines with Adaptec controllers seems to hang for a few
> tens of seconds to a few minutes before things start to work normally
> again for a while:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> 
> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
> commit despite a warning of mine to Sasha recently made it into 6.1.53
> -- and that way apparently recently reached more users recently, as
> quite a few joined that ticket.
> 
> The culprit is authored by Sagar Biradar who unless I missed something
> never replied even once to the ticket or earlier mails about it. Lore
> has no messages from him since early June.
> 
> Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but
> that didn't work out (see the ticket for details). Since then things
> look stalled again, which is, ehh, unfortunate when it comes to
> regressions.

I am loath to revert a stable patch that has been there for so long as
any upgrade will just cause the same bug to show back up.  Why can't we
just revert it in Linus's tree now and I'll take that revert in the
stable trees as well?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-24 16:25 ` Greg KH
@ 2023-11-24 22:44   ` Martin K. Petersen
  2023-11-25  7:10   ` Thorsten Leemhuis
  1 sibling, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2023-11-24 22:44 UTC (permalink / raw)
  To: Greg KH
  Cc: Thorsten Leemhuis, Sagar Biradar, James Bottomley,
	Martin K. Petersen, Adaptec OEM Raid Solutions, stable,
	Sasha Levin, Linux kernel regressions list, Hannes Reinecke,
	scsi, LKML, Gilbert Wu, John Garry


Greg,

> I am loath to revert a stable patch that has been there for so long as
> any upgrade will just cause the same bug to show back up. Why can't we
> just revert it in Linus's tree now and I'll take that revert in the
> stable trees as well?

Hannes just posted another tentative patch. I'd prefer an incremental
fix if possible.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-24 16:25 ` Greg KH
  2023-11-24 22:44   ` Martin K. Petersen
@ 2023-11-25  7:10   ` Thorsten Leemhuis
  2023-12-29 20:13     ` Salvatore Bonaccorso
  1 sibling, 1 reply; 12+ messages in thread
From: Thorsten Leemhuis @ 2023-11-25  7:10 UTC (permalink / raw)
  To: Greg KH
  Cc: Sagar Biradar, James Bottomley, Martin K. Petersen,
	Adaptec OEM Raid Solutions, stable, Sasha Levin,
	Linux kernel regressions list, Hannes Reinecke, scsi, LKML,
	Gilbert Wu, John Garry

On 24.11.23 17:25, Greg KH wrote:
> On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote:
>> * @SCSI maintainers: could you please look into below please?
>>
>> * @Stable team: you might want to take a look as well and consider a
>> revert in 6.1.y (yes, I know, those are normally avoided, but here it
>> might make sense).
>>
>> Hi everyone!
>>
>> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
>> hangs for a while) that was reported months ago already but is still not
>> fixed. Not only that, it apparently more and more users run into this
>> recently, as the culprit was recently integrated into 6.1.y; I wonder if
>> it would be best to revert it there, unless a fix for mainline comes
>> into reach soon.
>>
>> Details:
>>
>> Quite a few machines with Adaptec controllers seems to hang for a few
>> tens of seconds to a few minutes before things start to work normally
>> again for a while:
>> https://bugzilla.kernel.org/show_bug.cgi?id=217599
>>
>> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
>> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
>> commit despite a warning of mine to Sasha recently made it into 6.1.53
>> -- and that way apparently recently reached more users recently, as
>> quite a few joined that ticket.
>[...]
> I am loath to revert a stable patch that has been there for so long as
> any upgrade will just cause the same bug to show back up. Why can't we
> just revert it in Linus's tree now and I'll take that revert in the
> stable trees as well?

FWIW, I know and in general agree with that strategy, that's why I
normally wouldn't have brought a stable-only revert up for
consideration. But this issue to me looked somewhat special and urgent
for two and a half reasons: (1) that backport apparently made a lot more
people suddenly hit the issue (2) there was also this data corruption
aspect one of the reporters mentioned (not sure if that is real and/or
if this might be just a 6.1.y thing). Furthermore for 6.1.y it was
recently confirmed that reverting the change fixes things, while we iirc
had no such confirmation for recent mainline kernels at that point. So
it looked like it would take a while to get this sorted out in mainline.
But it seems we finally might get closer to that now, so yeah, maybe
it's not worth a stable revert.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-11-25  7:10   ` Thorsten Leemhuis
@ 2023-12-29 20:13     ` Salvatore Bonaccorso
  2023-12-30 10:58       ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Salvatore Bonaccorso @ 2023-12-29 20:13 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen,
	Adaptec OEM Raid Solutions, stable, Sasha Levin,
	Linux kernel regressions list, Hannes Reinecke, scsi, LKML,
	Gilbert Wu, John Garry

Hi all,

On Sat, Nov 25, 2023 at 08:10:35AM +0100, Thorsten Leemhuis wrote:
> On 24.11.23 17:25, Greg KH wrote:
> > On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote:
> >> * @SCSI maintainers: could you please look into below please?
> >>
> >> * @Stable team: you might want to take a look as well and consider a
> >> revert in 6.1.y (yes, I know, those are normally avoided, but here it
> >> might make sense).
> >>
> >> Hi everyone!
> >>
> >> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
> >> hangs for a while) that was reported months ago already but is still not
> >> fixed. Not only that, it apparently more and more users run into this
> >> recently, as the culprit was recently integrated into 6.1.y; I wonder if
> >> it would be best to revert it there, unless a fix for mainline comes
> >> into reach soon.
> >>
> >> Details:
> >>
> >> Quite a few machines with Adaptec controllers seems to hang for a few
> >> tens of seconds to a few minutes before things start to work normally
> >> again for a while:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> >>
> >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
> >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
> >> commit despite a warning of mine to Sasha recently made it into 6.1.53
> >> -- and that way apparently recently reached more users recently, as
> >> quite a few joined that ticket.
> >[...]
> > I am loath to revert a stable patch that has been there for so long as
> > any upgrade will just cause the same bug to show back up. Why can't we
> > just revert it in Linus's tree now and I'll take that revert in the
> > stable trees as well?
> 
> FWIW, I know and in general agree with that strategy, that's why I
> normally wouldn't have brought a stable-only revert up for
> consideration. But this issue to me looked somewhat special and urgent
> for two and a half reasons: (1) that backport apparently made a lot more
> people suddenly hit the issue (2) there was also this data corruption
> aspect one of the reporters mentioned (not sure if that is real and/or
> if this might be just a 6.1.y thing). Furthermore for 6.1.y it was
> recently confirmed that reverting the change fixes things, while we iirc
> had no such confirmation for recent mainline kernels at that point. So
> it looked like it would take a while to get this sorted out in mainline.
> But it seems we finally might get closer to that now, so yeah, maybe
> it's not worth a stable revert.

If I'm not completely wrong, finally indeed the commit has been
reverted in mainline, with c5becf57dd56 ("Revert "scsi: aacraid: Reply
queue mapping to CPUs based on IRQ affinity"") .

This is what was mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=217599#c52

So should/can it be reverted it now as well on the 6.1.y stable series
(and the others up as needed?)

#regzbot link: https://bugs.debian.org/1059624
#regzbot fixed-by: c5becf57dd56

Thorsten, hope I got the above right.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too
  2023-12-29 20:13     ` Salvatore Bonaccorso
@ 2023-12-30 10:58       ` Greg KH
  0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2023-12-30 10:58 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Thorsten Leemhuis, Sagar Biradar, James Bottomley,
	Martin K. Petersen, Adaptec OEM Raid Solutions, stable,
	Sasha Levin, Linux kernel regressions list, Hannes Reinecke,
	scsi, LKML, Gilbert Wu, John Garry

On Fri, Dec 29, 2023 at 09:13:18PM +0100, Salvatore Bonaccorso wrote:
> Hi all,
> 
> On Sat, Nov 25, 2023 at 08:10:35AM +0100, Thorsten Leemhuis wrote:
> > On 24.11.23 17:25, Greg KH wrote:
> > > On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote:
> > >> * @SCSI maintainers: could you please look into below please?
> > >>
> > >> * @Stable team: you might want to take a look as well and consider a
> > >> revert in 6.1.y (yes, I know, those are normally avoided, but here it
> > >> might make sense).
> > >>
> > >> Hi everyone!
> > >>
> > >> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
> > >> hangs for a while) that was reported months ago already but is still not
> > >> fixed. Not only that, it apparently more and more users run into this
> > >> recently, as the culprit was recently integrated into 6.1.y; I wonder if
> > >> it would be best to revert it there, unless a fix for mainline comes
> > >> into reach soon.
> > >>
> > >> Details:
> > >>
> > >> Quite a few machines with Adaptec controllers seems to hang for a few
> > >> tens of seconds to a few minutes before things start to work normally
> > >> again for a while:
> > >> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> > >>
> > >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
> > >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
> > >> commit despite a warning of mine to Sasha recently made it into 6.1.53
> > >> -- and that way apparently recently reached more users recently, as
> > >> quite a few joined that ticket.
> > >[...]
> > > I am loath to revert a stable patch that has been there for so long as
> > > any upgrade will just cause the same bug to show back up. Why can't we
> > > just revert it in Linus's tree now and I'll take that revert in the
> > > stable trees as well?
> > 
> > FWIW, I know and in general agree with that strategy, that's why I
> > normally wouldn't have brought a stable-only revert up for
> > consideration. But this issue to me looked somewhat special and urgent
> > for two and a half reasons: (1) that backport apparently made a lot more
> > people suddenly hit the issue (2) there was also this data corruption
> > aspect one of the reporters mentioned (not sure if that is real and/or
> > if this might be just a 6.1.y thing). Furthermore for 6.1.y it was
> > recently confirmed that reverting the change fixes things, while we iirc
> > had no such confirmation for recent mainline kernels at that point. So
> > it looked like it would take a while to get this sorted out in mainline.
> > But it seems we finally might get closer to that now, so yeah, maybe
> > it's not worth a stable revert.
> 
> If I'm not completely wrong, finally indeed the commit has been
> reverted in mainline, with c5becf57dd56 ("Revert "scsi: aacraid: Reply
> queue mapping to CPUs based on IRQ affinity"") .
> 
> This is what was mentioned here:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599#c52
> 
> So should/can it be reverted it now as well on the 6.1.y stable series
> (and the others up as needed?)

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-12-30 10:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-21  9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis
2023-11-21  9:57 ` Thorsten Leemhuis
2023-11-21 11:30 ` John Garry
2023-11-21 12:24   ` Linux regression tracking (Thorsten Leemhuis)
2023-11-21 13:05     ` James Bottomley
2023-11-21 13:24       ` Linux regression tracking (Thorsten Leemhuis)
2023-11-21 13:31         ` James Bottomley
2023-11-24 16:25 ` Greg KH
2023-11-24 22:44   ` Martin K. Petersen
2023-11-25  7:10   ` Thorsten Leemhuis
2023-12-29 20:13     ` Salvatore Bonaccorso
2023-12-30 10:58       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).