linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Reported regressions for 4.7 as of Sunday, 2016-06-19
@ 2016-06-19 14:52 Thorsten Leemhuis
  2016-06-20 10:21 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Thorsten Leemhuis @ 2016-06-19 14:52 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List

Hi! Here is my second regression report for 4.7. It has 19 entries; 8 of
them are new; 8 regressions were fixed since the last report (those are 
not included in this report) and I dropped 2 which turned out to not be
regressions after all (at least that's what I think right now). 

FWIW, it's still a lot of work to generate this report (as expected). I'm still
thinking about a plan how to make the whole tracking process easier and more  
attractive for everyone, but it will take a few weeks before I come up with a
concrete plan.

HTH, CU, Thorsten

(¹) last weeks report was http://article.gmane.org/gmane.linux.kernel/2241805

P.S.: Please let me know if a regression is missing in the list; or if there is 
something on the list which shouldn't be there.
----

Description:    ath10k no longer authenticates and freezes system
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119151
Latest status:  http://thread.gmane.org/gmane.linux.kernel.wireless.general/152513/focus=152535
Date rep/stat:  2016-05-27 / 2016-06-02
Notes:          forgotten? poked bug report on Friday

Description:    Bad flicker on skylake HQD due to code in the 4.7 merge window
Report:         http://thread.gmane.org/gmane.linux.kernel/2230377
Latest status:  http://thread.gmane.org/gmane.linux.kernel/2230377/focus=92602
Date rep/stat:  2016-05-30 / 2016-06-18
Notes:          investigation ongoing

Description:    we noticed reaim.jobs_per_min -49.1% regression
Report:         http://thread.gmane.org/gmane.linux.kernel/2231025/
Latest status:  http://thread.gmane.org/gmane.linux.kernel/2231025/focus=2233571
Date rep/stat:  2016-05-31 / 2016-06-13
Notes:          wip? http://article.gmane.org/gmane.linux.kernel/2241911

Description:    NULL pointer dereference with BCM4350 wireless device
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119451
Latest status:  7.6.
Date rep/stat:  2016-06-01 / 2016-06-07
Notes:          poked bugzilla, likely fixed in mainline by https://git.kernel.org/torvalds/c/31143e2933

Description:    795ae7a0de: pixz.throughput -9.1% regression
Report:         http://thread.gmane.org/gmane.linux.kernel/2233056/
Latest status:  http://thread.gmane.org/gmane.linux.kernel/2233056/focus=2238208
Date rep/stat:  2016-06-02 / 2016-06-08
Notes:          @regression tracker: poke someone

Description:    RadeonSI get a huge performance dip with used with the nine state tracker
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119631
Latest status:  https://bugzilla.kernel.org/show_bug.cgi?id=119631#c12
Date rep/stat:  2016-06-04 / 2016-06-15
Notes:          investigation ongoing, waiting for reporter

Description:    5c0a85fad9: unixbench.score -6.3% regression
Report:         http://thread.gmane.org/gmane.linux.kernel/2235794
Latest status:  http://thread.gmane.org/gmane.linux.kernel.mm/153151/focus=153409
Date rep/stat:  2016-06-06 / 2016-06-17
Notes:          wip, revert discussed

Description:    System hang possibly due to brcmfmac regression
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119761
Latest status:  https://bugzilla.kernel.org/show_bug.cgi?id=119761#c1
Date rep/stat:  2016-06-07 / 2016-06-12
Notes:          might be fixed, waiting for clarification from reporter

Description:    Regression in kbuild: fix if_change and friends to consider argument 
Report:         http://thread.gmane.org/gmane.linux.kbuild.devel/14981/
Latest status:  http://thread.gmane.org/gmane.linux.kbuild.devel/14981/focus=15000
Date rep/stat:  2016-06-07 / 2016-06-09
Notes:          patch in linux-next

Description:    BUG: using smp_processor_id() in preemptible [00000000] code] when using a USB Mass Storage device
Report:         http://thread.gmane.org/gmane.linux.usb.general/143504
Latest status:  http://thread.gmane.org/gmane.linux.usb.general/143504/focus=153154 https://lkml.org/lkml/2016/6/15/397
Date rep/stat:  2016-06-09 / 2016-06-15
Notes:          investigation ongoing

Description:    Notebook Clevo N350DW i5-6500T freezes on shutdown (but reboots fine)
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119871
Latest status:  https://bugzilla.kernel.org/show_bug.cgi?id=119871#c8
Date rep/stat:  2016-06-09 / 2016-06-16
Notes:          reporter needs help to provide more details to debug the problem

Description:    BUG() in dmesg after loading nouveau module
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120591
Latest status:  https://bugzilla.kernel.org/show_bug.cgi?id=120591#c3
Date rep/stat:  2016-06-18 / 2016-06-19
Notes:          wip

Description:    BUG: unable to handle kernel NULL pointer dereference […] qla24xx_process_response_queue+0x49/0x4b0 [qla2xxx]
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120201
Latest status:  n/a
Date rep/stat:  2016-06-14 / n/a
Notes:          poked bugzilla, a bit unsure how to proceed

Description:    Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120481
Latest status:  n/a
Date rep/stat:  2016-06-16 / n/a
Notes:          real reason unknown

Description:    performance drop on SFC interface around 30 %
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120461
Latest status:  https://bugzilla.kernel.org/show_bug.cgi?id=120461#c9
Date rep/stat:  2016-06-17 / 2016-06-17
Notes:          wip

Description:    System hang when plug/un-plug USB 3.1 key via thunderbolt port on Dell XPS 13
Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120241
Latest status:  n/a
Date rep/stat:  2016-06-14 / n/a
Notes:          waiting for reporter

Description:    performance regression on Jetson TK1 since 4.7-rc1: moving windows under X would become unsufferably slow, and graphical performance under X in general is seriously degraded
Report:         http://thread.gmane.org/gmane.linux.ports.tegra/26983/focus=2245415
Latest status:  n/a
Date rep/stat:  2016-06-16 / n/a
Notes:          wip, fix available

Description:    lk 4.7 regression: EDAC, amd64_edac: Drop pci_register_driver() use
Report:         http://thread.gmane.org/gmane.linux.kernel/2245115/
Latest status:  http://thread.gmane.org/gmane.linux.kernel/2246008/focus=2246009
Date rep/stat:  2016-06-15 / 2016-06-16
Notes:          wip, fix available

Description:    regression in 8250 uart driver
Report:         http://thread.gmane.org/gmane.linux.kernel/2243130/focus=2243653
Latest status:  http://thread.gmane.org/gmane.linux.kernel/2243130/focus=2243653
Date rep/stat:  2016-06-14 / 2016-06-14
Notes:          wip, fix available

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-19 14:52 Reported regressions for 4.7 as of Sunday, 2016-06-19 Thorsten Leemhuis
@ 2016-06-20 10:21 ` Christoph Hellwig
  2016-06-21 11:11 ` Josh Boyer
  2016-06-22  6:36 ` Kalle Valo
  2 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-06-20 10:21 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel,
	linux-ext4, xfs

Another important one is the rename regression in XFS and ext4 that
I suspect is due the VFS changes in 4.7:

http://oss.sgi.com/pipermail/xfs/2016-June/049138.html

http://oss.sgi.com/pipermail/xfs/2016-June/049309.html

possibly related:

http://marc.info/?l=linux-kernel&m=146605889024559&w=2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-19 14:52 Reported regressions for 4.7 as of Sunday, 2016-06-19 Thorsten Leemhuis
  2016-06-20 10:21 ` Christoph Hellwig
@ 2016-06-21 11:11 ` Josh Boyer
  2016-06-21 20:40   ` Linus Torvalds
  2016-06-22  6:36 ` Kalle Valo
  2 siblings, 1 reply; 20+ messages in thread
From: Josh Boyer @ 2016-06-21 11:11 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Sun, Jun 19, 2016 at 10:52 AM, Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
> Description:    BUG: unable to handle kernel NULL pointer dereference […] qla24xx_process_response_queue+0x49/0x4b0 [qla2xxx]
> Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120201
> Latest status:  n/a
> Date rep/stat:  2016-06-14 / n/a
> Notes:          poked bugzilla, a bit unsure how to proceed

We have two bug reports against 4.5.5 - 4.5.7 of this as well.  So
whatever commit caused this in 4.7 seems to have been pulled into the
4.5.y stable tree.  I suspect it is in the 4.6.y stable tree as well,
but we don't have that pushed out yet.

https://bugzilla.redhat.com/show_bug.cgi?id=1348342
https://bugzilla.redhat.com/show_bug.cgi?id=1346753

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-21 11:11 ` Josh Boyer
@ 2016-06-21 20:40   ` Linus Torvalds
  2016-06-22  0:55     ` Josh Boyer
  2016-06-22  1:25     ` Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2016-06-21 20:40 UTC (permalink / raw)
  To: Josh Boyer, Martin K. Petersen, Johannes Thumshirn
  Cc: Thorsten Leemhuis, Linux Kernel Mailing List

On Tue, Jun 21, 2016 at 4:11 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Sun, Jun 19, 2016 at 10:52 AM, Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
>> Description:    BUG: unable to handle kernel NULL pointer dereference […] qla24xx_process_response_queue+0x49/0x4b0 [qla2xxx]
>> Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120201
>> Latest status:  n/a
>> Date rep/stat:  2016-06-14 / n/a
>> Notes:          poked bugzilla, a bit unsure how to proceed
>
> We have two bug reports against 4.5.5 - 4.5.7 of this as well.  So
> whatever commit caused this in 4.7 seems to have been pulled into the
> 4.5.y stable tree.  I suspect it is in the 4.6.y stable tree as well,
> but we don't have that pushed out yet.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
> https://bugzilla.redhat.com/show_bug.cgi?id=1346753

That seems pretty unambiguous - 4.5.5 is fine, and 4.5.6 is bad. So
unless it's specific to whatever patches RH is carrying around, we
should be able to just look at the scsi-related stable tree patches in
that region. That seems simple enough.

But theres' really only two (trivial) patches in there:

 - scsi: Add intermediate STARGET_REMOVE state to scsi_target_state
   (f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream)

 - Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"
   (305c2e71b3d733ec065cb716c76af7d554bd5571 upstream)

as far as I can tell. And neither of them looks very likely, but what
do I know. Adding Martin Petersen and Johannes Thumshirn to the
participants just in case they go "Ahh.."

               Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-21 20:40   ` Linus Torvalds
@ 2016-06-22  0:55     ` Josh Boyer
  2016-06-22  1:25     ` Martin K. Petersen
  1 sibling, 0 replies; 20+ messages in thread
From: Josh Boyer @ 2016-06-22  0:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin K. Petersen, Johannes Thumshirn, Thorsten Leemhuis,
	Linux Kernel Mailing List

On Tue, Jun 21, 2016 at 4:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Jun 21, 2016 at 4:11 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> On Sun, Jun 19, 2016 at 10:52 AM, Thorsten Leemhuis
>> <regressions@leemhuis.info> wrote:
>>> Description:    BUG: unable to handle kernel NULL pointer dereference […] qla24xx_process_response_queue+0x49/0x4b0 [qla2xxx]
>>> Report:         https://bugzilla.kernel.org/show_bug.cgi?id=120201
>>> Latest status:  n/a
>>> Date rep/stat:  2016-06-14 / n/a
>>> Notes:          poked bugzilla, a bit unsure how to proceed
>>
>> We have two bug reports against 4.5.5 - 4.5.7 of this as well.  So
>> whatever commit caused this in 4.7 seems to have been pulled into the
>> 4.5.y stable tree.  I suspect it is in the 4.6.y stable tree as well,
>> but we don't have that pushed out yet.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
>> https://bugzilla.redhat.com/show_bug.cgi?id=1346753
>
> That seems pretty unambiguous - 4.5.5 is fine, and 4.5.6 is bad. So
> unless it's specific to whatever patches RH is carrying around, we
> should be able to just look at the scsi-related stable tree patches in
> that region. That seems simple enough.

I thought the same.  We're only carrying one very very old scsi patch
to revalidate a pointer.  That shouldn't even been involved in this
path and upstream 4.7-rcX is hitting the same issue anyway.  Thus far
we've only seen reports for qla2xxx devices as far as I'm aware.

> But theres' really only two (trivial) patches in there:
>
>  - scsi: Add intermediate STARGET_REMOVE state to scsi_target_state
>    (f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream)
>
>  - Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"
>    (305c2e71b3d733ec065cb716c76af7d554bd5571 upstream)
>
> as far as I can tell. And neither of them looks very likely, but what
> do I know. Adding Martin Petersen and Johannes Thumshirn to the
> participants just in case they go "Ahh.."

Right, I had the same head scratching.

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-21 20:40   ` Linus Torvalds
  2016-06-22  0:55     ` Josh Boyer
@ 2016-06-22  1:25     ` Martin K. Petersen
  2016-06-22  1:29       ` Quinn Tran
  2016-06-22 11:51       ` Johannes Thumshirn
  1 sibling, 2 replies; 20+ messages in thread
From: Martin K. Petersen @ 2016-06-22  1:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Boyer, Martin K. Petersen, Johannes Thumshirn,
	Thorsten Leemhuis, Linux Kernel Mailing List, Quinn Tran

>>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:

>> https://bugzilla.redhat.com/show_bug.cgi?id=1348342

This first one appears to be a crash in a USB sound doodad and not
qla2xxx. Also, this appears to be where the 4.5.5 -> 4.5.6 notion comes
from. So we can probably ignore 4.5.5 as the last good revision.

Linus> as far as I can tell. And neither of them looks very likely, but
Linus> what do I know. Adding Martin Petersen and Johannes Thumshirn to
Linus> the participants just in case they go "Ahh.."

Doubt it's Johannes' tweak. The qla2xxx crash from the two other
bugzilla entries is in:

(gdb) list *qla24xx_process_response_queue+0x49
0x27e09 is in qla24xx_process_response_queue (drivers/scsi/qla2xxx/qla_isr.c:2560).
2555            if (rsp->msix->cpuid != smp_processor_id()) {
2556                    /* if kernel does not notify qla of IRQ's CPU change,
2557                     * then set it here.
2558                     */
2559                    rsp->msix->cpuid = smp_processor_id();
2560                    ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
2561            }
2562
2563            while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
2564                    pkt = (struct sts_entry_24xx *)rsp->ring_ptr;

That particular code went into 4.5 and comes from:

commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
Author: Quinn Tran <quinn.tran@qlogic.com>
Date:   Thu Dec 17 14:57:05 2015 -0500

    qla2xxx: Add irq affinity notification

    Register to receive notification of when irq setting change
    occured.

    Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
    Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
    Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Quinn?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-22  1:25     ` Martin K. Petersen
@ 2016-06-22  1:29       ` Quinn Tran
  2016-06-22 11:51       ` Johannes Thumshirn
  1 sibling, 0 replies; 20+ messages in thread
From: Quinn Tran @ 2016-06-22  1:29 UTC (permalink / raw)
  To: Martin K. Petersen, Linus Torvalds
  Cc: Josh Boyer, Johannes Thumshirn, Thorsten Leemhuis, linux-kernel

Investigating.

Regards,
Quinn Tran







-----Original Message-----
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Organization: Oracle Corporation
Date: Tuesday, June 21, 2016 at 6:25 PM
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>, "Martin K. Petersen" <martin.petersen@oracle.com>, Johannes Thumshirn <jthumshirn@suse.de>, Thorsten Leemhuis <regressions@leemhuis.info>, linux-kernel <linux-kernel@vger.kernel.org>, Quinn Tran <quinn.tran@qlogic.com>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>>>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
>
>This first one appears to be a crash in a USB sound doodad and not
>qla2xxx. Also, this appears to be where the 4.5.5 -> 4.5.6 notion comes
>from. So we can probably ignore 4.5.5 as the last good revision.
>
>Linus> as far as I can tell. And neither of them looks very likely, but
>Linus> what do I know. Adding Martin Petersen and Johannes Thumshirn to
>Linus> the participants just in case they go "Ahh.."
>
>Doubt it's Johannes' tweak. The qla2xxx crash from the two other
>bugzilla entries is in:
>
>(gdb) list *qla24xx_process_response_queue+0x49
>0x27e09 is in qla24xx_process_response_queue (drivers/scsi/qla2xxx/qla_isr.c:2560).
>2555            if (rsp->msix->cpuid != smp_processor_id()) {
>2556                    /* if kernel does not notify qla of IRQ's CPU change,
>2557                     * then set it here.
>2558                     */
>2559                    rsp->msix->cpuid = smp_processor_id();
>2560                    ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>2561            }
>2562
>2563            while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
>2564                    pkt = (struct sts_entry_24xx *)rsp->ring_ptr;
>
>That particular code went into 4.5 and comes from:
>
>commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
>Author: Quinn Tran <quinn.tran@qlogic.com>
>Date:   Thu Dec 17 14:57:05 2015 -0500
>
>    qla2xxx: Add irq affinity notification
>
>    Register to receive notification of when irq setting change
>    occured.
>
>    Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
>    Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
>    Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
>
>Quinn?
>
>-- 
>Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-19 14:52 Reported regressions for 4.7 as of Sunday, 2016-06-19 Thorsten Leemhuis
  2016-06-20 10:21 ` Christoph Hellwig
  2016-06-21 11:11 ` Josh Boyer
@ 2016-06-22  6:36 ` Kalle Valo
  2 siblings, 0 replies; 20+ messages in thread
From: Kalle Valo @ 2016-06-22  6:36 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: Linus Torvalds, Linux Kernel Mailing List

Thorsten Leemhuis <regressions@leemhuis.info> writes:

> Description:    ath10k no longer authenticates and freezes system
> Report:         https://bugzilla.kernel.org/show_bug.cgi?id=119151
> Latest status:  http://thread.gmane.org/gmane.linux.kernel.wireless.general/152513/focus=152535
> Date rep/stat:  2016-05-27 / 2016-06-02
> Notes:          forgotten? poked bug report on Friday

Here's the fix:

ath10k: fix deadlock while processing rx_in_ord_ind

https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers.git/commit/?id=e50525bef593c3dd0564df676c567d77f7c20322

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-22  1:25     ` Martin K. Petersen
  2016-06-22  1:29       ` Quinn Tran
@ 2016-06-22 11:51       ` Johannes Thumshirn
  2016-06-22 15:57         ` Quinn Tran
  1 sibling, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2016-06-22 11:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Linus Torvalds, Josh Boyer, Thorsten Leemhuis,
	Linux Kernel Mailing List, Quinn Tran

On Tue, Jun 21, 2016 at 09:25:18PM -0400, Martin K. Petersen wrote:
> >>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
> 
> This first one appears to be a crash in a USB sound doodad and not
> qla2xxx. Also, this appears to be where the 4.5.5 -> 4.5.6 notion comes
> from. So we can probably ignore 4.5.5 as the last good revision.
> 
> Linus> as far as I can tell. And neither of them looks very likely, but
> Linus> what do I know. Adding Martin Petersen and Johannes Thumshirn to
> Linus> the participants just in case they go "Ahh.."
> 
> Doubt it's Johannes' tweak. The qla2xxx crash from the two other
> bugzilla entries is in:
> 
> (gdb) list *qla24xx_process_response_queue+0x49
> 0x27e09 is in qla24xx_process_response_queue (drivers/scsi/qla2xxx/qla_isr.c:2560).
> 2555            if (rsp->msix->cpuid != smp_processor_id()) {
> 2556                    /* if kernel does not notify qla of IRQ's CPU change,
> 2557                     * then set it here.
> 2558                     */
> 2559                    rsp->msix->cpuid = smp_processor_id();
> 2560                    ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
> 2561            }
> 2562
> 2563            while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
> 2564                    pkt = (struct sts_entry_24xx *)rsp->ring_ptr;
> 
> That particular code went into 4.5 and comes from:
> 
> commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
> Author: Quinn Tran <quinn.tran@qlogic.com>
> Date:   Thu Dec 17 14:57:05 2015 -0500
> 
>     qla2xxx: Add irq affinity notification
> 
>     Register to receive notification of when irq setting change
>     occured.
> 
>     Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
>     Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
>     Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> Quinn?
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

Having a quick look at it I _think_ this could be the problem.
We request the IRQ _before_ actually assigning the rsp->msix entry. Now If an
IRQ triggers, before the assignment we touch rsp->msix->cpuid, which is
probably the case. At least from what I conduct from Martin's mail.

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 5649c20..20743a3 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -3086,6 +3086,8 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
 	/* Enable MSI-X vectors for the base queue */
 	for (i = 0; i < 2; i++) {
 		qentry = &ha->msix_entries[i];
+		qentry->rsp = rsp;
+		rsp->msix = qentry;
 		if (IS_P3P_TYPE(ha))
 			ret = request_irq(qentry->vector,
 				qla82xx_msix_entries[i].handler,
@@ -3097,8 +3099,6 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
 		if (ret)
 			goto msix_register_fail;
 		qentry->have_irq = 1;
-		qentry->rsp = rsp;
-		rsp->msix = qentry;
 
 		/* Register for CPU affinity notification. */
 		irq_set_affinity_notifier(qentry->vector, &qentry->irq_notify);
@@ -3119,12 +3119,12 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
 	 */
 	if (QLA_TGT_MODE_ENABLED() && IS_ATIO_MSIX_CAPABLE(ha)) {
 		qentry = &ha->msix_entries[ATIO_VECTOR];
+		qentry->rsp = rsp;
+		rsp->msix = qentry;
 		ret = request_irq(qentry->vector,
 			qla83xx_msix_entries[ATIO_VECTOR].handler,
 			0, qla83xx_msix_entries[ATIO_VECTOR].name, rsp);
 		qentry->have_irq = 1;
-		qentry->rsp = rsp;
-		rsp->msix = qentry;
 	}
 
 msix_register_fail:


I'm not sure if we need the qentry->have_irq assingment as well, I'm not 
deep enough into the qla2xx driver yet, maybe Quinn can clarify.
Beware of the above change being untested.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-22 11:51       ` Johannes Thumshirn
@ 2016-06-22 15:57         ` Quinn Tran
  2016-06-23  7:22           ` Johannes Thumshirn
  2016-07-05 16:30           ` Josh Boyer
  0 siblings, 2 replies; 20+ messages in thread
From: Quinn Tran @ 2016-06-22 15:57 UTC (permalink / raw)
  To: Johannes Thumshirn, Martin K. Petersen
  Cc: Linus Torvalds, Josh Boyer, Thorsten Leemhuis, linux-kernel

Johannes,  Martin,

Based on the screen shot/call trace,  it looks like this adapter is not using MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an assumption  of MSIX is available.  There is no point in go through that code segment.

Can you try this work around?  It’s untested.  Thanks.


diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 5649c20..e033ecb 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
        if (!vha->flags.online)
                return;
 
-       if (rsp->msix->cpuid != smp_processor_id()) {
+       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
                /* if kernel does not notify qla of IRQ's CPU change,
                 * then set it here.
                 */




Regards,
Quinn Tran







-----Original Message-----
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Wednesday, June 22, 2016 at 4:51 AM
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, Josh Boyer <jwboyer@fedoraproject.org>, Thorsten Leemhuis <regressions@leemhuis.info>, linux-kernel <linux-kernel@vger.kernel.org>, Quinn Tran <quinn.tran@qlogic.com>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>On Tue, Jun 21, 2016 at 09:25:18PM -0400, Martin K. Petersen wrote:
>> >>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1348342
>> 
>> This first one appears to be a crash in a USB sound doodad and not
>> qla2xxx. Also, this appears to be where the 4.5.5 -> 4.5.6 notion comes
>> from. So we can probably ignore 4.5.5 as the last good revision.
>> 
>> Linus> as far as I can tell. And neither of them looks very likely, but
>> Linus> what do I know. Adding Martin Petersen and Johannes Thumshirn to
>> Linus> the participants just in case they go "Ahh.."
>> 
>> Doubt it's Johannes' tweak. The qla2xxx crash from the two other
>> bugzilla entries is in:
>> 
>> (gdb) list *qla24xx_process_response_queue+0x49
>> 0x27e09 is in qla24xx_process_response_queue (drivers/scsi/qla2xxx/qla_isr.c:2560).
>> 2555            if (rsp->msix->cpuid != smp_processor_id()) {
>> 2556                    /* if kernel does not notify qla of IRQ's CPU change,
>> 2557                     * then set it here.
>> 2558                     */
>> 2559                    rsp->msix->cpuid = smp_processor_id();
>> 2560                    ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>> 2561            }
>> 2562
>> 2563            while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
>> 2564                    pkt = (struct sts_entry_24xx *)rsp->ring_ptr;
>> 
>> That particular code went into 4.5 and comes from:
>> 
>> commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
>> Author: Quinn Tran <quinn.tran@qlogic.com>
>> Date:   Thu Dec 17 14:57:05 2015 -0500
>> 
>>     qla2xxx: Add irq affinity notification
>> 
>>     Register to receive notification of when irq setting change
>>     occured.
>> 
>>     Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
>>     Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
>>     Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
>> 
>> Quinn?
>> 
>> -- 
>> Martin K. Petersen	Oracle Linux Engineering
>
>Having a quick look at it I _think_ this could be the problem.
>We request the IRQ _before_ actually assigning the rsp->msix entry. Now If an
>IRQ triggers, before the assignment we touch rsp->msix->cpuid, which is
>probably the case. At least from what I conduct from Martin's mail.
>
>diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
>index 5649c20..20743a3 100644
>--- a/drivers/scsi/qla2xxx/qla_isr.c
>+++ b/drivers/scsi/qla2xxx/qla_isr.c
>@@ -3086,6 +3086,8 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> 	/* Enable MSI-X vectors for the base queue */
> 	for (i = 0; i < 2; i++) {
> 		qentry = &ha->msix_entries[i];
>+		qentry->rsp = rsp;
>+		rsp->msix = qentry;
> 		if (IS_P3P_TYPE(ha))
> 			ret = request_irq(qentry->vector,
> 				qla82xx_msix_entries[i].handler,
>@@ -3097,8 +3099,6 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> 		if (ret)
> 			goto msix_register_fail;
> 		qentry->have_irq = 1;
>-		qentry->rsp = rsp;
>-		rsp->msix = qentry;
> 
> 		/* Register for CPU affinity notification. */
> 		irq_set_affinity_notifier(qentry->vector, &qentry->irq_notify);
>@@ -3119,12 +3119,12 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> 	 */
> 	if (QLA_TGT_MODE_ENABLED() && IS_ATIO_MSIX_CAPABLE(ha)) {
> 		qentry = &ha->msix_entries[ATIO_VECTOR];
>+		qentry->rsp = rsp;
>+		rsp->msix = qentry;
> 		ret = request_irq(qentry->vector,
> 			qla83xx_msix_entries[ATIO_VECTOR].handler,
> 			0, qla83xx_msix_entries[ATIO_VECTOR].name, rsp);
> 		qentry->have_irq = 1;
>-		qentry->rsp = rsp;
>-		rsp->msix = qentry;
> 	}
> 
> msix_register_fail:
>
>
>I'm not sure if we need the qentry->have_irq assingment as well, I'm not 
>deep enough into the qla2xx driver yet, maybe Quinn can clarify.
>Beware of the above change being untested.
>
>Byte,
>	Johannes
>-- 
>Johannes Thumshirn                                          Storage
>jthumshirn@suse.de                                +49 911 74053 689
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>GF: Felix Imendörffer, Jane Smithard, Graham Norton
>HRB 21284 (AG Nürnberg)
>Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-22 15:57         ` Quinn Tran
@ 2016-06-23  7:22           ` Johannes Thumshirn
  2016-06-23 16:13             ` Quinn Tran
  2016-07-05 16:30           ` Josh Boyer
  1 sibling, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2016-06-23  7:22 UTC (permalink / raw)
  To: Quinn Tran
  Cc: Martin K. Petersen, Linus Torvalds, Josh Boyer,
	Thorsten Leemhuis, linux-kernel, linux-scsi

[+ Cc linux-scsi@vger.kernel.org ]

On Wed, Jun 22, 2016 at 03:57:35PM +0000, Quinn Tran wrote:
> Johannes,  Martin,
> 
> Based on the screen shot/call trace,  it looks like this adapter is not using MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an assumption  of MSIX is available.  There is no point in go through that code segment.
> 
> Can you try this work around?  It’s untested.  Thanks.
> 
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index 5649c20..e033ecb 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>         if (!vha->flags.online)
>                 return;
>  
> -       if (rsp->msix->cpuid != smp_processor_id()) {
> +       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>                 /* if kernel does not notify qla of IRQ's CPU change,
>                  * then set it here.
>                  */
> 

But this still does not fix the race which would be possible if the HBA is
using MSI-X but triggering IRQs early enough.

Have a look at this (I admit theoretical) path:
qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
{
	[...]
	/* Enable MSI-X vectors for the base queue */
	for (i = 0; i < 2; i++) {
			qentry = &ha->msix_entries[i];
			if (IS_P3P_TYPE(ha))
				ret = request_irq(qentry->vector,
					qla82xx_msix_entries[i].handler,
					0, qla82xx_msix_entries[i].name, rsp);
			else
				ret = request_irq(qentry->vector,
					msix_entries[i].handler,
					0, msix_entries[i].name, rsp);
			if (ret)
				goto msix_register_fail;
							<--- IRQ arrives here
			qentry->have_irq = 1;
			qentry->rsp = rsp;
			rsp->msix = qentry;

			[...]


void qla24xx_process_response_queue(struct scsi_qla_host *vha,
        struct rsp_que *rsp)
{
[...]
	if (rsp->msix->cpuid != smp_processor_id()) {
                  ^
                  \--- rsp->msix == NULL

			/* if kernel does not notify qla of IRQ's CPU change,
			 * then set it here.
			 */
			rsp->msix->cpuid = smp_processor_id();
			ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-23  7:22           ` Johannes Thumshirn
@ 2016-06-23 16:13             ` Quinn Tran
  2016-06-23 16:35               ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Quinn Tran @ 2016-06-23 16:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Martin K. Petersen, Linus Torvalds, Josh Boyer,
	Thorsten Leemhuis, linux-kernel, linux-scsi


-----Original Message-----
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Thursday, June 23, 2016 at 12:22 AM
To: Quinn Tran <quinn.tran@qlogic.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Linus Torvalds <torvalds@linux-foundation.org>, Josh Boyer <jwboyer@fedoraproject.org>, Thorsten Leemhuis <regressions@leemhuis.info>, linux-kernel <linux-kernel@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>[+ Cc linux-scsi@vger.kernel.org ]
>
>On Wed, Jun 22, 2016 at 03:57:35PM +0000, Quinn Tran wrote:
>> Johannes,  Martin,
>> 
>> Based on the screen shot/call trace,  it looks like this adapter is not using MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an assumption  of MSIX is available.  There is no point in go through that code segment.
>> 
>> Can you try this work around?  It’s untested.  Thanks.
>> 
>> 
>> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
>> index 5649c20..e033ecb 100644
>> --- a/drivers/scsi/qla2xxx/qla_isr.c
>> +++ b/drivers/scsi/qla2xxx/qla_isr.c
>> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>>         if (!vha->flags.online)
>>                 return;
>>  
>> -       if (rsp->msix->cpuid != smp_processor_id()) {
>> +       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>>                 /* if kernel does not notify qla of IRQ's CPU change,
>>                  * then set it here.
>>                  */
>> 
>
>But this still does not fix the race which would be possible if the HBA is
>using MSI-X but triggering IRQs early enough.
>
>Have a look at this (I admit theoretical) path:
>qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
>{
>	[...]
>	/* Enable MSI-X vectors for the base queue */
>	for (i = 0; i < 2; i++) {
>			qentry = &ha->msix_entries[i];
>			if (IS_P3P_TYPE(ha))
>				ret = request_irq(qentry->vector,
>					qla82xx_msix_entries[i].handler,
>					0, qla82xx_msix_entries[i].name, rsp);
>			else
>				ret = request_irq(qentry->vector,
>					msix_entries[i].handler,
>					0, msix_entries[i].name, rsp);
>			if (ret)
>				goto msix_register_fail;
>							<--- IRQ arrives here

QT: setting up the interrupt vector does not mean the interrupt starts firing immediately.  Interrupt starting firing when the driver is ready to accept the interrupt by enabling the interrupt (ha->isp_ops->enable_intrs(ha)) later on in time.  In addition, that particular code path/qla24xx_process_response_queue  is not executed until driver feeds commands to the hardware work queue.  

IF there is a left over interrupt that happens to trigger the call immediately, there is another check that prevent the code from getting to the point of the “theoretical" race.


>			qentry->have_irq = 1;
>			qentry->rsp = rsp;
>			rsp->msix = qentry;
>
>			[...]
>
>
>void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>        struct rsp_que *rsp)
>{
--->8------
	if (!vha->flags.online)
		return;

---8<------
>	if (rsp->msix->cpuid != smp_processor_id()) {
>                  ^
>                  \--- rsp->msix == NULL
>
>			/* if kernel does not notify qla of IRQ's CPU change,
>			 * then set it here.
>			 */
>			rsp->msix->cpuid = smp_processor_id();
>			ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>
>-- 
>Johannes Thumshirn                                          Storage
>jthumshirn@suse.de                                +49 911 74053 689
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>GF: Felix Imendörffer, Jane Smithard, Graham Norton
>HRB 21284 (AG Nürnberg)
>Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-23 16:13             ` Quinn Tran
@ 2016-06-23 16:35               ` Linus Torvalds
  2016-06-23 20:56                 ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2016-06-23 16:35 UTC (permalink / raw)
  To: Quinn Tran
  Cc: Johannes Thumshirn, Martin K. Petersen, Josh Boyer,
	Thorsten Leemhuis, linux-kernel, linux-scsi

On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran <quinn.tran@qlogic.com> wrote:
>
>
> QT: setting up the interrupt vector does not mean the interrupt starts firing immediately.

Actually, it very much can mean that. If the interrupt can possibly be
shared, there is a very real possibility of it fiding immediately.

Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch
that handles just the legacy INTx case anyway is sufficient, but in
general I would like people to always act as if interrupts can happen
immediately after request_irq().

We have had *tons* of situations where the firmware left a device
active, for example. Or where some random interrupt controller ended
up having stale interrupts pending, even.

So in general, it's just good practice to say "spurious interrupts can
and do happen" - the shared irq case is the most obvious case, but
there have been other sources of unexpected spurious interrupts
firing.

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-23 16:35               ` Linus Torvalds
@ 2016-06-23 20:56                 ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2016-06-23 20:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Quinn Tran, Johannes Thumshirn, Martin K. Petersen, Josh Boyer,
	Thorsten Leemhuis, linux-kernel, linux-scsi

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran <quinn.tran@qlogic.com> wrote:
>>
>>
>> QT: setting up the interrupt vector does not mean the interrupt starts firing immediately.
>
> Actually, it very much can mean that. If the interrupt can possibly be
> shared, there is a very real possibility of it fiding immediately.
>
> Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch
> that handles just the legacy INTx case anyway is sufficient, but in
> general I would like people to always act as if interrupts can happen
> immediately after request_irq().
>
> We have had *tons* of situations where the firmware left a device
> active, for example. Or where some random interrupt controller ended
> up having stale interrupts pending, even.
>
> So in general, it's just good practice to say "spurious interrupts can
> and do happen" - the shared irq case is the most obvious case, but
> there have been other sources of unexpected spurious interrupts
> firing.

One case that occassionally bytes even for MSI-X is the case of kexec on
panic where the hardware was not shut down before the kernel starts, and
the start of the kernel masks the irq.  Then when the driver initializes
and calls request_irq it is possible for an irq to be pending as soon as
the MSI-X irq is actually enabled to the hardware.

And there is always CONFIG_IRQ_DEBUG which always acts like an interrupt
happens right when after request_irq finishes.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-22 15:57         ` Quinn Tran
  2016-06-23  7:22           ` Johannes Thumshirn
@ 2016-07-05 16:30           ` Josh Boyer
  2016-07-05 17:32             ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Josh Boyer @ 2016-07-05 16:30 UTC (permalink / raw)
  To: Quinn Tran
  Cc: Johannes Thumshirn, Martin K. Petersen, Linus Torvalds,
	Thorsten Leemhuis, linux-kernel

On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran <quinn.tran@qlogic.com> wrote:
> Johannes,  Martin,
>
> Based on the screen shot/call trace,  it looks like this adapter is not using MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an assumption  of MSIX is available.  There is no point in go through that code segment.
>
> Can you try this work around?  It’s untested.  Thanks.
>
>
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index 5649c20..e033ecb 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>         if (!vha->flags.online)
>                 return;
>
> -       if (rsp->msix->cpuid != smp_processor_id()) {
> +       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>                 /* if kernel does not notify qla of IRQ's CPU change,
>                  * then set it here.
>                  */

Did this wind up going into an official commit somewhere?

josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-07-05 16:30           ` Josh Boyer
@ 2016-07-05 17:32             ` Linus Torvalds
  2016-07-05 18:43               ` Thorsten Leemhuis
  2016-07-05 19:40               ` Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2016-07-05 17:32 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Quinn Tran, Johannes Thumshirn, Martin K. Petersen,
	Thorsten Leemhuis, linux-kernel, Linux SCSI List

On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran <quinn.tran@qlogic.com> wrote:
>>
>> -       if (rsp->msix->cpuid != smp_processor_id()) {
>> +       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>
> Did this wind up going into an official commit somewhere?

It's not in my tree, at least.

And I don't think I've seen a "yes, that fixes it". Although Johannes
was right that in addition to that, the ordering of the irq setup
should probably _also_ be fixed, but that's a separate patch.

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-07-05 17:32             ` Linus Torvalds
@ 2016-07-05 18:43               ` Thorsten Leemhuis
  2016-07-05 19:40               ` Martin K. Petersen
  1 sibling, 0 replies; 20+ messages in thread
From: Thorsten Leemhuis @ 2016-07-05 18:43 UTC (permalink / raw)
  To: Linus Torvalds, Josh Boyer
  Cc: Quinn Tran, Johannes Thumshirn, Martin K. Petersen, linux-kernel,
	Linux SCSI List

On 05.07.2016 19:32, Linus Torvalds wrote:
> On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran <quinn.tran@qlogic.com> wrote:
>>>
>>> -       if (rsp->msix->cpuid != smp_processor_id()) {
>>> +       if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>>
>> Did this wind up going into an official commit somewhere?
> It's not in my tree, at least.
> And I don't think I've seen a "yes, that fixes it".

Quinn Tran ACKed a nearly identical patch from Bruno Prémont in a
different thread:
http://thread.gmane.org/gmane.linux.kernel/2257008/focus=2257139

>From what I can see in the initial mail in that thread it seems Bruno
successfully tested the patch he submitted. But I have no idea if the
patch is in someones queue to mainline right now. That's why I had it on
my "if nothing happens soon, poke someone" list...

HTH, CU, Thorsten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-07-05 17:32             ` Linus Torvalds
  2016-07-05 18:43               ` Thorsten Leemhuis
@ 2016-07-05 19:40               ` Martin K. Petersen
  1 sibling, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2016-07-05 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Boyer, Quinn Tran, Johannes Thumshirn, Martin K. Petersen,
	Thorsten Leemhuis, linux-kernel, Linux SCSI List

>>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:

Linus> It's not in my tree, at least.

Not in scsi-fixes either. I have been waiting for a "real" patch
submission with one or more Tested-by: tags. I generally don't queue
something that comes with a "try this untested workaround" patch
description.

Quinn, please submit a real patch.

Linus> And I don't think I've seen a "yes, that fixes it". Although
Linus> Johannes was right that in addition to that, the ordering of the
Linus> irq setup should probably _also_ be fixed, but that's a separate
Linus> patch.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
  2016-06-26 12:52 ` Thorsten Leemhuis
@ 2016-06-26 15:36   ` Lucas Stach
  0 siblings, 0 replies; 20+ messages in thread
From: Lucas Stach @ 2016-06-26 15:36 UTC (permalink / raw)
  To: Thorsten Leemhuis, George Spelvin
  Cc: airlied, Linux Kernel Mailing List, dri-devel, nouveau

Am Sonntag, den 26.06.2016, 14:52 +0200 schrieb Thorsten Leemhuis:
> On 24.06.2016 16:19, George Spelvin wrote:
> > 
> > Here's a regression you might add.  
> Thx, added.
> 
Probably the same bug as 
https://bugzilla.kernel.org/show_bug.cgi?id=119861 and already fixed in
the last -rc.

Regards,
Lucas

> > 
> > I only reported it to dri-devel,
> > since it's DRI-specific, but since there's been thunderous silence
> > for a few weeks, I'm trying to be a squeakier wheel.
> Added the nouveau developers to CC, maybe it's a bug in the drm
> driver
> that triggers this problem; and airlied is "Internet challenged"
> right
> now and Daniel on holidays, so it might be good to get more people
> into
> the loop anyway.
> 
> > 
> > Given that I bisected it to a single, small, revertable commit, I'd
> > hoped it would be easy to deal with.
> > 
> > [BISECTED: 0955c1250e] 4.7-rc1 oops at
> > drm_connector_cleanup+0x5c/0x1d0 
> > 
> > E-mail report at
> > https://marc.info/?l=dri-devel&m=146577898611849
> > 
> > Bugzilla report at
> > https://bugs.freedesktop.org/show_bug.cgi?id=96532
> FWIW the important detail: Reverting
> https://git.kernel.org/linus/0955c1250e (drm/crtc: take references to
> connectors used in a modeset. (v2)) fixes this.
> 
> Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRm
> Jo)
>  Thorsten
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
       [not found] <20160624141918.4646.qmail@ns.sciencehorizons.net>
@ 2016-06-26 12:52 ` Thorsten Leemhuis
  2016-06-26 15:36   ` Lucas Stach
  0 siblings, 1 reply; 20+ messages in thread
From: Thorsten Leemhuis @ 2016-06-26 12:52 UTC (permalink / raw)
  To: George Spelvin; +Cc: airlied, dri-devel, Linux Kernel Mailing List, nouveau

On 24.06.2016 16:19, George Spelvin wrote:
> Here's a regression you might add.  

Thx, added.

> I only reported it to dri-devel,
> since it's DRI-specific, but since there's been thunderous silence
> for a few weeks, I'm trying to be a squeakier wheel.

Added the nouveau developers to CC, maybe it's a bug in the drm driver
that triggers this problem; and airlied is "Internet challenged" right
now and Daniel on holidays, so it might be good to get more people into
the loop anyway.

> Given that I bisected it to a single, small, revertable commit, I'd
> hoped it would be easy to deal with.
> 
> [BISECTED: 0955c1250e] 4.7-rc1 oops at drm_connector_cleanup+0x5c/0x1d0 
> 
> E-mail report at
> https://marc.info/?l=dri-devel&m=146577898611849
> 
> Bugzilla report at
> https://bugs.freedesktop.org/show_bug.cgi?id=96532

FWIW the important detail: Reverting
https://git.kernel.org/linus/0955c1250e (drm/crtc: take references to
connectors used in a modeset. (v2)) fixes this.

Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRmJo)
 Thorsten

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-07-05 19:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-19 14:52 Reported regressions for 4.7 as of Sunday, 2016-06-19 Thorsten Leemhuis
2016-06-20 10:21 ` Christoph Hellwig
2016-06-21 11:11 ` Josh Boyer
2016-06-21 20:40   ` Linus Torvalds
2016-06-22  0:55     ` Josh Boyer
2016-06-22  1:25     ` Martin K. Petersen
2016-06-22  1:29       ` Quinn Tran
2016-06-22 11:51       ` Johannes Thumshirn
2016-06-22 15:57         ` Quinn Tran
2016-06-23  7:22           ` Johannes Thumshirn
2016-06-23 16:13             ` Quinn Tran
2016-06-23 16:35               ` Linus Torvalds
2016-06-23 20:56                 ` Eric W. Biederman
2016-07-05 16:30           ` Josh Boyer
2016-07-05 17:32             ` Linus Torvalds
2016-07-05 18:43               ` Thorsten Leemhuis
2016-07-05 19:40               ` Martin K. Petersen
2016-06-22  6:36 ` Kalle Valo
     [not found] <20160624141918.4646.qmail@ns.sciencehorizons.net>
2016-06-26 12:52 ` Thorsten Leemhuis
2016-06-26 15:36   ` Lucas Stach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).