All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
@ 2015-11-04 17:24 Laine Stump
       [not found] ` <563A3F64.50808-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Laine Stump @ 2015-11-04 17:24 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Last week I upgraded my Fedora 22 AMD 990FX system from kernel 4.1.10 to 
4.2.3 (standard Fedora builds) and multiple devices stopped working:

* 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 
Azalia (Intel HDA) (rev 40)

* 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit 
Network Connection

* 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar 
HDMI Audio [Radeon HD 5400/6300 Series]

(The 1st is integrated on the motherboard, the 2nd & 3rd are behind an 
AMD RD890 pci-pci bridge. There may be other devices failing, but these 
are the ones immediately obvious.)

Whatever is the source of the failure, it ends up that the drivers for 
these devices aren't loaded.

At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS, 
and magically all the devices resumed normal operation (except that I 
can't do vfio device assignment because the IOMMU is disabled).

Reverting to kernel 4.1.10 very definitely eliminates the problem. I've 
also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these 
three are the only pre-built kernels for F22). I can provide dmesg / 
lspci output from each of these, or any other debug info anyone might 
like me to gather.

What can I do to help figure out the cause of this problem and get it 
fixed? (keeping in mind that the last time I built a Linux kernel for 
myself was 2.6.something about 7 years ago :-). I would be willing to 
set things up to build my own and go bisecting with git, but there's 
likely something more expedient...)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found] ` <563A3F64.50808-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-11-04 21:08   ` Alex Williamson
       [not found]     ` <1446671291.3692.147.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-11-12 17:33   ` Laine Stump
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2015-11-04 21:08 UTC (permalink / raw)
  To: Laine Stump; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
> Last week I upgraded my Fedora 22 AMD 990FX system from kernel 4.1.10 to 
> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
> 
> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 
> Azalia (Intel HDA) (rev 40)
> 
> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit 
> Network Connection
> 
> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar 
> HDMI Audio [Radeon HD 5400/6300 Series]
> 
> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind an 
> AMD RD890 pci-pci bridge. There may be other devices failing, but these 
> are the ones immediately obvious.)
> 
> Whatever is the source of the failure, it ends up that the drivers for 
> these devices aren't loaded.
> 
> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS, 
> and magically all the devices resumed normal operation (except that I 
> can't do vfio device assignment because the IOMMU is disabled).
> 
> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've 
> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these 
> three are the only pre-built kernels for F22). I can provide dmesg / 
> lspci output from each of these, or any other debug info anyone might 
> like me to gather.

I built a 4.2.3 kernel for my 990fx system and can't seem to reproduce
it.  Does 'lspci -k' for those devices show any driver?  Does 'lsmod'
show the drivers loaded, igb and snd_hda_intel?  If not, does manually
modprobe'ing either of those drivers change anything?  You haven't
installed a script that writes to driver_override or setup a
configuration where those devices are claimed by pci-stub and forgotten
about it, have you? (it's happened to me)  Otherwise, dmesg is probably
a good place to start.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]     ` <1446671291.3692.147.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-11-05 14:11       ` Mark Hounschell
  2015-11-05 19:05       ` Laine Stump
  1 sibling, 0 replies; 12+ messages in thread
From: Mark Hounschell @ 2015-11-05 14:11 UTC (permalink / raw)
  To: Alex Williamson, Laine Stump
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 11/04/2015 04:08 PM, Alex Williamson wrote:
> On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
>> Last week I upgraded my Fedora 22 AMD 990FX system from kernel 4.1.10 to
>> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
>>
>> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
>> Azalia (Intel HDA) (rev 40)
>>
>> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
>> Network Connection
>>
>> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
>> HDMI Audio [Radeon HD 5400/6300 Series]
>>
>> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind an
>> AMD RD890 pci-pci bridge. There may be other devices failing, but these
>> are the ones immediately obvious.)
>>
>> Whatever is the source of the failure, it ends up that the drivers for
>> these devices aren't loaded.
>>
>> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
>> and magically all the devices resumed normal operation (except that I
>> can't do vfio device assignment because the IOMMU is disabled).
>>
>> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
>> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
>> three are the only pre-built kernels for F22). I can provide dmesg /
>> lspci output from each of these, or any other debug info anyone might
>> like me to gather.
>
> I built a 4.2.3 kernel for my 990fx system and can't seem to reproduce
> it.  Does 'lspci -k' for those devices show any driver?  Does 'lsmod'
> show the drivers loaded, igb and snd_hda_intel?  If not, does manually
> modprobe'ing either of those drivers change anything?  You haven't
> installed a script that writes to driver_override or setup a
> configuration where those devices are claimed by pci-stub and forgotten
> about it, have you? (it's happened to me)  Otherwise, dmesg is probably
> a good place to start.  Thanks,
>
> Alex

I'm also running one of these with a 4.2.5 kernel with no IOMMU issues.

mark

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]     ` <1446671291.3692.147.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-11-05 14:11       ` Mark Hounschell
@ 2015-11-05 19:05       ` Laine Stump
       [not found]         ` <563BA893.4020202-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Laine Stump @ 2015-11-05 19:05 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 11/04/2015 04:08 PM, Alex Williamson wrote:
> On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
>> Last week I upgraded my Fedora 22 AMD 990FX system from kernel 4.1.10 to
>> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
>>
>> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
>> Azalia (Intel HDA) (rev 40)
>>
>> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
>> Network Connection
>>
>> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
>> HDMI Audio [Radeon HD 5400/6300 Series]
>>
>> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind an
>> AMD RD890 pci-pci bridge. There may be other devices failing, but these
>> are the ones immediately obvious.)
>>
>> Whatever is the source of the failure, it ends up that the drivers for
>> these devices aren't loaded.
>>
>> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
>> and magically all the devices resumed normal operation (except that I
>> can't do vfio device assignment because the IOMMU is disabled).
>>
>> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
>> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
>> three are the only pre-built kernels for F22). I can provide dmesg /
>> lspci output from each of these, or any other debug info anyone might
>> like me to gather.
>
> I built a 4.2.3 kernel for my 990fx system and can't seem to reproduce
> it.  Does 'lspci -k' for those devices show any driver?

00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 
Azalia (Intel HDA) (rev 40)
	Subsystem: Gigabyte Technology Co., Ltd Device a132
	Kernel modules: snd_hda_intel
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI 
Audio [Radeon HD 5400/6300 Series]
	Subsystem: Gigabyte Technology Co., Ltd Device aa68
	Kernel modules: snd_hda_intel
02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network 
Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Kernel driver in use: igb
	Kernel modules: igb
02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network 
Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Kernel modules: igb

/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link from 
driver to ........drivers/igb, but .......:02::00.1 doesn't have a link, 
and neither of them shows up in /sys/class/net.

Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at 
00:02.0), the .0 device does have a link to the radeon driver, but the 
.1 device (which is the sound device on the radeon video card) has no 
driver link.

And 00:14.2 (the motherboard integrated sound device) shows no driver 
link in sysfs either.

> Does 'lsmod'
> show the drivers loaded, igb and snd_hda_intel?  If not, does manually
> modprobe'ing either of those drivers change anything?

Both of those drivers show up in lsmod output.

> You haven't
> installed a script that writes to driver_override or setup a
> configuration where those devices are claimed by pci-stub and forgotten
> about it, have you? (it's happened to me)

Not that I'm aware of. /etc/modules.d/local.conf had a few stray very 
old items that I'd forgotten about, but I removed those and the results 
are the same.

   Otherwise, dmesg is probably
> a good place to start.

Thanks to the uber-verbosity of systemd, this file is about 11MB. Where 
do you want me to put it?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]         ` <563BA893.4020202-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-11-08 16:52           ` Laine Stump
  0 siblings, 0 replies; 12+ messages in thread
From: Laine Stump @ 2015-11-08 16:52 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 11/05/2015 02:05 PM, Laine Stump wrote:
> On 11/04/2015 04:08 PM, Alex Williamson wrote:
>>
>> I built a 4.2.3 kernel for my 990fx system and can't seem to reproduce
>> it.  Does 'lspci -k' for those devices show any driver?
>
> 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> Azalia (Intel HDA) (rev 40)
>      Subsystem: Gigabyte Technology Co., Ltd Device a132
>      Kernel modules: snd_hda_intel
> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI
> Audio [Radeon HD 5400/6300 Series]
>      Subsystem: Gigabyte Technology Co., Ltd Device aa68
>      Kernel modules: snd_hda_intel
> 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
>      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
>      Kernel driver in use: igb
>      Kernel modules: igb
> 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
>      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
>      Kernel modules: igb
>
> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link from
> driver to ........drivers/igb, but .......:02::00.1 doesn't have a link,
> and neither of them shows up in /sys/class/net.
>
> Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at
> 00:02.0), the .0 device does have a link to the radeon driver, but the
> .1 device (which is the sound device on the radeon video card) has no
> driver link.
>
> And 00:14.2 (the motherboard integrated sound device) shows no driver
> link in sysfs either.
>
>> Does 'lsmod'
>> show the drivers loaded, igb and snd_hda_intel?  If not, does manually
>> modprobe'ing either of those drivers change anything?
>
> Both of those drivers show up in lsmod output.
>
>> You haven't
>> installed a script that writes to driver_override or setup a
>> configuration where those devices are claimed by pci-stub and forgotten
>> about it, have you? (it's happened to me)
>
> Not that I'm aware of. /etc/modules.d/local.conf had a few stray very
> old items that I'd forgotten about, but I removed those and the results
> are the same.
>
>    Otherwise, dmesg is probably
>> a good place to start.
>
> Thanks to the uber-verbosity of systemd, this file is about 11MB. Where
> do you want me to put it?

I figured out that somehow my kernel commandlines had gotten options to 
put systemd logging into kmsg *and* set its logging to debug mode. Now 
that that is fixed, dmesg is much more manageable. Here is the dmesg 
with IOMMU enabled in the BIOS (i.e. the devices *don't* work):

   http://fpaste.org/288181/70011531/

and here is is when IOMMU has been *disabled* in the BIOS (the devices 
*do* work):

   http://fpaste.org/288182/47001302/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found] ` <563A3F64.50808-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-11-04 21:08   ` Alex Williamson
@ 2015-11-12 17:33   ` Laine Stump
       [not found]     ` <5644CD81.2020304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Laine Stump @ 2015-11-12 17:33 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Joerg Roedel

(Cc'ing Joerg because I have a question for him down towards the bottom...)

On 11/04/2015 12:24 PM, Laine Stump wrote:
> Last week I upgraded my Fedora 22 AMD 990FX system from kernel 4.1.10 to
> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
>
> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> Azalia (Intel HDA) (rev 40)
>
> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
> Network Connection
>
> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
> HDMI Audio [Radeon HD 5400/6300 Series]
>
> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind an
> AMD RD890 pci-pci bridge. There may be other devices failing, but these
> are the ones immediately obvious.)
>
> Whatever is the source of the failure, it ends up that the drivers for
> these devices aren't loaded.

That is actually a bit misleading/incorrect - the igb device is 
apparently loaded for one of the two devices in the 82576 card 
(02:00.0), and one (out of expected 7) VF device entries is created in 
/sys/devices/pci0000:00/*, but something happens that results in none of 
these devices being put in /sys/class/net, the other 6 VFs don't get 
entries in /sys/device/pci0000:00/*, and the igb driver isn't loaded for 
the 2nd PF (02:00.1).

> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
> and magically all the devices resumed normal operation (except that I
> can't do vfio device assignment because the IOMMU is disabled).

After a crash course in kernel building from Alex, I bisected down to 
commit aafd8ba - a kernel built without this commit succeeds in setting 
up all the devices mentioned, adding it causes failure (and a very long 
delay during boot). Joerg, do you have any ideas for debugging the 
problem further to see what in the commit causes this problem? (note 
that 2 other people with the same chipset but slightly different 
hardware plugged into it report no failure - see the other replies to 
the parent of this message for more detail). I'm happy to build a kernel 
with any suggested patches and report results...

commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
Author: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Date:   Thu May 28 18:41:39 2015 +0200

     iommu/amd: Implement add_device and remove_device

     Implement these two iommu-ops call-backs to make use of the
     initialization and notifier features of the iommu core.

     Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]     ` <5644CD81.2020304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-11-18 15:18       ` Joerg Roedel
       [not found]         ` <20151118151841.GA2517-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2015-11-18 15:18 UTC (permalink / raw)
  To: Laine Stump; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello Laine,

On Thu, Nov 12, 2015 at 12:33:53PM -0500, Laine Stump wrote:
> After a crash course in kernel building from Alex, I bisected down
> to commit aafd8ba - a kernel built without this commit succeeds in
> setting up all the devices mentioned, adding it causes failure (and
> a very long delay during boot). Joerg, do you have any ideas for
> debugging the problem further to see what in the commit causes this
> problem? (note that 2 other people with the same chipset but
> slightly different hardware plugged into it report no failure - see
> the other replies to the parent of this message for more detail).
> I'm happy to build a kernel with any suggested patches and report
> results...
> 
> commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
> Author: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
> Date:   Thu May 28 18:41:39 2015 +0200
> 
>     iommu/amd: Implement add_device and remove_device
> 
>     Implement these two iommu-ops call-backs to make use of the
>     initialization and notifier features of the iommu core.
> 
>     Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>

I have no idea yet how this patch causes your regression. You certainly
already posted it, but since I was not on Cc, can you please give me an
overview about the problem you are seeing with this patch?



	Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]         ` <20151118151841.GA2517-l3A5Bk7waGM@public.gmane.org>
@ 2015-12-02 19:56           ` Laine Stump
       [not found]             ` <565F4CF5.90107-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Laine Stump @ 2015-12-02 19:56 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 11/18/2015 10:18 AM, Joerg Roedel wrote:
> Hello Laine,
>
> On Thu, Nov 12, 2015 at 12:33:53PM -0500, Laine Stump wrote:
>> After a crash course in kernel building from Alex, I bisected down
>> to commit aafd8ba - a kernel built without this commit succeeds in
>> setting up all the devices mentioned, adding it causes failure (and
>> a very long delay during boot). Joerg, do you have any ideas for
>> debugging the problem further to see what in the commit causes this
>> problem? (note that 2 other people with the same chipset but
>> slightly different hardware plugged into it report no failure - see
>> the other replies to the parent of this message for more detail).
>> I'm happy to build a kernel with any suggested patches and report
>> results...
>>
>> commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
>> Author: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
>> Date:   Thu May 28 18:41:39 2015 +0200
>>
>>      iommu/amd: Implement add_device and remove_device
>>
>>      Implement these two iommu-ops call-backs to make use of the
>>      initialization and notifier features of the iommu core.
>>
>>      Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
>
> I have no idea yet how this patch causes your regression. You certainly
> already posted it, but since I was not on Cc, can you please give me an
> overview about the problem you are seeing with this patch?

Sure. Sorry it took so long to get back to you. (My to-do list keeps 
getting longer instead of shorter, and I'm thrashing a bit).

Here's my original description, along with some questions from Alex and 
my responses:

On 11/05/2015 02:05 PM, Laine Stump wrote:
 > On 11/04/2015 04:08 PM, Alex Williamson wrote:
 >> On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
 >>> Last week I upgraded my Fedora 22 AMD 990FX system from kernel 
4.1.10 to
 >>> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
 >>>
 >>> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
 >>> Azalia (Intel HDA) (rev 40)
 >>>
 >>> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
 >>> Network Connection
 >>>
 >>> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
 >>> HDMI Audio [Radeon HD 5400/6300 Series]
 >>>
 >>> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind
 >>> an AMD RD890 pci-pci bridge. There may be other devices failing,
 >>> but these are the ones immediately obvious.)
 >>>
 >>> Whatever is the source of the failure, it ends up that the drivers
 >>> for these devices aren't loaded.
 >>>
 >>> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
 >>> and magically all the devices resumed normal operation (except that
 >>> I can't do vfio device assignment because the IOMMU is disabled).
 >>>
 >>> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
 >>> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
 >>> three are the only pre-built kernels for F22). I can provide dmesg /
 >>> lspci output from each of these, or any other debug info anyone
 >>> might like me to gather.
 >>
 >> I built a 4.2.3 kernel for my 990fx system and can't seem to
 >> reproduce it.  Does 'lspci -k' for those devices show any driver?
 >
 > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
 > Azalia (Intel HDA) (rev 40)
 >      Subsystem: Gigabyte Technology Co., Ltd Device a132
 >      Kernel modules: snd_hda_intel
 > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI
 > Audio [Radeon HD 5400/6300 Series]
 >      Subsystem: Gigabyte Technology Co., Ltd Device aa68
 >      Kernel modules: snd_hda_intel
 > 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
 > Connection (rev 01)
 >      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
 >      Kernel driver in use: igb
 >      Kernel modules: igb
 > 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
 > Connection (rev 01)
 >      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
 >      Kernel modules: igb
 >
 > /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link
 > from driver to ........drivers/igb, but .......:02::00.1 doesn't
 > have a link, and neither of them shows up in /sys/class/net.
 >
 > Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at
 > 00:02.0), the .0 device does have a link to the radeon driver, but
 > the .1 device (which is the sound device on the radeon video card)
 > has no driver link.
 >
 > And 00:14.2 (the motherboard integrated sound device) shows no driver
 > link in sysfs either.
 >
 >> Does 'lsmod'
 >> show the drivers loaded, igb and snd_hda_intel?  If not, does
 >> manually modprobe'ing either of those drivers change anything?
 >
 > Both of those drivers show up in lsmod output.
 >
 >> You haven't
 >> installed a script that writes to driver_override or setup a
 >> configuration where those devices are claimed by pci-stub and
 >> forgotten about it, have you? (it's happened to me)
 >
 > Not that I'm aware of. /etc/modules.d/local.conf had a few stray very
 > old items that I'd forgotten about, but I removed those and the
 > results are the same.
 >
 >> Otherwise, dmesg is probably a good place to start.

On 11/08/2015 11:52 AM, Laine Stump wrote:
 > Here is the dmesg
 > with IOMMU enabled in the BIOS (i.e. the devices *don't* work):
 >
 >    http://fpaste.org/296772/14490851/
 >
 > and here is is when IOMMU has been *disabled* in the BIOS (the
 > devices *do* work):
 >
 >    http://fpaste.org/296774/44908550/
 >

(I refreshed those links since they were almost a month old).

It was after getting the above dmesg's that I bisected kernel builds 
down to aafd8ba. If it would help, I can provide dmesg from just 
before/after that commit, with any sort of extra debugging you'd like 
turned on, or if you have a patch you'd like tested (or just something 
to add extra debugging) I'm happy to do that to. Since this is my main 
test machine for vfio device assignment, I'm open to do just about 
anything to help figure out the problem, but don't really have the 
knowledge to figure it out myself. :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]             ` <565F4CF5.90107-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-20 12:39               ` Joerg Roedel
  2016-01-20 14:10               ` Baoquan He
  1 sibling, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2016-01-20 12:39 UTC (permalink / raw)
  To: Laine Stump; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Laine,

On Wed, Dec 02, 2015 at 02:56:37PM -0500, Laine Stump wrote:
> On 11/08/2015 11:52 AM, Laine Stump wrote:
> > Here is the dmesg
> > with IOMMU enabled in the BIOS (i.e. the devices *don't* work):
> >
> >    http://fpaste.org/296772/14490851/
> >
> > and here is is when IOMMU has been *disabled* in the BIOS (the
> > devices *do* work):
> >
> >    http://fpaste.org/296774/44908550/
> >

Sorry, it took me long to look into this problem and now the links above
disappeared. Can you please upload a dmesg with iommu enabled again?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]             ` <565F4CF5.90107-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-01-20 12:39               ` Joerg Roedel
@ 2016-01-20 14:10               ` Baoquan He
       [not found]                 ` <20160120141025.GA13677-ejN7fcUYdH/by3iVrkZq2A@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Baoquan He @ 2016-01-20 14:10 UTC (permalink / raw)
  To: Laine Stump
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel

I found it archived in this place well:

https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg10687.html

But pasted dmesg has been lost. putting "lspci -tv" and "lspci -vvv" is
more helpful.

Besides does it work with latest kernel?

Thanks
Baoquan

On 12/02/15 at 02:56pm, Laine Stump wrote:
> On 11/18/2015 10:18 AM, Joerg Roedel wrote:
> >Hello Laine,
> >
> >On Thu, Nov 12, 2015 at 12:33:53PM -0500, Laine Stump wrote:
> >>After a crash course in kernel building from Alex, I bisected down
> >>to commit aafd8ba - a kernel built without this commit succeeds in
> >>setting up all the devices mentioned, adding it causes failure (and
> >>a very long delay during boot). Joerg, do you have any ideas for
> >>debugging the problem further to see what in the commit causes this
> >>problem? (note that 2 other people with the same chipset but
> >>slightly different hardware plugged into it report no failure - see
> >>the other replies to the parent of this message for more detail).
> >>I'm happy to build a kernel with any suggested patches and report
> >>results...
> >>
> >>commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
> >>Author: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
> >>Date:   Thu May 28 18:41:39 2015 +0200
> >>
> >>     iommu/amd: Implement add_device and remove_device
> >>
> >>     Implement these two iommu-ops call-backs to make use of the
> >>     initialization and notifier features of the iommu core.
> >>
> >>     Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
> >
> >I have no idea yet how this patch causes your regression. You certainly
> >already posted it, but since I was not on Cc, can you please give me an
> >overview about the problem you are seeing with this patch?
> 
> Sure. Sorry it took so long to get back to you. (My to-do list keeps
> getting longer instead of shorter, and I'm thrashing a bit).
> 
> Here's my original description, along with some questions from Alex
> and my responses:
> 
> On 11/05/2015 02:05 PM, Laine Stump wrote:
> > On 11/04/2015 04:08 PM, Alex Williamson wrote:
> >> On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
> >>> Last week I upgraded my Fedora 22 AMD 990FX system from kernel
> 4.1.10 to
> >>> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
> >>>
> >>> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> >>> Azalia (Intel HDA) (rev 40)
> >>>
> >>> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
> >>> Network Connection
> >>>
> >>> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
> >>> HDMI Audio [Radeon HD 5400/6300 Series]
> >>>
> >>> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind
> >>> an AMD RD890 pci-pci bridge. There may be other devices failing,
> >>> but these are the ones immediately obvious.)
> >>>
> >>> Whatever is the source of the failure, it ends up that the drivers
> >>> for these devices aren't loaded.
> >>>
> >>> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
> >>> and magically all the devices resumed normal operation (except that
> >>> I can't do vfio device assignment because the IOMMU is disabled).
> >>>
> >>> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
> >>> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
> >>> three are the only pre-built kernels for F22). I can provide dmesg /
> >>> lspci output from each of these, or any other debug info anyone
> >>> might like me to gather.
> >>
> >> I built a 4.2.3 kernel for my 990fx system and can't seem to
> >> reproduce it.  Does 'lspci -k' for those devices show any driver?
> >
> > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> > Azalia (Intel HDA) (rev 40)
> >      Subsystem: Gigabyte Technology Co., Ltd Device a132
> >      Kernel modules: snd_hda_intel
> > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI
> > Audio [Radeon HD 5400/6300 Series]
> >      Subsystem: Gigabyte Technology Co., Ltd Device aa68
> >      Kernel modules: snd_hda_intel
> > 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> > Connection (rev 01)
> >      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
> >      Kernel driver in use: igb
> >      Kernel modules: igb
> > 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> > Connection (rev 01)
> >      Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
> >      Kernel modules: igb
> >
> > /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link
> > from driver to ........drivers/igb, but .......:02::00.1 doesn't
> > have a link, and neither of them shows up in /sys/class/net.
> >
> > Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at
> > 00:02.0), the .0 device does have a link to the radeon driver, but
> > the .1 device (which is the sound device on the radeon video card)
> > has no driver link.
> >
> > And 00:14.2 (the motherboard integrated sound device) shows no driver
> > link in sysfs either.
> >
> >> Does 'lsmod'
> >> show the drivers loaded, igb and snd_hda_intel?  If not, does
> >> manually modprobe'ing either of those drivers change anything?
> >
> > Both of those drivers show up in lsmod output.
> >
> >> You haven't
> >> installed a script that writes to driver_override or setup a
> >> configuration where those devices are claimed by pci-stub and
> >> forgotten about it, have you? (it's happened to me)
> >
> > Not that I'm aware of. /etc/modules.d/local.conf had a few stray very
> > old items that I'd forgotten about, but I removed those and the
> > results are the same.
> >
> >> Otherwise, dmesg is probably a good place to start.
> 
> On 11/08/2015 11:52 AM, Laine Stump wrote:
> > Here is the dmesg
> > with IOMMU enabled in the BIOS (i.e. the devices *don't* work):
> >
> >    http://fpaste.org/296772/14490851/
> >
> > and here is is when IOMMU has been *disabled* in the BIOS (the
> > devices *do* work):
> >
> >    http://fpaste.org/296774/44908550/
> >
> 
> (I refreshed those links since they were almost a month old).
> 
> It was after getting the above dmesg's that I bisected kernel builds
> down to aafd8ba. If it would help, I can provide dmesg from just
> before/after that commit, with any sort of extra debugging you'd
> like turned on, or if you have a patch you'd like tested (or just
> something to add extra debugging) I'm happy to do that to. Since
> this is my main test machine for vfio device assignment, I'm open to
> do just about anything to help figure out the problem, but don't
> really have the knowledge to figure it out myself. :-)
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]                 ` <20160120141025.GA13677-ejN7fcUYdH/by3iVrkZq2A@public.gmane.org>
@ 2016-01-20 14:43                   ` Laine Stump
       [not found]                     ` <569F9D0E.20309-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Laine Stump @ 2016-01-20 14:43 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Joerg Roedel

On 01/20/2016 09:10 AM, Baoquan He wrote:
> I found it archived in this place well:
>
> https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg10687.html
>
> But pasted dmesg has been lost. putting "lspci -tv" and "lspci -vvv" is
> more helpful.

Sure, I'll boot it with the two kernels again today and recollect 
everything.


> Besides does it work with latest kernel?

I haven't tried the latest upstream recently, but the latest available 
pre-built for Fedora 23 (4.2.8-300.fc23) is even worse - at the place 
where it would previously hang for ~3 minutes, it now hangs "forever" (I 
accidentally rebooted with that kernel and left without checking; 
several hours later when I returned it was still hung).

I'll also grab the latest upstream sources and build/test that today.

> Thanks
> Baoquan
>
> On 12/02/15 at 02:56pm, Laine Stump wrote:
>> On 11/18/2015 10:18 AM, Joerg Roedel wrote:
>>> Hello Laine,
>>>
>>> On Thu, Nov 12, 2015 at 12:33:53PM -0500, Laine Stump wrote:
>>>> After a crash course in kernel building from Alex, I bisected down
>>>> to commit aafd8ba - a kernel built without this commit succeeds in
>>>> setting up all the devices mentioned, adding it causes failure (and
>>>> a very long delay during boot). Joerg, do you have any ideas for
>>>> debugging the problem further to see what in the commit causes this
>>>> problem? (note that 2 other people with the same chipset but
>>>> slightly different hardware plugged into it report no failure - see
>>>> the other replies to the parent of this message for more detail).
>>>> I'm happy to build a kernel with any suggested patches and report
>>>> results...
>>>>
>>>> commit aafd8ba0ca74894b9397e412bbd7f8ea2662ead8
>>>> Author: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
>>>> Date:   Thu May 28 18:41:39 2015 +0200
>>>>
>>>>      iommu/amd: Implement add_device and remove_device
>>>>
>>>>      Implement these two iommu-ops call-backs to make use of the
>>>>      initialization and notifier features of the iommu core.
>>>>
>>>>      Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
>>>
>>> I have no idea yet how this patch causes your regression. You certainly
>>> already posted it, but since I was not on Cc, can you please give me an
>>> overview about the problem you are seeing with this patch?
>>
>> Sure. Sorry it took so long to get back to you. (My to-do list keeps
>> getting longer instead of shorter, and I'm thrashing a bit).
>>
>> Here's my original description, along with some questions from Alex
>> and my responses:
>>
>> On 11/05/2015 02:05 PM, Laine Stump wrote:
>>> On 11/04/2015 04:08 PM, Alex Williamson wrote:
>>>> On Wed, 2015-11-04 at 12:24 -0500, Laine Stump wrote:
>>>>> Last week I upgraded my Fedora 22 AMD 990FX system from kernel
>> 4.1.10 to
>>>>> 4.2.3 (standard Fedora builds) and multiple devices stopped working:
>>>>>
>>>>> * 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
>>>>> Azalia (Intel HDA) (rev 40)
>>>>>
>>>>> * 02:00.[01] Ethernet controller: Intel Corporation 82576 Gigabit
>>>>> Network Connection
>>>>>
>>>>> * 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar
>>>>> HDMI Audio [Radeon HD 5400/6300 Series]
>>>>>
>>>>> (The 1st is integrated on the motherboard, the 2nd & 3rd are behind
>>>>> an AMD RD890 pci-pci bridge. There may be other devices failing,
>>>>> but these are the ones immediately obvious.)
>>>>>
>>>>> Whatever is the source of the failure, it ends up that the drivers
>>>>> for these devices aren't loaded.
>>>>>
>>>>> At Alex Williamson's suggestion, I tried disabling IOMMU in the BIOS,
>>>>> and magically all the devices resumed normal operation (except that
>>>>> I can't do vfio device assignment because the IOMMU is disabled).
>>>>>
>>>>> Reverting to kernel 4.1.10 very definitely eliminates the problem. I've
>>>>> also tried kernel 4.2.5 and it has the same problem as 4.2.3 (these
>>>>> three are the only pre-built kernels for F22). I can provide dmesg /
>>>>> lspci output from each of these, or any other debug info anyone
>>>>> might like me to gather.
>>>>
>>>> I built a 4.2.3 kernel for my 990fx system and can't seem to
>>>> reproduce it.  Does 'lspci -k' for those devices show any driver?
>>>
>>> 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
>>> Azalia (Intel HDA) (rev 40)
>>>       Subsystem: Gigabyte Technology Co., Ltd Device a132
>>>       Kernel modules: snd_hda_intel
>>> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI
>>> Audio [Radeon HD 5400/6300 Series]
>>>       Subsystem: Gigabyte Technology Co., Ltd Device aa68
>>>       Kernel modules: snd_hda_intel
>>> 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
>>> Connection (rev 01)
>>>       Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
>>>       Kernel driver in use: igb
>>>       Kernel modules: igb
>>> 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
>>> Connection (rev 01)
>>>       Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
>>>       Kernel modules: igb
>>>
>>> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0 does show a link
>>> from driver to ........drivers/igb, but .......:02::00.1 doesn't
>>> have a link, and neither of them shows up in /sys/class/net.
>>>
>>> Similarly for 01:00.[01] (which are behind the PCI to PCI bridge at
>>> 00:02.0), the .0 device does have a link to the radeon driver, but
>>> the .1 device (which is the sound device on the radeon video card)
>>> has no driver link.
>>>
>>> And 00:14.2 (the motherboard integrated sound device) shows no driver
>>> link in sysfs either.
>>>
>>>> Does 'lsmod'
>>>> show the drivers loaded, igb and snd_hda_intel?  If not, does
>>>> manually modprobe'ing either of those drivers change anything?
>>>
>>> Both of those drivers show up in lsmod output.
>>>
>>>> You haven't
>>>> installed a script that writes to driver_override or setup a
>>>> configuration where those devices are claimed by pci-stub and
>>>> forgotten about it, have you? (it's happened to me)
>>>
>>> Not that I'm aware of. /etc/modules.d/local.conf had a few stray very
>>> old items that I'd forgotten about, but I removed those and the
>>> results are the same.
>>>
>>>> Otherwise, dmesg is probably a good place to start.
>>
>> On 11/08/2015 11:52 AM, Laine Stump wrote:
>>> Here is the dmesg
>>> with IOMMU enabled in the BIOS (i.e. the devices *don't* work):
>>>
>>>     http://fpaste.org/296772/14490851/
>>>
>>> and here is is when IOMMU has been *disabled* in the BIOS (the
>>> devices *do* work):
>>>
>>>     http://fpaste.org/296774/44908550/
>>>
>>
>> (I refreshed those links since they were almost a month old).
>>
>> It was after getting the above dmesg's that I bisected kernel builds
>> down to aafd8ba. If it would help, I can provide dmesg from just
>> before/after that commit, with any sort of extra debugging you'd
>> like turned on, or if you have a patch you'd like tested (or just
>> something to add extra debugging) I'm happy to do that to. Since
>> this is my main test machine for vfio device assignment, I'm open to
>> do just about anything to help figure out the problem, but don't
>> really have the knowledge to figure it out myself. :-)
>>
>> _______________________________________________
>> iommu mailing list
>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled
       [not found]                     ` <569F9D0E.20309-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-02-05 21:09                       ` Laine Stump
  0 siblings, 0 replies; 12+ messages in thread
From: Laine Stump @ 2016-02-05 21:09 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Joerg Roedel

On 01/20/2016 09:43 AM, Laine Stump wrote:
> On 01/20/2016 09:10 AM, Baoquan He wrote:
>> I found it archived in this place well:
>>
>> https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg10687.html
>>
>>
>> But pasted dmesg has been lost. putting "lspci -tv" and "lspci -vvv" is
>> more helpful.
>
> Sure, I'll boot it with the two kernels again today and recollect
> everything.
>
>
>> Besides does it work with latest kernel?
>
> I haven't tried the latest upstream recently, but the latest available
> pre-built for Fedora 23 (4.2.8-300.fc23) is even worse - at the place
> where it would previously hang for ~3 minutes, it now hangs "forever" (I
> accidentally rebooted with that kernel and left without checking;
> several hours later when I returned it was still hung).
>
> I'll also grab the latest upstream sources and build/test that today.

I finally built a 4.5.0-rc2+ kernel, and found that the problem has 
disappeared. So I also tried a locally built 4.3.0 (broken) and 4.4.0 
(works).

After another day of git bisect between v4.3 and v4.4, here's what I found:

commit 30e2561b95295258890b4e0366ce867e04d34a97 fails to boot
commit cbfe360a1541a32e9e28f8f8ac925d2b7979d767 works

It's notable that cbfe360a is in the igb driver, and I have an 82576 
card (which uses the igb driver) in my system. However:

1) That's not really related to the commit that seems to have caused the 
breakage (aafd8ba0ca74894b9397e412bbd7f8ea2662ead8) is it?

2) If I create a branch off of aafd8ba0c (or even v4.2) and cherry-pick 
commit cbfe360a (and ceee3450 to avoid a merge conflict) the result 
still fails to boot, so it's not a simple thing that just a patch or two 
can fix.

3) The good news: if I cherry-pick commit cbfe360a on top of v4.3 
(Fedora is currently using kernel 4.3) then the problem will be solved 
without needing to constantly switch back to a locally built kernel 
after every update.

My current working theory is that the changes in AMD iommu uncovered a 
latent bug in the igb driver, and that a series of patches to the igb 
driver (ending with cbfe360a) fixed that bug. I can't think of any other 
way to explain it (and I've rebuilt/retested on either side of every 
involved commit multiple times to verify the behavior).

So thanks for your interest, but I'm happy to say that this seems to be 
someone else's problem :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-02-05 21:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 17:24 Regression in kernel 4.2.3+ (relative to 4.1.10) on AMD 990FX system with IOMMU enabled Laine Stump
     [not found] ` <563A3F64.50808-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-11-04 21:08   ` Alex Williamson
     [not found]     ` <1446671291.3692.147.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-11-05 14:11       ` Mark Hounschell
2015-11-05 19:05       ` Laine Stump
     [not found]         ` <563BA893.4020202-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-11-08 16:52           ` Laine Stump
2015-11-12 17:33   ` Laine Stump
     [not found]     ` <5644CD81.2020304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-11-18 15:18       ` Joerg Roedel
     [not found]         ` <20151118151841.GA2517-l3A5Bk7waGM@public.gmane.org>
2015-12-02 19:56           ` Laine Stump
     [not found]             ` <565F4CF5.90107-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-01-20 12:39               ` Joerg Roedel
2016-01-20 14:10               ` Baoquan He
     [not found]                 ` <20160120141025.GA13677-ejN7fcUYdH/by3iVrkZq2A@public.gmane.org>
2016-01-20 14:43                   ` Laine Stump
     [not found]                     ` <569F9D0E.20309-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-05 21:09                       ` Laine Stump

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.