All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.6.11  AMD-Vi: Completion-Wait loop timed out
@ 2013-01-20 10:33 Udo van den Heuvel
  2013-01-20 10:36 ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-20 10:33 UTC (permalink / raw)
  To: linux-kernel


Hello,

See below for a part of the logging on this F2A85X-UP4 with AMD
a10-5800k. Box was raid checking I guess.


Jan 20 03:42:08 s3 rsyslogd: [origin software="rsyslogd"
swVersion="5.8.10" x-pid="3031" x-info="http://www.rsyslog.com"]
rsyslogd was HUPed
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:18 s3 kernel: ------------[ cut here ]------------
Jan 20 04:11:18 s3 kernel: WARNING: at drivers/iommu/amd_iommu.c:1104
__domain_flush_pages+0x1ad/0x1b0()
Jan 20 04:11:18 s3 kernel: Hardware name: To be filled by O.E.M.
Jan 20 04:11:18 s3 kernel: Modules linked in: vfat fat usb_storage pwc
udf crc_itu_t nfsv3 nfs bnep bluetooth fuse cpufreq_userspace
nf_conntrack_netbios_ns eeprom nf_conntrack_broadcast ipt_REJECT
ip6t_REJECT it87 iptable_filter hwmon_vid xt_tcpudp ipt_MASQUERADE
nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat nf_conntrack_ipv4
xt_state nf_defrag_ipv4 nf_conntrack ip6table_filter ip_tables
ip6_tables x_tables dm_mirror dm_region_hash dm_log ext2 snd_usb_audio
snd_usbmidi_lib snd_hwdep snd_rawmidi snd_hda_codec_realtek
videobuf2_vmalloc videobuf2_memops videobuf2_core cdc_ether hid_generic
videodev binfmt_misc radeon cfbfillrect snd_hda_intel cfbimgblt
snd_hda_codec fbcon bitblit cfbcopyarea snd_seq softcursor i2c_algo_bit
snd_seq_device font backlight powernow_k8 mperf drm_kms_helper kvm_amd
ttm snd_pcm kvm drm fb snd_page_alloc snd_timer snd fbdev k10temp
microcode evdev i2c_piix4 xhci_hcd button nfsd exportfs auth_rpcgss
nfs_acl lockd sunrpc autofs4 usbhid ehci_hcd ohci_hcd sr_mod cdrom [last
unloaded
Jan 20 04:11:18 s3 kernel: : pwc]
Jan 20 04:11:18 s3 kernel: Pid: 506, comm: irq/43-ahci Not tainted
3.6.11 #19
Jan 20 04:11:18 s3 kernel: Call Trace:
Jan 20 04:11:18 s3 kernel: [<ffffffff8103c679>] ?
warn_slowpath_common+0x79/0xc0
Jan 20 04:11:18 s3 kernel: [<ffffffff8134598d>] ?
__domain_flush_pages+0x1ad/0x1b0
Jan 20 04:11:18 s3 kernel: [<ffffffff81345c2b>] ?
__unmap_single.isra.24+0xdb/0x110
Jan 20 04:11:18 s3 kernel: [<ffffffff81346615>] ? unmap_sg+0x55/0xb0
Jan 20 04:11:18 s3 kernel: [<ffffffff812a9a61>] ? ata_sg_clean+0x61/0xd0
Jan 20 04:11:18 s3 kernel: [<ffffffff812b039d>] ?
ata_scsi_qc_complete+0x5d/0x420
Jan 20 04:11:18 s3 kernel: [<ffffffff812a9cc0>] ?
__ata_qc_complete+0x40/0x130
Jan 20 04:11:18 s3 kernel: [<ffffffff812aa05a>] ?
ata_qc_complete_multiple+0x7a/0xc0
Jan 20 04:11:30 s3 kernel: [<ffffffff812c1d2f>] ? ahci_interrupt+0xaf/0x710
Jan 20 04:11:30 s3 kernel: [<ffffffff8109c8e0>] ? irq_thread_fn+0x40/0x40
Jan 20 04:11:30 s3 kernel: [<ffffffff8109c903>] ?
irq_forced_thread_fn+0x23/0x50
Jan 20 04:11:30 s3 kernel: [<ffffffff8109c67b>] ? irq_thread+0x11b/0x180
Jan 20 04:11:30 s3 kernel: [<ffffffff81060d8c>] ? __wake_up_common+0x4c/0x80
Jan 20 04:11:30 s3 kernel: [<ffffffff8109c7e0>] ?
irq_finalize_oneshot+0x100/0x100
Jan 20 04:11:30 s3 kernel: [<ffffffff8109c560>] ?
wake_threads_waitq+0x50/0x50
Jan 20 04:11:30 s3 kernel: [<ffffffff81058d75>] ? kthread+0x85/0x90
Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec34>] ?
kernel_thread_helper+0x4/0x10
Jan 20 04:11:30 s3 kernel: [<ffffffff81058cf0>] ?
kthread_freezable_should_stop+0x50/0x50
Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec30>] ? gs_change+0xb/0xb
Jan 20 04:11:30 s3 kernel: ---[ end trace 73ac82546fadadb1 ]---
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
Jan 20 04:11:30 s3 kernel: ------------[ cut here ]------------

And many more of the WARNINGs.

What went wrong? How to fix?


Kind regards,
Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 10:33 3.6.11 AMD-Vi: Completion-Wait loop timed out Udo van den Heuvel
@ 2013-01-20 10:36 ` Borislav Petkov
  2013-01-20 10:40   ` Udo van den Heuvel
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2013-01-20 10:36 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: linux-kernel, Jörg Rödel

I know just the guy, CCed. :-)

On Sun, Jan 20, 2013 at 11:33:19AM +0100, Udo van den Heuvel wrote:
> 
> Hello,
> 
> See below for a part of the logging on this F2A85X-UP4 with AMD
> a10-5800k. Box was raid checking I guess.
> 
> 
> Jan 20 03:42:08 s3 rsyslogd: [origin software="rsyslogd"
> swVersion="5.8.10" x-pid="3031" x-info="http://www.rsyslog.com"]
> rsyslogd was HUPed
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:18 s3 kernel: ------------[ cut here ]------------
> Jan 20 04:11:18 s3 kernel: WARNING: at drivers/iommu/amd_iommu.c:1104
> __domain_flush_pages+0x1ad/0x1b0()
> Jan 20 04:11:18 s3 kernel: Hardware name: To be filled by O.E.M.
> Jan 20 04:11:18 s3 kernel: Modules linked in: vfat fat usb_storage pwc
> udf crc_itu_t nfsv3 nfs bnep bluetooth fuse cpufreq_userspace
> nf_conntrack_netbios_ns eeprom nf_conntrack_broadcast ipt_REJECT
> ip6t_REJECT it87 iptable_filter hwmon_vid xt_tcpudp ipt_MASQUERADE
> nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat nf_conntrack_ipv4
> xt_state nf_defrag_ipv4 nf_conntrack ip6table_filter ip_tables
> ip6_tables x_tables dm_mirror dm_region_hash dm_log ext2 snd_usb_audio
> snd_usbmidi_lib snd_hwdep snd_rawmidi snd_hda_codec_realtek
> videobuf2_vmalloc videobuf2_memops videobuf2_core cdc_ether hid_generic
> videodev binfmt_misc radeon cfbfillrect snd_hda_intel cfbimgblt
> snd_hda_codec fbcon bitblit cfbcopyarea snd_seq softcursor i2c_algo_bit
> snd_seq_device font backlight powernow_k8 mperf drm_kms_helper kvm_amd
> ttm snd_pcm kvm drm fb snd_page_alloc snd_timer snd fbdev k10temp
> microcode evdev i2c_piix4 xhci_hcd button nfsd exportfs auth_rpcgss
> nfs_acl lockd sunrpc autofs4 usbhid ehci_hcd ohci_hcd sr_mod cdrom [last
> unloaded
> Jan 20 04:11:18 s3 kernel: : pwc]
> Jan 20 04:11:18 s3 kernel: Pid: 506, comm: irq/43-ahci Not tainted
> 3.6.11 #19
> Jan 20 04:11:18 s3 kernel: Call Trace:
> Jan 20 04:11:18 s3 kernel: [<ffffffff8103c679>] ?
> warn_slowpath_common+0x79/0xc0
> Jan 20 04:11:18 s3 kernel: [<ffffffff8134598d>] ?
> __domain_flush_pages+0x1ad/0x1b0
> Jan 20 04:11:18 s3 kernel: [<ffffffff81345c2b>] ?
> __unmap_single.isra.24+0xdb/0x110
> Jan 20 04:11:18 s3 kernel: [<ffffffff81346615>] ? unmap_sg+0x55/0xb0
> Jan 20 04:11:18 s3 kernel: [<ffffffff812a9a61>] ? ata_sg_clean+0x61/0xd0
> Jan 20 04:11:18 s3 kernel: [<ffffffff812b039d>] ?
> ata_scsi_qc_complete+0x5d/0x420
> Jan 20 04:11:18 s3 kernel: [<ffffffff812a9cc0>] ?
> __ata_qc_complete+0x40/0x130
> Jan 20 04:11:18 s3 kernel: [<ffffffff812aa05a>] ?
> ata_qc_complete_multiple+0x7a/0xc0
> Jan 20 04:11:30 s3 kernel: [<ffffffff812c1d2f>] ? ahci_interrupt+0xaf/0x710
> Jan 20 04:11:30 s3 kernel: [<ffffffff8109c8e0>] ? irq_thread_fn+0x40/0x40
> Jan 20 04:11:30 s3 kernel: [<ffffffff8109c903>] ?
> irq_forced_thread_fn+0x23/0x50
> Jan 20 04:11:30 s3 kernel: [<ffffffff8109c67b>] ? irq_thread+0x11b/0x180
> Jan 20 04:11:30 s3 kernel: [<ffffffff81060d8c>] ? __wake_up_common+0x4c/0x80
> Jan 20 04:11:30 s3 kernel: [<ffffffff8109c7e0>] ?
> irq_finalize_oneshot+0x100/0x100
> Jan 20 04:11:30 s3 kernel: [<ffffffff8109c560>] ?
> wake_threads_waitq+0x50/0x50
> Jan 20 04:11:30 s3 kernel: [<ffffffff81058d75>] ? kthread+0x85/0x90
> Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec34>] ?
> kernel_thread_helper+0x4/0x10
> Jan 20 04:11:30 s3 kernel: [<ffffffff81058cf0>] ?
> kthread_freezable_should_stop+0x50/0x50
> Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec30>] ? gs_change+0xb/0xb
> Jan 20 04:11:30 s3 kernel: ---[ end trace 73ac82546fadadb1 ]---
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out
> Jan 20 04:11:30 s3 kernel: ------------[ cut here ]------------
> 
> And many more of the WARNINGs.
> 
> What went wrong? How to fix?
> 
> 
> Kind regards,
> Udo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 10:36 ` Borislav Petkov
@ 2013-01-20 10:40   ` Udo van den Heuvel
  2013-01-20 11:19     ` Jörg Rödel
  0 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-20 10:40 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel; +Cc: linux-kernel

Hello,

On 2013-01-20 11:36, Borislav Petkov wrote:
> I know just the guy, CCed. :-)

Thanks for the quick response!
I found this similar case:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384


Kind regards,
Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 10:40   ` Udo van den Heuvel
@ 2013-01-20 11:19     ` Jörg Rödel
  2013-01-20 11:25       ` Udo van den Heuvel
  0 siblings, 1 reply; 38+ messages in thread
From: Jörg Rödel @ 2013-01-20 11:19 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: Borislav Petkov, linux-kernel

On Sun, Jan 20, 2013 at 11:40:20AM +0100, Udo van den Heuvel wrote:
> Hello,
> 
> On 2013-01-20 11:36, Borislav Petkov wrote:
> > I know just the guy, CCed. :-)
> 
> Thanks for the quick response!
> I found this similar case:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384

Yes, this is a Hardware issue for which the BIOS does not apply the
workaround. The only solution for now is to disable the IOMMU on the
Trinity based chips. Unfortunatly I don't have access to the hardware
any longer to write a workaround in the AMD IOMMU driver.

The question is what to do now, I tend to disable the IOMMU if a
Trinity chip is detected. This is not the first report of this problem
I encountered.


Regards,

	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:19     ` Jörg Rödel
@ 2013-01-20 11:25       ` Udo van den Heuvel
  2013-01-20 11:40         ` Jörg Rödel
  0 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-20 11:25 UTC (permalink / raw)
  To: Jörg Rödel; +Cc: Borislav Petkov, linux-kernel

Hello Jörg,

On 2013-01-20 12:19, Jörg Rödel wrote:
> On Sun, Jan 20, 2013 at 11:40:20AM +0100, Udo van den Heuvel wrote:
>> Hello,
>>
>> On 2013-01-20 11:36, Borislav Petkov wrote:
>>> I know just the guy, CCed. :-)
>>
>> Thanks for the quick response!
>> I found this similar case:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384
> 
> Yes, this is a Hardware issue for which the BIOS does not apply the
> workaround.

Hardware issue? What is wrong c.q. happening?

I have this:

# dmesg|grep IOMMU
[    0.000000] ACPI: IVRS 000000009dd12420 00070 (v02  AMD   AMDIOMMU
00000001 AMD  00000000)
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40

So kernel says I have no IOMMU but still one is found? (!?)

> The only solution for now is to disable the IOMMU on the
> Trinity based chips. 

In PC-BIOS I assume?
I did not yet find an option, but this is the first occurrence.
Can the BIOS vendor fix this? If so: please explain so I cna contact
Gigabyte (motherboard manufacturer)

> The question is what to do now, I tend to disable the IOMMU if a
> Trinity chip is detected. This is not the first report of this problem
> I encountered.

I know, see the URL I posted.
What is the impact of disabling the IOMMU?


Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:25       ` Udo van den Heuvel
@ 2013-01-20 11:40         ` Jörg Rödel
  2013-01-20 11:48           ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: Jörg Rödel @ 2013-01-20 11:40 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: Borislav Petkov, linux-kernel

On Sun, Jan 20, 2013 at 12:25:07PM +0100, Udo van den Heuvel wrote:
> Hello Jörg,
> 
> Hardware issue? What is wrong c.q. happening?

I think it falls under Erratum 455 (which does not mention IOMMU
specifically). Point is, there is a hardware workaround for this to make
the IOMMU work, but your BIOS does not enable it.

> 
> I have this:
> 
> # dmesg|grep IOMMU
> [    0.000000] ACPI: IVRS 000000009dd12420 00070 (v02  AMD   AMDIOMMU
> 00000001 AMD  00000000)
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
> 
> So kernel says I have no IOMMU but still one is found? (!?)

The "Please enable IOMMU ..." line tells you about the GART, not the AMD
IOMMU. This is a frequent source of confusion, we probably should remove
that line. Trinity has no GART, so there is nothing to find :)

> In PC-BIOS I assume?
> I did not yet find an option, but this is the first occurrence.
> Can the BIOS vendor fix this? If so: please explain so I cna contact
> Gigabyte (motherboard manufacturer)

Yes, the BIOS vendor can fix this issue. They need to disable NB clock
gating for the IOMMU.

> What is the impact of disabling the IOMMU?

Well, it has some security impact and if you have more than 4GB of RAM
maybe also some slight performance impact due to DMA bounce buffering.
But thats still better as a system that stops working after some time.


	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:40         ` Jörg Rödel
@ 2013-01-20 11:48           ` Borislav Petkov
  2013-01-20 11:50             ` Borislav Petkov
                               ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Borislav Petkov @ 2013-01-20 11:48 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> gating for the IOMMU.

Right, Udo, you can try Gigabyte first.

Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
Jacob could help. CCed.

Guys, the error description is at
http://marc.info/?l=linux-kernel&m=135867802432660

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:48           ` Borislav Petkov
@ 2013-01-20 11:50             ` Borislav Petkov
  2013-01-20 11:59               ` Udo van den Heuvel
  2013-01-20 11:52             ` Udo van den Heuvel
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2013-01-20 11:50 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> > Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> > gating for the IOMMU.
> 
> Right, Udo, you can try Gigabyte first.

Btw, you're running the latest BIOS from them, I assume?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:48           ` Borislav Petkov
  2013-01-20 11:50             ` Borislav Petkov
@ 2013-01-20 11:52             ` Udo van den Heuvel
  2013-01-20 11:57             ` Jörg Rödel
  2013-01-21 16:04             ` Jacob Shin
  3 siblings, 0 replies; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-20 11:52 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On 2013-01-20 12:48, Borislav Petkov wrote:
> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
>> Yes, the BIOS vendor can fix this issue. They need to disable NB clock
>> gating for the IOMMU.
> 
> Right, Udo, you can try Gigabyte first.

I just did so and referred to the kernel.org bugzilla.

> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
> Jacob could help. CCed.

That would be most helpful!
Of course I can help testing but the issue happened only 1 time so far.

The person from
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384 had more
bad luck in experiencing the issue.

Thanks,
Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:48           ` Borislav Petkov
  2013-01-20 11:50             ` Borislav Petkov
  2013-01-20 11:52             ` Udo van den Heuvel
@ 2013-01-20 11:57             ` Jörg Rödel
  2013-01-21 13:09               ` Borislav Petkov
  2013-01-21 14:37               ` Boris Ostrovsky
  2013-01-21 16:04             ` Jacob Shin
  3 siblings, 2 replies; 38+ messages in thread
From: Jörg Rödel @ 2013-01-20 11:57 UTC (permalink / raw)
  To: Borislav Petkov, Udo van den Heuvel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> > Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> > gating for the IOMMU.
> 
> Right, Udo, you can try Gigabyte first.
> 
> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
> Jacob could help. CCed.

BorisO is no longer with AMD afaik. I wrote an email to Sherry and
Suravee and asked them to either send me hardware to write the fix on my
own or to send a fix for the issue. Let's see what happens...


	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:50             ` Borislav Petkov
@ 2013-01-20 11:59               ` Udo van den Heuvel
  2013-01-20 12:24                 ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-20 11:59 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On 2013-01-20 12:50, Borislav Petkov wrote:
>> Right, Udo, you can try Gigabyte first.
> 
> Btw, you're running the latest BIOS from them, I assume?

Nope. But I am beyond their first released BIOS, I am running one of
their beta BIOSes. I am two beta updates behind current with F3g.
They list as description for BIOS F3k:

1.    Beta BIOS
2.    Modify option of APU and memory voltage
3.    Modify option of CPU PWM switch rate
4.    Modify memory compatibility
5.    Modify ET6 compatibility


Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:59               ` Udo van den Heuvel
@ 2013-01-20 12:24                 ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2013-01-20 12:24 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Sun, Jan 20, 2013 at 12:59:59PM +0100, Udo van den Heuvel wrote:
> On 2013-01-20 12:50, Borislav Petkov wrote:
> >> Right, Udo, you can try Gigabyte first.
> > 
> > Btw, you're running the latest BIOS from them, I assume?
> 
> Nope. But I am beyond their first released BIOS, I am running one of
> their beta BIOSes. I am two beta updates behind current with F3g.
> They list as description for BIOS F3k:
> 
> 1.    Beta BIOS
> 2.    Modify option of APU and memory voltage
> 3.    Modify option of CPU PWM switch rate
> 4.    Modify memory compatibility
> 5.    Modify ET6 compatibility

Yeah, fixes lists are not always exhaustive, especially with BIOS. You
could try the latest if you can downgrade it easily if something breaks
with F3k.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:57             ` Jörg Rödel
@ 2013-01-21 13:09               ` Borislav Petkov
  2013-01-21 14:10                 ` Udo van den Heuvel
                                   ` (2 more replies)
  2013-01-21 14:37               ` Boris Ostrovsky
  1 sibling, 3 replies; 38+ messages in thread
From: Borislav Petkov @ 2013-01-21 13:09 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Sun, Jan 20, 2013 at 12:57:55PM +0100, Jörg Rödel wrote:
> BorisO is no longer with AMD afaik.

Why am I not surprised...

> I wrote an email to Sherry and Suravee and asked them to either send
> me hardware to write the fix on my own or to send a fix for the issue.
> Let's see what happens...

Btw, while we're at it, here's some more h0rkage from my PD box:

[    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
[    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
[    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
[    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 13:09               ` Borislav Petkov
@ 2013-01-21 14:10                 ` Udo van den Heuvel
  2013-01-21 14:55                   ` Borislav Petkov
  2013-01-21 15:10                 ` Jörg Rödel
  2013-04-21  1:03                 ` Jake
  2 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-21 14:10 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On 2013-01-21 14:09, Borislav Petkov wrote:
>> Let's see what happens...
> 
> Btw, while we're at it, here's some more h0rkage from my PD box:
> 
> [    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
> [    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
> [    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
> [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

I have:

# dmesg|grep -i amd-vi
[    1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    1.125701] AMD-Vi:  Extended features:  PreF PPR GT IA
[    1.131725] AMD-Vi: Lazy IO/TLB flushing enabled


Is that 'OK'?

Udo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:57             ` Jörg Rödel
  2013-01-21 13:09               ` Borislav Petkov
@ 2013-01-21 14:37               ` Boris Ostrovsky
  2013-01-21 14:44                 ` Udo van den Heuvel
  2013-01-21 14:47                 ` Jörg Rödel
  1 sibling, 2 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2013-01-21 14:37 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Borislav Petkov, Udo van den Heuvel, linux-kernel, Jacob Shin



On 01/20/2013 06:57 AM, Jörg Rödel wrote:
> On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
>> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
>>> Yes, the BIOS vendor can fix this issue. They need to disable NB clock
>>> gating for the IOMMU.
>>
>> Right, Udo, you can try Gigabyte first.
>>
>> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
>> Jacob could help. CCed.


Are you talking about erratum 746?

-boris



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 14:37               ` Boris Ostrovsky
@ 2013-01-21 14:44                 ` Udo van den Heuvel
  2013-01-21 14:47                 ` Jörg Rödel
  1 sibling, 0 replies; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-21 14:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Jörg Rödel, Borislav Petkov, linux-kernel, Jacob Shin

On 2013-01-21 15:37, Boris Ostrovsky wrote:
>>> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
>>> Jacob could help. CCed.
> 
> 
> Are you talking about erratum 746?

Link please?
If we have a link I could add that to the Gigabyte case.

Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 14:37               ` Boris Ostrovsky
  2013-01-21 14:44                 ` Udo van den Heuvel
@ 2013-01-21 14:47                 ` Jörg Rödel
  1 sibling, 0 replies; 38+ messages in thread
From: Jörg Rödel @ 2013-01-21 14:47 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Borislav Petkov, Udo van den Heuvel, linux-kernel, Jacob Shin

Hi Boris,

On Mon, Jan 21, 2013 at 09:37:31AM -0500, Boris Ostrovsky wrote:
> Are you talking about erratum 746?

The problems seen here are not about PPR failures, so it is not
particularily this erratum, but the workaround looks similar.


	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 14:10                 ` Udo van den Heuvel
@ 2013-01-21 14:55                   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2013-01-21 14:55 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Mon, Jan 21, 2013 at 03:10:19PM +0100, Udo van den Heuvel wrote:
> On 2013-01-21 14:09, Borislav Petkov wrote:
> >> Let's see what happens...
> > 
> > Btw, while we're at it, here's some more h0rkage from my PD box:
> > 
> > [    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
> > [    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
> > [    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
> > [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)
> 
> I have:
> 
> # dmesg|grep -i amd-vi
> [    1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
> [    1.125701] AMD-Vi:  Extended features:  PreF PPR GT IA
> [    1.131725] AMD-Vi: Lazy IO/TLB flushing enabled
> 
> Is that 'OK'?

That's simply dumping the IOMMU extended features and yes, it is ok.

Mine happen when enabling CONFIG_IRQ_REMAP and they're somewhat related.
Anyways, I decided to show them to Joerg so that he's aware.

Btw, you could try enabling that on your machine and see whether IRQ
remapping works there.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 13:09               ` Borislav Petkov
  2013-01-21 14:10                 ` Udo van den Heuvel
@ 2013-01-21 15:10                 ` Jörg Rödel
  2013-01-21 15:32                   ` Borislav Petkov
  2013-04-21  1:03                 ` Jake
  2 siblings, 1 reply; 38+ messages in thread
From: Jörg Rödel @ 2013-01-21 15:10 UTC (permalink / raw)
  To: Borislav Petkov, Udo van den Heuvel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On Mon, Jan 21, 2013 at 02:09:42PM +0100, Borislav Petkov wrote:
> [    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
> [    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
> [    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
> [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

Yes, that are BIOS bugs too that prevent interrupt remapping to function
reliably. But the good thing is that these bugs can be detected easily
to enable a workaround :-)


	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 15:10                 ` Jörg Rödel
@ 2013-01-21 15:32                   ` Borislav Petkov
  2013-01-21 15:34                     ` Udo van den Heuvel
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2013-01-21 15:32 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin

On Mon, Jan 21, 2013 at 04:10:00PM +0100, Jörg Rödel wrote:
> On Mon, Jan 21, 2013 at 02:09:42PM +0100, Borislav Petkov wrote:
> > [    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
> > [    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
> > [    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
> > [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)
> 
> Yes, that are BIOS bugs too that prevent interrupt remapping to function
> reliably. But the good thing is that these bugs can be detected easily
> to enable a workaround :-)

Well, I'm all ready to test stuff since this is the latest ASUS BIOS
and I'm not even going to ask them to fix it there based on previous
experience with them :-).

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 15:32                   ` Borislav Petkov
@ 2013-01-21 15:34                     ` Udo van den Heuvel
  0 siblings, 0 replies; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-21 15:34 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky, Jacob Shin

On 2013-01-21 16:32, Borislav Petkov wrote:
>>> [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)
>>
>> Yes, that are BIOS bugs too that prevent interrupt remapping to function
>> reliably. But the good thing is that these bugs can be detected easily
>> to enable a workaround :-)
> 
> Well, I'm all ready to test stuff since this is the latest ASUS BIOS
> and I'm not even going to ask them to fix it there based on previous
> experience with them :-).

I too am ready to test but the Gigabyte case is still open and unanswered.

Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-20 11:48           ` Borislav Petkov
                               ` (2 preceding siblings ...)
  2013-01-20 11:57             ` Jörg Rödel
@ 2013-01-21 16:04             ` Jacob Shin
  2013-01-21 22:35               ` Suravee Suthikulpanit
  3 siblings, 1 reply; 38+ messages in thread
From: Jacob Shin @ 2013-01-21 16:04 UTC (permalink / raw)
  To: Borislav Petkov, Jörg Rödel, Udo van den Heuvel,
	linux-kernel, Boris Ostrovsky
  Cc: Suravee.Suthikulpanit

On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> > Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> > gating for the IOMMU.
> 
> Right, Udo, you can try Gigabyte first.
> 
> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
> Jacob could help. CCed.

Hi, yes we will try and reproduce the NB clock gating issue on our
end and submit a patch ASAP.

And Boris P., I think your IOAPIC not in IVRS issue we've also seen
something similar recently (on Xen), so we'll atempt to tackle that
one too afterwards.

-Jacob

> 
> Guys, the error description is at
> http://marc.info/?l=linux-kernel&m=135867802432660
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 16:04             ` Jacob Shin
@ 2013-01-21 22:35               ` Suravee Suthikulpanit
  2013-01-22  3:22                 ` Udo van den Heuvel
  2013-01-22 14:13                 ` Udo van den Heuvel
  0 siblings, 2 replies; 38+ messages in thread
From: Suravee Suthikulpanit @ 2013-01-21 22:35 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Jörg Rödel, Udo van den Heuvel,
	linux-kernel, Boris Ostrovsky

Udo, 

I am trying to debug the issue but need to check one thing on your
system.  Would you please try the following and check the output value
on your system?

# setpci -s 00:00.02 F0.w=90
# setpci -s 00:00.02 F4.w


Thank you,

Suravee


On Mon, 2013-01-21 at 10:04 -0600, Jacob Shin wrote:
> On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
> > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> > > gating for the IOMMU.
> > 
> > Right, Udo, you can try Gigabyte first.
> > 
> > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
> > Jacob could help. CCed.
> 
> Hi, yes we will try and reproduce the NB clock gating issue on our
> end and submit a patch ASAP.
> 
> And Boris P., I think your IOAPIC not in IVRS issue we've also seen
> something similar recently (on Xen), so we'll atempt to tackle that
> one too afterwards.
> 
> -Jacob
> 
> > 
> > Guys, the error description is at
> > http://marc.info/?l=linux-kernel&m=135867802432660
> > 
> > Thanks.
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > Sent from a fat crate under my desk. Formatting is fine.
> > --
> > 




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 22:35               ` Suravee Suthikulpanit
@ 2013-01-22  3:22                 ` Udo van den Heuvel
  2013-01-22 14:13                 ` Udo van den Heuvel
  1 sibling, 0 replies; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-22  3:22 UTC (permalink / raw)
  To: suravee.suthikulpanit
  Cc: Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky

On 2013-01-21 23:35, Suravee Suthikulpanit wrote:
> Would you please try the following and check the output value
> on your system?
> 
> # setpci -s 00:00.02 F0.w=90
> # setpci -s 00:00.02 F4.w

# setpci -s 00:00.02 F0.w=90
# setpci -s 00:00.02 F4.w
0050
#


Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 22:35               ` Suravee Suthikulpanit
  2013-01-22  3:22                 ` Udo van den Heuvel
@ 2013-01-22 14:13                 ` Udo van den Heuvel
  2013-01-22 14:36                   ` Boris Ostrovsky
  1 sibling, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-22 14:13 UTC (permalink / raw)
  To: suravee.suthikulpanit
  Cc: Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel,
	Boris Ostrovsky

Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures
attached).

What can we bring against that?

Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 14:13                 ` Udo van den Heuvel
@ 2013-01-22 14:36                   ` Boris Ostrovsky
  2013-01-22 15:16                     ` Jörg Rödel
                                       ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2013-01-22 14:36 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel



On 01/22/2013 09:13 AM, Udo van den Heuvel wrote:
> Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures
> attached).

There are no attachments to your message.

I am not sure that 5i supports IOMMU (but I may well be wrong).

>
> What can we bring against that?

How reproducible is the problem that you are seeing?


-boris


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 14:36                   ` Boris Ostrovsky
@ 2013-01-22 15:16                     ` Jörg Rödel
  2013-01-22 15:27                     ` Udo van den Heuvel
  2013-01-31 15:42                     ` Udo van den Heuvel
  2 siblings, 0 replies; 38+ messages in thread
From: Jörg Rödel @ 2013-01-22 15:16 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Udo van den Heuvel, suravee.suthikulpanit, Jacob Shin,
	Borislav Petkov, linux-kernel

On Tue, Jan 22, 2013 at 09:36:34AM -0500, Boris Ostrovsky wrote:
> 
> 
> On 01/22/2013 09:13 AM, Udo van den Heuvel wrote:
> >Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures
> >attached).
> 
> There are no attachments to your message.
> 
> I am not sure that 5i supports IOMMU (but I may well be wrong).

Virtualization use-cases don't change the page-tables for the IOMMU very
often. So there is less need to flush the IO-TLB and IOMMU command
processing is utilized only from time to time.

In Linux however the page-tables change all the time and there is a much
higher load on the IOMMU command buffer which makes it much more likely
to trigger the hardware problem.


	Joerg



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 14:36                   ` Boris Ostrovsky
  2013-01-22 15:16                     ` Jörg Rödel
@ 2013-01-22 15:27                     ` Udo van den Heuvel
  2013-01-22 16:12                       ` Boris Ostrovsky
  2013-01-31 15:42                     ` Udo van den Heuvel
  2 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-22 15:27 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 2013-01-22 15:36, Boris Ostrovsky wrote:
>> Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures
>> attached).
> 
> There are no attachments to your message.

Correct, gigabyte did send them via their support web-interface.
Do yo uneed to see them? They just show IOMMU enabled or similar.

>> What can we bring against that?
> 
> How reproducible is the problem that you are seeing?

Seen once over here. Correlated with raid-check.

Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 15:27                     ` Udo van den Heuvel
@ 2013-01-22 16:12                       ` Boris Ostrovsky
  2013-01-22 16:29                         ` Udo van den Heuvel
  0 siblings, 1 reply; 38+ messages in thread
From: Boris Ostrovsky @ 2013-01-22 16:12 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel



On 01/22/2013 10:27 AM, Udo van den Heuvel wrote:
> On 2013-01-22 15:36, Boris Ostrovsky wrote:
>>> Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures
>>> attached).
>>
>> There are no attachments to your message.
>
> Correct, gigabyte did send them via their support web-interface.
> Do yo uneed to see them? They just show IOMMU enabled or similar.

No, I thought you ran this yourself.

>
>>> What can we bring against that?
>>
>> How reproducible is the problem that you are seeing?
>
> Seen once over here. Correlated with raid-check.

Then the answer from Gigabyte doesn't prove anything. You can also boot 
Linux without seeing this problem in most cases.

Your BIOS does not have the required erratum workaround. We will provide 
a patch to close that hole but since the problem is not easily 
reproducible (and the erratum is also not easy to trigger) it may be 
difficult to say whether it really helped with your problem.

-boris


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 16:12                       ` Boris Ostrovsky
@ 2013-01-22 16:29                         ` Udo van den Heuvel
  2013-01-22 23:29                           ` Suravee Suthikulanit
  0 siblings, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-22 16:29 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 2013-01-22 17:12, Boris Ostrovsky wrote:
>> Seen once over here. Correlated with raid-check.
> 
> Then the answer from Gigabyte doesn't prove anything. You can also boot
> Linux without seeing this problem in most cases.

That was my situation until the first time it hit.

> Your BIOS does not have the required erratum workaround. We will provide
> a patch to close that hole but since the problem is not easily
> reproducible (and the erratum is also not easy to trigger) it may be
> difficult to say whether it really helped with your problem.

Can we think of certain loads/actions/etc that could help trigger the issue?
Then if reproducing is easier we can better say if stuff is actually
fixed after the workaround.

Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 16:29                         ` Udo van den Heuvel
@ 2013-01-22 23:29                           ` Suravee Suthikulanit
  2013-01-23 14:19                             ` Udo van den Heuvel
  2013-01-23 14:23                             ` Udo van den Heuvel
  0 siblings, 2 replies; 38+ messages in thread
From: Suravee Suthikulanit @ 2013-01-22 23:29 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 1/22/2013 10:29 AM, Udo van den Heuvel wrote:

> On 2013-01-22 17:12, Boris Ostrovsky wrote:
>> Your BIOS does not have the required erratum workaround. We will provide
>> a patch to close that hole but since the problem is not easily
>> reproducible (and the erratum is also not easy to trigger) it may be
>> difficult to say whether it really helped with your problem.

Udo,

I sent out a patch (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should implement
the workaround for AMD processor family15h model 10-1Fh erratum 746 in the IOMMU driver.
In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which tells me that BIOS doesn't
implement the work around. After patching, you should see the following message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

> Can we think of certain loads/actions/etc that could help trigger the issue?
> Then if reproducing is easier we can better say if stuff is actually
> fixed after the workaround.
>
> Udo

Looking at the original kernel message, it seems that the the kernel timed out while waiting for the IOMMU
to finish executing the "COMPLETION_WAIT" command.   In this particular case, it is issued as part of
"__domain_flush_pages()" while trying to send the "INVALIDATE_IOMMU_PAGE" command to the IOMMU but the command
buffer is getting full and the kernel needed to wait for the command buffer to free up.  However, the kernel
message did not exactly telling us what caused IOMMU to locked up in the first place.

According to my observation, high disk traffic workload should trigger large amount of "INVALIDATE_IOMMU_PAGE".
However, this doesn't automatically issuing "COMPLETION_WAIT" command.  The following patch slightly modify
the code to always issue "COMPLETION_WAIT" after every command.  This should help increasing the chance of reproducing
the issue.


diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..d05b1f9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1016,6 +1016,7 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu,
                                     struct iommu_cmd *cmd,
                                     bool sync)
  {
+#if 0
         u32 left, tail, head, next_tail;
         unsigned long flags;
  
@@ -1052,6 +1053,40 @@ again:
  
         spin_unlock_irqrestore(&iommu->lock, flags);
  
+#else
+       u32 tail;
+       unsigned long flags;
+
+       WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
+       printk (KERN_DEBUG "AMD-Vi: iommu_queue_command_sync: iommu_queue_command_sync"
+               " data[0]:%#x data[1]:%#x data[2]:%#x data[3]:%#x\n",
+               cmd->data[0], cmd->data[1], cmd->data[2], cmd->data[3] );
+
+       spin_lock_irqsave(&iommu->lock, flags);
+
+       tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+       copy_cmd_to_buffer(iommu, cmd, tail);
+
+       spin_unlock_irqrestore(&iommu->lock, flags);
+
+       // Sending completion_wait command
+       {
+               struct iommu_cmd sync_cmd;
+               volatile u64 sem = 0;
+               int ret;
+
+               spin_lock_irqsave(&iommu->lock, flags);
+
+               tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+               build_completion_wait(&sync_cmd, (u64)&sem);
+               copy_cmd_to_buffer(iommu, &sync_cmd, tail);
+
+               spin_unlock_irqrestore(&iommu->lock, flags);
+
+               if ((ret = wait_on_sem(&sem)) != 0)
+                       return ret;
+       }
+#endif
         return 0;
  }








^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 23:29                           ` Suravee Suthikulanit
@ 2013-01-23 14:19                             ` Udo van den Heuvel
  2013-01-23 15:00                               ` Suravee Suthikulpanit
  2013-01-23 14:23                             ` Udo van den Heuvel
  1 sibling, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-23 14:19 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 2013-01-23 00:29, Suravee Suthikulanit wrote:
> I sent out a patch
> (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should
> implement
> the workaround for AMD processor family15h model 10-1Fh erratum 746 in
> the IOMMU driver.
> In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which
> tells me that BIOS doesn't
> implement the work around. After patching, you should see the following
> message in "dmesg".
> 
> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

Thanks!
I'll check for that after these messages.

> The following patch slightly modify
> the code to always issue "COMPLETION_WAIT" after every command.  This
> should help increasing the chance of reproducing
> the issue.

Should I test with these two patches together?
Or should I apply the first one first and then see what the second can help?


Kind regards,
Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 23:29                           ` Suravee Suthikulanit
  2013-01-23 14:19                             ` Udo van den Heuvel
@ 2013-01-23 14:23                             ` Udo van den Heuvel
  2013-01-23 15:01                               ` Suravee Suthikulpanit
  1 sibling, 1 reply; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-23 14:23 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 2013-01-23 00:29, Suravee Suthikulanit wrote:
> message in "dmesg".
> 
> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

[    1.091733] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40

I assume that is correct.

Kind regards,
Udo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-23 14:19                             ` Udo van den Heuvel
@ 2013-01-23 15:00                               ` Suravee Suthikulpanit
  0 siblings, 0 replies; 38+ messages in thread
From: Suravee Suthikulpanit @ 2013-01-23 15:00 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 1/23/2013 8:19 AM, Udo van den Heuvel wrote:
> On 2013-01-23 00:29, Suravee Suthikulanit wrote:
>> I sent out a patch
>> (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should
>> implement
>> the workaround for AMD processor family15h model 10-1Fh erratum 746 in
>> the IOMMU driver.
>> In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which
>> tells me that BIOS doesn't
>> implement the work around. After patching, you should see the following
>> message in "dmesg".
>>
>> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"
> Thanks!
> I'll check for that after these messages.
>
>> The following patch slightly modify
>> the code to always issue "COMPLETION_WAIT" after every command.  This
>> should help increasing the chance of reproducing
>> the issue.
> Should I test with these two patches together?
> Or should I apply the first one first and then see what the second can help?
Please try the first one first.  If the issue doesn't reproduce, you can 
use the second patch to try to trigger it.

Thank you,

Suravee
>
>
> Kind regards,
> Udo
>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-23 14:23                             ` Udo van den Heuvel
@ 2013-01-23 15:01                               ` Suravee Suthikulpanit
  0 siblings, 0 replies; 38+ messages in thread
From: Suravee Suthikulpanit @ 2013-01-23 15:01 UTC (permalink / raw)
  To: Udo van den Heuvel
  Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 1/23/2013 8:23 AM, Udo van den Heuvel wrote:

> On 2013-01-23 00:29, Suravee Suthikulanit wrote:
>> message in "dmesg".
>>
>> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

This is expected.

Regards,

Suravee

> [    1.091733] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
>
> I assume that is correct.
>
> Kind regards,
> Udo
>
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-22 14:36                   ` Boris Ostrovsky
  2013-01-22 15:16                     ` Jörg Rödel
  2013-01-22 15:27                     ` Udo van den Heuvel
@ 2013-01-31 15:42                     ` Udo van den Heuvel
  2 siblings, 0 replies; 38+ messages in thread
From: Udo van den Heuvel @ 2013-01-31 15:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov,
	Jörg Rödel, linux-kernel

On 2013-01-22 15:36, Boris Ostrovsky wrote:
> 
> 
> On 01/22/2013 09:13 AM, Udo van den Heuvel wrote:
>> Gigabyte demonstrate that using ESX 5i IOMMU works fine. 

I forwarded the malinglist links with the patch(es) to Gigabyte support
and they forwarded the info to the BIOS_team.

(to be continued)

Udo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-01-21 13:09               ` Borislav Petkov
  2013-01-21 14:10                 ` Udo van den Heuvel
  2013-01-21 15:10                 ` Jörg Rödel
@ 2013-04-21  1:03                 ` Jake
  2013-04-21 21:47                   ` Borislav Petkov
  2 siblings, 1 reply; 38+ messages in thread
From: Jake @ 2013-04-21  1:03 UTC (permalink / raw)
  To: linux-kernel

Borislav Petkov <bp <at> alien8.de> writes:

> 
> On Sun, Jan 20, 2013 at 12:57:55PM +0100, Jörg Rödel wrote:
> > BorisO is no longer with AMD afaik.
> 
> Why am I not surprised...
> 
> > I wrote an email to Sherry and Suravee and asked them to either send
> > me hardware to write the fix on my own or to send a fix for the issue.
> > Let's see what happens...
> 
> Btw, while we're at it, here's some more h0rkage from my PD box:
> 
> [    0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
> [    0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
> [    0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS
table
> [    0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)
> 

hello,

I've never posted to this type of message board before so I hope I'm not out
of order. I found my way here while trying to solve a shutdown problem in a
new linux install (Arch). I've noticed for some time that I have the same
lines as above in my dmesg as well as: 

ACPI BIOS Bug: Warning: Optional FADT field Pm2CintrolBlock has no address
or length: 0x0000000000000000/0x1 (20121018/tbfadt-589)

I have no idea whether this is related to my shutdown problem or not and
what I was hoping is if someone would advise me how to find help for my
problem. I have tried all the normal routes for my distro (forum and irc)
repeatedly, and googled my weeping eyes out - but to no avail.

Any advice would be greatly appreciated.

Thanks
Jake




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
  2013-04-21  1:03                 ` Jake
@ 2013-04-21 21:47                   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2013-04-21 21:47 UTC (permalink / raw)
  To: Jake; +Cc: linux-kernel

On Sun, Apr 21, 2013 at 01:03:16AM +0000, Jake wrote:
> ACPI BIOS Bug: Warning: Optional FADT field Pm2CintrolBlock has no address or length: 0x0000000000000000/0x1 (20121018/tbfadt-589)

I have the same one:

[    0.000000] ACPI BIOS Bug: Warning: Optional FADT field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20130117/tbfadt-599)

We probably have the same ASUS crap for a board.

> I have no idea whether this is related to my shutdown problem or not and

I don't think so as I can suspend/resume/shutdown my box just fine. :)

> what I was hoping is if someone would advise me how to find help for
> my problem. I have tried all the normal routes for my distro (forum
> and irc) repeatedly, and googled my weeping eyes out - but to no
> avail.

Have you tried the upstream kernel yet? I hear 3.9-rc8 will be out
tomorrow :-)

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2013-04-21 21:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-20 10:33 3.6.11 AMD-Vi: Completion-Wait loop timed out Udo van den Heuvel
2013-01-20 10:36 ` Borislav Petkov
2013-01-20 10:40   ` Udo van den Heuvel
2013-01-20 11:19     ` Jörg Rödel
2013-01-20 11:25       ` Udo van den Heuvel
2013-01-20 11:40         ` Jörg Rödel
2013-01-20 11:48           ` Borislav Petkov
2013-01-20 11:50             ` Borislav Petkov
2013-01-20 11:59               ` Udo van den Heuvel
2013-01-20 12:24                 ` Borislav Petkov
2013-01-20 11:52             ` Udo van den Heuvel
2013-01-20 11:57             ` Jörg Rödel
2013-01-21 13:09               ` Borislav Petkov
2013-01-21 14:10                 ` Udo van den Heuvel
2013-01-21 14:55                   ` Borislav Petkov
2013-01-21 15:10                 ` Jörg Rödel
2013-01-21 15:32                   ` Borislav Petkov
2013-01-21 15:34                     ` Udo van den Heuvel
2013-04-21  1:03                 ` Jake
2013-04-21 21:47                   ` Borislav Petkov
2013-01-21 14:37               ` Boris Ostrovsky
2013-01-21 14:44                 ` Udo van den Heuvel
2013-01-21 14:47                 ` Jörg Rödel
2013-01-21 16:04             ` Jacob Shin
2013-01-21 22:35               ` Suravee Suthikulpanit
2013-01-22  3:22                 ` Udo van den Heuvel
2013-01-22 14:13                 ` Udo van den Heuvel
2013-01-22 14:36                   ` Boris Ostrovsky
2013-01-22 15:16                     ` Jörg Rödel
2013-01-22 15:27                     ` Udo van den Heuvel
2013-01-22 16:12                       ` Boris Ostrovsky
2013-01-22 16:29                         ` Udo van den Heuvel
2013-01-22 23:29                           ` Suravee Suthikulanit
2013-01-23 14:19                             ` Udo van den Heuvel
2013-01-23 15:00                               ` Suravee Suthikulpanit
2013-01-23 14:23                             ` Udo van den Heuvel
2013-01-23 15:01                               ` Suravee Suthikulpanit
2013-01-31 15:42                     ` Udo van den Heuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.