All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
@ 2014-08-07 14:45 bugzilla-daemon
  2014-08-07 14:46 ` [Bug 81841] " bugzilla-daemon
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 14:45 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

            Bug ID: 81841
           Summary: amd-iommu: kernel BUG & lockup after shutting down KVM
                    guest using PCI passthrough/PCIe bridge
           Product: Virtualization
           Version: unspecified
    Kernel Version: 3.13 (Ubuntu: 3.13.0-32-generic)
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: marti@juffo.org
        Regression: No

I have a Windows XP virtual machine in libvirt and I'm trying to use PCI
passthrough to provide access to a legacy Dialogic ISDN card (0000:00:05.0).
Since it's an old PCI device, there's also a PCIe-to-PCI bridge (0000:00:14.4).
With some manual tinkering, the virtual machine starts up and passthrough works
fine, but when I stop or shut down the virtual machine, I immediately get an
oops in dmesg and after some time passes, the whole machine freezes.

I'm using the ASRock FM2A88X Extreme6+ motherboard, tried with the latest BIOS
version 2.90 as well as beta version L3.16. AMD A10-7850K processor.

The same symptoms have also been reported before:
* 3.2.0: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/85138
* 3.0.6: https://www.mail-archive.com/kvm@vger.kernel.org/msg64854.html
* 2.6.37-rc6: http://marc.info/?l=kvm&m=129867567106942 - slightly different
traceback

In order for the VM to successfully start up, I need to run the following
commands manually, to bind the PCI bridge to pci-stub and then unbind:

modprobe pci-stub
echo '1022 780f' > /sys/bus/pci/drivers/pci-stub/new_id
echo 0000:00:14.4 > /sys/bus/pci/drivers/pci-stub/bind
echo 0000:00:14.4 > /sys/bus/pci/drivers/pci-stub/unbind
echo '1022 780f' > /sys/bus/pci/drivers/pci-stub/remove_id

(If I don't do this, I get the kernel message:
    pci-stub 0000:01:05.0: kvm assign device failed ret -16)

lspci -vt
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
           +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 200
Series]
           +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI
mode]
           +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
           +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
           +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-14.4-[01]----05.0  Dialogic Corporation PRI
           +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-15.0-[02]--
           +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed USB
Host Controller
           +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 141a
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 141b
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 141c
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 141d
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 141e
           \-18.5  Advanced Micro Devices, Inc. [AMD] Device 141f

After shutting down, I get lots of oops messages; these are captured via
netconsole.

[ 1949.942276] ------------[ cut here ]------------
[ 1949.942311] kernel BUG at
/build/buildd/linux-3.13.0/drivers/iommu/amd_iommu.c:2382!
[ 1949.942342] invalid opcode: 0000 [#1] SMP
[ 1949.942359] Modules linked in: pci_stub ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter
ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nct6775
hwmon_vid snd_hda_codec_realtek netconsole kvm_amd snd_timer drm_kms_helper snd
drm soundcore mac_hid i2c_algo_bit[ 1949.942716] Hardware name: To Be Filled By
O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 1949.942745] task: ffff8804284497f0 ti: ffff8800361a2000 task.ti:
ffff8800361a2000
[ 1949.942767] RIP: 0010:[<ffffffff815eec8d>]  [<ffffffff815eec8d>]
__detach_device+0xad/0xb0
[ 1949.942798] RSP: 0018:ffff8800361a3d00  EFLAGS: 00010046
[ 1949.942814] RAX: 0000000000000000 RBX: ffff880427210660 RCX:
ffff8800361a3cb0
[ 1949.942834] RDX: dead000000100100 RSI: 0000000000000086 RDI:
ffff880427210660
[ 1949.942855] RBP: ffff8800361a3d20 R08: 0000000000000046 R09:
ffff8804299b5240
[ 1949.942875] R10: ffff88043ebf2f60 R11: 000ffffffffff000 R12:
0000000000000000
[ 1949.942895] R13: ffff880428596e10 R14: ffff880036019a80 R15:
ffff880427210660
[ 1949.942916] FS:  00007fca76222980(0000) GS:ffff88043ec00000(0000)
knlGS:0000000000000000
[ 1949.942939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1949.942956] CR2: 00007fca65bc5e74 CR3: 0000000001c0e000 CR4:
00000000000407f0
[ 1949.942978] DR0: 0000000000000003 DR1: 00000000000000b0 DR2:
0000000000000001
[ 1949.942998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 1949.943018] Stack:
[ 1949.943025]  dead000000100100 ffff880428596e00 ffff880428596e10
ffff880036019a80
[ 1949.943055]  ffff8800361a3d60 ffffffff815eed2e 0000000000000202
ffff880036019a80
[ 1949.943082]  ffff880036019a80 ffff880427cb0008 ffff88042752a300
ffff880424f3ad48
[ 1949.943110] Call Trace:
[ 1949.943123]  [<ffffffff815eed2e>] amd_iommu_domain_destroy+0x9e/0x160
[ 1949.943144]  [<ffffffff815eb3bb>] iommu_domain_free+0x1b/0x30
[ 1949.943176]  [<ffffffffa044fa63>] kvm_iommu_unmap_guest+0x53/0x60 [kvm]
[ 1949.943205]  [<ffffffffa0460299>] kvm_arch_destroy_vm+0x39/0x1f0 [kvm]
[ 1949.943227]  [<ffffffff810c696d>] ? synchronize_srcu+0x1d/0x20
[ 1949.943250]  [<ffffffffa04486ee>] kvm_put_kvm+0x10e/0x1c0 [kvm]
[ 1949.943273]  [<ffffffffa04487d8>] kvm_vcpu_release+0x18/0x20 [kvm]
[ 1949.943293]  [<ffffffff811be6d4>] __fput+0xe4/0x260
[ 1949.943309]  [<ffffffff811be89e>] ____fput+0xe/0x10
[ 1949.943326]  [<ffffffff81088174>] task_work_run+0xc4/0xe0
[ 1949.943344]  [<ffffffff81069c18>] do_exit+0x2b8/0xa50
[ 1949.943362]  [<ffffffff8109a7f0>] ? wake_up_state+0x10/0x20
[ 1949.943380]  [<ffffffff81077b5e>] ? signal_wake_up_state+0x1e/0x30
[ 1949.943400]  [<ffffffff81078ed2>] ? zap_other_threads+0x82/0xa0
[ 1949.943418]  [<ffffffff8106a42f>] do_group_exit+0x3f/0xa0
[ 1949.943435]  [<ffffffff8106a4a4>] SyS_exit_group+0x14/0x20
[ 1949.943455]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 1949.943470] Code: fe ff ff eb b8 66 0f 1f 84 00 00 00 00 00 48 8b 35 39 f4
9d 00 49 39 f4
[ 1949.947837] ---[ end trace e6893b1ed79451c3 ]---
[ 1949.947853] Fixing recursive fault but reboot is needed!
[ 1950.189137] usb 10-1: reset high-speed USB device number 2 using xhci_hcd
[ 1950.240587] xhci_hcd 0000:03:00.0: xHCI xhci_drop_endpoint called with
disabled ep ffff8804249eee00
[ 1950.240666] xhci_hcd 0000:03:00.0: xHCI xhci_drop_endpoint called with
disabled ep ffff8804249eee40
[ 2045.294007] ------------[ cut here ]------------
[ 2045.294067] WARNING: CPU: 3 PID: 1083 at
/build/buildd/linux-3.13.0/kernel/watchdog.c:245
watchdog_overflow_callback+0x9c/0xd0()
[ 2045.294142] Watchdog detected hard LOCKUP on cpu 3
[ 2045.294176] Modules linked in: pci_stub ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter
ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nct6775
hwmon_vid snd_hda_codec_realtek netconsole kvm_amd kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd configfs serio_raw edac_core edac_mce_amd
k10temp snd_hda_codec_hdmi radeon snd_hda_intel snd_hda_codec snd_hwdep video
snd_pcm ttm snd_page_alloc i2c_piix4 snd_timer drm_kms_helper snd drm soundcore
mac_hid i2c_algo_bit lp parport usb_storage pata_acpi hid_generic usbhid hid
alx psmouse mdio ahci pata_atiixp libahci
[ 2045.299884] CPU: 3 PID: 1083 Comm: irqbalance Tainted: G      D     
3.13.0-32-generic #57-Ubuntu
[ 2045.302472] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2045.305088]  0000000000000009 ffff88043ed86c38 ffffffff8171bcb4
ffff88043ed86c80
[ 2045.307720]  ffff88043ed86c70 ffffffff810676cd ffff8804295c4000
0000000000000000
[ 2045.310342]  ffff88043ed86d88 0000000000000000 ffff88043ed86ef8
ffff88043ed86cd0
[ 2045.312921] Call Trace:
[ 2045.315411]  <NMI>  [<ffffffff8171bcb4>] dump_stack+0x45/0x56
[ 2045.317892]  [<ffffffff810676cd>] warn_slowpath_common+0x7d/0xa0
[ 2045.320309]  [<ffffffff8106773c>] warn_slowpath_fmt+0x4c/0x50
[ 2045.322665]  [<ffffffff8110d590>] ? restart_watchdog_hrtimer+0x50/0x50
[ 2045.324974]  [<ffffffff8110d62c>] watchdog_overflow_callback+0x9c/0xd0
[ 2045.327255]  [<ffffffff81144dae>] __perf_event_overflow+0x8e/0x240
[ 2045.329506]  [<ffffffff811458c4>] perf_event_overflow+0x14/0x20
[ 2045.331741]  [<ffffffff81029414>] x86_pmu_handle_irq+0x144/0x190
[ 2045.333961]  [<ffffffff81725b2b>] perf_event_nmi_handler+0x2b/0x50
[ 2045.336168]  [<ffffffff81725348>] nmi_handle.isra.3+0x88/0x180
[ 2045.338374]  [<ffffffff81725510>] do_nmi+0xd0/0x340
[ 2045.340574]  [<ffffffff817247b1>] end_repeat_nmi+0x1e/0x2e
[ 2045.342771]  [<ffffffff815f0450>] ? compose_msi_msg+0x90/0x90
[ 2045.344967]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.347144]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.349290]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.351409]  <<EOE>>  [<ffffffff81723bb8>] _raw_write_lock_irqsave+0x28/0x30
[ 2045.353560]  [<ffffffff815efddc>] get_irq_table+0x2c/0x370
[ 2045.355714]  [<ffffffff8115c95e>] ? lru_cache_add+0xe/0x10
[ 2045.357852]  [<ffffffff81183638>] ? page_add_new_anon_rmap+0xd8/0x170
[ 2045.359983]  [<ffffffff815f04a8>] set_affinity+0x58/0x180
[ 2045.362109]  [<ffffffff815f9e75>] set_remapped_irq_affinity+0x25/0x40
[ 2045.364238]  [<ffffffff810c084c>] irq_do_set_affinity+0x1c/0x70
[ 2045.366350]  [<ffffffff810c0a38>] irq_set_affinity_locked+0xb8/0xf0
[ 2045.368454]  [<ffffffff810c0ab6>] __irq_set_affinity+0x46/0x70
[ 2045.370563]  [<ffffffff810c51f5>] write_irq_affinity.isra.6+0xd5/0x100
[ 2045.372663]  [<ffffffff810c5259>] irq_affinity_proc_write+0x19/0x20
[ 2045.374763]  [<ffffffff81222c4d>] proc_reg_write+0x3d/0x80
[ 2045.376854]  [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[ 2045.378941]  [<ffffffff811bd599>] SyS_write+0x49/0xa0
[ 2045.381020]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 2045.383092] ---[ end trace e6893b1ed79451c4 ]---
[ 2045.385173] perf samples too long (712292 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[ 2045.387310] INFO: NMI handler (perf_event_nmi_handler) took too long to run:
93.305 msecs
[ 2060.824098] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by
1, t=15002 jiffies, g=5056, c=5055, q=0)
[ 2060.826366] sending NMI to all CPUs:
[ 2060.828573] NMI backtrace for cpu 1
[ 2060.828577] perf samples too long (706733 > 5000), lowering
kernel.perf_event_max_sample_rate to 25000
[ 2060.833588] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W   
3.13.0-32-generic #57-Ubuntu
[ 2060.836158] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.838776] task: ffff8804295147d0 ti: ffff880429520000 task.ti:
ffff880429520000
[ 2060.841361] RIP: 0010:[<ffffffff8136d5c2>]  [<ffffffff8136d5c2>]
__const_udelay+0x12/0x30
[ 2060.843929] RSP: 0018:ffff88043ec83df0  EFLAGS: 00000046
[ 2060.846469] RAX: 0000000001062560 RBX: 0000000000002710 RCX:
0000000000000004
[ 2060.849029] RDX: 0000000000e16bb4 RSI: 0000000000000100 RDI:
0000000000418958
[ 2060.851586] RBP: ffff88043ec83e08 R08: 0000000000000082 R09:
00000000000004ba
[ 2060.854135] R10: ffff880036b8a000 R11: 00000003937e3000 R12:
ffffffff81c4e1c0
[ 2060.856674] R13: ffffffff81d137a0 R14: ffffffff81c4e1c0 R15:
0000000000000001
[ 2060.859194] FS:  00007f8d6a2ac840(0000) GS:ffff88043ec80000(0000)
knlGS:0000000000000000
[ 2060.861752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2060.864281] CR2: 00007f43403ce000 CR3: 0000000424883000 CR4:
00000000000407e0
[ 2060.866791] Stack:
[ 2060.869242]  ffff88043ec83e08 ffffffff81044c7f ffff88043ec8e800
ffff88043ec83e60
[ 2060.871711]  ffffffff810cac21 ffffffff81c4e1c0 ffffffff00000003
0000000000000000
[ 2060.874122]  0000000000000001 ffff8804295147d0 0000000000000000
0000000000000001
[ 2060.876470] Call Trace:
[ 2060.878724]  <IRQ> 
[ 2060.878744]  [<ffffffff81044c7f>] ? arch_trigger_all_cpu_backtrace+0x8f/0xb0
[ 2060.883075]  [<ffffffff810cac21>] rcu_check_callbacks+0x631/0x650
[ 2060.885203]  [<ffffffff81076227>] update_process_times+0x47/0x70
[ 2060.887296]  [<ffffffff810d5cf5>] tick_sched_handle.isra.17+0x25/0x60
[ 2060.889379]  [<ffffffff810d5d71>] tick_sched_timer+0x41/0x60
[ 2060.891449]  [<ffffffff8108e5e7>] __run_hrtimer+0x77/0x1d0
[ 2060.893499]  [<ffffffff810d5d30>] ? tick_sched_handle.isra.17+0x60/0x60
[ 2060.895553]  [<ffffffff8108edaf>] hrtimer_interrupt+0xef/0x230
[ 2060.897596]  [<ffffffff81043097>] local_apic_timer_interrupt+0x37/0x60
[ 2060.899655]  [<ffffffff8172ea3f>] smp_apic_timer_interrupt+0x3f/0x60
[ 2060.901714]  [<ffffffff8172d3dd>] apic_timer_interrupt+0x6d/0x80
[ 2060.903768]  <EOI> 
[ 2060.903787]  [<ffffffff815ceaff>] ? cpuidle_enter_state+0x4f/0xc0
[ 2060.907889]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2060.909958]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2060.912022]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2060.914089]  [<ffffffff81040fd8>] start_secondary+0x218/0x2c0
[ 2060.916154] Code: 89 e5 ff 15 b9 07 92 00 5d c3 66 66 66 66 66 66 2e 0f 1f
84 00 00 00 00 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 60 8d 0c 12 48 c1
e2 06 48 89 e5 48 29 ca f7 e2 48 8d 7a 01 ff 
[ 2060.920807] NMI backtrace for cpu 2
[ 2060.923013] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D W   
3.13.0-32-generic #57-Ubuntu
[ 2060.925257] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.927566] task: ffff880429515fc0 ti: ffff880429522000 task.ti:
ffff880429522000
[ 2060.929877] RIP: 0010:[<ffffffff8141666e>]  [<ffffffff8141666e>]
acpi_idle_do_entry+0x21/0x2b
[ 2060.932215] RSP: 0018:ffff880429523e28  EFLAGS: 00000093
[ 2060.934532] RAX: 000001dfd2d31300 RBX: ffff8804297548a8 RCX:
000000000000c710
[ 2060.936878] RDX: 0000000000001771 RSI: ffff88043ed00000 RDI:
ffff8804297548a8
[ 2060.939233] RBP: ffff880429523e28 R08: ffff88043ed112d4 R09:
0000000000000018
[ 2060.941591] R10: 0000000000052971 R11: 000000000005f98e R12:
ffff880429754800
[ 2060.943958] R13: 0000000000000002 R14: 0000000000000002 R15:
ffffffff81c96ea8
[ 2060.946314] FS:  00007f43403bf880(0000) GS:ffff88043ed00000(0000)
knlGS:0000000000000000
[ 2060.948692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2060.951083] CR2: 00007fe029130f64 CR3: 00000004263d9000 CR4:
00000000000407e0
[ 2060.953495] Stack:
[ 2060.955883]  ffff880429523e50 ffffffff814166f3 ffff880424c28000
ffffffff81c96de0
[ 2060.958357]  000001e034cff146 ffff880429523e88 ffffffff815ceaf0
ffff880429523f38
[ 2060.960854]  ffff880424c28000 0000000000000002 0000000000000002
ffffffff81c96de0
[ 2060.963368] Call Trace:
[ 2060.965849]  [<ffffffff814166f3>] acpi_idle_enter_simple+0x7b/0x99
[ 2060.968336]  [<ffffffff815ceaf0>] cpuidle_enter_state+0x40/0xc0
[ 2060.970793]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2060.973208]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2060.975582]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2060.977965]  [<ffffffff81040fd8>] start_secondary+0x218/0x2c0
[ 2060.980317] Code: ff fa 66 66 90 66 66 90 5d c3 8a 47 08 55 48 89 e5 3c 01
75 07 e8 b3 94 c2 ff eb 17 3c 02 75 07 e8 ba ff ff ff eb 0c 8b 57 04 ec <48> 8b
15 cf a7 ba 00 ed 5d c3 66 66 66 66 90 55 48 63 c2 48 8d 
[ 2060.985600] NMI backtrace for cpu 0
[ 2060.985606] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took
too long to run: 157.026 msecs
[ 2060.989906] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W   
3.13.0-32-generic #57-Ubuntu
[ 2060.992076] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.994272] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti:
ffffffff81c00000
[ 2060.996487] RIP: 0010:[<ffffffff8141666e>]  [<ffffffff8141666e>]
acpi_idle_do_entry+0x21/0x2b
[ 2060.998740] RSP: 0018:ffffffff81c01e38  EFLAGS: 00000093
[ 2061.000974] RAX: 000001dfca387700 RBX: ffff8804297540a8 RCX:
000000000000c710
[ 2061.003228] RDX: 0000000000001771 RSI: ffff88043ec00000 RDI:
ffff8804297540a8
[ 2061.005471] RBP: ffffffff81c01e38 R08: ffff88043ec112d0 R09:
0000000000000018
[ 2061.007703] R10: 0000000000030cfc R11: 00000000000ce668 R12:
ffff880429754000
[ 2061.009933] R13: 0000000000000002 R14: 0000000000000002 R15:
ffffffff81c96ea8
[ 2061.012148] FS:  00007f6bd1547740(0000) GS:ffff88043ec00000(0000)
knlGS:0000000000000000
[ 2061.014369] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2061.016586] CR2: 00007f43403ce000 CR3: 0000000426388000 CR4:
00000000000407f0
[ 2061.018821] DR0: 0000000000000003 DR1: 00000000000000b0 DR2:
0000000000000001
[ 2061.021040] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2061.023264] Stack:
[ 2061.025459]  ffffffff81c01e60 ffffffff814166f3 ffff880425310200
ffffffff81c96de0
[ 2061.027719]  000001e02c3342df ffffffff81c01e98 ffffffff815ceaf0
ffffffffffffffff
[ 2061.029977]  ffff880425310200 0000000000000002 0000000000000000
ffffffff81c96de0
[ 2061.032246] Call Trace:
[ 2061.034488]  [<ffffffff814166f3>] acpi_idle_enter_simple+0x7b/0x99
[ 2061.036763]  [<ffffffff815ceaf0>] cpuidle_enter_state+0x40/0xc0
[ 2061.039035]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2061.041301]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2061.043556]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2061.045812]  [<ffffffff8170a187>] rest_init+0x77/0x80
[ 2061.048062]  [<ffffffff81d35f70>] start_kernel+0x438/0x443
[ 2061.050303]  [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[ 2061.052543]  [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[ 2061.054782]  [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[ 2061.057023]  [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[ 2061.059256] Code: ff fa 66 66 90 66 66 90 5d c3 8a 47 08 55 48 89 e5 3c 01
75 07 e8 b3 94 c2 ff eb 17 3c 02 75 07 e8 ba ff ff ff eb 0c 8b 57 <48> 8b 15 cf
a7 ba 00 ed 5d c3 66 66 66 66 90 55 48 63 c2 48 8d 
[ 2061.064162] NMI backtrace for cpu 3
[ 2061.064167] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took
too long to run: 235.588 msecs
[ 2061.068651] CPU: 3 PID: 1083 Comm: irqbalance Tainted: G      D W   
3.13.0-32-generic #57-Ubuntu
[ 2061.070912] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2061.073181] task: ffff880427595fc0 ti: ffff88042772a000 task.ti:
ffff88042772a000
[ 2061.075433] RIP: 0010:[<ffffffff8136e9ed>]  [<ffffffff8136e9ed>]
__write_lock_failed+0xd/0x20
[ 2061.077704] RSP: 0018:ffff88042772bd30  EFLAGS: 00000087
[ 2061.079944] RAX: 0000000000000086 RBX: ffff8804295ecc60 RCX:
ffffffff815f0450
[ 2061.082193] RDX: 0000000000000086 RSI: 0000000000000000 RDI:
ffffffff81cd6cf0
[ 2061.084434] RBP: ffff88042772bd30 R08: 0000000000000286 R09:
0000000000000004
[ 2061.086675] R10: 000000000000000e R11: 0000000000000001 R12:
0000000000000080
[ 2061.088903] R13: 0000000000000080 R14: 00000000ffffffea R15:
0000000000000000
[ 2061.091127] FS:  00007f6340128780(0000) GS:ffff88043ed80000(0000)
knlGS:0000000000000000
[ 2061.093353] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2061.095565] CR2: 00007f634012d000 CR3: 00000004284d2000 CR4:
00000000000407e0
[ 2061.097784] Stack:
[ 2061.099975]  ffff88042772bd40 ffffffff81723bb8 ffff88042772bda8
ffffffff815efddc
[ 2061.102216]  ffff88042772bd60 ffffffff8115c95e ffff88042772bd88
ffffffff81183638
[ 2061.104441]  00007f634012d000 ffff880428b4a968 ffff8804295ecc60
ffff88042772be88
[ 2061.106657] Call Trace:
[ 2061.108834]  [<ffffffff81723bb8>] _raw_write_lock_irqsave+0x28/0x30
[ 2061.111030]  [<ffffffff815efddc>] get_irq_table+0x2c/0x370
[ 2061.113224]  [<ffffffff8115c95e>] ? lru_cache_add+0xe/0x10
[ 2061.115420]  [<ffffffff81183638>] ? page_add_new_anon_rmap+0xd8/0x170
[ 2061.117621]  [<ffffffff815f04a8>] set_affinity+0x58/0x180
[ 2061.119821]  [<ffffffff815f9e75>] set_remapped_irq_affinity+0x25/0x40
[ 2061.122027]  [<ffffffff810c084c>] irq_do_set_affinity+0x1c/0x70
[ 2061.124187]  [<ffffffff810c0a38>] irq_set_affinity_locked+0xb8/0xf0
[ 2061.126295]  [<ffffffff810c0ab6>] __irq_set_affinity+0x46/0x70
[ 2061.128396]  [<ffffffff810c51f5>] write_irq_affinity.isra.6+0xd5/0x100
[ 2061.130475]  [<ffffffff810c5259>] irq_affinity_proc_write+0x19/0x20
[ 2061.132527]  [<ffffffff81222c4d>] proc_reg_write+0x3d/0x80
[ 2061.134549]  [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[ 2061.136535]  [<ffffffff811bd599>] SyS_write+0x49/0xa0
[ 2061.138492]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 2061.140423] Code: 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90
90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 81 07 f3 90 <81> 3f 00 00 10 00
75 f6 f0 81 2f 00 00 10 00 75 e6 5d c3 55 48 
[ 2061.144744] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took
too long to run: 316.161 msecs
[ 2067.038359] perf samples too long (701229 > 10000), lowering
kernel.perf_event_max_sample_rate to 12500
[ 2088.782705] perf samples too long (695767 > 20000), lowering
kernel.perf_event_max_sample_rate to 6250
[ 2110.527053] perf samples too long (690349 > 40000), lowering
kernel.perf_event_max_sample_rate to 3250
[ 2132.271401] perf samples too long (684971 > 76923), lowering
kernel.perf_event_max_sample_rate to 1750
[ 2154.015748] perf samples too long (679634 > 142857), lowering
kernel.perf_event_max_sample_rate to 1000

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
@ 2014-08-07 14:46 ` bugzilla-daemon
  2014-08-07 14:47 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 14:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #1 from Marti Raudsepp <marti@juffo.org> ---
Created attachment 145421
  --> https://bugzilla.kernel.org/attachment.cgi?id=145421&action=edit
crash_netconsole.txt

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
  2014-08-07 14:46 ` [Bug 81841] " bugzilla-daemon
@ 2014-08-07 14:47 ` bugzilla-daemon
  2014-08-07 14:47 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 14:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #2 from Marti Raudsepp <marti@juffo.org> ---
Created attachment 145431
  --> https://bugzilla.kernel.org/attachment.cgi?id=145431&action=edit
startup_dmesg.txt

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
  2014-08-07 14:46 ` [Bug 81841] " bugzilla-daemon
  2014-08-07 14:47 ` bugzilla-daemon
@ 2014-08-07 14:47 ` bugzilla-daemon
  2014-08-07 15:30 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 14:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #3 from Marti Raudsepp <marti@juffo.org> ---
Created attachment 145441
  --> https://bugzilla.kernel.org/attachment.cgi?id=145441&action=edit
dmidecode.txt

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (2 preceding siblings ...)
  2014-08-07 14:47 ` bugzilla-daemon
@ 2014-08-07 15:30 ` bugzilla-daemon
  2014-08-07 16:25 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 15:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

Marti Raudsepp <marti@juffo.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|3.13 (Ubuntu:               |3.16.0 (originally Ubuntu
                   |3.13.0-32-generic)          |3.13.0-32-generic)

--- Comment #4 from Marti Raudsepp <marti@juffo.org> ---
Also occurs with freshly built mainline kernel version 3.16.0.

[   87.327457] ------------[ cut here ]------------
[   87.327488] kernel BUG at drivers/iommu/amd_iommu.c:2382!
[   87.327505] invalid opcode: 0000 [#1] SMP 
[   87.327526] Modules linked in: pci_stub(E) ipt_MASQUERADE(E) iptable_nat(E)
nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E)
nf_conntrack(E) ipt_REJECT(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E)
bridge(E) stp(E) llc(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E)
ip_tables(E) ebtable_nat(E) ebtables(E) x_tables(E) nct6775(E) hwmon_vid(E)
radeon(E) kvm_amd(E) kvm(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E)
snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E)
i2c_algo_bit(E) crct10dif_pclmul(E) drm_kms_helper(E) crc32_pclmul(E)
ghash_clmulni_intel(E) snd_hwdep(E) aesni_intel(E) snd_pcm(E) ttm(E)
aes_x86_64(E) glue_helper(E) netconsole(E) drm(E) lrw(E) snd_timer(E)
configfs(E) snd(E) gf128mul(E) ablk_helper(E) cryptd(E) soundcore(E) lp(E)
serio_raw(E) k10temp(E) i2c_piix4(E) mac_hid(E) video(E) parport(E)
usb_storage(E) pata_acpi(E) hid_generic(E) usbhid(E) hid(E) alx(E) psmouse(E)
mdio(E) pata_atiixp(E) ahci(E) libahci(E)
[   87.327963] CPU: 0 PID: 1452 Comm: qemu-system-x86 Tainted: G            E
3.16.0 #1
[   87.327986] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[   87.328016] task: ffff880427a18000 ti: ffff880421280000 task.ti:
ffff880421280000
[   87.328039] RIP: 0010:[<ffffffff816059dd>]  [<ffffffff816059dd>]
__detach_device+0xad/0xb0
[   87.328071] RSP: 0018:ffff880421283b38  EFLAGS: 00010046
[   87.328088] RAX: 0000000000000000 RBX: ffff8804286e5240 RCX:
ffff880421283ae0
[   87.328110] RDX: dead000000100100 RSI: 0000000000000086 RDI:
ffff8804286e5240
[   87.328132] RBP: ffff880421283b58 R08: 0000000000000046 R09:
ffff8804299b8900
[   87.328154] R10: ffff880000000000 R11: 000ffffffffff000 R12:
0000000000000000
[   87.328175] R13: ffff88042127a610 R14: ffff88042744c040 R15:
ffff8804286e5240
[   87.328197] FS:  00007f1d03857700(0000) GS:ffff88043ec00000(0000)
knlGS:0000000000000000
[   87.328221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   87.328239] CR2: 00007f1d03dc63a0 CR3: 0000000001c13000 CR4:
00000000000407f0
[   87.328260] Stack:
[   87.328268]  dead000000100100 ffff88042127a600 ffff88042127a610
ffff88042744c040
[   87.328299]  ffff880421283b98 ffffffff81605a7e 0000000000000202
ffff88042744c040
[   87.328333]  ffff88042744c040 ffff880420a3c008 ffff8804242b0a80
ffff88007786dfd8
[   87.328365] Call Trace:
[   87.328378]  [<ffffffff81605a7e>] amd_iommu_domain_destroy+0x9e/0x160
[   87.328400]  [<ffffffff816022db>] iommu_domain_free+0x1b/0x30
[   87.328432]  [<ffffffffa03628a3>] kvm_iommu_unmap_guest+0x53/0x60 [kvm]
[   87.328461]  [<ffffffffa0373059>] kvm_arch_destroy_vm+0x39/0x1f0 [kvm]
[   87.328484]  [<ffffffff810cfebd>] ? synchronize_srcu+0x1d/0x20
[   87.328509]  [<ffffffffa035b26e>] kvm_put_kvm+0x10e/0x220 [kvm]
[   87.328535]  [<ffffffffa035b3b8>] kvm_vcpu_release+0x18/0x20 [kvm]
[   87.328556]  [<ffffffff811d0a04>] __fput+0xe4/0x220
[   87.328573]  [<ffffffff811d0b8e>] ____fput+0xe/0x10
[   87.328591]  [<ffffffff8108cd74>] task_work_run+0xc4/0xe0
[   87.328609]  [<ffffffff8106ef18>] do_exit+0x2b8/0xa60
[   87.328627]  [<ffffffff8106f73f>] do_group_exit+0x3f/0xa0
[   87.328645]  [<ffffffff8107f100>] get_signal_to_deliver+0x1d0/0x6f0
[   87.328668]  [<ffffffff81012548>] do_signal+0x48/0x9d0
[   87.328687]  [<ffffffff8111d1bc>] ? acct_account_cputime+0x1c/0x20
[   87.328708]  [<ffffffff810a372b>] ? account_user_time+0x8b/0xa0
[   87.329791]  [<ffffffff810a3cf4>] ? vtime_account_user+0x54/0x60
[   87.330869]  [<ffffffff81012f39>] do_notify_resume+0x69/0xb0
[   87.331950]  [<ffffffff8172b32a>] int_signal+0x12/0x17
[   87.333016] Code: fe ff ff eb b8 66 0f 1f 84 00 00 00 00 00 48 8b 35 69 b0
9a 00 49 39 f4 74 c1 48 89 df e8 8c fd ff ff 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b
90 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 fe 41 55 
[   87.335373] RIP  [<ffffffff816059dd>] __detach_device+0xad/0xb0
[   87.336475]  RSP <ffff880421283b38>
[   87.337562] ---[ end trace bee5733468f37c81 ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (3 preceding siblings ...)
  2014-08-07 15:30 ` bugzilla-daemon
@ 2014-08-07 16:25 ` bugzilla-daemon
  2014-08-07 17:56 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 16:25 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

Alex Williamson <alex.williamson@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alex.williamson@redhat.com

--- Comment #5 from Alex Williamson <alex.williamson@redhat.com> ---
What if you use vfio-pci instead of pci-assign?  The BUG happens when the
kernel tries to detach a device from the domain, but the device doesn't
actually belong to a domain.  VFIO likely already avoids this because the
bridge and device will both be in the same IOMMU group and therefore attached
to the same domain.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (4 preceding siblings ...)
  2014-08-07 16:25 ` bugzilla-daemon
@ 2014-08-07 17:56 ` bugzilla-daemon
  2014-08-07 18:11 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 17:56 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #6 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Alex Williamson from comment #5)
> What if you use vfio-pci instead of pci-assign?

I run into the dreaded error:
  vfio: error, group 9 is not viable, please ensure all devices within the
  iommu_group are bound to their vfio bus driver

There are some proposed workarounds on the web, like passing
vfio_iommu_type1.allow_unsafe_interrupts=1 or pci=realloc, but these seem to
change nothing for me.

So I tried adding all the PCI devices in the IOMMU group as passthrough devices
(including IDE, SMBus, audio and OHCI controllers). But then QEMU's SeaBIOS
gets so confused it can no longer find a hard drive to boot off.

But you're right. At least I can stop the non-functional virtual machine now,
so I've got that going for me, which is nice.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (5 preceding siblings ...)
  2014-08-07 17:56 ` bugzilla-daemon
@ 2014-08-07 18:11 ` bugzilla-daemon
  2014-08-07 18:53 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 18:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #7 from Alex Williamson <alex.williamson@redhat.com> ---
(In reply to Marti Raudsepp from comment #6)
> (In reply to Alex Williamson from comment #5)
> > What if you use vfio-pci instead of pci-assign?
> 
> I run into the dreaded error:
>   vfio: error, group 9 is not viable, please ensure all devices within the
>   iommu_group are bound to their vfio bus driver
> 
> There are some proposed workarounds on the web, like passing
> vfio_iommu_type1.allow_unsafe_interrupts=1 or pci=realloc, but these seem to
> change nothing for me.

None of these remotely address the issue.  If you're running at least 3.12
there are quirks for the following AMD southbridge components:

 * 1002:4385 SBx00 SMBus Controller
 * 1002:439c SB7x0/SB8x0/SB9x0 IDE Controller
 * 1002:4383 SBx00 Azalia (Intel HDA)
 * 1002:439d SB7x0/SB8x0/SB9x0 LPC host controller
 * 1002:4384 SBx00 PCI to PCI Bridge
 * 1002:4399 SB7x0/SB8x0/SB9x0 USB OHCI2 Controller

If your bridge does not match these, then AMD will need to confirm whether
isolation is provided between your devices.  There is an ACS override patch
floating around which allows assuming device isolation, but this is generally a
bad idea, can introduce obscure bugs, and will not be merged upstream.

> So I tried adding all the PCI devices in the IOMMU group as passthrough
> devices (including IDE, SMBus, audio and OHCI controllers). But then QEMU's
> SeaBIOS gets so confused it can no longer find a hard drive to boot off.

Note that it's not required to assign all the devices, they simply need to be
detached from host drivers (ie. bound to pci-stub or vfio-pci).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (6 preceding siblings ...)
  2014-08-07 18:11 ` bugzilla-daemon
@ 2014-08-07 18:53 ` bugzilla-daemon
  2014-08-07 19:47 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 18:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #8 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Alex Williamson from comment #7)
> > There are some proposed workarounds on the web
> None of these remotely address the issue.

I see. This page claims so: http://www.ovirt.org/Features/hostdev_passthrough

> there are quirks for the following AMD southbridge components

Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f, 1022:7809

> If your bridge does not match these, then AMD will need to confirm whether
> isolation is provided between your devices.

How would I go about confirming that? What are the chances that they care, and
provide accurate information to a random person?

> There is an ACS override patch

I already ran across it... https://bugzilla.redhat.com/show_bug.cgi?id=1113399
Would I be any worse off using this, compared to the old kvm pci-assign method?

> Note that it's not required to assign all the devices, they simply need to
> be detached from host drivers (ie. bound to pci-stub or vfio-pci).

Thanks, I will give it a shot tomorrow.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (7 preceding siblings ...)
  2014-08-07 18:53 ` bugzilla-daemon
@ 2014-08-07 19:47 ` bugzilla-daemon
  2014-08-08  2:48 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-07 19:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #9 from Alex Williamson <alex.williamson@redhat.com> ---
(In reply to Marti Raudsepp from comment #8)
> (In reply to Alex Williamson from comment #7)
> > > There are some proposed workarounds on the web
> > None of these remotely address the issue.
> 
> I see. This page claims so: http://www.ovirt.org/Features/hostdev_passthrough

Sorry, it's wrong.

> > there are quirks for the following AMD southbridge components
> 
> Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f,
> 1022:7809
> 
> > If your bridge does not match these, then AMD will need to confirm whether
> > isolation is provided between your devices.
> 
> How would I go about confirming that? What are the chances that they care,
> and provide accurate information to a random person?

AMD would need to confirm it.  IOMMU groups are based on hardware advertised
isolation via the PCIe ACS capability.  Without this, or a device specific
quirk to take its place, IOMMU groups must assume that peer-to-peer between
functions of a multi-function device is possible and therefore that the devices
are not isolated.  Chances are that this new chipset in your system is taking
the exact same ASICs that were deemed not to do peer-to-peer on previous
chipsets, but we need that confirmation from AMD.  Alex Deucher (see
MAINTAINERS) may have contacts available that can make that statement.

> > There is an ACS override patch
> 
> I already ran across it...
> https://bugzilla.redhat.com/show_bug.cgi?id=1113399
> Would I be any worse off using this, compared to the old kvm pci-assign
> method?

I think the path forward is to get confirmation from AMD that these function
are isolated from each other and add quirks to the kernel.  Then you won't have
the device dependencies in vfio-pci.  The override patch allows you to do that
with just a kernel boot parameter.  There's no gurantee that pci-assign will
ever be fixed since it's being phased out.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (8 preceding siblings ...)
  2014-08-07 19:47 ` bugzilla-daemon
@ 2014-08-08  2:48 ` bugzilla-daemon
  2014-08-08 10:19 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-08  2:48 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

Joel Schopp <joel.schopp@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joel.schopp@amd.com

--- Comment #10 from Joel Schopp <joel.schopp@amd.com> ---
(In reply to Alex Williamson from comment #9)
> (In reply to Marti Raudsepp from comment #8)
> > (In reply to Alex Williamson from comment #7)
> > > > There are some proposed workarounds on the web
> > > None of these remotely address the issue.
> > 
> > I see. This page claims so: http://www.ovirt.org/Features/hostdev_passthrough
> 
> Sorry, it's wrong.
> 
> > > there are quirks for the following AMD southbridge components
> > 
> > Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f,
> > 1022:7809
> > 
> > > If your bridge does not match these, then AMD will need to confirm whether
> > > isolation is provided between your devices.
> > 
> > How would I go about confirming that? What are the chances that they care,
> > and provide accurate information to a random person?


Are you suggesting we'd provide innacurate information to a random person?


> 
> AMD would need to confirm it.  IOMMU groups are based on hardware advertised
> isolation via the PCIe ACS capability.  Without this, or a device specific
> quirk to take its place, IOMMU groups must assume that peer-to-peer between
> functions of a multi-function device is possible and therefore that the
> devices are not isolated.  Chances are that this new chipset in your system
> is taking the exact same ASICs that were deemed not to do peer-to-peer on
> previous chipsets, but we need that confirmation from AMD.  Alex Deucher
> (see MAINTAINERS) may have contacts available that can make that statement.

I don't have an answer for you offhand.  Let me do some digging and get you an
answer.

> 
> > > There is an ACS override patch
> > 
> > I already ran across it...
> > https://bugzilla.redhat.com/show_bug.cgi?id=1113399
> > Would I be any worse off using this, compared to the old kvm pci-assign
> > method?
> 
> I think the path forward is to get confirmation from AMD that these function
> are isolated from each other and add quirks to the kernel.  Then you won't
> have the device dependencies in vfio-pci.  The override patch allows you to
> do that with just a kernel boot parameter.  There's no gurantee that
> pci-assign will ever be fixed since it's being phased out.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (9 preceding siblings ...)
  2014-08-08  2:48 ` bugzilla-daemon
@ 2014-08-08 10:19 ` bugzilla-daemon
  2014-08-12  9:36 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-08 10:19 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #11 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Joel Schopp from comment #10)
> > How would I go about confirming that? What are the chances that they care,
> > and provide accurate information to a random person?
> 
> Are you suggesting we'd provide innacurate information to a random person?

Yes, that's my experience with the "customer support" for desktop hardware.

Of course cutting support out of the equation and asking engineers directly is
likely to give better results, that didn't occur to me at first.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (10 preceding siblings ...)
  2014-08-08 10:19 ` bugzilla-daemon
@ 2014-08-12  9:36 ` bugzilla-daemon
  2014-08-12 10:30 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12  9:36 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

Joerg Roedel <joro@8bytes.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joro@8bytes.org

--- Comment #12 from Joerg Roedel <joro@8bytes.org> ---
Created attachment 146311
  --> https://bugzilla.kernel.org/attachment.cgi?id=146311&action=edit
Possible fix as a patch against v3.13

Hi Marti,

Can you please test this patch? I think it should fix the issue.

Thanks, Joerg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (11 preceding siblings ...)
  2014-08-12  9:36 ` bugzilla-daemon
@ 2014-08-12 10:30 ` bugzilla-daemon
  2014-08-12 10:42 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12 10:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #13 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Joerg Roedel from comment #12)
> Thanks, Joerg

Indeed. Thanks, Joerg. And thanks everyone else too, you have been very
helpful!

I didn't have v3.13 sources handy, but I applied the attachment 146311 patch to
3.16.0 and it fixes the problem. (I verified that unpatched 3.16.0 also
crashes).

I can start & shut down the VM multiple times without crashing the host and PCI
passthrough works as expected.

Feel free to add
    Tested-by: Marti Raudsepp <marti@juffo.org>

(In reply to Alex Williamson from comment #7)
> Note that it's not required to assign all the devices, they simply need to
> be detached from host drivers (ie. bound to pci-stub or vfio-pci).

This approach also works; I think I will go this route for the production
setup. Seems that we don't actually need any of the devices in the same IOMMU
group.

(In reply to Joel Schopp from comment #10)
> > AMD would need to confirm it.
>
> I don't have an answer for you offhand.  Let me do some digging and get you
> an answer.

I am sorry if I sounded frustrated or arrogant earlier. Any update on this?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (12 preceding siblings ...)
  2014-08-12 10:30 ` bugzilla-daemon
@ 2014-08-12 10:42 ` bugzilla-daemon
  2014-08-12 14:53 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12 10:42 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #14 from Joerg Roedel <joro@8bytes.org> ---
Hi Marti,

> Indeed. Thanks, Joerg. And thanks everyone else too, you have been very
> helpful!
> 
> I didn't have v3.13 sources handy, but I applied the attachment 146311

> I can start & shut down the VM multiple times without crashing the host and
> PCI passthrough works as expected.
> 
> Feel free to add
>     Tested-by: Marti Raudsepp <marti@juffo.org>

Thanks for testing the fix, I will send it upstream once the merge window is
over -rc1 is released. I also added a stable tag so it gets backported.

Joerg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (13 preceding siblings ...)
  2014-08-12 10:42 ` bugzilla-daemon
@ 2014-08-12 14:53 ` bugzilla-daemon
  2014-08-12 15:09 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12 14:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #15 from Joel Schopp <joel.schopp@amd.com> ---

> (In reply to Joel Schopp from comment #10)
> > > AMD would need to confirm it.
> >
> > I don't have an answer for you offhand.  Let me do some digging and get you
> > an answer.
> 
> I am sorry if I sounded frustrated or arrogant earlier. Any update on this?

It's not clear to me which devices were being put in the same group.  Here's
some of my notes on your lspci output

lspci -vt
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
           +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 200
Series]
           +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller

These xhci controllers are isolated from from the other devices, I would need
some more detail on which variant you are running to determine if they are
isolated from eachother, they probably aren't.

           +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI
mode]
The sata controller is isolated from the other devices

           +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
This pair of OHCI/EHCI controllers are together isolated from the other devices

           +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
This pair of OHCI/EHCI controllers are together isolated from the other devices

           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
           +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but they
are isolated from the other devices I have identified.


           +-14.4-[01]----05.0  Dialogic Corporation PRI
The legacy PCI should be isolated from the other devices identified.  Not sure
what is going on here.

           +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
This OHCI Controller should also be isolated from the other devices.

           +-15.0-[02]--
           +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed USB
Host Controller
Is this in a PCI-e slot or otherwise attached to the PCI-e?

           +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
Is this in a PCI-e slot or otherwise attached to the PCI-e?

           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 141a
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 141b
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 141c
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 141d
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 141e
           \-18.5  Advanced Micro Devices, Inc. [AMD] Device 141f

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (14 preceding siblings ...)
  2014-08-12 14:53 ` bugzilla-daemon
@ 2014-08-12 15:09 ` bugzilla-daemon
  2014-08-12 15:20 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12 15:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #16 from Alex Williamson <alex.williamson@redhat.com> ---
(In reply to Joel Schopp from comment #15)
> > (In reply to Joel Schopp from comment #10)
> > > > AMD would need to confirm it.
> > >
> > > I don't have an answer for you offhand.  Let me do some digging and get you
> > > an answer.
> > 
> > I am sorry if I sounded frustrated or arrogant earlier. Any update on this?
> 
> It's not clear to me which devices were being put in the same group.  Here's
> some of my notes on your lspci output

Marti, the output of 'find /sys/kernel/iommu_groups' would be useful here. 
I'll try to help based on what I think is happening...

> lspci -vt
> -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
>            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
>            +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7
> 200 Series]
>            +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
>            +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
>            +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
> 
> These xhci controllers are isolated from from the other devices, I would
> need some more detail on which variant you are running to determine if they
> are isolated from eachother, they probably aren't.

10.0 & 10.1 will typically be grouped together due to lack of ACS.  This is
usually not a problem.

>            +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
> [AHCI mode]
> The sata controller is isolated from the other devices

Yep, and it's a single function device so IOMMU groups should be ok.

>            +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
>            +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
> This pair of OHCI/EHCI controllers are together isolated from the other
> devices

Yep, same as above.

>            +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
>            +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
> This pair of OHCI/EHCI controllers are together isolated from the other
> devices

Yep

>            +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
>            +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
>            +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
>            +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
> I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but
> they are isolated from the other devices I have identified.
> 
> 
>            +-14.4-[01]----05.0  Dialogic Corporation PRI
> The legacy PCI should be isolated from the other devices identified.  Not
> sure what is going on here.
> 
>            +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
> This OHCI Controller should also be isolated from the other devices.

All of the above will be grouped together, this is the problem.  Since none of
these functions support ACS, IOMMU groups assume that peer-to-peer between
functions is possible.  If 14.4 and 14.5 are truly isolated from the rest of
the package then we should have quirks to support that.  This whole block is an
update or the quirk already shown in comment 7.

>            +-15.0-[02]--
>            +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed
> USB Host Controller
> Is this in a PCI-e slot or otherwise attached to the PCI-e?
> 
>            +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

I would guess 15.x are all PCIe root ports, hopefully with ACS support.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (15 preceding siblings ...)
  2014-08-12 15:09 ` bugzilla-daemon
@ 2014-08-12 15:20 ` bugzilla-daemon
  2014-09-01  9:30 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-08-12 15:20 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #17 from Marti Raudsepp <marti@juffo.org> ---
It's an ASRock FM2A88X Extreme6+ motherboard with the AMD A88X (Bolton-D4)
chipset.

There are 12 IOMMU groups on the system. The problematic group for me is number
9 because the legacy PCI bridge (14.4) gets mixed in with other southbridge
devices (all 14.*).

/sys/kernel/iommu_groups/0/devices:
0000:00:00.0 -> ../../../../devices/pci0000:00/0000:00:00.0

/sys/kernel/iommu_groups/1/devices:
0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1

/sys/kernel/iommu_groups/2/devices:
0000:00:02.0 -> ../../../../devices/pci0000:00/0000:00:02.0

/sys/kernel/iommu_groups/3/devices:
0000:00:03.0 -> ../../../../devices/pci0000:00/0000:00:03.0

/sys/kernel/iommu_groups/4/devices:
0000:00:04.0 -> ../../../../devices/pci0000:00/0000:00:04.0

/sys/kernel/iommu_groups/5/devices:
0000:00:10.0 -> ../../../../devices/pci0000:00/0000:00:10.0
0000:00:10.1 -> ../../../../devices/pci0000:00/0000:00:10.1

/sys/kernel/iommu_groups/6/devices:
0000:00:11.0 -> ../../../../devices/pci0000:00/0000:00:11.0

/sys/kernel/iommu_groups/7/devices:
0000:00:12.0 -> ../../../../devices/pci0000:00/0000:00:12.0
0000:00:12.2 -> ../../../../devices/pci0000:00/0000:00:12.2

/sys/kernel/iommu_groups/8/devices:
0000:00:13.0 -> ../../../../devices/pci0000:00/0000:00:13.0
0000:00:13.2 -> ../../../../devices/pci0000:00/0000:00:13.2

/sys/kernel/iommu_groups/9/devices:
0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0
0000:00:14.1 -> ../../../../devices/pci0000:00/0000:00:14.1
0000:00:14.2 -> ../../../../devices/pci0000:00/0000:00:14.2
0000:00:14.3 -> ../../../../devices/pci0000:00/0000:00:14.3
0000:00:14.4 -> ../../../../devices/pci0000:00/0000:00:14.4
0000:00:14.5 -> ../../../../devices/pci0000:00/0000:00:14.5
0000:01:05.0 -> ../../../../devices/pci0000:00/0000:00:14.4/0000:01:05.0

    [When I plug in a card to the other legacy PCI slot, it also appears here
as
     pci0000:00/0000:00:14.4/0000:01:06.0]

/sys/kernel/iommu_groups/10/devices:
0000:00:15.0 -> ../../../../devices/pci0000:00/0000:00:15.0
0000:00:15.2 -> ../../../../devices/pci0000:00/0000:00:15.2
0000:00:15.3 -> ../../../../devices/pci0000:00/0000:00:15.3
0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:15.2/0000:03:00.0
0000:04:00.0 -> ../../../../devices/pci0000:00/0000:00:15.3/0000:04:00.0

/sys/kernel/iommu_groups/11/devices:
0000:00:18.0 -> ../../../../devices/pci0000:00/0000:00:18.0
0000:00:18.1 -> ../../../../devices/pci0000:00/0000:00:18.1
0000:00:18.2 -> ../../../../devices/pci0000:00/0000:00:18.2
0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
0000:00:18.4 -> ../../../../devices/pci0000:00/0000:00:18.4
0000:00:18.5 -> ../../../../devices/pci0000:00/0000:00:18.5

(In reply to Joel Schopp from comment #15)
> It's not clear to me which devices were being put in the same group.  Here's
> some of my notes on your lspci output

Other than the 14.* devices everything seems to be as you describe.

>            +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
>            +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
>            +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
>            +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
> I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but
> they are isolated from the other devices I have identified.

Ok, that's not a problem.

>            +-14.4-[01]----05.0  Dialogic Corporation PRI
> The legacy PCI should be isolated from the other devices identified.  Not
> sure what is going on here.

Yep, currently shares group 9.

>            +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
> This OHCI Controller should also be isolated from the other devices.

Also shares group 9.

>            +-15.0-[02]--
>            +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed
> USB Host Controller
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

Nope, this is integrated on the motherboard. The only used PCI slot is the
Dialogic card.

>            +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

Integrated Ethernet.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (16 preceding siblings ...)
  2014-08-12 15:20 ` bugzilla-daemon
@ 2014-09-01  9:30 ` bugzilla-daemon
  2014-09-09  7:39 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-09-01  9:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #18 from Joerg Roedel <joro@8bytes.org> ---
The fix is now upstream and part of Linux v3.17-rc2.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (17 preceding siblings ...)
  2014-09-01  9:30 ` bugzilla-daemon
@ 2014-09-09  7:39 ` bugzilla-daemon
  2014-09-09 14:58 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-09-09  7:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #19 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Joel Schopp from comment #15)
> It's not clear to me which devices were being put in the same group.

Hi Joel, any updates on this? I posted my IOMMU groups in comment #17 in case
you missed it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (18 preceding siblings ...)
  2014-09-09  7:39 ` bugzilla-daemon
@ 2014-09-09 14:58 ` bugzilla-daemon
  2014-09-09 15:25 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-09-09 14:58 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #20 from Joel Schopp <joel.schopp@amd.com> ---
What updates are you looking for?  Joerg's fix is now upstream.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (19 preceding siblings ...)
  2014-09-09 14:58 ` bugzilla-daemon
@ 2014-09-09 15:25 ` bugzilla-daemon
  2014-10-02 13:30 ` bugzilla-daemon
  2014-10-10 15:58 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-09-09 15:25 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #21 from Marti Raudsepp <marti@juffo.org> ---
(In reply to Joel Schopp from comment #20)
> What updates are you looking for?  Joerg's fix is now upstream.

Yes, but there's still the issue with southbridge component isolation. You
requested more information from me in comment #15 that I provided in comment
#17.

For background see comment #9 from Alex Williamson:
> AMD would need to confirm it.  IOMMU groups are based on hardware advertised
> isolation via the PCIe ACS capability.  Without this, or a device specific
> quirk to take its place, IOMMU groups must assume that peer-to-peer between
> functions of a multi-function device is possible and therefore that the
> devices are not isolated. [...]

> I think the path forward is to get confirmation from AMD that these function
> are isolated from each other and add quirks to the kernel.  Then you won't
> have the device dependencies in vfio-pci.  The override patch allows you to
> do that with just a kernel boot parameter.  There's no gurantee that
> pci-assign will ever be fixed since it's being phased out.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (20 preceding siblings ...)
  2014-09-09 15:25 ` bugzilla-daemon
@ 2014-10-02 13:30 ` bugzilla-daemon
  2014-10-10 15:58 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-10-02 13:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #22 from Marti Raudsepp <marti@juffo.org> ---
Since I did not get further confirmation from Mr. Schopp, I decided to push it
and submit a patch: https://lkml.org/lkml/2014/10/2/223

The phrases "Not sure what is going on here" and "should also be isolated" in
comment #15 don't inspire much confidence, but I have not managed to obtain
more concrete statements.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
  2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
                   ` (21 preceding siblings ...)
  2014-10-02 13:30 ` bugzilla-daemon
@ 2014-10-10 15:58 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2014-10-10 15:58 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=81841

Marti Raudsepp <marti@juffo.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #23 from Marti Raudsepp <marti@juffo.org> ---
Closing bug, ACS patch merged to mainline Linux:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3587e625fe24a2d1cd1891fc660c3313151a368c

Thanks Joerg.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-10-10 15:58 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-07 14:45 [Bug 81841] New: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge bugzilla-daemon
2014-08-07 14:46 ` [Bug 81841] " bugzilla-daemon
2014-08-07 14:47 ` bugzilla-daemon
2014-08-07 14:47 ` bugzilla-daemon
2014-08-07 15:30 ` bugzilla-daemon
2014-08-07 16:25 ` bugzilla-daemon
2014-08-07 17:56 ` bugzilla-daemon
2014-08-07 18:11 ` bugzilla-daemon
2014-08-07 18:53 ` bugzilla-daemon
2014-08-07 19:47 ` bugzilla-daemon
2014-08-08  2:48 ` bugzilla-daemon
2014-08-08 10:19 ` bugzilla-daemon
2014-08-12  9:36 ` bugzilla-daemon
2014-08-12 10:30 ` bugzilla-daemon
2014-08-12 10:42 ` bugzilla-daemon
2014-08-12 14:53 ` bugzilla-daemon
2014-08-12 15:09 ` bugzilla-daemon
2014-08-12 15:20 ` bugzilla-daemon
2014-09-01  9:30 ` bugzilla-daemon
2014-09-09  7:39 ` bugzilla-daemon
2014-09-09 14:58 ` bugzilla-daemon
2014-09-09 15:25 ` bugzilla-daemon
2014-10-02 13:30 ` bugzilla-daemon
2014-10-10 15:58 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.