All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 19:10 ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-07 19:10 UTC (permalink / raw)
  To: Jiang Liu
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	James Smart

Hi Jiang,

If I tell qemu to use PCI pass-through for a PCI adapter and next load the
lpfc driver for an lpfc adapter that has not been passed through to any VM
a kernel bug is hit. Do you perhaps know whether it should be possible to
a load kernel driver in this scenario? If so, do you know what should change
to avoid that this kernel bug is hit? Should the iommu code be modified or
should the lpfc code be modified? I'm asking you because I think that you
introduced the BUG() statement that was hit. See also commit ab8dfe251571
("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).

Thank you,

Bart.

------------[ cut here ]------------
kernel BUG at drivers/iommu/intel-iommu.c:608!
invalid opcode: 0000 [#1] SMP
CPU: 7 PID: 7842 Comm: modprobe Not tainted 5.0.7+ #2
Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
RIP: 0010:domain_get_iommu+0x50/0x60
Code: c2 01 eb 0b 48 83 c0 01 8b 34 87 85 f6 75 0b 48 39 c2 48 63 c8 75 ed 31 c0 c3 48 c1 e1 03 48 8b 05 15 9b cd 00 48 8b 04 08 c3 <0f> 0b 31 c9 eb ee 66 2e 0f 1f 84 00 00 00 00 00 41 55 8b 05 d0 9a
RSP: 0018:ffffa7884024ba60 EFLAGS: 00010202
RAX: ffff96c8897c60c0 RBX: 00000004046c2000 RCX: ffff96c88edab000
RDX: 00000000fffffff0 RSI: ffff96c88e08de80 RDI: ffff96c8897c60c0
RBP: 0000000000000000 R08: ffff96c88b806b40 R09: ffff96c88f802f50
R10: 0000000000000000 R11: 0000000000000001 R12: ffff96c88edab0b0
R13: ffffffffffffffff R14: 0000000000001000 R15: ffff96c8897c60c0
FS:  00007fec74c05b80(0000) GS:ffff96c89fbc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fec74c5dc80 CR3: 0000000404686004 CR4: 00000000001626e0
Call Trace:
  __intel_map_page+0x7e/0x150
  intel_alloc_coherent+0xa7/0x130
  dma_alloc_attrs+0x6b/0xc0
  dma_pool_alloc+0xb8/0x1a0
  lpfc_mem_alloc+0x109/0x3e0 [lpfc]
  lpfc_pci_probe_one+0xdac/0x2060 [lpfc]
  pci_device_probe+0xc3/0x140
  really_probe+0xd2/0x380
  driver_probe_device+0xae/0xf0
  __driver_attach+0xd5/0x100
  ? driver_probe_device+0xf0/0xf0
  bus_for_each_dev+0x5b/0x90
  bus_add_driver+0x208/0x220
  ? 0xffffffffc0ad9000
  driver_register+0x66/0xb0
  ? 0xffffffffc0ad9000
  lpfc_init+0xd5/0x1000 [lpfc]
  do_one_initcall+0x2e/0x181
  ? __vunmap+0x75/0xb0
  do_init_module+0x55/0x1e0
  load_module+0x2438/0x2560
  ? __do_sys_finit_module+0x8f/0xd0
  __do_sys_finit_module+0x8f/0xd0
  do_syscall_64+0x44/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fec74d212f9
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6f 4b 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffea26c1458 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055eeaf1669c0 RCX: 00007fec74d212f9
RDX: 0000000000000000 RSI: 000055eeaf0ee3c8 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 000055eeaf166400
R10: 0000000000000004 R11: 0000000000000246 R12: 000055eeaf0ee3c8
R13: 0000000000040000 R14: 000055eeaf16eb90 R15: 000055eeaf1669c0
Modules linked in: lpfc(+) scsi_transport_fc mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core pci_stub af_packet vhost_net vhost tun vfio_pci vfio_virqfd vfio_iommu_type1 vfio fuse dm_crypt algif_skcipher af_alg loop devlink bridge stp llc xt_tcpudp ip6t_rpfilter ip6t_REJECT 
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
scsi_transport_iscsi ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul 
ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 crypto_simd cryptd snd_hda_intel iTCO_wdt mei_me iTCO_vendor_support joydev mxm_wmi glue_helper e1000e alx
  mdio intel_rapl_perf snd_hda_codec mei ptp lpc_ich pcspkr i2c_i801 snd_hda_core pps_core mfd_core fan thermal wmi pcc_cpufreq acpi_pad button snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_seq_device snd_pcm snd_timer snd soundcore ext4 crc16 mbcache jbd2 
hid_generic usbhid sd_mod i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm drm_panel_orientation_quirks xhci_pci libata ehci_pci agpgart ehci_hcd xhci_hcd i2c_core video usbcore usb_common sg dm_multipath 
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod unix ipv6 autofs4 [last unloaded: scsi_transport_fc]
---[ end trace cc531c0d02c790cc ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 19:10 ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-07 19:10 UTC (permalink / raw)
  To: Jiang Liu; +Cc: iommu, Joerg Roedel, James Smart

Hi Jiang,

If I tell qemu to use PCI pass-through for a PCI adapter and next load the
lpfc driver for an lpfc adapter that has not been passed through to any VM
a kernel bug is hit. Do you perhaps know whether it should be possible to
a load kernel driver in this scenario? If so, do you know what should change
to avoid that this kernel bug is hit? Should the iommu code be modified or
should the lpfc code be modified? I'm asking you because I think that you
introduced the BUG() statement that was hit. See also commit ab8dfe251571
("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).

Thank you,

Bart.

------------[ cut here ]------------
kernel BUG at drivers/iommu/intel-iommu.c:608!
invalid opcode: 0000 [#1] SMP
CPU: 7 PID: 7842 Comm: modprobe Not tainted 5.0.7+ #2
Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
RIP: 0010:domain_get_iommu+0x50/0x60
Code: c2 01 eb 0b 48 83 c0 01 8b 34 87 85 f6 75 0b 48 39 c2 48 63 c8 75 ed 31 c0 c3 48 c1 e1 03 48 8b 05 15 9b cd 00 48 8b 04 08 c3 <0f> 0b 31 c9 eb ee 66 2e 0f 1f 84 00 00 00 00 00 41 55 8b 05 d0 9a
RSP: 0018:ffffa7884024ba60 EFLAGS: 00010202
RAX: ffff96c8897c60c0 RBX: 00000004046c2000 RCX: ffff96c88edab000
RDX: 00000000fffffff0 RSI: ffff96c88e08de80 RDI: ffff96c8897c60c0
RBP: 0000000000000000 R08: ffff96c88b806b40 R09: ffff96c88f802f50
R10: 0000000000000000 R11: 0000000000000001 R12: ffff96c88edab0b0
R13: ffffffffffffffff R14: 0000000000001000 R15: ffff96c8897c60c0
FS:  00007fec74c05b80(0000) GS:ffff96c89fbc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fec74c5dc80 CR3: 0000000404686004 CR4: 00000000001626e0
Call Trace:
  __intel_map_page+0x7e/0x150
  intel_alloc_coherent+0xa7/0x130
  dma_alloc_attrs+0x6b/0xc0
  dma_pool_alloc+0xb8/0x1a0
  lpfc_mem_alloc+0x109/0x3e0 [lpfc]
  lpfc_pci_probe_one+0xdac/0x2060 [lpfc]
  pci_device_probe+0xc3/0x140
  really_probe+0xd2/0x380
  driver_probe_device+0xae/0xf0
  __driver_attach+0xd5/0x100
  ? driver_probe_device+0xf0/0xf0
  bus_for_each_dev+0x5b/0x90
  bus_add_driver+0x208/0x220
  ? 0xffffffffc0ad9000
  driver_register+0x66/0xb0
  ? 0xffffffffc0ad9000
  lpfc_init+0xd5/0x1000 [lpfc]
  do_one_initcall+0x2e/0x181
  ? __vunmap+0x75/0xb0
  do_init_module+0x55/0x1e0
  load_module+0x2438/0x2560
  ? __do_sys_finit_module+0x8f/0xd0
  __do_sys_finit_module+0x8f/0xd0
  do_syscall_64+0x44/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fec74d212f9
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6f 4b 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffea26c1458 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055eeaf1669c0 RCX: 00007fec74d212f9
RDX: 0000000000000000 RSI: 000055eeaf0ee3c8 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 000055eeaf166400
R10: 0000000000000004 R11: 0000000000000246 R12: 000055eeaf0ee3c8
R13: 0000000000040000 R14: 000055eeaf16eb90 R15: 000055eeaf1669c0
Modules linked in: lpfc(+) scsi_transport_fc mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core pci_stub af_packet vhost_net vhost tun vfio_pci vfio_virqfd vfio_iommu_type1 vfio fuse dm_crypt algif_skcipher af_alg loop devlink bridge stp llc xt_tcpudp ip6t_rpfilter ip6t_REJECT 
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
scsi_transport_iscsi ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul 
ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 crypto_simd cryptd snd_hda_intel iTCO_wdt mei_me iTCO_vendor_support joydev mxm_wmi glue_helper e1000e alx
  mdio intel_rapl_perf snd_hda_codec mei ptp lpc_ich pcspkr i2c_i801 snd_hda_core pps_core mfd_core fan thermal wmi pcc_cpufreq acpi_pad button snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_seq_device snd_pcm snd_timer snd soundcore ext4 crc16 mbcache jbd2 
hid_generic usbhid sd_mod i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm drm_panel_orientation_quirks xhci_pci libata ehci_pci agpgart ehci_hcd xhci_hcd i2c_core video usbcore usb_common sg dm_multipath 
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod unix ipv6 autofs4 [last unloaded: scsi_transport_fc]
---[ end trace cc531c0d02c790cc ]---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 21:06     ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-07 21:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On Sun, 7 Apr 2019 12:10:38 -0700
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> Hi Jiang,
> 
> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
> lpfc driver for an lpfc adapter that has not been passed through to any VM
> a kernel bug is hit. Do you perhaps know whether it should be possible to
> a load kernel driver in this scenario? If so, do you know what should change
> to avoid that this kernel bug is hit? Should the iommu code be modified or
> should the lpfc code be modified? I'm asking you because I think that you
> introduced the BUG() statement that was hit. See also commit ab8dfe251571
> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).

Do both of these lpfc devices belong to the same IOMMU group?
(/sys/kernel/iommu_groups/)  Thanks,

Alex

> ------------[ cut here ]------------
> kernel BUG at drivers/iommu/intel-iommu.c:608!
> invalid opcode: 0000 [#1] SMP
> CPU: 7 PID: 7842 Comm: modprobe Not tainted 5.0.7+ #2
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
> RIP: 0010:domain_get_iommu+0x50/0x60
> Code: c2 01 eb 0b 48 83 c0 01 8b 34 87 85 f6 75 0b 48 39 c2 48 63 c8 75 ed 31 c0 c3 48 c1 e1 03 48 8b 05 15 9b cd 00 48 8b 04 08 c3 <0f> 0b 31 c9 eb ee 66 2e 0f 1f 84 00 00 00 00 00 41 55 8b 05 d0 9a
> RSP: 0018:ffffa7884024ba60 EFLAGS: 00010202
> RAX: ffff96c8897c60c0 RBX: 00000004046c2000 RCX: ffff96c88edab000
> RDX: 00000000fffffff0 RSI: ffff96c88e08de80 RDI: ffff96c8897c60c0
> RBP: 0000000000000000 R08: ffff96c88b806b40 R09: ffff96c88f802f50
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff96c88edab0b0
> R13: ffffffffffffffff R14: 0000000000001000 R15: ffff96c8897c60c0
> FS:  00007fec74c05b80(0000) GS:ffff96c89fbc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fec74c5dc80 CR3: 0000000404686004 CR4: 00000000001626e0
> Call Trace:
>   __intel_map_page+0x7e/0x150
>   intel_alloc_coherent+0xa7/0x130
>   dma_alloc_attrs+0x6b/0xc0
>   dma_pool_alloc+0xb8/0x1a0
>   lpfc_mem_alloc+0x109/0x3e0 [lpfc]
>   lpfc_pci_probe_one+0xdac/0x2060 [lpfc]
>   pci_device_probe+0xc3/0x140
>   really_probe+0xd2/0x380
>   driver_probe_device+0xae/0xf0
>   __driver_attach+0xd5/0x100
>   ? driver_probe_device+0xf0/0xf0
>   bus_for_each_dev+0x5b/0x90
>   bus_add_driver+0x208/0x220
>   ? 0xffffffffc0ad9000
>   driver_register+0x66/0xb0
>   ? 0xffffffffc0ad9000
>   lpfc_init+0xd5/0x1000 [lpfc]
>   do_one_initcall+0x2e/0x181
>   ? __vunmap+0x75/0xb0
>   do_init_module+0x55/0x1e0
>   load_module+0x2438/0x2560
>   ? __do_sys_finit_module+0x8f/0xd0
>   __do_sys_finit_module+0x8f/0xd0
>   do_syscall_64+0x44/0xf0
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fec74d212f9
> Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6f 4b 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffea26c1458 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 000055eeaf1669c0 RCX: 00007fec74d212f9
> RDX: 0000000000000000 RSI: 000055eeaf0ee3c8 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 0000000000000000 R09: 000055eeaf166400
> R10: 0000000000000004 R11: 0000000000000246 R12: 000055eeaf0ee3c8
> R13: 0000000000040000 R14: 000055eeaf16eb90 R15: 000055eeaf1669c0
> Modules linked in: lpfc(+) scsi_transport_fc mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core pci_stub af_packet vhost_net vhost tun vfio_pci vfio_virqfd vfio_iommu_type1 vfio fuse dm_crypt algif_skcipher af_alg loop devlink bridge stp llc xt_tcpudp ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> scsi_transport_iscsi ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul 
> ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 crypto_simd cryptd snd_hda_intel iTCO_wdt mei_me iTCO_vendor_support joydev mxm_wmi glue_helper e1000e alx
>   mdio intel_rapl_perf snd_hda_codec mei ptp lpc_ich pcspkr i2c_i801 snd_hda_core pps_core mfd_core fan thermal wmi pcc_cpufreq acpi_pad button snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_seq_device snd_pcm snd_timer snd soundcore ext4 crc16 mbcache jbd2 
> hid_generic usbhid sd_mod i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm drm_panel_orientation_quirks xhci_pci libata ehci_pci agpgart ehci_hcd xhci_hcd i2c_core video usbcore usb_common sg dm_multipath 
> dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod unix ipv6 autofs4 [last unloaded: scsi_transport_fc]
> ---[ end trace cc531c0d02c790cc ]---
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 21:06     ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-07 21:06 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On Sun, 7 Apr 2019 12:10:38 -0700
Bart Van Assche <bvanassche@acm.org> wrote:

> Hi Jiang,
> 
> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
> lpfc driver for an lpfc adapter that has not been passed through to any VM
> a kernel bug is hit. Do you perhaps know whether it should be possible to
> a load kernel driver in this scenario? If so, do you know what should change
> to avoid that this kernel bug is hit? Should the iommu code be modified or
> should the lpfc code be modified? I'm asking you because I think that you
> introduced the BUG() statement that was hit. See also commit ab8dfe251571
> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).

Do both of these lpfc devices belong to the same IOMMU group?
(/sys/kernel/iommu_groups/)  Thanks,

Alex

> ------------[ cut here ]------------
> kernel BUG at drivers/iommu/intel-iommu.c:608!
> invalid opcode: 0000 [#1] SMP
> CPU: 7 PID: 7842 Comm: modprobe Not tainted 5.0.7+ #2
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
> RIP: 0010:domain_get_iommu+0x50/0x60
> Code: c2 01 eb 0b 48 83 c0 01 8b 34 87 85 f6 75 0b 48 39 c2 48 63 c8 75 ed 31 c0 c3 48 c1 e1 03 48 8b 05 15 9b cd 00 48 8b 04 08 c3 <0f> 0b 31 c9 eb ee 66 2e 0f 1f 84 00 00 00 00 00 41 55 8b 05 d0 9a
> RSP: 0018:ffffa7884024ba60 EFLAGS: 00010202
> RAX: ffff96c8897c60c0 RBX: 00000004046c2000 RCX: ffff96c88edab000
> RDX: 00000000fffffff0 RSI: ffff96c88e08de80 RDI: ffff96c8897c60c0
> RBP: 0000000000000000 R08: ffff96c88b806b40 R09: ffff96c88f802f50
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff96c88edab0b0
> R13: ffffffffffffffff R14: 0000000000001000 R15: ffff96c8897c60c0
> FS:  00007fec74c05b80(0000) GS:ffff96c89fbc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fec74c5dc80 CR3: 0000000404686004 CR4: 00000000001626e0
> Call Trace:
>   __intel_map_page+0x7e/0x150
>   intel_alloc_coherent+0xa7/0x130
>   dma_alloc_attrs+0x6b/0xc0
>   dma_pool_alloc+0xb8/0x1a0
>   lpfc_mem_alloc+0x109/0x3e0 [lpfc]
>   lpfc_pci_probe_one+0xdac/0x2060 [lpfc]
>   pci_device_probe+0xc3/0x140
>   really_probe+0xd2/0x380
>   driver_probe_device+0xae/0xf0
>   __driver_attach+0xd5/0x100
>   ? driver_probe_device+0xf0/0xf0
>   bus_for_each_dev+0x5b/0x90
>   bus_add_driver+0x208/0x220
>   ? 0xffffffffc0ad9000
>   driver_register+0x66/0xb0
>   ? 0xffffffffc0ad9000
>   lpfc_init+0xd5/0x1000 [lpfc]
>   do_one_initcall+0x2e/0x181
>   ? __vunmap+0x75/0xb0
>   do_init_module+0x55/0x1e0
>   load_module+0x2438/0x2560
>   ? __do_sys_finit_module+0x8f/0xd0
>   __do_sys_finit_module+0x8f/0xd0
>   do_syscall_64+0x44/0xf0
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fec74d212f9
> Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6f 4b 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffea26c1458 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 000055eeaf1669c0 RCX: 00007fec74d212f9
> RDX: 0000000000000000 RSI: 000055eeaf0ee3c8 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 0000000000000000 R09: 000055eeaf166400
> R10: 0000000000000004 R11: 0000000000000246 R12: 000055eeaf0ee3c8
> R13: 0000000000040000 R14: 000055eeaf16eb90 R15: 000055eeaf1669c0
> Modules linked in: lpfc(+) scsi_transport_fc mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core pci_stub af_packet vhost_net vhost tun vfio_pci vfio_virqfd vfio_iommu_type1 vfio fuse dm_crypt algif_skcipher af_alg loop devlink bridge stp llc xt_tcpudp ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> scsi_transport_iscsi ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul 
> ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 crypto_simd cryptd snd_hda_intel iTCO_wdt mei_me iTCO_vendor_support joydev mxm_wmi glue_helper e1000e alx
>   mdio intel_rapl_perf snd_hda_codec mei ptp lpc_ich pcspkr i2c_i801 snd_hda_core pps_core mfd_core fan thermal wmi pcc_cpufreq acpi_pad button snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_seq_device snd_pcm snd_timer snd soundcore ext4 crc16 mbcache jbd2 
> hid_generic usbhid sd_mod i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm drm_panel_orientation_quirks xhci_pci libata ehci_pci agpgart ehci_hcd xhci_hcd i2c_core video usbcore usb_common sg dm_multipath 
> dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod unix ipv6 autofs4 [last unloaded: scsi_transport_fc]
> ---[ end trace cc531c0d02c790cc ]---
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 23:02         ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-07 23:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On 4/7/19 2:06 PM, Alex Williamson wrote:
> On Sun, 7 Apr 2019 12:10:38 -0700
> Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
>> lpfc driver for an lpfc adapter that has not been passed through to any VM
>> a kernel bug is hit. Do you perhaps know whether it should be possible to
>> a load kernel driver in this scenario? If so, do you know what should change
>> to avoid that this kernel bug is hit? Should the iommu code be modified or
>> should the lpfc code be modified? I'm asking you because I think that you
>> introduced the BUG() statement that was hit. See also commit ab8dfe251571
>> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).
> 
> Do both of these lpfc devices belong to the same IOMMU group?
> (/sys/kernel/iommu_groups/)  Thanks,

Hi Alex,

Apparently the two Emulex (lpfc) and the two QLogic ports are in the same IOMMU group:

# lspci | grep -E 'QLogic|Emulex'
01:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
01:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
02:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
02:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)

# ls -d /sys/kernel/iommu_groups/*/devices/0000:0[12]:00.*
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.1

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 23:02         ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-07 23:02 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On 4/7/19 2:06 PM, Alex Williamson wrote:
> On Sun, 7 Apr 2019 12:10:38 -0700
> Bart Van Assche <bvanassche@acm.org> wrote:
>> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
>> lpfc driver for an lpfc adapter that has not been passed through to any VM
>> a kernel bug is hit. Do you perhaps know whether it should be possible to
>> a load kernel driver in this scenario? If so, do you know what should change
>> to avoid that this kernel bug is hit? Should the iommu code be modified or
>> should the lpfc code be modified? I'm asking you because I think that you
>> introduced the BUG() statement that was hit. See also commit ab8dfe251571
>> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).
> 
> Do both of these lpfc devices belong to the same IOMMU group?
> (/sys/kernel/iommu_groups/)  Thanks,

Hi Alex,

Apparently the two Emulex (lpfc) and the two QLogic ports are in the same IOMMU group:

# lspci | grep -E 'QLogic|Emulex'
01:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
01:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
02:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
02:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)

# ls -d /sys/kernel/iommu_groups/*/devices/0000:0[12]:00.*
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.1

Thanks,

Bart.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 23:31             ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-07 23:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On Sun, 7 Apr 2019 16:02:31 -0700
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> On 4/7/19 2:06 PM, Alex Williamson wrote:
> > On Sun, 7 Apr 2019 12:10:38 -0700
> > Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:  
> >> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
> >> lpfc driver for an lpfc adapter that has not been passed through to any VM
> >> a kernel bug is hit. Do you perhaps know whether it should be possible to
> >> a load kernel driver in this scenario? If so, do you know what should change
> >> to avoid that this kernel bug is hit? Should the iommu code be modified or
> >> should the lpfc code be modified? I'm asking you because I think that you
> >> introduced the BUG() statement that was hit. See also commit ab8dfe251571
> >> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).  
> > 
> > Do both of these lpfc devices belong to the same IOMMU group?
> > (/sys/kernel/iommu_groups/)  Thanks,  
> 
> Hi Alex,
> 
> Apparently the two Emulex (lpfc) and the two QLogic ports are in the same IOMMU group:
> 
> # lspci | grep -E 'QLogic|Emulex'
> 01:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
> 01:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
> 02:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
> 02:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
> 
> # ls -d /sys/kernel/iommu_groups/*/devices/0000:0[12]:00.*
> /sys/kernel/iommu_groups/1/devices/0000:01:00.0
> /sys/kernel/iommu_groups/1/devices/0000:01:00.1
> /sys/kernel/iommu_groups/1/devices/0000:02:00.0
> /sys/kernel/iommu_groups/1/devices/0000:02:00.1

It's not possible to do what you want with this configuration.  An IOMMU
group represents the smallest set of devices that are isolated from
other sets of devices and is also therefore the minimum granularity we
can assign devices to userspace (ex. QEMU).  The kernel reacts to
breaking the isolation of the group with a BUG_ON.  If you managed not
to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
of isolation is detected there. IOMMU groups are formed at the highest
point in the topology which guarantees isolation.  This can be
indicated either via native PCIe ACS support or ACS-equivalent quirks
in the code.  If the root port provides neither of these, then all
devices downstream are grouped together as well as all peer root ports
in the same PCI slot and all devices downstream of those.  If a
multifunction endpoint does not provide ACS or equivalent quirks, the
functions will be grouped together. Not all endpoint devices or systems
are designed for minimum possible granularity.  You can learn more
here[1].  Thanks,

Alex

[1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-07 23:31             ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-07 23:31 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On Sun, 7 Apr 2019 16:02:31 -0700
Bart Van Assche <bvanassche@acm.org> wrote:

> On 4/7/19 2:06 PM, Alex Williamson wrote:
> > On Sun, 7 Apr 2019 12:10:38 -0700
> > Bart Van Assche <bvanassche@acm.org> wrote:  
> >> If I tell qemu to use PCI pass-through for a PCI adapter and next load the
> >> lpfc driver for an lpfc adapter that has not been passed through to any VM
> >> a kernel bug is hit. Do you perhaps know whether it should be possible to
> >> a load kernel driver in this scenario? If so, do you know what should change
> >> to avoid that this kernel bug is hit? Should the iommu code be modified or
> >> should the lpfc code be modified? I'm asking you because I think that you
> >> introduced the BUG() statement that was hit. See also commit ab8dfe251571
> >> ("iommu/vt-d: Introduce helper functions to improve code readability"; v3.17).  
> > 
> > Do both of these lpfc devices belong to the same IOMMU group?
> > (/sys/kernel/iommu_groups/)  Thanks,  
> 
> Hi Alex,
> 
> Apparently the two Emulex (lpfc) and the two QLogic ports are in the same IOMMU group:
> 
> # lspci | grep -E 'QLogic|Emulex'
> 01:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
> 01:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
> 02:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
> 02:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
> 
> # ls -d /sys/kernel/iommu_groups/*/devices/0000:0[12]:00.*
> /sys/kernel/iommu_groups/1/devices/0000:01:00.0
> /sys/kernel/iommu_groups/1/devices/0000:01:00.1
> /sys/kernel/iommu_groups/1/devices/0000:02:00.0
> /sys/kernel/iommu_groups/1/devices/0000:02:00.1

It's not possible to do what you want with this configuration.  An IOMMU
group represents the smallest set of devices that are isolated from
other sets of devices and is also therefore the minimum granularity we
can assign devices to userspace (ex. QEMU).  The kernel reacts to
breaking the isolation of the group with a BUG_ON.  If you managed not
to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
of isolation is detected there. IOMMU groups are formed at the highest
point in the topology which guarantees isolation.  This can be
indicated either via native PCIe ACS support or ACS-equivalent quirks
in the code.  If the root port provides neither of these, then all
devices downstream are grouped together as well as all peer root ports
in the same PCI slot and all devices downstream of those.  If a
multifunction endpoint does not provide ACS or equivalent quirks, the
functions will be grouped together. Not all endpoint devices or systems
are designed for minimum possible granularity.  You can learn more
here[1].  Thanks,

Alex

[1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:13                 ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-08 15:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
> It's not possible to do what you want with this configuration.  An IOMMU
> group represents the smallest set of devices that are isolated from
> other sets of devices and is also therefore the minimum granularity we
> can assign devices to userspace (ex. QEMU).  The kernel reacts to
> breaking the isolation of the group with a BUG_ON.  If you managed not
> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> of isolation is detected there. IOMMU groups are formed at the highest
> point in the topology which guarantees isolation.  This can be
> indicated either via native PCIe ACS support or ACS-equivalent quirks
> in the code.  If the root port provides neither of these, then all
> devices downstream are grouped together as well as all peer root ports
> in the same PCI slot and all devices downstream of those.  If a
> multifunction endpoint does not provide ACS or equivalent quirks, the
> functions will be grouped together. Not all endpoint devices or systems
> are designed for minimum possible granularity.  You can learn more
> here[1].  Thanks,
> 
> Alex
> 
> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

Hi Alex,

Thank you for the detailed reply. The background information you provided
makes it very clear why the devices I mentioned in my e-mail ended up in the
same IOMMU group.

But it seems that I was not clear enough in my original e-mail. My concern is
that a user space action (modprobe) should never trigger a kernel BUG(). Is
there any way to make sure that the sequence of actions I performed causes
modprobe to fail with an error code instead of triggering a kernel BUG()?

Thanks,

Bart.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:13                 ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-08 15:13 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
> It's not possible to do what you want with this configuration.  An IOMMU
> group represents the smallest set of devices that are isolated from
> other sets of devices and is also therefore the minimum granularity we
> can assign devices to userspace (ex. QEMU).  The kernel reacts to
> breaking the isolation of the group with a BUG_ON.  If you managed not
> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> of isolation is detected there. IOMMU groups are formed at the highest
> point in the topology which guarantees isolation.  This can be
> indicated either via native PCIe ACS support or ACS-equivalent quirks
> in the code.  If the root port provides neither of these, then all
> devices downstream are grouped together as well as all peer root ports
> in the same PCI slot and all devices downstream of those.  If a
> multifunction endpoint does not provide ACS or equivalent quirks, the
> functions will be grouped together. Not all endpoint devices or systems
> are designed for minimum possible granularity.  You can learn more
> here[1].  Thanks,
> 
> Alex
> 
> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

Hi Alex,

Thank you for the detailed reply. The background information you provided
makes it very clear why the devices I mentioned in my e-mail ended up in the
same IOMMU group.

But it seems that I was not clear enough in my original e-mail. My concern is
that a user space action (modprobe) should never trigger a kernel BUG(). Is
there any way to make sure that the sequence of actions I performed causes
modprobe to fail with an error code instead of triggering a kernel BUG()?

Thanks,

Bart.

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:23                     ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-08 15:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On Mon, 08 Apr 2019 08:13:34 -0700
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
> > It's not possible to do what you want with this configuration.  An IOMMU
> > group represents the smallest set of devices that are isolated from
> > other sets of devices and is also therefore the minimum granularity we
> > can assign devices to userspace (ex. QEMU).  The kernel reacts to
> > breaking the isolation of the group with a BUG_ON.  If you managed not
> > to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> > of isolation is detected there. IOMMU groups are formed at the highest
> > point in the topology which guarantees isolation.  This can be
> > indicated either via native PCIe ACS support or ACS-equivalent quirks
> > in the code.  If the root port provides neither of these, then all
> > devices downstream are grouped together as well as all peer root ports
> > in the same PCI slot and all devices downstream of those.  If a
> > multifunction endpoint does not provide ACS or equivalent quirks, the
> > functions will be grouped together. Not all endpoint devices or systems
> > are designed for minimum possible granularity.  You can learn more
> > here[1].  Thanks,
> > 
> > Alex
> > 
> > [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html  
> 
> Hi Alex,
> 
> Thank you for the detailed reply. The background information you provided
> makes it very clear why the devices I mentioned in my e-mail ended up in the
> same IOMMU group.
> 
> But it seems that I was not clear enough in my original e-mail. My concern is
> that a user space action (modprobe) should never trigger a kernel BUG(). Is
> there any way to make sure that the sequence of actions I performed causes
> modprobe to fail with an error code instead of triggering a kernel BUG()?

Loading modules is privileged:

$ modprobe vfio-pci
modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted

Granting a device to a user for device assignment purposes is also a
privileged operation.  Can you describe a scenario where this is
reachable without elevated privileges?  The driver core maintainer has
indicated previously that manipulation of driver binding is effectively
at your own risk.  It's entirely possible to bind devices to the wrong
driver creating all sorts of bad behavior.  In this case, it appears
that the system has been improperly configured if devices from a user
owned group can accidentally be bound to host drivers.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:23                     ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-08 15:23 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On Mon, 08 Apr 2019 08:13:34 -0700
Bart Van Assche <bvanassche@acm.org> wrote:

> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
> > It's not possible to do what you want with this configuration.  An IOMMU
> > group represents the smallest set of devices that are isolated from
> > other sets of devices and is also therefore the minimum granularity we
> > can assign devices to userspace (ex. QEMU).  The kernel reacts to
> > breaking the isolation of the group with a BUG_ON.  If you managed not
> > to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> > of isolation is detected there. IOMMU groups are formed at the highest
> > point in the topology which guarantees isolation.  This can be
> > indicated either via native PCIe ACS support or ACS-equivalent quirks
> > in the code.  If the root port provides neither of these, then all
> > devices downstream are grouped together as well as all peer root ports
> > in the same PCI slot and all devices downstream of those.  If a
> > multifunction endpoint does not provide ACS or equivalent quirks, the
> > functions will be grouped together. Not all endpoint devices or systems
> > are designed for minimum possible granularity.  You can learn more
> > here[1].  Thanks,
> > 
> > Alex
> > 
> > [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html  
> 
> Hi Alex,
> 
> Thank you for the detailed reply. The background information you provided
> makes it very clear why the devices I mentioned in my e-mail ended up in the
> same IOMMU group.
> 
> But it seems that I was not clear enough in my original e-mail. My concern is
> that a user space action (modprobe) should never trigger a kernel BUG(). Is
> there any way to make sure that the sequence of actions I performed causes
> modprobe to fail with an error code instead of triggering a kernel BUG()?

Loading modules is privileged:

$ modprobe vfio-pci
modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted

Granting a device to a user for device assignment purposes is also a
privileged operation.  Can you describe a scenario where this is
reachable without elevated privileges?  The driver core maintainer has
indicated previously that manipulation of driver binding is effectively
at your own risk.  It's entirely possible to bind devices to the wrong
driver creating all sorts of bad behavior.  In this case, it appears
that the system has been improperly configured if devices from a user
owned group can accidentally be bound to host drivers.  Thanks,

Alex
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:30                         ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-08 15:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On Mon, 2019-04-08 at 09:23 -0600, Alex Williamson wrote:
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers. 

No user space action should ever crash the kernel, whether or not it
is a privileged action and whether or not a configuration mistake is
involved. The only exception are actions that are intended to crash
the kernel, e.g. SysRq-c. I'm surprised that I have to explain this.

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:30                         ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-04-08 15:30 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On Mon, 2019-04-08 at 09:23 -0600, Alex Williamson wrote:
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers. 

No user space action should ever crash the kernel, whether or not it
is a privileged action and whether or not a configuration mistake is
involved. The only exception are actions that are intended to crash
the kernel, e.g. SysRq-c. I'm surprised that I have to explain this.

Bart.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:55                         ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-04-08 15:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: James Smart, Bart Van Assche,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Linus Torvalds, Jiang Liu

On Mon, Apr 08, 2019 at 09:23:45AM -0600, Alex Williamson wrote:
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers.  Thanks,

Sorry, but BUG()ing in case of invalid user input is simply not
acceptable.  Please fix this code to have proper error handling.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 15:55                         ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-04-08 15:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: James Smart, Bart Van Assche, iommu, Joerg Roedel,
	Linus Torvalds, Jiang Liu

On Mon, Apr 08, 2019 at 09:23:45AM -0600, Alex Williamson wrote:
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers.  Thanks,

Sorry, but BUG()ing in case of invalid user input is simply not
acceptable.  Please fix this code to have proper error handling.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 17:10                         ` Robin Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Robin Murphy @ 2019-04-08 17:10 UTC (permalink / raw)
  To: Alex Williamson, Bart Van Assche
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joerg Roedel,
	Jiang Liu, James Smart

On 08/04/2019 16:23, Alex Williamson wrote:
> On Mon, 08 Apr 2019 08:13:34 -0700
> Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> 
>> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
>>> It's not possible to do what you want with this configuration.  An IOMMU
>>> group represents the smallest set of devices that are isolated from
>>> other sets of devices and is also therefore the minimum granularity we
>>> can assign devices to userspace (ex. QEMU).  The kernel reacts to
>>> breaking the isolation of the group with a BUG_ON.  If you managed not
>>> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
>>> of isolation is detected there. IOMMU groups are formed at the highest
>>> point in the topology which guarantees isolation.  This can be
>>> indicated either via native PCIe ACS support or ACS-equivalent quirks
>>> in the code.  If the root port provides neither of these, then all
>>> devices downstream are grouped together as well as all peer root ports
>>> in the same PCI slot and all devices downstream of those.  If a
>>> multifunction endpoint does not provide ACS or equivalent quirks, the
>>> functions will be grouped together. Not all endpoint devices or systems
>>> are designed for minimum possible granularity.  You can learn more
>>> here[1].  Thanks,
>>>
>>> Alex
>>>
>>> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
>>
>> Hi Alex,
>>
>> Thank you for the detailed reply. The background information you provided
>> makes it very clear why the devices I mentioned in my e-mail ended up in the
>> same IOMMU group.
>>
>> But it seems that I was not clear enough in my original e-mail. My concern is
>> that a user space action (modprobe) should never trigger a kernel BUG(). Is
>> there any way to make sure that the sequence of actions I performed causes
>> modprobe to fail with an error code instead of triggering a kernel BUG()?
> 
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers.  Thanks,

The fundamental problem seems to be that VFIO is checking the viability 
of a group a bit too late, or not being strict enough to begin with. 
I've just reproduced much the equivalent thing on an arm64 system where 
I have a single group containing 2 real devices (plus a bunch of bridges):

   echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
   echo 0000:08:00.0 > /sys/bus/pci/devices/0000:08:00.0/driver/unbind

   echo $VID $DID > /sys/bus/pci/drivers/vfio-pci/new_id #IDs for 08:00.0

   lkvm run Image --vfio-pci 08:00.0 ...
   # guest runs...

Then back on the host,

   echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

and bang:

   [ 1091.768165] ------------[ cut here ]------------
   [ 1091.772732] kernel BUG at drivers/vfio/vfio.c:759!
   [ 1091.777472] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
   [ 1091.782898] Modules linked in:
   [ 1091.785920] CPU: 1 PID: 1090 Comm: sh Not tainted 5.1.0-rc1+ #77
   [ 1091.791862] Hardware name: ARM LTD ARM Juno Development 
Platform/ARM Juno Development Platform, BIOS EDK II Feb 25 2019
   [ 1091.802535] pstate: 60000005 (nZCv daif -PAN -UAO)
   [ 1091.807279] pc : vfio_iommu_group_notifier+0x1c8/0x360
   [ 1091.812361] lr : vfio_iommu_group_notifier+0x1c4/0x360
   ...

Yes, they're all privileged operations so you can say "well don't do 
that then", but it's still a rather unexpected behaviour. This is 
actually slightly worse than Bart's case, since arm-smmu doesn't have an 
equivalent to that BUG_ON() check in __intel_map_page(), so the host 
driver for 03:00.0 may have successfully started DMA during probing and 
potentially corrupted guest memory by that point. AFAICS, ideally in 
this situation vfio_iommu_group_notifier() could catch 
IOMMU_GROUP_NOTIFY_BIND_DRIVER and prevent any new drivers from binding 
while the rest of the group is assigned, but at a glance there seems to 
be some core plumbing missing to allow that to happen :/

Alternatively, maybe we could just tighten up and stop treating unbound 
devices as viable - that certainly seems easier to implement, but 
whether it impacts real use-cases I don't know.

I guess this comes down to that "TBD - interface for disabling driver 
probing/locking a device." in Documentation/vfio.txt.

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 17:10                         ` Robin Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Robin Murphy @ 2019-04-08 17:10 UTC (permalink / raw)
  To: Alex Williamson, Bart Van Assche
  Cc: iommu, Joerg Roedel, Jiang Liu, James Smart

On 08/04/2019 16:23, Alex Williamson wrote:
> On Mon, 08 Apr 2019 08:13:34 -0700
> Bart Van Assche <bvanassche@acm.org> wrote:
> 
>> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:
>>> It's not possible to do what you want with this configuration.  An IOMMU
>>> group represents the smallest set of devices that are isolated from
>>> other sets of devices and is also therefore the minimum granularity we
>>> can assign devices to userspace (ex. QEMU).  The kernel reacts to
>>> breaking the isolation of the group with a BUG_ON.  If you managed not
>>> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
>>> of isolation is detected there. IOMMU groups are formed at the highest
>>> point in the topology which guarantees isolation.  This can be
>>> indicated either via native PCIe ACS support or ACS-equivalent quirks
>>> in the code.  If the root port provides neither of these, then all
>>> devices downstream are grouped together as well as all peer root ports
>>> in the same PCI slot and all devices downstream of those.  If a
>>> multifunction endpoint does not provide ACS or equivalent quirks, the
>>> functions will be grouped together. Not all endpoint devices or systems
>>> are designed for minimum possible granularity.  You can learn more
>>> here[1].  Thanks,
>>>
>>> Alex
>>>
>>> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
>>
>> Hi Alex,
>>
>> Thank you for the detailed reply. The background information you provided
>> makes it very clear why the devices I mentioned in my e-mail ended up in the
>> same IOMMU group.
>>
>> But it seems that I was not clear enough in my original e-mail. My concern is
>> that a user space action (modprobe) should never trigger a kernel BUG(). Is
>> there any way to make sure that the sequence of actions I performed causes
>> modprobe to fail with an error code instead of triggering a kernel BUG()?
> 
> Loading modules is privileged:
> 
> $ modprobe vfio-pci
> modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> 
> Granting a device to a user for device assignment purposes is also a
> privileged operation.  Can you describe a scenario where this is
> reachable without elevated privileges?  The driver core maintainer has
> indicated previously that manipulation of driver binding is effectively
> at your own risk.  It's entirely possible to bind devices to the wrong
> driver creating all sorts of bad behavior.  In this case, it appears
> that the system has been improperly configured if devices from a user
> owned group can accidentally be bound to host drivers.  Thanks,

The fundamental problem seems to be that VFIO is checking the viability 
of a group a bit too late, or not being strict enough to begin with. 
I've just reproduced much the equivalent thing on an arm64 system where 
I have a single group containing 2 real devices (plus a bunch of bridges):

   echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
   echo 0000:08:00.0 > /sys/bus/pci/devices/0000:08:00.0/driver/unbind

   echo $VID $DID > /sys/bus/pci/drivers/vfio-pci/new_id #IDs for 08:00.0

   lkvm run Image --vfio-pci 08:00.0 ...
   # guest runs...

Then back on the host,

   echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

and bang:

   [ 1091.768165] ------------[ cut here ]------------
   [ 1091.772732] kernel BUG at drivers/vfio/vfio.c:759!
   [ 1091.777472] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
   [ 1091.782898] Modules linked in:
   [ 1091.785920] CPU: 1 PID: 1090 Comm: sh Not tainted 5.1.0-rc1+ #77
   [ 1091.791862] Hardware name: ARM LTD ARM Juno Development 
Platform/ARM Juno Development Platform, BIOS EDK II Feb 25 2019
   [ 1091.802535] pstate: 60000005 (nZCv daif -PAN -UAO)
   [ 1091.807279] pc : vfio_iommu_group_notifier+0x1c8/0x360
   [ 1091.812361] lr : vfio_iommu_group_notifier+0x1c4/0x360
   ...

Yes, they're all privileged operations so you can say "well don't do 
that then", but it's still a rather unexpected behaviour. This is 
actually slightly worse than Bart's case, since arm-smmu doesn't have an 
equivalent to that BUG_ON() check in __intel_map_page(), so the host 
driver for 03:00.0 may have successfully started DMA during probing and 
potentially corrupted guest memory by that point. AFAICS, ideally in 
this situation vfio_iommu_group_notifier() could catch 
IOMMU_GROUP_NOTIFY_BIND_DRIVER and prevent any new drivers from binding 
while the rest of the group is assigned, but at a glance there seems to 
be some core plumbing missing to allow that to happen :/

Alternatively, maybe we could just tighten up and stop treating unbound 
devices as viable - that certainly seems easier to implement, but 
whether it impacts real use-cases I don't know.

I guess this comes down to that "TBD - interface for disabling driver 
probing/locking a device." in Documentation/vfio.txt.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 18:05                             ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-08 18:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: James Smart, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Joerg Roedel, Bart Van Assche, Jiang Liu

On Mon, 8 Apr 2019 18:10:05 +0100
Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:

> On 08/04/2019 16:23, Alex Williamson wrote:
> > On Mon, 08 Apr 2019 08:13:34 -0700
> > Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> >   
> >> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:  
> >>> It's not possible to do what you want with this configuration.  An IOMMU
> >>> group represents the smallest set of devices that are isolated from
> >>> other sets of devices and is also therefore the minimum granularity we
> >>> can assign devices to userspace (ex. QEMU).  The kernel reacts to
> >>> breaking the isolation of the group with a BUG_ON.  If you managed not
> >>> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> >>> of isolation is detected there. IOMMU groups are formed at the highest
> >>> point in the topology which guarantees isolation.  This can be
> >>> indicated either via native PCIe ACS support or ACS-equivalent quirks
> >>> in the code.  If the root port provides neither of these, then all
> >>> devices downstream are grouped together as well as all peer root ports
> >>> in the same PCI slot and all devices downstream of those.  If a
> >>> multifunction endpoint does not provide ACS or equivalent quirks, the
> >>> functions will be grouped together. Not all endpoint devices or systems
> >>> are designed for minimum possible granularity.  You can learn more
> >>> here[1].  Thanks,
> >>>
> >>> Alex
> >>>
> >>> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html  
> >>
> >> Hi Alex,
> >>
> >> Thank you for the detailed reply. The background information you provided
> >> makes it very clear why the devices I mentioned in my e-mail ended up in the
> >> same IOMMU group.
> >>
> >> But it seems that I was not clear enough in my original e-mail. My concern is
> >> that a user space action (modprobe) should never trigger a kernel BUG(). Is
> >> there any way to make sure that the sequence of actions I performed causes
> >> modprobe to fail with an error code instead of triggering a kernel BUG()?  
> > 
> > Loading modules is privileged:
> > 
> > $ modprobe vfio-pci
> > modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> > 
> > Granting a device to a user for device assignment purposes is also a
> > privileged operation.  Can you describe a scenario where this is
> > reachable without elevated privileges?  The driver core maintainer has
> > indicated previously that manipulation of driver binding is effectively
> > at your own risk.  It's entirely possible to bind devices to the wrong
> > driver creating all sorts of bad behavior.  In this case, it appears
> > that the system has been improperly configured if devices from a user
> > owned group can accidentally be bound to host drivers.  Thanks,  
> 
> The fundamental problem seems to be that VFIO is checking the viability 
> of a group a bit too late, or not being strict enough to begin with. 
> I've just reproduced much the equivalent thing on an arm64 system where 
> I have a single group containing 2 real devices (plus a bunch of bridges):
> 
>    echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
>    echo 0000:08:00.0 > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
> 
>    echo $VID $DID > /sys/bus/pci/drivers/vfio-pci/new_id #IDs for 08:00.0
> 
>    lkvm run Image --vfio-pci 08:00.0 ...
>    # guest runs...
> 
> Then back on the host,
> 
>    echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> and bang:
> 
>    [ 1091.768165] ------------[ cut here ]------------
>    [ 1091.772732] kernel BUG at drivers/vfio/vfio.c:759!
>    [ 1091.777472] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>    [ 1091.782898] Modules linked in:
>    [ 1091.785920] CPU: 1 PID: 1090 Comm: sh Not tainted 5.1.0-rc1+ #77
>    [ 1091.791862] Hardware name: ARM LTD ARM Juno Development 
> Platform/ARM Juno Development Platform, BIOS EDK II Feb 25 2019
>    [ 1091.802535] pstate: 60000005 (nZCv daif -PAN -UAO)
>    [ 1091.807279] pc : vfio_iommu_group_notifier+0x1c8/0x360
>    [ 1091.812361] lr : vfio_iommu_group_notifier+0x1c4/0x360
>    ...
> 
> Yes, they're all privileged operations so you can say "well don't do 
> that then", but it's still a rather unexpected behaviour. This is 
> actually slightly worse than Bart's case, since arm-smmu doesn't have an 
> equivalent to that BUG_ON() check in __intel_map_page(), so the host 
> driver for 03:00.0 may have successfully started DMA during probing and 
> potentially corrupted guest memory by that point. AFAICS, ideally in 
> this situation vfio_iommu_group_notifier() could catch 
> IOMMU_GROUP_NOTIFY_BIND_DRIVER and prevent any new drivers from binding 
> while the rest of the group is assigned, but at a glance there seems to 
> be some core plumbing missing to allow that to happen :/
> 
> Alternatively, maybe we could just tighten up and stop treating unbound 
> devices as viable - that certainly seems easier to implement, but 
> whether it impacts real use-cases I don't know.
> 
> I guess this comes down to that "TBD - interface for disabling driver 
> probing/locking a device." in Documentation/vfio.txt.

I've tried to fix this previously...

https://patchwork.kernel.org/patch/9799841/
https://lore.kernel.org/patchwork/patch/803695/

You can see in the first link where I've been advised that users
mucking with driver binding and things breaking is par for the course.
It's clearly not ideal to crash the kernel, but once the isolation has
already been broken, our options are limited.  At that point it's not
enough to kill the user process.  I tried a couple approaches to
prevent the situation and didn't get traction.  New ideas welcome.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:608
@ 2019-04-08 18:05                             ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2019-04-08 18:05 UTC (permalink / raw)
  To: Robin Murphy; +Cc: James Smart, iommu, Joerg Roedel, Bart Van Assche, Jiang Liu

On Mon, 8 Apr 2019 18:10:05 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 08/04/2019 16:23, Alex Williamson wrote:
> > On Mon, 08 Apr 2019 08:13:34 -0700
> > Bart Van Assche <bvanassche@acm.org> wrote:
> >   
> >> On Sun, 2019-04-07 at 17:31 -0600, Alex Williamson wrote:  
> >>> It's not possible to do what you want with this configuration.  An IOMMU
> >>> group represents the smallest set of devices that are isolated from
> >>> other sets of devices and is also therefore the minimum granularity we
> >>> can assign devices to userspace (ex. QEMU).  The kernel reacts to
> >>> breaking the isolation of the group with a BUG_ON.  If you managed not
> >>> to hit the BUG_ON here, you'd hit the BUG_ON in vfio code when the loss
> >>> of isolation is detected there. IOMMU groups are formed at the highest
> >>> point in the topology which guarantees isolation.  This can be
> >>> indicated either via native PCIe ACS support or ACS-equivalent quirks
> >>> in the code.  If the root port provides neither of these, then all
> >>> devices downstream are grouped together as well as all peer root ports
> >>> in the same PCI slot and all devices downstream of those.  If a
> >>> multifunction endpoint does not provide ACS or equivalent quirks, the
> >>> functions will be grouped together. Not all endpoint devices or systems
> >>> are designed for minimum possible granularity.  You can learn more
> >>> here[1].  Thanks,
> >>>
> >>> Alex
> >>>
> >>> [1] http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html  
> >>
> >> Hi Alex,
> >>
> >> Thank you for the detailed reply. The background information you provided
> >> makes it very clear why the devices I mentioned in my e-mail ended up in the
> >> same IOMMU group.
> >>
> >> But it seems that I was not clear enough in my original e-mail. My concern is
> >> that a user space action (modprobe) should never trigger a kernel BUG(). Is
> >> there any way to make sure that the sequence of actions I performed causes
> >> modprobe to fail with an error code instead of triggering a kernel BUG()?  
> > 
> > Loading modules is privileged:
> > 
> > $ modprobe vfio-pci
> > modprobe: ERROR: could not insert 'vfio_pci': Operation not permitted
> > 
> > Granting a device to a user for device assignment purposes is also a
> > privileged operation.  Can you describe a scenario where this is
> > reachable without elevated privileges?  The driver core maintainer has
> > indicated previously that manipulation of driver binding is effectively
> > at your own risk.  It's entirely possible to bind devices to the wrong
> > driver creating all sorts of bad behavior.  In this case, it appears
> > that the system has been improperly configured if devices from a user
> > owned group can accidentally be bound to host drivers.  Thanks,  
> 
> The fundamental problem seems to be that VFIO is checking the viability 
> of a group a bit too late, or not being strict enough to begin with. 
> I've just reproduced much the equivalent thing on an arm64 system where 
> I have a single group containing 2 real devices (plus a bunch of bridges):
> 
>    echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
>    echo 0000:08:00.0 > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
> 
>    echo $VID $DID > /sys/bus/pci/drivers/vfio-pci/new_id #IDs for 08:00.0
> 
>    lkvm run Image --vfio-pci 08:00.0 ...
>    # guest runs...
> 
> Then back on the host,
> 
>    echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> and bang:
> 
>    [ 1091.768165] ------------[ cut here ]------------
>    [ 1091.772732] kernel BUG at drivers/vfio/vfio.c:759!
>    [ 1091.777472] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>    [ 1091.782898] Modules linked in:
>    [ 1091.785920] CPU: 1 PID: 1090 Comm: sh Not tainted 5.1.0-rc1+ #77
>    [ 1091.791862] Hardware name: ARM LTD ARM Juno Development 
> Platform/ARM Juno Development Platform, BIOS EDK II Feb 25 2019
>    [ 1091.802535] pstate: 60000005 (nZCv daif -PAN -UAO)
>    [ 1091.807279] pc : vfio_iommu_group_notifier+0x1c8/0x360
>    [ 1091.812361] lr : vfio_iommu_group_notifier+0x1c4/0x360
>    ...
> 
> Yes, they're all privileged operations so you can say "well don't do 
> that then", but it's still a rather unexpected behaviour. This is 
> actually slightly worse than Bart's case, since arm-smmu doesn't have an 
> equivalent to that BUG_ON() check in __intel_map_page(), so the host 
> driver for 03:00.0 may have successfully started DMA during probing and 
> potentially corrupted guest memory by that point. AFAICS, ideally in 
> this situation vfio_iommu_group_notifier() could catch 
> IOMMU_GROUP_NOTIFY_BIND_DRIVER and prevent any new drivers from binding 
> while the rest of the group is assigned, but at a glance there seems to 
> be some core plumbing missing to allow that to happen :/
> 
> Alternatively, maybe we could just tighten up and stop treating unbound 
> devices as viable - that certainly seems easier to implement, but 
> whether it impacts real use-cases I don't know.
> 
> I guess this comes down to that "TBD - interface for disabling driver 
> probing/locking a device." in Documentation/vfio.txt.

I've tried to fix this previously...

https://patchwork.kernel.org/patch/9799841/
https://lore.kernel.org/patchwork/patch/803695/

You can see in the first link where I've been advised that users
mucking with driver binding and things breaking is par for the course.
It's clearly not ideal to crash the kernel, but once the isolation has
already been broken, our options are limited.  At that point it's not
enough to kill the user process.  I tried a couple approaches to
prevent the situation and didn't get traction.  New ideas welcome.
Thanks,

Alex
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-04-08 18:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-07 19:10 kernel BUG at drivers/iommu/intel-iommu.c:608 Bart Van Assche
2019-04-07 19:10 ` Bart Van Assche
     [not found] ` <a523ea10-3dab-fa2c-1ecf-5cf32077565f-HInyCGIudOg@public.gmane.org>
2019-04-07 21:06   ` Alex Williamson
2019-04-07 21:06     ` Alex Williamson
     [not found]     ` <20190407150650.060cc508-hfcDOgR9qeA@public.gmane.org>
2019-04-07 23:02       ` Bart Van Assche
2019-04-07 23:02         ` Bart Van Assche
     [not found]         ` <8d876549-21da-e027-0157-8737b10e26f8-HInyCGIudOg@public.gmane.org>
2019-04-07 23:31           ` Alex Williamson
2019-04-07 23:31             ` Alex Williamson
     [not found]             ` <20190407173132.24032810-hfcDOgR9qeA@public.gmane.org>
2019-04-08 15:13               ` Bart Van Assche
2019-04-08 15:13                 ` Bart Van Assche
     [not found]                 ` <1554736414.118779.265.camel-HInyCGIudOg@public.gmane.org>
2019-04-08 15:23                   ` Alex Williamson
2019-04-08 15:23                     ` Alex Williamson
     [not found]                     ` <20190408092345.01751472-hfcDOgR9qeA@public.gmane.org>
2019-04-08 15:30                       ` Bart Van Assche
2019-04-08 15:30                         ` Bart Van Assche
2019-04-08 15:55                       ` Christoph Hellwig
2019-04-08 15:55                         ` Christoph Hellwig
2019-04-08 17:10                       ` Robin Murphy
2019-04-08 17:10                         ` Robin Murphy
     [not found]                         ` <c5511683-9739-9c74-418b-cf2aed6b294a-5wv7dgnIgG8@public.gmane.org>
2019-04-08 18:05                           ` Alex Williamson
2019-04-08 18:05                             ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.