linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Failure to recreate virtual functions
@ 2019-07-26 16:30 Vlad Buslov
  2019-07-27  2:15 ` Lu Baolu
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Buslov @ 2019-07-26 16:30 UTC (permalink / raw)
  To: Lu Baolu; +Cc: Joerg Roedel, Maor Gottlieb, Ran Rozenstein, iommu, linux-kernel

Hi Lu Baolu,

Our mlx5 driver fails to recreate VFs when cmdline includes
"intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
Delegate DMA domain to generic iommu". I've bisected the failure to
patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
hotplug"). Here is the dmesg log for following case: enable switchdev
mode, set number of VFs to 0, then set it back to any value
>0.

[  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
[  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
[  223.663864] pci 0000:81:00.2: enabling Extended Tags
[  223.665143] pci 0000:81:00.2: Adding to iommu group 52
[  223.665215] pci 0000:81:00.2: Using iommu direct mapping
[  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
[  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
[  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 56:1f:95:e0:51:d6
[  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
[  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
[  224.041711] pci 0000:81:00.3: enabling Extended Tags
[  224.043660] pci 0000:81:00.3: Adding to iommu group 53
[  224.043738] pci 0000:81:00.3: Using iommu direct mapping
[  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
[  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
[  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address a6:1e:56:0a:d9:f2
[  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
[  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
[  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
[  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
[  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  237.447846] ens1f0_0: renamed from eth0
[  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  237.526880] ens1f0_1: renamed from eth1
[  248.953873] pci 0000:81:00.2: Removing from iommu group 52
[  248.954114] pci 0000:81:00.3: Removing from iommu group 53
[  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
[  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
[  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
[  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
[  258.936258] pci 0000:81:00.2: enabling Extended Tags
[  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
[  258.938053] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
[  258.938196] mlx5_core 0000:81:00.2: firmware version: 16.26.148
[  258.938229] mlx5_core 0000:81:00.2: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
[  258.938315] mlx5_core 0000:81:00.2: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
[  258.938540] mlx5_core: probe of 0000:81:00.2 failed with error -12
[  258.938597] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
[  258.938657] pci 0000:81:00.3: enabling Extended Tags
[  258.939431] pci 0000:81:00.3: Failed to add to iommu group 52: -16
[  258.939928] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
[  258.940039] mlx5_core 0000:81:00.3: firmware version: 16.26.148
[  258.940071] mlx5_core 0000:81:00.3: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
[  258.940158] mlx5_core 0000:81:00.3: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
[  258.940400] mlx5_core: probe of 0000:81:00.3 failed with error -12


On previous patch 0e31a7266508 ("iommu/vt-d: Remove startup parameter
from device_def_domain_type()") in the series same sequence of actions
doesn't trigger any iommu errors:

[  164.252254] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
[  164.288724] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  164.394839] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
[  164.394938] pci 0000:81:00.2: enabling Extended Tags
[  164.396087] pci 0000:81:00.2: Adding to iommu group 52
[  164.396154] pci 0000:81:00.2: Using iommu direct mapping
[  164.396679] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
[  164.396803] mlx5_core 0000:81:00.2: firmware version: 16.26.148
[  164.619320] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  164.625754] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  164.625922] mlx5_core 0000:81:00.2: Assigned random MAC address 5e:1e:9b:ca:c8:e5
[  164.739694] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
[  164.774637] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
[  164.774709] pci 0000:81:00.3: enabling Extended Tags
[  164.775816] pci 0000:81:00.3: Adding to iommu group 53
[  164.775886] pci 0000:81:00.3: Using iommu direct mapping
[  164.776610] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
[  164.776734] mlx5_core 0000:81:00.3: firmware version: 16.26.148
[  164.999360] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  165.007118] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  165.007327] mlx5_core 0000:81:00.3: Assigned random MAC address 82:4a:7a:5f:81:55
[  165.123927] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
[  172.063665] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
[  172.103306] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
[  173.033033] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  173.091605] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  173.129258] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
[  173.129863] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  173.203879] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  173.204002] ens1f0_0: renamed from eth1
[  173.289454] ens1f0_1: renamed from eth0
[  186.720692] pci 0000:81:00.2: Removing from iommu group 52
[  186.720994] pci 0000:81:00.3: Removing from iommu group 53
[  187.771549] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
[  188.141758] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  188.394072] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
[  191.116400] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
[  191.128965] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
[  191.235151] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
[  191.235250] pci 0000:81:00.2: enabling Extended Tags
[  191.236463] pci 0000:81:00.2: Adding to iommu group 52
[  191.236531] pci 0000:81:00.2: Using iommu direct mapping
[  191.237037] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
[  191.237161] mlx5_core 0000:81:00.2: firmware version: 16.26.148
[  191.457369] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  191.463355] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  191.463509] mlx5_core 0000:81:00.2: Assigned random MAC address e6:f2:0c:b4:e3:2e
[  191.572884] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
[  191.608592] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
[  191.608664] pci 0000:81:00.3: enabling Extended Tags
[  191.609434] pci 0000:81:00.3: Adding to iommu group 53
[  191.609466] pci 0000:81:00.3: Using iommu direct mapping
[  191.609760] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
[  191.609862] mlx5_core 0000:81:00.3: firmware version: 16.26.148
[  191.826324] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  191.832558] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  191.832730] mlx5_core 0000:81:00.3: Assigned random MAC address a2:dc:76:30:18:6c
[  191.949625] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-26 16:30 Failure to recreate virtual functions Vlad Buslov
@ 2019-07-27  2:15 ` Lu Baolu
  2019-07-29 10:05   ` Vlad Buslov
  0 siblings, 1 reply; 8+ messages in thread
From: Lu Baolu @ 2019-07-27  2:15 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: baolu.lu, Joerg Roedel, Maor Gottlieb, Ran Rozenstein, iommu,
	linux-kernel

Hi Vilad,

On 7/27/19 12:30 AM, Vlad Buslov wrote:
> Hi Lu Baolu,
> 
> Our mlx5 driver fails to recreate VFs when cmdline includes
> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
> Delegate DMA domain to generic iommu". I've bisected the failure to
> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
> hotplug"). Here is the dmesg log for following case: enable switchdev
> mode, set number of VFs to 0, then set it back to any value
>> 0.
> 
> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 56:1f:95:e0:51:d6
> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address a6:1e:56:0a:d9:f2
> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  237.447846] ens1f0_0: renamed from eth0
> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  237.526880] ens1f0_1: renamed from eth1
> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16

It seems that an EBUSY error returned from iommu_group_add_device(). Can
you please hack some debug messages in iommu_group_add_device() so that
we can know where the EBUSY returns?

Best regards,
Baolu


> [  258.938053] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
> [  258.938196] mlx5_core 0000:81:00.2: firmware version: 16.26.148
> [  258.938229] mlx5_core 0000:81:00.2: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
> [  258.938315] mlx5_core 0000:81:00.2: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
> [  258.938540] mlx5_core: probe of 0000:81:00.2 failed with error -12
> [  258.938597] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
> [  258.938657] pci 0000:81:00.3: enabling Extended Tags
> [  258.939431] pci 0000:81:00.3: Failed to add to iommu group 52: -16
> [  258.939928] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
> [  258.940039] mlx5_core 0000:81:00.3: firmware version: 16.26.148
> [  258.940071] mlx5_core 0000:81:00.3: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
> [  258.940158] mlx5_core 0000:81:00.3: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
> [  258.940400] mlx5_core: probe of 0000:81:00.3 failed with error -12
> 
> 
> On previous patch 0e31a7266508 ("iommu/vt-d: Remove startup parameter
> from device_def_domain_type()") in the series same sequence of actions
> doesn't trigger any iommu errors:
> 
> [  164.252254] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
> [  164.288724] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  164.394839] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
> [  164.394938] pci 0000:81:00.2: enabling Extended Tags
> [  164.396087] pci 0000:81:00.2: Adding to iommu group 52
> [  164.396154] pci 0000:81:00.2: Using iommu direct mapping
> [  164.396679] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
> [  164.396803] mlx5_core 0000:81:00.2: firmware version: 16.26.148
> [  164.619320] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  164.625754] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  164.625922] mlx5_core 0000:81:00.2: Assigned random MAC address 5e:1e:9b:ca:c8:e5
> [  164.739694] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
> [  164.774637] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
> [  164.774709] pci 0000:81:00.3: enabling Extended Tags
> [  164.775816] pci 0000:81:00.3: Adding to iommu group 53
> [  164.775886] pci 0000:81:00.3: Using iommu direct mapping
> [  164.776610] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
> [  164.776734] mlx5_core 0000:81:00.3: firmware version: 16.26.148
> [  164.999360] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  165.007118] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  165.007327] mlx5_core 0000:81:00.3: Assigned random MAC address 82:4a:7a:5f:81:55
> [  165.123927] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
> [  172.063665] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
> [  172.103306] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
> [  173.033033] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  173.091605] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  173.129258] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
> [  173.129863] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  173.203879] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  173.204002] ens1f0_0: renamed from eth1
> [  173.289454] ens1f0_1: renamed from eth0
> [  186.720692] pci 0000:81:00.2: Removing from iommu group 52
> [  186.720994] pci 0000:81:00.3: Removing from iommu group 53
> [  187.771549] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
> [  188.141758] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  188.394072] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
> [  191.116400] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
> [  191.128965] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
> [  191.235151] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
> [  191.235250] pci 0000:81:00.2: enabling Extended Tags
> [  191.236463] pci 0000:81:00.2: Adding to iommu group 52
> [  191.236531] pci 0000:81:00.2: Using iommu direct mapping
> [  191.237037] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
> [  191.237161] mlx5_core 0000:81:00.2: firmware version: 16.26.148
> [  191.457369] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  191.463355] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  191.463509] mlx5_core 0000:81:00.2: Assigned random MAC address e6:f2:0c:b4:e3:2e
> [  191.572884] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
> [  191.608592] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
> [  191.608664] pci 0000:81:00.3: enabling Extended Tags
> [  191.609434] pci 0000:81:00.3: Adding to iommu group 53
> [  191.609466] pci 0000:81:00.3: Using iommu direct mapping
> [  191.609760] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
> [  191.609862] mlx5_core 0000:81:00.3: firmware version: 16.26.148
> [  191.826324] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
> [  191.832558] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
> [  191.832730] mlx5_core 0000:81:00.3: Assigned random MAC address a2:dc:76:30:18:6c
> [  191.949625] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
> 
> Thanks,
> Vlad
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-27  2:15 ` Lu Baolu
@ 2019-07-29 10:05   ` Vlad Buslov
  2019-07-30  4:28     ` Lu Baolu
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Buslov @ 2019-07-29 10:05 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Vlad Buslov, Joerg Roedel, Maor Gottlieb, Ran Rozenstein, iommu,
	linux-kernel


On Sat 27 Jul 2019 at 05:15, Lu Baolu <baolu.lu@linux.intel.com> wrote:
> Hi Vilad,
>
> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>> Hi Lu Baolu,
>>
>> Our mlx5 driver fails to recreate VFs when cmdline includes
>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>> Delegate DMA domain to generic iommu". I've bisected the failure to
>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>> hotplug"). Here is the dmesg log for following case: enable switchdev
>> mode, set number of VFs to 0, then set it back to any value
>>> 0.
>>
>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 56:1f:95:e0:51:d6
>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address a6:1e:56:0a:d9:f2
>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  237.447846] ens1f0_0: renamed from eth0
>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  237.526880] ens1f0_1: renamed from eth1
>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>
> It seems that an EBUSY error returned from iommu_group_add_device(). Can
> you please hack some debug messages in iommu_group_add_device() so that
> we can know where the EBUSY returns?
>
> Best regards,
> Baolu

The error code is returned by __iommu_attach_device().

>
>
>> [  258.938053] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>> [  258.938196] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>> [  258.938229] mlx5_core 0000:81:00.2: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
>> [  258.938315] mlx5_core 0000:81:00.2: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
>> [  258.938540] mlx5_core: probe of 0000:81:00.2 failed with error -12
>> [  258.938597] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>> [  258.938657] pci 0000:81:00.3: enabling Extended Tags
>> [  258.939431] pci 0000:81:00.3: Failed to add to iommu group 52: -16
>> [  258.939928] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>> [  258.940039] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>> [  258.940071] mlx5_core 0000:81:00.3: mlx5_function_setup:923:(pid 265): Failed initializing command interface, aborting
>> [  258.940158] mlx5_core 0000:81:00.3: init_one:1308:(pid 265): mlx5_load_one failed with error code -12
>> [  258.940400] mlx5_core: probe of 0000:81:00.3 failed with error -12
>>
>>
>> On previous patch 0e31a7266508 ("iommu/vt-d: Remove startup parameter
>> from device_def_domain_type()") in the series same sequence of actions
>> doesn't trigger any iommu errors:
>>
>> [  164.252254] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>> [  164.288724] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  164.394839] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>> [  164.394938] pci 0000:81:00.2: enabling Extended Tags
>> [  164.396087] pci 0000:81:00.2: Adding to iommu group 52
>> [  164.396154] pci 0000:81:00.2: Using iommu direct mapping
>> [  164.396679] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>> [  164.396803] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>> [  164.619320] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  164.625754] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  164.625922] mlx5_core 0000:81:00.2: Assigned random MAC address 5e:1e:9b:ca:c8:e5
>> [  164.739694] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>> [  164.774637] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>> [  164.774709] pci 0000:81:00.3: enabling Extended Tags
>> [  164.775816] pci 0000:81:00.3: Adding to iommu group 53
>> [  164.775886] pci 0000:81:00.3: Using iommu direct mapping
>> [  164.776610] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>> [  164.776734] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>> [  164.999360] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  165.007118] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  165.007327] mlx5_core 0000:81:00.3: Assigned random MAC address 82:4a:7a:5f:81:55
>> [  165.123927] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>> [  172.063665] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
>> [  172.103306] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
>> [  173.033033] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  173.091605] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  173.129258] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>> [  173.129863] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  173.203879] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  173.204002] ens1f0_0: renamed from eth1
>> [  173.289454] ens1f0_1: renamed from eth0
>> [  186.720692] pci 0000:81:00.2: Removing from iommu group 52
>> [  186.720994] pci 0000:81:00.3: Removing from iommu group 53
>> [  187.771549] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
>> [  188.141758] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  188.394072] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>> [  191.116400] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>> [  191.128965] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>> [  191.235151] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>> [  191.235250] pci 0000:81:00.2: enabling Extended Tags
>> [  191.236463] pci 0000:81:00.2: Adding to iommu group 52
>> [  191.236531] pci 0000:81:00.2: Using iommu direct mapping
>> [  191.237037] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>> [  191.237161] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>> [  191.457369] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  191.463355] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  191.463509] mlx5_core 0000:81:00.2: Assigned random MAC address e6:f2:0c:b4:e3:2e
>> [  191.572884] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>> [  191.608592] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>> [  191.608664] pci 0000:81:00.3: enabling Extended Tags
>> [  191.609434] pci 0000:81:00.3: Adding to iommu group 53
>> [  191.609466] pci 0000:81:00.3: Using iommu direct mapping
>> [  191.609760] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>> [  191.609862] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>> [  191.826324] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>> [  191.832558] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  191.832730] mlx5_core 0000:81:00.3: Assigned random MAC address a2:dc:76:30:18:6c
>> [  191.949625] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>
>> Thanks,
>> Vlad
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-29 10:05   ` Vlad Buslov
@ 2019-07-30  4:28     ` Lu Baolu
  2019-07-30 11:22       ` Robin Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: Lu Baolu @ 2019-07-30  4:28 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: baolu.lu, Joerg Roedel, Maor Gottlieb, Ran Rozenstein, iommu,
	linux-kernel

Hi,

On 7/29/19 6:05 PM, Vlad Buslov wrote:
> On Sat 27 Jul 2019 at 05:15, Lu Baolu<baolu.lu@linux.intel.com>  wrote:
>> Hi Vilad,
>>
>> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>>> Hi Lu Baolu,
>>>
>>> Our mlx5 driver fails to recreate VFs when cmdline includes
>>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>>> Delegate DMA domain to generic iommu". I've bisected the failure to
>>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>>> hotplug"). Here is the dmesg log for following case: enable switchdev
>>> mode, set number of VFs to 0, then set it back to any value
>>>> 0.
>>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 56:1f:95:e0:51:d6
>>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address a6:1e:56:0a:d9:f2
>>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(1)
>>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (2)
>>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  237.447846] ens1f0_0: renamed from eth0
>>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>>> [  237.526880] ens1f0_1: renamed from eth1
>>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active vports(3) mode(2)
>>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV: nvfs(2) mode (1)
>>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active vports(3)
>>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>> It seems that an EBUSY error returned from iommu_group_add_device(). Can
>> you please hack some debug messages in iommu_group_add_device() so that
>> we can know where the EBUSY returns?
>>
>> Best regards,
>> Baolu
> The error code is returned by __iommu_attach_device().
> 

Thanks!

It looks like the system has already a domain for specific pci bdf
device. Does this VF share the bdf with other devices? Or has been
previously created, and system failed to get chance to remove it?

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-30  4:28     ` Lu Baolu
@ 2019-07-30 11:22       ` Robin Murphy
  2019-07-31  7:29         ` Lu Baolu
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2019-07-30 11:22 UTC (permalink / raw)
  To: Lu Baolu, Vlad Buslov
  Cc: Joerg Roedel, Ran Rozenstein, linux-kernel, iommu, Maor Gottlieb

On 30/07/2019 05:28, Lu Baolu wrote:
> Hi,
> 
> On 7/29/19 6:05 PM, Vlad Buslov wrote:
>> On Sat 27 Jul 2019 at 05:15, Lu Baolu<baolu.lu@linux.intel.com>  wrote:
>>> Hi Vilad,
>>>
>>> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>>>> Hi Lu Baolu,
>>>>
>>>> Our mlx5 driver fails to recreate VFs when cmdline includes
>>>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>>>> Delegate DMA domain to generic iommu". I've bisected the failure to
>>>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>>>> hotplug"). Here is the dmesg log for following case: enable switchdev
>>>> mode, set number of VFs to 0, then set it back to any value
>>>>> 0.
>>>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>> SRIOV: nvfs(2) mode (1)
>>>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>> active vports(3)
>>>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>>>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>>>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>>>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>>>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>>>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are 
>>>> supported, range: 0Mbps to 97656Mbps
>>>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 
>>>> 56:1f:95:e0:51:d6
>>>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>>>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>>>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>>>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>>>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>>>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>>>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>>>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are 
>>>> supported, range: 0Mbps to 97656Mbps
>>>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address 
>>>> a6:1e:56:0a:d9:f2
>>>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: 
>>>> active vports(3) mode(1)
>>>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>> SRIOV: nvfs(2) mode (2)
>>>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  237.447846] ens1f0_0: renamed from eth0
>>>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>> active vports(3)
>>>> [  237.526880] ens1f0_1: renamed from eth1
>>>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>>>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>>>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: 
>>>> active vports(3) mode(2)
>>>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>> StrdSz(2048) RxCqeCmprss(0)
>>>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>> SRIOV: nvfs(2) mode (1)
>>>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>> active vports(3)
>>>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>>>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>>> It seems that an EBUSY error returned from iommu_group_add_device(). Can
>>> you please hack some debug messages in iommu_group_add_device() so that
>>> we can know where the EBUSY returns?
>>>
>>> Best regards,
>>> Baolu
>> The error code is returned by __iommu_attach_device().
>>
> 
> Thanks!
> 
> It looks like the system has already a domain for specific pci bdf
> device. Does this VF share the bdf with other devices? Or has been
> previously created, and system failed to get chance to remove it?

At a glance, it looks like it might be down to 
intel_iommu_remove_device() not calling dmar_remove_one_dev_info() like 
the old notifier did. If the group is getting torn down and recreated, 
but the driver still has a stale pointer to the old default domain 
cached, which dmar_insert_one_dev_info() finds and returns, that would 
seem to explain the observed behaviour.

Robin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-30 11:22       ` Robin Murphy
@ 2019-07-31  7:29         ` Lu Baolu
  2019-07-31 11:19           ` Vlad Buslov
  0 siblings, 1 reply; 8+ messages in thread
From: Lu Baolu @ 2019-07-31  7:29 UTC (permalink / raw)
  To: Robin Murphy, Vlad Buslov
  Cc: baolu.lu, Joerg Roedel, Ran Rozenstein, linux-kernel, iommu,
	Maor Gottlieb

Hi,

On 7/30/19 7:22 PM, Robin Murphy wrote:
> On 30/07/2019 05:28, Lu Baolu wrote:
>> Hi,
>>
>> On 7/29/19 6:05 PM, Vlad Buslov wrote:
>>> On Sat 27 Jul 2019 at 05:15, Lu Baolu<baolu.lu@linux.intel.com>  wrote:
>>>> Hi Vilad,
>>>>
>>>> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>>>>> Hi Lu Baolu,
>>>>>
>>>>> Our mlx5 driver fails to recreate VFs when cmdline includes
>>>>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>>>>> Delegate DMA domain to generic iommu". I've bisected the failure to
>>>>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>>>>> hotplug"). Here is the dmesg log for following case: enable switchdev
>>>>> mode, set number of VFs to 0, then set it back to any value
>>>>>> 0.
>>>>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>>> SRIOV: nvfs(2) mode (1)
>>>>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>>> active vports(3)
>>>>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>>>>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>>>>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>>>>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>>>>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>>>>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are 
>>>>> supported, range: 0Mbps to 97656Mbps
>>>>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address 
>>>>> 56:1f:95:e0:51:d6
>>>>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>>>>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>>>>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>>>>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>>>>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>>>>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>>>>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>>>>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are 
>>>>> supported, range: 0Mbps to 97656Mbps
>>>>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address 
>>>>> a6:1e:56:0a:d9:f2
>>>>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>>>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: 
>>>>> active vports(3) mode(1)
>>>>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>>> SRIOV: nvfs(2) mode (2)
>>>>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  237.447846] ens1f0_0: renamed from eth0
>>>>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>>> active vports(3)
>>>>> [  237.526880] ens1f0_1: renamed from eth1
>>>>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>>>>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>>>>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: 
>>>>> active vports(3) mode(2)
>>>>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8) 
>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable 
>>>>> SRIOV: nvfs(2) mode (1)
>>>>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: 
>>>>> active vports(3)
>>>>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>>>>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>>>> It seems that an EBUSY error returned from iommu_group_add_device(). 
>>>> Can
>>>> you please hack some debug messages in iommu_group_add_device() so that
>>>> we can know where the EBUSY returns?
>>>>
>>>> Best regards,
>>>> Baolu
>>> The error code is returned by __iommu_attach_device().
>>>
>>
>> Thanks!
>>
>> It looks like the system has already a domain for specific pci bdf
>> device. Does this VF share the bdf with other devices? Or has been
>> previously created, and system failed to get chance to remove it?
> 
> At a glance, it looks like it might be down to 
> intel_iommu_remove_device() not calling dmar_remove_one_dev_info() like 
> the old notifier did. If the group is getting torn down and recreated, 
> but the driver still has a stale pointer to the old default domain 
> cached, which dmar_insert_one_dev_info() finds and returns, that would 
> seem to explain the observed behaviour.

Yes agreed.

Vlad,

Can you please try below change?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index baf21001c339..abffc520fe05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5575,6 +5575,8 @@ static void intel_iommu_remove_device(struct 
device *dev)
         if (!iommu)
                 return;

+       dmar_remove_one_dev_info(dev);
+
         iommu_group_remove_device(dev);

         iommu_device_unlink(&iommu->iommu, dev);

Best regards,
Baolu

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-31  7:29         ` Lu Baolu
@ 2019-07-31 11:19           ` Vlad Buslov
  2019-08-01  1:49             ` Lu Baolu
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Buslov @ 2019-07-31 11:19 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Robin Murphy, Vlad Buslov, Joerg Roedel, Ran Rozenstein,
	linux-kernel, iommu, Maor Gottlieb


On Wed 31 Jul 2019 at 10:29, Lu Baolu <baolu.lu@linux.intel.com> wrote:
> Hi,
>
> On 7/30/19 7:22 PM, Robin Murphy wrote:
>> On 30/07/2019 05:28, Lu Baolu wrote:
>>> Hi,
>>>
>>> On 7/29/19 6:05 PM, Vlad Buslov wrote:
>>>> On Sat 27 Jul 2019 at 05:15, Lu Baolu<baolu.lu@linux.intel.com>  wrote:
>>>>> Hi Vilad,
>>>>>
>>>>> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>>>>>> Hi Lu Baolu,
>>>>>>
>>>>>> Our mlx5 driver fails to recreate VFs when cmdline includes
>>>>>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>>>>>> Delegate DMA domain to generic iommu". I've bisected the failure to
>>>>>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>>>>>> hotplug"). Here is the dmesg log for following case: enable switchdev
>>>>>> mode, set number of VFs to 0, then set it back to any value
>>>>>>> 0.
>>>>>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>> nvfs(2) mode (1)
>>>>>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>> vports(3)
>>>>>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>>>>>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>>>>>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>>>>>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>>>>>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>>>>>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are
>>>>>> supported, range: 0Mbps to 97656Mbps
>>>>>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address
>>>>>> 56:1f:95:e0:51:d6
>>>>>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>>>>>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>>>>>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>>>>>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>>>>>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>>>>>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>>>>>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>>>>>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are
>>>>>> supported, range: 0Mbps to 97656Mbps
>>>>>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address
>>>>>> a6:1e:56:0a:d9:f2
>>>>>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>>>>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active
>>>>>> vports(3) mode(1)
>>>>>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>> nvfs(2) mode (2)
>>>>>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  237.447846] ens1f0_0: renamed from eth0
>>>>>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>> vports(3)
>>>>>> [  237.526880] ens1f0_1: renamed from eth1
>>>>>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>>>>>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>>>>>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active
>>>>>> vports(3) mode(2)
>>>>>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>> nvfs(2) mode (1)
>>>>>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>> vports(3)
>>>>>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>>>>>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>>>>> It seems that an EBUSY error returned from iommu_group_add_device(). Can
>>>>> you please hack some debug messages in iommu_group_add_device() so that
>>>>> we can know where the EBUSY returns?
>>>>>
>>>>> Best regards,
>>>>> Baolu
>>>> The error code is returned by __iommu_attach_device().
>>>>
>>>
>>> Thanks!
>>>
>>> It looks like the system has already a domain for specific pci bdf
>>> device. Does this VF share the bdf with other devices? Or has been
>>> previously created, and system failed to get chance to remove it?
>>
>> At a glance, it looks like it might be down to intel_iommu_remove_device() not
>> calling dmar_remove_one_dev_info() like the old notifier did. If the group is
>> getting torn down and recreated, but the driver still has a stale pointer to
>> the old default domain cached, which dmar_insert_one_dev_info() finds and
>> returns, that would seem to explain the observed behaviour.
>
> Yes agreed.
>
> Vlad,
>
> Can you please try below change?
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index baf21001c339..abffc520fe05 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5575,6 +5575,8 @@ static void intel_iommu_remove_device(struct device *dev)
>         if (!iommu)
>                 return;
>
> +       dmar_remove_one_dev_info(dev);
> +
>         iommu_group_remove_device(dev);
>
>         iommu_device_unlink(&iommu->iommu, dev);
>
> Best regards,
> Baolu

Hi Baolu,

This patch fixes the issue for me.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Failure to recreate virtual functions
  2019-07-31 11:19           ` Vlad Buslov
@ 2019-08-01  1:49             ` Lu Baolu
  0 siblings, 0 replies; 8+ messages in thread
From: Lu Baolu @ 2019-08-01  1:49 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: baolu.lu, Robin Murphy, Joerg Roedel, Ran Rozenstein,
	linux-kernel, iommu, Maor Gottlieb

Hi,

On 7/31/19 7:19 PM, Vlad Buslov wrote:
> 
> On Wed 31 Jul 2019 at 10:29, Lu Baolu <baolu.lu@linux.intel.com> wrote:
>> Hi,
>>
>> On 7/30/19 7:22 PM, Robin Murphy wrote:
>>> On 30/07/2019 05:28, Lu Baolu wrote:
>>>> Hi,
>>>>
>>>> On 7/29/19 6:05 PM, Vlad Buslov wrote:
>>>>> On Sat 27 Jul 2019 at 05:15, Lu Baolu<baolu.lu@linux.intel.com>  wrote:
>>>>>> Hi Vilad,
>>>>>>
>>>>>> On 7/27/19 12:30 AM, Vlad Buslov wrote:
>>>>>>> Hi Lu Baolu,
>>>>>>>
>>>>>>> Our mlx5 driver fails to recreate VFs when cmdline includes
>>>>>>> "intel_iommu=on iommu=pt" after recent merge of patch set "iommu/vt-d:
>>>>>>> Delegate DMA domain to generic iommu". I've bisected the failure to
>>>>>>> patch b7297783c2bb ("iommu/vt-d: Remove duplicated code for device
>>>>>>> hotplug"). Here is the dmesg log for following case: enable switchdev
>>>>>>> mode, set number of VFs to 0, then set it back to any value
>>>>>>>> 0.
>>>>>>> [  223.525282] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>>> nvfs(2) mode (1)
>>>>>>> [  223.562027] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>>> vports(3)
>>>>>>> [  223.663766] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>>>> [  223.663864] pci 0000:81:00.2: enabling Extended Tags
>>>>>>> [  223.665143] pci 0000:81:00.2: Adding to iommu group 52
>>>>>>> [  223.665215] pci 0000:81:00.2: Using iommu direct mapping
>>>>>>> [  223.665771] mlx5_core 0000:81:00.2: enabling device (0000 -> 0002)
>>>>>>> [  223.665890] mlx5_core 0000:81:00.2: firmware version: 16.26.148
>>>>>>> [  223.889908] mlx5_core 0000:81:00.2: Rate limit: 127 rates are
>>>>>>> supported, range: 0Mbps to 97656Mbps
>>>>>>> [  223.896438] mlx5_core 0000:81:00.2: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  223.896636] mlx5_core 0000:81:00.2: Assigned random MAC address
>>>>>>> 56:1f:95:e0:51:d6
>>>>>>> [  224.012905] mlx5_core 0000:81:00.2 ens1f0v0: renamed from eth0
>>>>>>> [  224.041651] pci 0000:81:00.3: [15b3:101a] type 00 class 0x020000
>>>>>>> [  224.041711] pci 0000:81:00.3: enabling Extended Tags
>>>>>>> [  224.043660] pci 0000:81:00.3: Adding to iommu group 53
>>>>>>> [  224.043738] pci 0000:81:00.3: Using iommu direct mapping
>>>>>>> [  224.044196] mlx5_core 0000:81:00.3: enabling device (0000 -> 0002)
>>>>>>> [  224.044298] mlx5_core 0000:81:00.3: firmware version: 16.26.148
>>>>>>> [  224.268099] mlx5_core 0000:81:00.3: Rate limit: 127 rates are
>>>>>>> supported, range: 0Mbps to 97656Mbps
>>>>>>> [  224.274983] mlx5_core 0000:81:00.3: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  224.275195] mlx5_core 0000:81:00.3: Assigned random MAC address
>>>>>>> a6:1e:56:0a:d9:f2
>>>>>>> [  224.388359] mlx5_core 0000:81:00.3 ens1f0v1: renamed from eth0
>>>>>>> [  236.325027] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active
>>>>>>> vports(3) mode(1)
>>>>>>> [  236.362766] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>>> nvfs(2) mode (2)
>>>>>>> [  237.290066] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  237.350215] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  237.373052] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>>>> [  237.390768] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  237.447846] ens1f0_0: renamed from eth0
>>>>>>> [  237.460399] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>>> vports(3)
>>>>>>> [  237.526880] ens1f0_1: renamed from eth1
>>>>>>> [  248.953873] pci 0000:81:00.2: Removing from iommu group 52
>>>>>>> [  248.954114] pci 0000:81:00.3: Removing from iommu group 53
>>>>>>> [  249.960570] mlx5_core 0000:81:00.0: E-Switch: disable SRIOV: active
>>>>>>> vports(3) mode(2)
>>>>>>> [  250.319135] mlx5_core 0000:81:00.0: MLX5E: StrdRq(1) RqSz(8)
>>>>>>> StrdSz(2048) RxCqeCmprss(0)
>>>>>>> [  250.559431] mlx5_core 0000:81:00.0 ens1f0: renamed from eth0
>>>>>>> [  258.819162] mlx5_core 0000:81:00.0: E-Switch: E-Switch enable SRIOV:
>>>>>>> nvfs(2) mode (1)
>>>>>>> [  258.831625] mlx5_core 0000:81:00.0: E-Switch: SRIOV enabled: active
>>>>>>> vports(3)
>>>>>>> [  258.936160] pci 0000:81:00.2: [15b3:101a] type 00 class 0x020000
>>>>>>> [  258.936258] pci 0000:81:00.2: enabling Extended Tags
>>>>>>> [  258.937438] pci 0000:81:00.2: Failed to add to iommu group 52: -16
>>>>>> It seems that an EBUSY error returned from iommu_group_add_device(). Can
>>>>>> you please hack some debug messages in iommu_group_add_device() so that
>>>>>> we can know where the EBUSY returns?
>>>>>>
>>>>>> Best regards,
>>>>>> Baolu
>>>>> The error code is returned by __iommu_attach_device().
>>>>>
>>>>
>>>> Thanks!
>>>>
>>>> It looks like the system has already a domain for specific pci bdf
>>>> device. Does this VF share the bdf with other devices? Or has been
>>>> previously created, and system failed to get chance to remove it?
>>>
>>> At a glance, it looks like it might be down to intel_iommu_remove_device() not
>>> calling dmar_remove_one_dev_info() like the old notifier did. If the group is
>>> getting torn down and recreated, but the driver still has a stale pointer to
>>> the old default domain cached, which dmar_insert_one_dev_info() finds and
>>> returns, that would seem to explain the observed behaviour.
>>
>> Yes agreed.
>>
>> Vlad,
>>
>> Can you please try below change?
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index baf21001c339..abffc520fe05 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5575,6 +5575,8 @@ static void intel_iommu_remove_device(struct device *dev)
>>          if (!iommu)
>>                  return;
>>
>> +       dmar_remove_one_dev_info(dev);
>> +
>>          iommu_group_remove_device(dev);
>>
>>          iommu_device_unlink(&iommu->iommu, dev);
>>
>> Best regards,
>> Baolu
> 
> Hi Baolu,
> 
> This patch fixes the issue for me.
> 

Great! Thanks for testing. I will submit a fix soon.

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-01  1:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-26 16:30 Failure to recreate virtual functions Vlad Buslov
2019-07-27  2:15 ` Lu Baolu
2019-07-29 10:05   ` Vlad Buslov
2019-07-30  4:28     ` Lu Baolu
2019-07-30 11:22       ` Robin Murphy
2019-07-31  7:29         ` Lu Baolu
2019-07-31 11:19           ` Vlad Buslov
2019-08-01  1:49             ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).