All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops in during sriov_enable with ixgbe driver
@ 2021-09-28 11:56 Niklas Schnelle
  2021-09-30 17:31   ` [Intel-wired-lan] " Jesse Brandeburg
  0 siblings, 1 reply; 13+ messages in thread
From: Niklas Schnelle @ 2021-09-28 11:56 UTC (permalink / raw)
  To: Jesse Brandeburg, Tony Nguyen; +Cc: linux-acpi, netdev

Hi Jesse, Hi Tony,

Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
my private workstation with an Intel 82599 NIC with the ixgbe driver. I
haven't had time to bisect or look closer but since it still happens on
v5.15-rc3 I wanted to at least check if you're aware of the problem as
I couldn't find anything on the web.

I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
and suspect that the earlier ACPI messages could have something to do
with that, absolutely not an ACPI expert though. If there is a need I
could do a bisect.

Thanks,
Niklas

dmesg output:

[    6.112738] ixgbe 0000:03:00.0: registered PHC device on enp3s0
[    6.286041] ixgbe 0000:03:00.0 enp3s0: detected SFP+: 9
[    6.954994] ACPI: \: failed to evaluate _DSM (0x1001)
[    6.955000] ACPI: \: failed to evaluate _DSM (0x1001)
[    6.955002] ACPI: \: failed to evaluate _DSM (0x1001)
[    6.955003] ACPI: \: failed to evaluate _DSM (0x1001)
[    7.155246] ACPI: \: failed to evaluate _DSM (0x1001)
[    7.155251] ACPI: \: failed to evaluate _DSM (0x1001)
[    7.155253] ACPI: \: failed to evaluate _DSM (0x1001)
[    7.155254] ACPI: \: failed to evaluate _DSM (0x1001)
...
[  136.883365] ixgbe 0000:03:00.0 enp3s0: SR-IOV enabled with 2 VFs
[  136.883489] ixgbe 0000:03:00.0: removed PHC on enp3s0
[  136.983130] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 4, Tx Queue count = 4 XDP Queue count = 0
[  137.003753] ixgbe 0000:03:00.0: registered PHC device on enp3s0
[  137.179126] ixgbe 0000:03:00.0 enp3s0: detected SFP+: 9
[  137.217508] BUG: kernel NULL pointer dereference, address: 0000000000000298
[  137.217515] #PF: supervisor read access in kernel mode
[  137.217518] #PF: error_code(0x0000) - not-present page
[  137.217520] PGD 0 P4D 0 
[  137.217523] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  137.217527] CPU: 19 PID: 1058 Comm: zsh Not tainted 5.15.0-rc3-niklas #25 ad1c778d4b5b0053fcbb87077df466d6ee92e65b
[  137.217532] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Creator, BIOS P3.40 01/28/2021
[  137.217534] RIP: 0010:acpi_pci_find_companion+0x9a/0x100
[  137.217540] Code: 83 e0 07 41 c1 e5 0d 41 81 e5 00 00 1f 00 41 09 c5 0f b6 83 79 ff ff ff 45 89 ee 83 e8 01 3c 01 48 8b 43 40 41 0f 96 c7 31 ed <4c> 8b a0 98 02 00 00 4c 89 e7 e8 97 57 04 00 49 8d 7c 24 f0 44 89
[  137.217543] RSP: 0018:ffffbb978392bb88 EFLAGS: 00010246
[  137.217545] RAX: 0000000000000000 RBX: ffff9cb20c0b80d0 RCX: 00000000000000a4
[  137.217548] RDX: ffff9cb1cfdddf40 RSI: 0000000000000100 RDI: ffffffff897474e0
[  137.217549] RBP: 0000000000000000 R08: 0000000000000004 R09: ffffbb978392bb94
[  137.217551] R10: ffff9cb1d3588900 R11: 0000000000000004 R12: ffff9cb20c0b8000
[  137.217552] R13: 0000000000100000 R14: 0000000000100000 R15: 0000000000000000
[  137.217555] FS:  00007f0f792de140(0000) GS:ffff9cc0eeec0000(0000) knlGS:0000000000000000
[  137.217557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.217559] CR2: 0000000000000298 CR3: 0000000108492000 CR4: 0000000000350ee0
[  137.217561] Call Trace:
[  137.217564]  pci_set_acpi_fwnode+0x34/0x60
[  137.217567]  pci_setup_device+0xe1/0x270
[  137.217572]  pci_iov_add_virtfn+0x27e/0x330
[  137.217577]  sriov_enable+0x219/0x3e0
[  137.217580]  ixgbe_pci_sriov_configure+0xf3/0x170 [ixgbe 7655574dbcea556149b0aede65e6825fd4dfd120]
[  137.217599]  sriov_numvfs_store+0xbe/0x130
[  137.217603]  kernfs_fop_write_iter+0x11c/0x1b0
[  137.217607]  new_sync_write+0x15c/0x1f0
[  137.217612]  vfs_write+0x1eb/0x280
[  137.217615]  ksys_write+0x67/0xe0
[  137.217618]  do_syscall_64+0x5c/0x80
[  137.217622]  ? syscall_exit_to_user_mode+0x23/0x40
[  137.217626]  ? do_syscall_64+0x69/0x80
[  137.217629]  ? syscall_exit_to_user_mode+0x23/0x40
[  137.217632]  ? do_syscall_64+0x69/0x80
[  137.217635]  ? syscall_exit_to_user_mode+0x23/0x40
[  137.217638]  ? do_syscall_64+0x69/0x80
[  137.217640]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  137.217645] RIP: 0033:0x7f0f793ce907
[  137.217648] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  137.217650] RSP: 002b:00007ffdb6a8e7e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  137.217652] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f0f793ce907
[  137.217654] RDX: 0000000000000002 RSI: 0000564d342b2240 RDI: 0000000000000001
[  137.217656] RBP: 0000564d342b2240 R08: 000000000000000a R09: 0000000000000000
[  137.217657] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  137.217659] R13: 00007f0f794a0520 R14: 0000000000000002 R15: 00007f0f794a0700
[  137.217662] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nct6683 lm92 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua iwlmvm snd_hda_codec_realtek intel_rapl_msr snd_hda_codec_generic intel_rapl_common amdgpu ledtrig_audio snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg gpu_sched snd_intel_sdw_acpi kvm_amd drm_ttm_helper mac80211 ttm snd_hda_codec snd_usb_audio btusb snd_usbmidi_lib kvm drm_kms_helper btrtl snd_rawmidi btbcm snd_seq_device snd_hda_core btintel mc snd_hwdep wmi_bmof mxm_wmi intel_wmi_thunderbolt libarc4 snd_pcm irqbypass cec crct10dif_pclmul crc32_pclmul iwlwifi ghash_clmulni_intel bluetooth agpgart snd_timer aesni_intel syscopyarea sp5100_tco sysfillrect crypto_simd atlantic ecdh_generic snd joydev
[  137.217722]  cryptd ecc sysimgblt rapl dm_mod pcspkr k10temp ccp i2c_piix4 fb_sys_fops mousedev soundcore crc16 cfg80211 ixgbe macsec igb mdio_devres rfkill i2c_algo_bit libphy thunderbolt mdio wireguard dca curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 tpm_crb libblake2s blake2s_x86_64 libcurve25519_generic tpm_tis wmi tpm_tis_core libchacha libblake2s_generic tpm ip6_udp_tunnel rng_core udp_tunnel pinctrl_amd mac_hid acpi_cpufreq usbip_host usbip_core sg drm crypto_user fuse bpf_preload ip_tables x_tables hid_logitech_hidpp hid_logitech_dj usbhid btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq crc32c_intel sr_mod cdrom xhci_pci xhci_pci_renesas vfat fat
[  137.217772] CR2: 0000000000000298
[  137.217774] ---[ end trace b5fd99c7b5c7e77b ]---
[  137.217776] RIP: 0010:acpi_pci_find_companion+0x9a/0x100
[  137.217779] Code: 83 e0 07 41 c1 e5 0d 41 81 e5 00 00 1f 00 41 09 c5 0f b6 83 79 ff ff ff 45 89 ee 83 e8 01 3c 01 48 8b 43 40 41 0f 96 c7 31 ed <4c> 8b a0 98 02 00 00 4c 89 e7 e8 97 57 04 00 49 8d 7c 24 f0 44 89
[  137.217781] RSP: 0018:ffffbb978392bb88 EFLAGS: 00010246
[  137.217783] RAX: 0000000000000000 RBX: ffff9cb20c0b80d0 RCX: 00000000000000a4
[  137.217784] RDX: ffff9cb1cfdddf40 RSI: 0000000000000100 RDI: ffffffff897474e0
[  137.217786] RBP: 0000000000000000 R08: 0000000000000004 R09: ffffbb978392bb94
[  137.217788] R10: ffff9cb1d3588900 R11: 0000000000000004 R12: ffff9cb20c0b8000
[  137.217789] R13: 0000000000100000 R14: 0000000000100000 R15: 0000000000000000
[  137.217791] FS:  00007f0f792de140(0000) GS:ffff9cc0eeec0000(0000) knlGS:0000000000000000
[  137.217793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.217795] CR2: 0000000000000298 CR3: 0000000108492000 CR4: 0000000000350ee0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-09-28 11:56 Oops in during sriov_enable with ixgbe driver Niklas Schnelle
@ 2021-09-30 17:31   ` Jesse Brandeburg
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Brandeburg @ 2021-09-30 17:31 UTC (permalink / raw)
  To: Niklas Schnelle, Tony Nguyen, Rafael J. Wysocki
  Cc: linux-acpi, netdev, intel-wired-lan

On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> Hi Jesse, Hi Tony,
> 
> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> haven't had time to bisect or look closer but since it still happens on
> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> I couldn't find anything on the web.

We haven't heard anything of this problem.


> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> and suspect that the earlier ACPI messages could have something to do
> with that, absolutely not an ACPI expert though. If there is a need I
> could do a bisect.

Hi Niklas, thanks for the report, I added the Intel Driver's list for
more exposure.

I asked the developers working on that driver to take a look and they
tried to reproduce, and were unable to do so. This might be related to
your platform, which strongly suggests that the ACPI stuff may be related.

We have tried to reproduce but everything works fine no call trace in
scenario with creating VF.

This is good in that it doesn't seem to be a general failure, you may
want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
and I hope that @Rafael might have some insight.

This issue may be related to changes in acpi_pci_find_companion,
but as I say, we are not able to reproduce this.

commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 24 16:43:55 2021 +0200
    PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus

At this point maybe a bisect would be helpful, since this seems to be a
corner case that we used to handle but no longer do.

Thanks!
Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-09-30 17:31   ` Jesse Brandeburg
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Brandeburg @ 2021-09-30 17:31 UTC (permalink / raw)
  To: intel-wired-lan

On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> Hi Jesse, Hi Tony,
> 
> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> haven't had time to bisect or look closer but since it still happens on
> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> I couldn't find anything on the web.

We haven't heard anything of this problem.


> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> and suspect that the earlier ACPI messages could have something to do
> with that, absolutely not an ACPI expert though. If there is a need I
> could do a bisect.

Hi Niklas, thanks for the report, I added the Intel Driver's list for
more exposure.

I asked the developers working on that driver to take a look and they
tried to reproduce, and were unable to do so. This might be related to
your platform, which strongly suggests that the ACPI stuff may be related.

We have tried to reproduce but everything works fine no call trace in
scenario with creating VF.

This is good in that it doesn't seem to be a general failure, you may
want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
and I hope that @Rafael might have some insight.

This issue may be related to changes in acpi_pci_find_companion,
but as I say, we are not able to reproduce this.

commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 24 16:43:55 2021 +0200
    PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus

At this point maybe a bisect would be helpful, since this seems to be a
corner case that we used to handle but no longer do.

Thanks!
Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-09-30 17:31   ` [Intel-wired-lan] " Jesse Brandeburg
@ 2021-09-30 17:38     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 17:38 UTC (permalink / raw)
  To: Jesse Brandeburg, Niklas Schnelle, Tony Nguyen
  Cc: linux-acpi, netdev, intel-wired-lan

On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
>> Hi Jesse, Hi Tony,
>>
>> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
>> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
>> haven't had time to bisect or look closer but since it still happens on
>> v5.15-rc3 I wanted to at least check if you're aware of the problem as
>> I couldn't find anything on the web.
> We haven't heard anything of this problem.
>
>
>> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
>> and suspect that the earlier ACPI messages could have something to do
>> with that, absolutely not an ACPI expert though. If there is a need I
>> could do a bisect.
> Hi Niklas, thanks for the report, I added the Intel Driver's list for
> more exposure.
>
> I asked the developers working on that driver to take a look and they
> tried to reproduce, and were unable to do so. This might be related to
> your platform, which strongly suggests that the ACPI stuff may be related.
>
> We have tried to reproduce but everything works fine no call trace in
> scenario with creating VF.
>
> This is good in that it doesn't seem to be a general failure, you may
> want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> and I hope that @Rafael might have some insight.
>
> This issue may be related to changes in acpi_pci_find_companion,
> but as I say, we are not able to reproduce this.
>
> commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Tue Aug 24 16:43:55 2021 +0200
>      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus

This change doesn't affect any devices beyond the ones on the VMD bus.


> At this point maybe a bisect would be helpful, since this seems to be a
> corner case that we used to handle but no longer do.
>
> Thanks!
> Jesse



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-09-30 17:38     ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 17:38 UTC (permalink / raw)
  To: intel-wired-lan

On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
>> Hi Jesse, Hi Tony,
>>
>> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
>> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
>> haven't had time to bisect or look closer but since it still happens on
>> v5.15-rc3 I wanted to at least check if you're aware of the problem as
>> I couldn't find anything on the web.
> We haven't heard anything of this problem.
>
>
>> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
>> and suspect that the earlier ACPI messages could have something to do
>> with that, absolutely not an ACPI expert though. If there is a need I
>> could do a bisect.
> Hi Niklas, thanks for the report, I added the Intel Driver's list for
> more exposure.
>
> I asked the developers working on that driver to take a look and they
> tried to reproduce, and were unable to do so. This might be related to
> your platform, which strongly suggests that the ACPI stuff may be related.
>
> We have tried to reproduce but everything works fine no call trace in
> scenario with creating VF.
>
> This is good in that it doesn't seem to be a general failure, you may
> want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> and I hope that @Rafael might have some insight.
>
> This issue may be related to changes in acpi_pci_find_companion,
> but as I say, we are not able to reproduce this.
>
> commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Tue Aug 24 16:43:55 2021 +0200
>      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus

This change doesn't affect any devices beyond the ones on the VMD bus.


> At this point maybe a bisect would be helpful, since this seems to be a
> corner case that we used to handle but no longer do.
>
> Thanks!
> Jesse



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-09-30 17:38     ` [Intel-wired-lan] " Rafael J. Wysocki
@ 2021-09-30 18:20       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 18:20 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Niklas Schnelle, Tony Nguyen, ACPI Devel Maling List, netdev,
	intel-wired-lan, Rafael J. Wysocki

On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
<rafael.j.wysocki@intel.com> wrote:
>
> On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> >> Hi Jesse, Hi Tony,
> >>
> >> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> >> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> >> haven't had time to bisect or look closer but since it still happens on
> >> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> >> I couldn't find anything on the web.
> > We haven't heard anything of this problem.
> >
> >
> >> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> >> and suspect that the earlier ACPI messages could have something to do
> >> with that, absolutely not an ACPI expert though. If there is a need I
> >> could do a bisect.
> > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > more exposure.
> >
> > I asked the developers working on that driver to take a look and they
> > tried to reproduce, and were unable to do so. This might be related to
> > your platform, which strongly suggests that the ACPI stuff may be related.
> >
> > We have tried to reproduce but everything works fine no call trace in
> > scenario with creating VF.
> >
> > This is good in that it doesn't seem to be a general failure, you may
> > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > and I hope that @Rafael might have some insight.
> >
> > This issue may be related to changes in acpi_pci_find_companion,
> > but as I say, we are not able to reproduce this.
> >
> > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Date:   Tue Aug 24 16:43:55 2021 +0200
> >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
>
> This change doesn't affect any devices beyond the ones on the VMD bus.

The only failing case I can see is when the device is on the VMD bus
and its bus pointer is NULL, so the dereference in
vmd_acpi_find_companion() crashes.

Can anything like that happen?

> > At this point maybe a bisect would be helpful, since this seems to be a
> > corner case that we used to handle but no longer do.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-09-30 18:20       ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 18:20 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
<rafael.j.wysocki@intel.com> wrote:
>
> On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> >> Hi Jesse, Hi Tony,
> >>
> >> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> >> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> >> haven't had time to bisect or look closer but since it still happens on
> >> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> >> I couldn't find anything on the web.
> > We haven't heard anything of this problem.
> >
> >
> >> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> >> and suspect that the earlier ACPI messages could have something to do
> >> with that, absolutely not an ACPI expert though. If there is a need I
> >> could do a bisect.
> > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > more exposure.
> >
> > I asked the developers working on that driver to take a look and they
> > tried to reproduce, and were unable to do so. This might be related to
> > your platform, which strongly suggests that the ACPI stuff may be related.
> >
> > We have tried to reproduce but everything works fine no call trace in
> > scenario with creating VF.
> >
> > This is good in that it doesn't seem to be a general failure, you may
> > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > and I hope that @Rafael might have some insight.
> >
> > This issue may be related to changes in acpi_pci_find_companion,
> > but as I say, we are not able to reproduce this.
> >
> > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Date:   Tue Aug 24 16:43:55 2021 +0200
> >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
>
> This change doesn't affect any devices beyond the ones on the VMD bus.

The only failing case I can see is when the device is on the VMD bus
and its bus pointer is NULL, so the dereference in
vmd_acpi_find_companion() crashes.

Can anything like that happen?

> > At this point maybe a bisect would be helpful, since this seems to be a
> > corner case that we used to handle but no longer do.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-09-30 18:20       ` [Intel-wired-lan] " Rafael J. Wysocki
@ 2021-09-30 18:37         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 18:37 UTC (permalink / raw)
  To: Jesse Brandeburg, Niklas Schnelle
  Cc: Tony Nguyen, ACPI Devel Maling List, netdev, intel-wired-lan,
	Rafael J. Wysocki

On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> <rafael.j.wysocki@intel.com> wrote:
> >
> > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > >> Hi Jesse, Hi Tony,
> > >>
> > >> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > >> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > >> haven't had time to bisect or look closer but since it still happens on
> > >> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > >> I couldn't find anything on the web.
> > > We haven't heard anything of this problem.
> > >
> > >
> > >> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > >> and suspect that the earlier ACPI messages could have something to do
> > >> with that, absolutely not an ACPI expert though. If there is a need I
> > >> could do a bisect.
> > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > more exposure.
> > >
> > > I asked the developers working on that driver to take a look and they
> > > tried to reproduce, and were unable to do so. This might be related to
> > > your platform, which strongly suggests that the ACPI stuff may be related.
> > >
> > > We have tried to reproduce but everything works fine no call trace in
> > > scenario with creating VF.
> > >
> > > This is good in that it doesn't seem to be a general failure, you may
> > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > and I hope that @Rafael might have some insight.
> > >
> > > This issue may be related to changes in acpi_pci_find_companion,
> > > but as I say, we are not able to reproduce this.
> > >
> > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> >
> > This change doesn't affect any devices beyond the ones on the VMD bus.
>
> The only failing case I can see is when the device is on the VMD bus
> and its bus pointer is NULL, so the dereference in
> vmd_acpi_find_companion() crashes.
>
> Can anything like that happen?

Not really, because pci_iov_add_virtfn() sets virtfn->bus.

However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
dereferenced by ACPI_COMPANIO(dev->parent) in
acpi_pci_find_companion(), the crash occurs.

We need a !dev->parent check in acpi_pci_find_companion() I suppose:

Does the following change help?

Index: linux-pm/drivers/pci/pci-acpi.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-acpi.c
+++ linux-pm/drivers/pci/pci-acpi.c
@@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
     bool check_children;
     u64 addr;

+    if (!dev->parent)
+        return NULL;
+
     down_read(&pci_acpi_companion_lookup_sem);

     adev = pci_acpi_find_companion_hook ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-09-30 18:37         ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 18:37 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> <rafael.j.wysocki@intel.com> wrote:
> >
> > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > >> Hi Jesse, Hi Tony,
> > >>
> > >> Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > >> my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > >> haven't had time to bisect or look closer but since it still happens on
> > >> v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > >> I couldn't find anything on the web.
> > > We haven't heard anything of this problem.
> > >
> > >
> > >> I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > >> and suspect that the earlier ACPI messages could have something to do
> > >> with that, absolutely not an ACPI expert though. If there is a need I
> > >> could do a bisect.
> > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > more exposure.
> > >
> > > I asked the developers working on that driver to take a look and they
> > > tried to reproduce, and were unable to do so. This might be related to
> > > your platform, which strongly suggests that the ACPI stuff may be related.
> > >
> > > We have tried to reproduce but everything works fine no call trace in
> > > scenario with creating VF.
> > >
> > > This is good in that it doesn't seem to be a general failure, you may
> > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > and I hope that @Rafael might have some insight.
> > >
> > > This issue may be related to changes in acpi_pci_find_companion,
> > > but as I say, we are not able to reproduce this.
> > >
> > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> >
> > This change doesn't affect any devices beyond the ones on the VMD bus.
>
> The only failing case I can see is when the device is on the VMD bus
> and its bus pointer is NULL, so the dereference in
> vmd_acpi_find_companion() crashes.
>
> Can anything like that happen?

Not really, because pci_iov_add_virtfn() sets virtfn->bus.

However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
dereferenced by ACPI_COMPANIO(dev->parent) in
acpi_pci_find_companion(), the crash occurs.

We need a !dev->parent check in acpi_pci_find_companion() I suppose:

Does the following change help?

Index: linux-pm/drivers/pci/pci-acpi.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-acpi.c
+++ linux-pm/drivers/pci/pci-acpi.c
@@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
     bool check_children;
     u64 addr;

+    if (!dev->parent)
+        return NULL;
+
     down_read(&pci_acpi_companion_lookup_sem);

     adev = pci_acpi_find_companion_hook ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-09-30 18:37         ` [Intel-wired-lan] " Rafael J. Wysocki
@ 2021-10-01  8:23           ` Niklas Schnelle
  -1 siblings, 0 replies; 13+ messages in thread
From: Niklas Schnelle @ 2021-10-01  8:23 UTC (permalink / raw)
  To: Rafael J. Wysocki, Jesse Brandeburg
  Cc: Tony Nguyen, ACPI Devel Maling List, netdev, intel-wired-lan,
	Rafael J. Wysocki

On Thu, 2021-09-30 at 20:37 +0200, Rafael J. Wysocki wrote:
> On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> > <rafael.j.wysocki@intel.com> wrote:
> > > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > > > > Hi Jesse, Hi Tony,
> > > > > 
> > > > > Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > > > > my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > > > > haven't had time to bisect or look closer but since it still happens on
> > > > > v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > > > > I couldn't find anything on the web.
> > > > We haven't heard anything of this problem.
> > > > 
> > > > 
> > > > > I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > > > > and suspect that the earlier ACPI messages could have something to do
> > > > > with that, absolutely not an ACPI expert though. If there is a need I
> > > > > could do a bisect.
> > > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > > more exposure.
> > > > 
> > > > I asked the developers working on that driver to take a look and they
> > > > tried to reproduce, and were unable to do so. This might be related to
> > > > your platform, which strongly suggests that the ACPI stuff may be related.
> > > > 
> > > > We have tried to reproduce but everything works fine no call trace in
> > > > scenario with creating VF.
> > > > 
> > > > This is good in that it doesn't seem to be a general failure, you may
> > > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > > and I hope that @Rafael might have some insight.
> > > > 
> > > > This issue may be related to changes in acpi_pci_find_companion,
> > > > but as I say, we are not able to reproduce this.
> > > > 
> > > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> > > 
> > > This change doesn't affect any devices beyond the ones on the VMD bus.
> > 
> > The only failing case I can see is when the device is on the VMD bus
> > and its bus pointer is NULL, so the dereference in
> > vmd_acpi_find_companion() crashes.
> > 
> > Can anything like that happen?
> 
> Not really, because pci_iov_add_virtfn() sets virtfn->bus.
> 
> However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
> dereferenced by ACPI_COMPANIO(dev->parent) in
> acpi_pci_find_companion(), the crash occurs.
> 
> We need a !dev->parent check in acpi_pci_find_companion() I suppose:
> 
> Does the following change help?
> 
> Index: linux-pm/drivers/pci/pci-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-acpi.c
> +++ linux-pm/drivers/pci/pci-acpi.c
> @@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
>      bool check_children;
>      u64 addr;
> 
> +    if (!dev->parent)
> +        return NULL;
> +
>      down_read(&pci_acpi_companion_lookup_sem);
> 
>      adev = pci_acpi_find_companion_hook ?


Yes the above change fixes the problem for me. SR-IOV enables
successfully and the VFs are fully usable. Thanks!

Just out of curiosity and because I use this system to test common code
PCI changed. Do you have an idea what makes my system special here? 

The call to pci_set_acpi_fwnode() in pci_setup_device() is
unconditional and should do the same on any ACPI enabled system.
Also nothing in your explanation sounds specific to my system.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-10-01  8:23           ` Niklas Schnelle
  0 siblings, 0 replies; 13+ messages in thread
From: Niklas Schnelle @ 2021-10-01  8:23 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 2021-09-30 at 20:37 +0200, Rafael J. Wysocki wrote:
> On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> > <rafael.j.wysocki@intel.com> wrote:
> > > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > > > > Hi Jesse, Hi Tony,
> > > > > 
> > > > > Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > > > > my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > > > > haven't had time to bisect or look closer but since it still happens on
> > > > > v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > > > > I couldn't find anything on the web.
> > > > We haven't heard anything of this problem.
> > > > 
> > > > 
> > > > > I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > > > > and suspect that the earlier ACPI messages could have something to do
> > > > > with that, absolutely not an ACPI expert though. If there is a need I
> > > > > could do a bisect.
> > > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > > more exposure.
> > > > 
> > > > I asked the developers working on that driver to take a look and they
> > > > tried to reproduce, and were unable to do so. This might be related to
> > > > your platform, which strongly suggests that the ACPI stuff may be related.
> > > > 
> > > > We have tried to reproduce but everything works fine no call trace in
> > > > scenario with creating VF.
> > > > 
> > > > This is good in that it doesn't seem to be a general failure, you may
> > > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > > and I hope that @Rafael might have some insight.
> > > > 
> > > > This issue may be related to changes in acpi_pci_find_companion,
> > > > but as I say, we are not able to reproduce this.
> > > > 
> > > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> > > 
> > > This change doesn't affect any devices beyond the ones on the VMD bus.
> > 
> > The only failing case I can see is when the device is on the VMD bus
> > and its bus pointer is NULL, so the dereference in
> > vmd_acpi_find_companion() crashes.
> > 
> > Can anything like that happen?
> 
> Not really, because pci_iov_add_virtfn() sets virtfn->bus.
> 
> However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
> dereferenced by ACPI_COMPANIO(dev->parent) in
> acpi_pci_find_companion(), the crash occurs.
> 
> We need a !dev->parent check in acpi_pci_find_companion() I suppose:
> 
> Does the following change help?
> 
> Index: linux-pm/drivers/pci/pci-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-acpi.c
> +++ linux-pm/drivers/pci/pci-acpi.c
> @@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
>      bool check_children;
>      u64 addr;
> 
> +    if (!dev->parent)
> +        return NULL;
> +
>      down_read(&pci_acpi_companion_lookup_sem);
> 
>      adev = pci_acpi_find_companion_hook ?


Yes the above change fixes the problem for me. SR-IOV enables
successfully and the VFs are fully usable. Thanks!

Just out of curiosity and because I use this system to test common code
PCI changed. Do you have an idea what makes my system special here? 

The call to pci_set_acpi_fwnode() in pci_setup_device() is
unconditional and should do the same on any ACPI enabled system.
Also nothing in your explanation sounds specific to my system.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Oops in during sriov_enable with ixgbe driver
  2021-10-01  8:23           ` [Intel-wired-lan] " Niklas Schnelle
@ 2021-10-01 13:21             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-10-01 13:21 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Rafael J. Wysocki, Jesse Brandeburg, Tony Nguyen,
	ACPI Devel Maling List, netdev, intel-wired-lan,
	Rafael J. Wysocki

On Fri, Oct 1, 2021 at 10:23 AM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
> On Thu, 2021-09-30 at 20:37 +0200, Rafael J. Wysocki wrote:
> > On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> > > <rafael.j.wysocki@intel.com> wrote:
> > > > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > > > > > Hi Jesse, Hi Tony,
> > > > > >
> > > > > > Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > > > > > my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > > > > > haven't had time to bisect or look closer but since it still happens on
> > > > > > v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > > > > > I couldn't find anything on the web.
> > > > > We haven't heard anything of this problem.
> > > > >
> > > > >
> > > > > > I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > > > > > and suspect that the earlier ACPI messages could have something to do
> > > > > > with that, absolutely not an ACPI expert though. If there is a need I
> > > > > > could do a bisect.
> > > > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > > > more exposure.
> > > > >
> > > > > I asked the developers working on that driver to take a look and they
> > > > > tried to reproduce, and were unable to do so. This might be related to
> > > > > your platform, which strongly suggests that the ACPI stuff may be related.
> > > > >
> > > > > We have tried to reproduce but everything works fine no call trace in
> > > > > scenario with creating VF.
> > > > >
> > > > > This is good in that it doesn't seem to be a general failure, you may
> > > > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > > > and I hope that @Rafael might have some insight.
> > > > >
> > > > > This issue may be related to changes in acpi_pci_find_companion,
> > > > > but as I say, we are not able to reproduce this.
> > > > >
> > > > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > > > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> > > >
> > > > This change doesn't affect any devices beyond the ones on the VMD bus.
> > >
> > > The only failing case I can see is when the device is on the VMD bus
> > > and its bus pointer is NULL, so the dereference in
> > > vmd_acpi_find_companion() crashes.
> > >
> > > Can anything like that happen?
> >
> > Not really, because pci_iov_add_virtfn() sets virtfn->bus.
> >
> > However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
> > dereferenced by ACPI_COMPANIO(dev->parent) in
> > acpi_pci_find_companion(), the crash occurs.
> >
> > We need a !dev->parent check in acpi_pci_find_companion() I suppose:
> >
> > Does the following change help?
> >
> > Index: linux-pm/drivers/pci/pci-acpi.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci-acpi.c
> > +++ linux-pm/drivers/pci/pci-acpi.c
> > @@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
> >      bool check_children;
> >      u64 addr;
> >
> > +    if (!dev->parent)
> > +        return NULL;
> > +
> >      down_read(&pci_acpi_companion_lookup_sem);
> >
> >      adev = pci_acpi_find_companion_hook ?
>
>
> Yes the above change fixes the problem for me. SR-IOV enables
> successfully and the VFs are fully usable. Thanks!

Thanks for the confirmation!

> Just out of curiosity and because I use this system to test common code
> PCI changed. Do you have an idea what makes my system special here?
>
> The call to pci_set_acpi_fwnode() in pci_setup_device() is
> unconditional and should do the same on any ACPI enabled system.
> Also nothing in your explanation sounds specific to my system.

Right, it is not special and I'm not really sure why others don't see
this breakage.

That's one of the reasons why it is key to report problems early: this
may help to protect others from being hit by those problems.

Let me post an "official" patch for this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-wired-lan] Oops in during sriov_enable with ixgbe driver
@ 2021-10-01 13:21             ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-10-01 13:21 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Oct 1, 2021 at 10:23 AM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
> On Thu, 2021-09-30 at 20:37 +0200, Rafael J. Wysocki wrote:
> > On Thu, Sep 30, 2021 at 8:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Thu, Sep 30, 2021 at 7:38 PM Rafael J. Wysocki
> > > <rafael.j.wysocki@intel.com> wrote:
> > > > On 9/30/2021 7:31 PM, Jesse Brandeburg wrote:
> > > > > On 9/28/2021 4:56 AM, Niklas Schnelle wrote:
> > > > > > Hi Jesse, Hi Tony,
> > > > > >
> > > > > > Since v5.15-rc1 I've been having problems with enabling SR-IOV VFs on
> > > > > > my private workstation with an Intel 82599 NIC with the ixgbe driver. I
> > > > > > haven't had time to bisect or look closer but since it still happens on
> > > > > > v5.15-rc3 I wanted to at least check if you're aware of the problem as
> > > > > > I couldn't find anything on the web.
> > > > > We haven't heard anything of this problem.
> > > > >
> > > > >
> > > > > > I get below Oops when trying "echo 2 > /sys/bus/pci/.../sriov_numvfs"
> > > > > > and suspect that the earlier ACPI messages could have something to do
> > > > > > with that, absolutely not an ACPI expert though. If there is a need I
> > > > > > could do a bisect.
> > > > > Hi Niklas, thanks for the report, I added the Intel Driver's list for
> > > > > more exposure.
> > > > >
> > > > > I asked the developers working on that driver to take a look and they
> > > > > tried to reproduce, and were unable to do so. This might be related to
> > > > > your platform, which strongly suggests that the ACPI stuff may be related.
> > > > >
> > > > > We have tried to reproduce but everything works fine no call trace in
> > > > > scenario with creating VF.
> > > > >
> > > > > This is good in that it doesn't seem to be a general failure, you may
> > > > > want to file a kernel bugzilla (bugzilla.kernel.org) to track the issue,
> > > > > and I hope that @Rafael might have some insight.
> > > > >
> > > > > This issue may be related to changes in acpi_pci_find_companion,
> > > > > but as I say, we are not able to reproduce this.
> > > > >
> > > > > commit 59dc33252ee777e02332774fbdf3381b1d5d5f5d
> > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > Date:   Tue Aug 24 16:43:55 2021 +0200
> > > > >      PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
> > > >
> > > > This change doesn't affect any devices beyond the ones on the VMD bus.
> > >
> > > The only failing case I can see is when the device is on the VMD bus
> > > and its bus pointer is NULL, so the dereference in
> > > vmd_acpi_find_companion() crashes.
> > >
> > > Can anything like that happen?
> >
> > Not really, because pci_iov_add_virtfn() sets virtfn->bus.
> >
> > However, it doesn\t set virtfn->dev.parent AFAICS, so when that gets
> > dereferenced by ACPI_COMPANIO(dev->parent) in
> > acpi_pci_find_companion(), the crash occurs.
> >
> > We need a !dev->parent check in acpi_pci_find_companion() I suppose:
> >
> > Does the following change help?
> >
> > Index: linux-pm/drivers/pci/pci-acpi.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci-acpi.c
> > +++ linux-pm/drivers/pci/pci-acpi.c
> > @@ -1243,6 +1243,9 @@ static struct acpi_device *acpi_pci_find
> >      bool check_children;
> >      u64 addr;
> >
> > +    if (!dev->parent)
> > +        return NULL;
> > +
> >      down_read(&pci_acpi_companion_lookup_sem);
> >
> >      adev = pci_acpi_find_companion_hook ?
>
>
> Yes the above change fixes the problem for me. SR-IOV enables
> successfully and the VFs are fully usable. Thanks!

Thanks for the confirmation!

> Just out of curiosity and because I use this system to test common code
> PCI changed. Do you have an idea what makes my system special here?
>
> The call to pci_set_acpi_fwnode() in pci_setup_device() is
> unconditional and should do the same on any ACPI enabled system.
> Also nothing in your explanation sounds specific to my system.

Right, it is not special and I'm not really sure why others don't see
this breakage.

That's one of the reasons why it is key to report problems early: this
may help to protect others from being hit by those problems.

Let me post an "official" patch for this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-01 13:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28 11:56 Oops in during sriov_enable with ixgbe driver Niklas Schnelle
2021-09-30 17:31 ` Jesse Brandeburg
2021-09-30 17:31   ` [Intel-wired-lan] " Jesse Brandeburg
2021-09-30 17:38   ` Rafael J. Wysocki
2021-09-30 17:38     ` [Intel-wired-lan] " Rafael J. Wysocki
2021-09-30 18:20     ` Rafael J. Wysocki
2021-09-30 18:20       ` [Intel-wired-lan] " Rafael J. Wysocki
2021-09-30 18:37       ` Rafael J. Wysocki
2021-09-30 18:37         ` [Intel-wired-lan] " Rafael J. Wysocki
2021-10-01  8:23         ` Niklas Schnelle
2021-10-01  8:23           ` [Intel-wired-lan] " Niklas Schnelle
2021-10-01 13:21           ` Rafael J. Wysocki
2021-10-01 13:21             ` [Intel-wired-lan] " Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.