linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yicong Yang <yangyicong@huawei.com>
To: Lukasz Maniak <lukasz.maniak@linux.intel.com>
Cc: yangyicong@hisilicon.com, "Bjorn Helgaas" <helgaas@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Łukasz Gieryk" <lukasz.gieryk@linux.intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>
Subject: Re: [PATCH] PCI: Reset IOV state on FLR to PF
Date: Wed, 19 Jan 2022 10:47:35 +0800	[thread overview]
Message-ID: <b1ad6220-cdc0-1058-6885-9c5b48441837@huawei.com> (raw)
In-Reply-To: <20220118163054.GA8392@lmaniak-dev.igk.intel.com>

On 2022/1/19 0:30, Lukasz Maniak wrote:
> On Tue, Jan 18, 2022 at 07:07:23PM +0800, Yicong Yang wrote:
>> On 2022/1/18 6:55, Bjorn Helgaas wrote:
>>> [+cc Alex in case he has comments on how FLR should work on
>>> non-conforming hns3 devices]
>>>
>>> On Sat, Jan 15, 2022 at 05:22:19PM +0800, Yicong Yang wrote:
>>>> On 2022/1/15 0:37, Bjorn Helgaas wrote:
>>>>> On Fri, Jan 14, 2022 at 05:42:48PM +0800, Yicong Yang wrote:
>>>>>> On 2022/1/14 0:45, Lukasz Maniak wrote:
>>>>>>> On Wed, Jan 12, 2022 at 08:49:03AM -0600, Bjorn Helgaas wrote:
>>>>>>>> On Wed, Dec 22, 2021 at 08:19:57PM +0100, Lukasz Maniak wrote:
>>>>>>>>> As per PCI Express specification, FLR to a PF resets the PF state as
>>>>>>>>> well as the SR-IOV extended capability including VF Enable which means
>>>>>>>>> that VFs no longer exist.
>>>>>>>>
>>>>>>>> Can you add a specific reference to the spec, please?
>>>>>>>>
>>>>>>> Following the Single Root I/O Virtualization and Sharing Specification:
>>>>>>> 2.2.3. FLR That Targets a PF
>>>>>>> PFs must support FLR.
>>>>>>> FLR to a PF resets the PF state as well as the SR-IOV extended
>>>>>>> capability including VF Enable which means that VFs no longer exist.
>>>>>>>
>>>>>>> For PCI Express Base Specification Revision 5.0 and later, this is
>>>>>>> section 9.2.2.3.
>>>>>
>>>>> This is also the section in the new PCIe r6.0.  Let's use that.
>>>>>
>>>>>>>>> Currently, the IOV state is not updated during FLR, resulting in
>>>>>>>>> non-compliant PCI driver behavior.
>>>>>>>>
>>>>>>>> And include a little detail about what problem is observed?  How would
>>>>>>>> a user know this problem is occurring?
>>>>>>>>
>>>>>>> The problem is that the state of the kernel and HW as to the number of
>>>>>>> VFs gets out of sync after FLR.
>>>>>>>
>>>>>>> This results in further listing, after the FLR is performed by the HW,
>>>>>>> of VFs that actually no longer exist and should no longer be reported on
>>>>>>> the PCI bus. lspci return FFs for these VFs.
>>>>>>
>>>>>> There're some exceptions. Take HiSilicon's hns3 and sec device as an
>>>>>> example, the VF won't be destroyed after the FLR reset.
>>>>>
>>>>> If FLR on an hns3 PF does *not* clear VF Enable, and the VFs still
>>>>> exist after FLR, isn't that a violation of sec 9.2.2.3?
>>>>
>>>> yes I think it's a violation to the spec.
>>>
>>> Thanks for confirming that.
>>>
>>>>> If hns3 and sec don't conform to the spec, we should have some sort of
>>>>> quirk that serves to document and work around this.
>>>>
>>>> ok I think it'll help. Do you mean something like this based on this patch:
>>>>
>>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>>> index 69ee321027b4..0e4976c669b2 100644
>>>> --- a/drivers/pci/iov.c
>>>> +++ b/drivers/pci/iov.c
>>>> @@ -1025,6 +1025,8 @@ void pci_reset_iov_state(struct pci_dev *dev)
>>>>  		return;
>>>>  	if (!iov->num_VFs)
>>>>  		return;
>>>> +	if (dev->flr_no_vf_reset)
>>>> +		return;
>>>>
>>>>  	sriov_del_vfs(dev);
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 003950c738d2..c8ffcb0ac612 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -1860,6 +1860,17 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
>>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
>>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
>>>>
>>>> +/*
>>>> + * Some HiSilicon PCIe devices' VF won't be destroyed after a FLR reset.
>>>> + * Don't reset these devices' IOV state when doing FLR.
>>>> + */
>>>> +static void quirk_huawei_pcie_flr(struct pci_dev *pdev)
>>>> +{
>>>> +	pdev->flr_no_vf_reset = 1;
>>>> +}
>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_flr);
>>>> +/* ...some other devices have this quirk */
>>>
>>> Yes, I think something along this line will help.
>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 18a75c8e615c..e62f9fa4d48f 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -454,6 +454,7 @@ struct pci_dev {
>>>>  	unsigned int	is_probed:1;		/* Device probing in progress */
>>>>  	unsigned int	link_active_reporting:1;/* Device capable of reporting link active */
>>>>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>>>> +	unsigned int	flr_no_vf_reset:1;	/* VF won't be destroyed after PF's FLR */
>>>>
>>>>>> Currently the transactions with the VF will be restored after the
>>>>>> FLR. But this patch will break that, the VF is fully disabled and
>>>>>> the transaction cannot be restored. User needs to reconfigure it,
>>>>>> which is unnecessary before this patch.
>>>>>
>>>>> What does it mean for a "transaction to be restored"?  Maybe you mean
>>>>> this patch removes the *VFs* via sriov_del_vfs(), and whoever
>>>>> initiated the FLR would need to re-enable VFs via pci_enable_sriov()
>>>>> or something similar?
>>>>
>>>> Partly. It'll also terminate the VF users.
>>>> Think that I attach the VF of hns to a VM by vfio and ping the network
>>>> in the VM, when doing FLR the 'ping' will pause and after FLR it'll
>>>> resume. Currenlty The driver handle this in the ->reset_{prepare, done}()
>>>> methods. The user of VM may not realize there is a FLR of the PF as the
>>>> VF always exists and the 'ping' is never terminated.
>>>>
>>>> If we remove the VF when doing FLR, then 1) we'll block in the VF->remove()
>>>> until no one is using the device, for example the 'ping' is finished.
>>>> 2) the VF in the VM no longer exists and we have to re-enable VF and hotplug
>>>> it into the VM and restart the ping. That's a big difference.
>>>>
>>>>> If FLR disables VFs, it seems like we should expect to have to
>>>>> re-enable them if we want them.
>>>>
>>>> It involves a remove()/probe() process of the VF driver and the user
>>>> of the VF will be terminated, just like the situation illustrated
>>>> above.
>>>
>>> I think users of FLR should be able to rely on it working per spec,
>>> i.e., that VFs will be destroyed.  If hardware like hns3 doesn't do
>>> that, the quirk should work around that in software by doing it
>>> explicitly.
>>>
>>> I don't think the non-standard behavior should be exposed to the
>>> users.  The user should not have to know about this hns3 issue.
>>>
>>> If FLR on a standard NIC terminates a ping on a VF, FLR on an hns3 NIC
>>> should also terminate a ping on a VF.
>>>
>>
>> ok thanks for the discussion, agree on that. According to the spec, after
>> the FLR to the PF the VF does not exist anymore, so the ping will be terminated.
>> Our hns3 and sec team are still evaluating it before coming to a solution of
>> whether using a quirk or comform to the spec.
>>
>> For this patch it looks reasonable to me, but some questions about the code below.
>>
>>>>>> Can we handle this problem in another way? Maybe test the VF's
>>>>>> vendor device ID after the FLR reset to see whether it has really
>>>>>> gone or not?
>>>>>>
>>>>>>> sriov_numvfs in sysfs returns old invalid value and does not allow
>>>>>>> setting a new value before explicitly setting 0 in the first place.
>>>>>>>
>>>>>>>>> This patch introduces a simple function, called on the FLR path, that
>>>>>>>>> removes the virtual function devices from the PCI bus and their
>>>>>>>>> corresponding sysfs links with a final clear of the num_vfs value in IOV
>>>>>>>>> state.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/pci/iov.c | 21 +++++++++++++++++++++
>>>>>>>>>  drivers/pci/pci.c |  2 ++
>>>>>>>>>  drivers/pci/pci.h |  4 ++++
>>>>>>>>>  3 files changed, 27 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>>>>>>>> index 0267977c9f17..69ee321027b4 100644
>>>>>>>>> --- a/drivers/pci/iov.c
>>>>>>>>> +++ b/drivers/pci/iov.c
>>>>>>>>> @@ -1013,6 +1013,27 @@ int pci_iov_bus_range(struct pci_bus *bus)
>>>>>>>>>  	return max ? max - bus->number : 0;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +/**
>>>>>>>>> + * pci_reset_iov_state - reset the state of the IOV capability
>>>>>>>>> + * @dev: the PCI device
>>>>>>>>> + */
>>>>>>>>> +void pci_reset_iov_state(struct pci_dev *dev)
>>>>>>>>> +{
>>>>>>>>> +	struct pci_sriov *iov = dev->sriov;
>>>>>>>>> +
>>>>>>>>> +	if (!dev->is_physfn)
>>>>>>>>> +		return;
>>>>>>>>> +	if (!iov->num_VFs)
>>>>>>>>> +		return;
>>>>>>>>> +
>>>>>>>>> +	sriov_del_vfs(dev);
>>>>>>>>> +
>>>>>>>>> +	if (iov->link != dev->devfn)
>>>>>>>>> +		sysfs_remove_link(&dev->dev.kobj, "dep_link");
>>>>>>>>> +
>>>>>>>>> +	iov->num_VFs = 0;
>>>>>>>>> +}
>>>>>>>>> +
>>
>> Any reason for not using pci_disable_sriov()?
> 
> The issue with pci_disable_sriov() is that it calls sriov_disable(),
> which directly uses pci_cfg_access_lock(), leading to deadlock on the
> FLR path.
> 

That'll be a problem. Well my main concern is whether the VFs will be reset
correctly through pci_reset_iov_state() as it lacks the participant of
PF driver and bios (seems may needed only on powerpc, not sure), which is
necessary in the enable/disable routine through $pci_dev/sriov_numvfs.

>>
>> With the spec the related registers in the SRIOV cap will be reset so
>> it's ok in general. But for some devices not following the spec like hns3,
>> some fields like VF enable won't be reset and keep enabled after the FLR.
>> In this case after the FLR the VF devices in the system has gone but
>> the state of the PF SRIOV cap leaves uncleared. pci_disable_sriov()
>> will reset the whole SRIOV cap. It'll also call pcibios_sriov_disable()
>> to correct handle the VF disabling on some platforms, IIUC.
>>
>> Or is it better to use pdev->driver->sriov_configure(pdev,0)?
>> PF drivers must implement ->sriov_configure() for enabling/disabling
>> the VF but we totally skip the PF driver here.
>>
>> Thanks,
>> Yicong
>>
>>>>>>>>>  /**
>>>>>>>>>   * pci_enable_sriov - enable the SR-IOV capability
>>>>>>>>>   * @dev: the PCI device
>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>> index 3d2fb394986a..535f19d37e8d 100644
>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>> @@ -4694,6 +4694,8 @@ EXPORT_SYMBOL(pci_wait_for_pending_transaction);
>>>>>>>>>   */
>>>>>>>>>  int pcie_flr(struct pci_dev *dev)
>>>>>>>>>  {
>>>>>>>>> +	pci_reset_iov_state(dev);
>>>>>>>>> +
>>>>>>>>>  	if (!pci_wait_for_pending_transaction(dev))
>>>>>>>>>  		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
>>>>>>>>>  
>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>> index 3d60cabde1a1..7bb144fbec76 100644
>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>> @@ -480,6 +480,7 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno);
>>>>>>>>>  resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
>>>>>>>>>  void pci_restore_iov_state(struct pci_dev *dev);
>>>>>>>>>  int pci_iov_bus_range(struct pci_bus *bus);
>>>>>>>>> +void pci_reset_iov_state(struct pci_dev *dev);
>>>>>>>>>  extern const struct attribute_group sriov_pf_dev_attr_group;
>>>>>>>>>  extern const struct attribute_group sriov_vf_dev_attr_group;
>>>>>>>>>  #else
>>>>>>>>> @@ -501,6 +502,9 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
>>>>>>>>>  {
>>>>>>>>>  	return 0;
>>>>>>>>>  }
>>>>>>>>> +static inline void pci_reset_iov_state(struct pci_dev *dev)
>>>>>>>>> +{
>>>>>>>>> +}
>>>>>>>>>  
>>>>>>>>>  #endif /* CONFIG_PCI_IOV */
>>> .
>>>
> .
> 

  reply	other threads:[~2022-01-19  2:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 19:19 [PATCH] PCI: Reset IOV state on FLR to PF Lukasz Maniak
2022-01-12 14:49 ` Bjorn Helgaas
2022-01-13 16:45   ` Lukasz Maniak
2022-01-14  9:42     ` Yicong Yang
2022-01-14 16:37       ` Bjorn Helgaas
2022-01-15  9:22         ` Yicong Yang
2022-01-17 22:55           ` Bjorn Helgaas
2022-01-18 11:07             ` Yicong Yang
2022-01-18 16:30               ` Lukasz Maniak
2022-01-19  2:47                 ` Yicong Yang [this message]
2022-01-19 10:22                   ` Yicong Yang
2022-01-19 16:06                     ` Lukasz Maniak
2022-01-19 17:09                       ` Lukasz Maniak
2022-01-20 13:16                         ` Yicong Yang
2022-01-21 21:40                           ` Saeed Mahameed
2022-01-29  9:41                             ` Yicong Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1ad6220-cdc0-1058-6885-9c5b48441837@huawei.com \
    --to=yangyicong@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukasz.gieryk@linux.intel.com \
    --cc=lukasz.maniak@linux.intel.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).