netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, Don Dutile <ddutile@redhat.com>
Subject: Re: [PATCH mlx5-next v1 1/5] PCI: Add sysfs callback to allow MSI-X table size change of SR-IOV VFs
Date: Wed, 13 Jan 2021 10:50:52 -0700	[thread overview]
Message-ID: <20210113105052.43cf3c15@omen.home.shazbot.org> (raw)
In-Reply-To: <20210110150727.1965295-2-leon@kernel.org>

On Sun, 10 Jan 2021 17:07:23 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Extend PCI sysfs interface with a new callback that allows configure
> the number of MSI-X vectors for specific SR-IO VF. This is needed
> to optimize the performance of newly bound devices by allocating
> the number of vectors based on the administrator knowledge of targeted VM.
> 
> This function is applicable for SR-IOV VF because such devices allocate
> their MSI-X table before they will run on the VMs and HW can't guess the
> right number of vectors, so the HW allocates them statically and equally.
> 
> The newly added /sys/bus/pci/devices/.../vf_msix_vec file will be seen
> for the VFs and it is writable as long as a driver is not bounded to the VF.
> 
> The values accepted are:
>  * > 0 - this will be number reported by the VF's MSI-X capability
>  * < 0 - not valid
>  * = 0 - will reset to the device default value
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-pci | 20 ++++++++
>  drivers/pci/iov.c                       | 62 +++++++++++++++++++++++++
>  drivers/pci/msi.c                       | 29 ++++++++++++
>  drivers/pci/pci-sysfs.c                 |  1 +
>  drivers/pci/pci.h                       |  2 +
>  include/linux/pci.h                     |  8 +++-
>  6 files changed, 121 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c39770c6..05e26e5da54e 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -375,3 +375,23 @@ Description:
>  		The value comes from the PCI kernel device state and can be one
>  		of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
>  		The file is read only.
> +
> +What:		/sys/bus/pci/devices/.../vf_msix_vec
> +Date:		December 2020
> +Contact:	Leon Romanovsky <leonro@nvidia.com>
> +Description:
> +		This file is associated with the SR-IOV VFs.
> +		It allows configuration of the number of MSI-X vectors for
> +		the VF. This is needed to optimize performance of newly bound
> +		devices by allocating the number of vectors based on the
> +		administrator knowledge of targeted VM.
> +
> +		The values accepted are:
> +		 * > 0 - this will be number reported by the VF's MSI-X
> +			 capability
> +		 * < 0 - not valid
> +		 * = 0 - will reset to the device default value
> +
> +		The file is writable if the PF is bound to a driver that
> +		supports the ->sriov_set_msix_vec_count() callback and there
> +		is no driver bound to the VF.
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 4afd4ee4f7f0..42c0df4158d1 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -31,6 +31,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
>  	return (dev->devfn + dev->sriov->offset +
>  		dev->sriov->stride * vf_id) & 0xff;
>  }
> +EXPORT_SYMBOL(pci_iov_virtfn_devfn);
> 
>  /*
>   * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
> @@ -426,6 +427,67 @@ const struct attribute_group sriov_dev_attr_group = {
>  	.is_visible = sriov_attrs_are_visible,
>  };
> 
> +#ifdef CONFIG_PCI_MSI
> +static ssize_t vf_msix_vec_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int numb = pci_msix_vec_count(pdev);
> +	struct pci_dev *pfdev;
> +
> +	if (numb < 0)
> +		return numb;
> +
> +	pfdev = pci_physfn(pdev);
> +	if (!pfdev->driver || !pfdev->driver->sriov_set_msix_vec_count)
> +		return -EOPNOTSUPP;
> +
> +	return sprintf(buf, "%d\n", numb);
> +}
> +
> +static ssize_t vf_msix_vec_store(struct device *dev,
> +				 struct device_attribute *attr, const char *buf,
> +				 size_t count)
> +{
> +	struct pci_dev *vf_dev = to_pci_dev(dev);
> +	int val, ret;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_set_msix_vec_count(vf_dev, val);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +static DEVICE_ATTR_RW(vf_msix_vec);
> +#endif
> +
> +static struct attribute *sriov_vf_dev_attrs[] = {
> +#ifdef CONFIG_PCI_MSI
> +	&dev_attr_vf_msix_vec.attr,
> +#endif
> +	NULL,
> +};
> +
> +static umode_t sriov_vf_attrs_are_visible(struct kobject *kobj,
> +					  struct attribute *a, int n)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +
> +	if (dev_is_pf(dev))
> +		return 0;
> +
> +	return a->mode;
> +}
> +
> +const struct attribute_group sriov_vf_dev_attr_group = {
> +	.attrs = sriov_vf_dev_attrs,
> +	.is_visible = sriov_vf_attrs_are_visible,
> +};
> +
>  int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>  {
>  	return 0;
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 3162f88fe940..20705ca94666 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -991,6 +991,35 @@ int pci_msix_vec_count(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_msix_vec_count);
> 
> +/**
> + * pci_set_msix_vec_count - change the reported number of MSI-X vectors
> + * This function is applicable for SR-IOV VF because such devices allocate
> + * their MSI-X table before they will run on the VMs and HW can't guess the
> + * right number of vectors, so the HW allocates them statically and equally.
> + * @dev: VF device that is going to be changed
> + * @numb: amount of MSI-X vectors
> + **/
> +int pci_set_msix_vec_count(struct pci_dev *dev, int numb)
> +{
> +	struct pci_dev *pdev = pci_physfn(dev);
> +
> +	if (!dev->msix_cap || !pdev->msix_cap)
> +		return -EINVAL;
> +
> +	if (dev->driver || !pdev->driver ||
> +	    !pdev->driver->sriov_set_msix_vec_count)
> +		return -EOPNOTSUPP;


This seems racy, don't we need to hold device_lock on both the VF and
PF to avoid driver {un}binding races?  Does that happen implicitly
somewhere?  Thanks,

Alex

> +
> +	if (numb < 0)
> +		/*
> +		 * We don't support negative numbers for now,
> +		 * but maybe in the future it will make sense.
> +		 */
> +		return -EINVAL;
> +
> +	return pdev->driver->sriov_set_msix_vec_count(dev, numb);
> +}
> +
>  static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
>  			     int nvec, struct irq_affinity *affd, int flags)
>  {
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index fb072f4b3176..0af2222643c2 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1557,6 +1557,7 @@ static const struct attribute_group *pci_dev_attr_groups[] = {
>  	&pci_dev_hp_attr_group,
>  #ifdef CONFIG_PCI_IOV
>  	&sriov_dev_attr_group,
> +	&sriov_vf_dev_attr_group,
>  #endif
>  	&pci_bridge_attr_group,
>  	&pcie_dev_attr_group,
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 5c59365092fa..1fd273077637 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -183,6 +183,7 @@ extern unsigned int pci_pm_d3hot_delay;
> 
>  #ifdef CONFIG_PCI_MSI
>  void pci_no_msi(void);
> +int pci_set_msix_vec_count(struct pci_dev *dev, int numb);
>  #else
>  static inline void pci_no_msi(void) { }
>  #endif
> @@ -502,6 +503,7 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
>  void pci_restore_iov_state(struct pci_dev *dev);
>  int pci_iov_bus_range(struct pci_bus *bus);
>  extern const struct attribute_group sriov_dev_attr_group;
> +extern const struct attribute_group sriov_vf_dev_attr_group;
>  #else
>  static inline int pci_iov_init(struct pci_dev *dev)
>  {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b32126d26997..a17cfc28eb66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -856,6 +856,8 @@ struct module;
>   *		e.g. drivers/net/e100.c.
>   * @sriov_configure: Optional driver callback to allow configuration of
>   *		number of VFs to enable via sysfs "sriov_numvfs" file.
> + * @sriov_set_msix_vec_count: Driver callback to change number of MSI-X vectors
> + *              exposed by the sysfs "vf_msix_vec" entry.
>   * @err_handler: See Documentation/PCI/pci-error-recovery.rst
>   * @groups:	Sysfs attribute groups.
>   * @driver:	Driver model structure.
> @@ -871,6 +873,7 @@ struct pci_driver {
>  	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
>  	void (*shutdown)(struct pci_dev *dev);
>  	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
> +	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
>  	const struct pci_error_handlers *err_handler;
>  	const struct attribute_group **groups;
>  	struct device_driver	driver;
> @@ -2057,7 +2060,6 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
> 
>  #ifdef CONFIG_PCI_IOV
>  int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
> -int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
> 
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>  void pci_disable_sriov(struct pci_dev *dev);
> @@ -2402,6 +2404,10 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
>  void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
>  #endif
> 
> +#ifdef CONFIG_PCI_IOV
> +int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id);
> +#endif
> +
>  /* Provide the legacy pci_dma_* API */
>  #include <linux/pci-dma-compat.h>
> 
> --
> 2.29.2
> 


  parent reply	other threads:[~2021-01-13 17:52 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-10 15:07 [PATCH mlx5-next v1 0/5] Dynamically assign MSI-X vectors count Leon Romanovsky
2021-01-10 15:07 ` [PATCH mlx5-next v1 1/5] PCI: Add sysfs callback to allow MSI-X table size change of SR-IOV VFs Leon Romanovsky
2021-01-11 19:30   ` Alexander Duyck
2021-01-12  3:25     ` Don Dutile
2021-01-12  6:15       ` Leon Romanovsky
2021-01-12  6:39     ` Leon Romanovsky
2021-01-12 21:59       ` Alexander Duyck
2021-01-13  6:09         ` Leon Romanovsky
2021-01-13 20:00           ` Alexander Duyck
2021-01-14  7:16             ` Leon Romanovsky
2021-01-14 16:51               ` Alexander Duyck
2021-01-14 17:12                 ` Leon Romanovsky
2021-01-13 17:50   ` Alex Williamson [this message]
2021-01-13 19:49     ` Leon Romanovsky
2021-01-10 15:07 ` [PATCH mlx5-next v1 2/5] PCI: Add SR-IOV sysfs entry to read number of MSI-X vectors Leon Romanovsky
2021-01-11 19:30   ` Alexander Duyck
2021-01-12  6:56     ` Leon Romanovsky
2021-01-12 21:34       ` Alexander Duyck
2021-01-13  6:19         ` Leon Romanovsky
2021-01-13 22:44           ` Alexander Duyck
2021-01-14  6:50             ` Leon Romanovsky
2021-01-14 16:40               ` Alexander Duyck
2021-01-14 16:48                 ` Jason Gunthorpe
2021-01-14 17:55                   ` Alexander Duyck
2021-01-14 18:29                     ` Jason Gunthorpe
2021-01-14 19:24                       ` Alexander Duyck
2021-01-14 19:56                         ` Leon Romanovsky
2021-01-14 20:08                         ` Jason Gunthorpe
2021-01-14 21:43                           ` Alexander Duyck
2021-01-14 23:28                             ` Alex Williamson
2021-01-15  1:56                               ` Alexander Duyck
2021-01-15 14:06                                 ` Jason Gunthorpe
2021-01-15 15:53                                   ` Leon Romanovsky
2021-01-16  1:48                                     ` Alexander Duyck
2021-01-16  8:20                                       ` Leon Romanovsky
2021-01-18  3:16                                         ` Alexander Duyck
2021-01-18  7:20                                           ` Leon Romanovsky
2021-01-18 13:28                                             ` Leon Romanovsky
2021-01-18 18:21                                               ` Alexander Duyck
2021-01-19  5:43                                                 ` Leon Romanovsky
2021-01-16  4:32                                   ` Alexander Duyck
2021-01-18 15:47                                     ` Jason Gunthorpe
2021-01-15 13:51                             ` Jason Gunthorpe
2021-01-14 17:26                 ` Leon Romanovsky
2021-01-18 17:03   ` Greg KH
2021-01-19  5:38     ` Leon Romanovsky
2021-01-10 15:07 ` [PATCH mlx5-next v1 3/5] net/mlx5: Add dynamic MSI-X capabilities bits Leon Romanovsky
2021-01-10 15:07 ` [PATCH mlx5-next v1 4/5] net/mlx5: Dynamically assign MSI-X vectors count Leon Romanovsky
2021-01-10 15:07 ` [PATCH mlx5-next v1 5/5] net/mlx5: Allow to the users to configure number of MSI-X vectors Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210113105052.43cf3c15@omen.home.shazbot.org \
    --to=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=ddutile@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).