From: Bjorn Helgaas <helgaas@kernel.org>
To: Leon Romanovsky <leon@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
Alexander Duyck <alexander.duyck@gmail.com>,
Keith Busch <kbusch@kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Jakub Kicinski <kuba@kernel.org>,
linux-pci <linux-pci@vger.kernel.org>,
linux-rdma@vger.kernel.org, Netdev <netdev@vger.kernel.org>,
Don Dutile <ddutile@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
"David S . Miller" <davem@davemloft.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count
Date: Wed, 31 Mar 2021 20:23:40 -0500 [thread overview]
Message-ID: <20210401012340.GA1423690@bjorn-Precision-5520> (raw)
In-Reply-To: <YGP1p7KH+/gL4NAU@unreal>
[+cc Rafael, in case you're interested in the driver core issue here]
On Wed, Mar 31, 2021 at 07:08:07AM +0300, Leon Romanovsky wrote:
> On Tue, Mar 30, 2021 at 03:41:41PM -0500, Bjorn Helgaas wrote:
> > On Tue, Mar 30, 2021 at 04:47:16PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Mar 30, 2021 at 10:00:19AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Mar 30, 2021 at 10:57:38AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Mar 29, 2021 at 08:29:49PM -0500, Bjorn Helgaas wrote:
> > > > >
> > > > > > I think I misunderstood Greg's subdirectory comment. We already have
> > > > > > directories like this:
> > > > >
> > > > > Yes, IIRC, Greg's remark applies if you have to start creating
> > > > > directories with manual kobjects.
> > > > >
> > > > > > and aspm_ctrl_attr_group (for "link") is nicely done with static
> > > > > > attributes. So I think we could do something like this:
> > > > > >
> > > > > > /sys/bus/pci/devices/0000:01:00.0/ # PF directory
> > > > > > sriov/ # SR-IOV related stuff
> > > > > > vf_total_msix
> > > > > > vf_msix_count_BB:DD.F # includes bus/dev/fn of first VF
> > > > > > ...
> > > > > > vf_msix_count_BB:DD.F # includes bus/dev/fn of last VF
> > > > >
> > > > > It looks a bit odd that it isn't a subdirectory, but this seems
> > > > > reasonable.
> > > >
> > > > Sorry, I missed your point; you'll have to lay it out more explicitly.
> > > > I did intend that "sriov" *is* a subdirectory of the 0000:01:00.0
> > > > directory. The full paths would be:
> > > >
> > > > /sys/bus/pci/devices/0000:01:00.0/sriov/vf_total_msix
> > > > /sys/bus/pci/devices/0000:01:00.0/sriov/vf_msix_count_BB:DD.F
> > > > ...
> > >
> > > Sorry, I was meaning what you first proposed:
> > >
> > > /sys/bus/pci/devices/0000:01:00.0/sriov/BB:DD.F/vf_msix_count
> > >
> > > Which has the extra sub directory to organize the child VFs.
> > >
> > > Keep in mind there is going to be alot of VFs here, > 1k - so this
> > > will be a huge directory.
> >
> > With 0000:01:00.0/sriov/vf_msix_count_BB:DD.F, sriov/ will contain
> > 1 + 1K files ("vf_total_msix" + 1 per VF).
> >
> > With 0000:01:00.0/sriov/BB:DD.F/vf_msix_count, sriov/ will contain
> > 1 file and 1K subdirectories.
>
> This is racy by design, in order to add new file and create BB:DD.F
> directory, the VF will need to do it after or during it's creation.
> During PF creation it is unknown to PF those BB:DD.F values.
>
> The race here is due to the events of PF,VF directory already sent
> but new directory structure is not ready yet.
>
> From code perspective, we will need to add something similar to
> pci_iov_sysfs_link() with the code that you didn't like in previous
> variants (the one that messes with sysfs_create_file API).
>
> It looks not good for large SR-IOV systems with >1K VFs with
> gazillion subdirectories inside PF, while the more natural is to see
> them in VF.
>
> So I'm completely puzzled why you want to do these files on PF and
> not on VF as v0, v7 and v8 proposed.
On both mlx5 and NVMe, the "assign vectors to VF" functionality is
implemented by the PF, so I think it's reasonable to explore the idea
of "associate the vector assignment sysfs file with the PF."
Assume 1K VFs. Either way we have >1K subdirectories of
/sys/devices/pci0000:00/. I think we should avoid an extra
subdirectory level, so I think the choices on the table are:
Associate "vf_msix_count" with the PF:
- /sys/.../<PF>/sriov/vf_total_msix # all on PF
- /sys/.../<PF>/sriov/vf_msix_count_BB:DD.F (1K of these). Greg
says the number of these is not a problem.
- The "vf_total_msix" and "vf_msix_count_*" files are all related
and are grouped together in PF/sriov/.
- The "vf_msix_count_*" files operate directly on the PF. Lock the
PF for serialization, lookup and lock the VF to ensure no VF
driver, call PF driver callback to assign vectors.
- Requires special sysfs code to create/remove "vf_msix_count_*"
files when setting/clearing VF Enable. This code could create
them only when the PF driver actually supports vector assignment.
Unavoidable sysfs/uevent race, see below.
Associate "vf_msix_count" with the VF:
- /sys/.../<PF>/sriov_vf_total_msix # on PF
- /sys/.../<VF>/sriov_vf_msix_count # on each VF
- The "sriov_vf_msix_count" files enter via the VF. Lock the VF to
ensure no VF driver, lookup and lock the PF for serialization,
call PF driver callback to assign vectors.
- Can be done with static sysfs attributes. This means creating
"sriov_vf_msix_count" *always*, even if PF driver doesn't support
vector assignment.
IIUC, putting "vf_msix_count_*" under the PF involves a race. When we
call device_add() for each new VF, it creates the VF sysfs directory
and emits the KOBJ_ADD uevent, but the "vf_msix_count_*" file doesn't
exist yet. It can't be created before device_add() because the sysfs
directory doesn't exist. If we create it after device_add(), the "add
VF" uevent has already been emitted, so userspace may consume it
before "vf_msix_count_*" is created.
sriov_enable
<set VF Enable> <-- VFs created on PCI
sriov_add_vfs
for (i = 0; i < num_vfs; i++) {
pci_iov_add_virtfn
pci_device_add
device_initialize
device_add
device_add_attrs <-- add VF sysfs attrs
kobject_uevent(KOBJ_ADD) <-- emit uevent
<-- add "vf_msix_count_*" sysfs attr
pci_iov_sysfs_link
pci_bus_add_device
pci_create_sysfs_dev_files
device_attach
}
Conceptually, I like having the "vf_total_msix" and "vf_msix_count_*"
files associated directly with the PF. I think that's more natural
because they both operate directly on the PF.
But I don't like the race, and using static attributes seems much
cleaner implementation-wise.
Bjorn
next prev parent reply other threads:[~2021-04-01 1:24 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-01 7:55 [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-01 7:55 ` [PATCH mlx5-next v7 1/4] PCI: Add a sysfs file to change the MSI-X table size of SR-IOV VFs Leon Romanovsky
2021-03-01 8:14 ` Greg Kroah-Hartman
2021-03-01 8:32 ` Leon Romanovsky
2021-03-01 8:37 ` Greg Kroah-Hartman
2021-03-01 8:53 ` Leon Romanovsky
2021-03-01 7:55 ` [PATCH mlx5-next v7 2/4] net/mlx5: Add dynamic MSI-X capabilities bits Leon Romanovsky
2021-03-01 7:55 ` [PATCH mlx5-next v7 3/4] net/mlx5: Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-01 7:55 ` [PATCH mlx5-next v7 4/4] net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks Leon Romanovsky
2021-03-07 8:11 ` [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-07 18:55 ` Alexander Duyck
2021-03-07 19:19 ` Leon Romanovsky
2021-03-08 16:33 ` Alexander Duyck
2021-03-08 19:20 ` Leon Romanovsky
2021-03-10 19:09 ` Bjorn Helgaas
2021-03-10 20:10 ` Leon Romanovsky
2021-03-10 20:21 ` Greg Kroah-Hartman
2021-03-11 8:37 ` Leon Romanovsky
2021-03-10 23:34 ` Alexander Duyck
2021-03-11 18:17 ` Bjorn Helgaas
2021-03-11 19:16 ` Keith Busch
2021-03-11 19:21 ` Leon Romanovsky
2021-03-11 20:22 ` Jason Gunthorpe
2021-03-11 20:50 ` Keith Busch
2021-03-11 21:44 ` Jason Gunthorpe
2021-03-25 17:21 ` Bjorn Helgaas
2021-03-25 17:36 ` Jason Gunthorpe
2021-03-25 18:20 ` Bjorn Helgaas
2021-03-25 18:28 ` Jason Gunthorpe
2021-03-26 6:44 ` Leon Romanovsky
2021-03-26 16:00 ` Alexander Duyck
2021-03-26 16:56 ` Jason Gunthorpe
2021-03-26 17:08 ` Bjorn Helgaas
2021-03-26 17:12 ` Jason Gunthorpe
2021-03-27 6:00 ` Leon Romanovsky
2021-03-26 17:29 ` Keith Busch
2021-03-26 17:31 ` Jason Gunthorpe
2021-03-26 18:50 ` Alexander Duyck
2021-03-26 19:01 ` Jason Gunthorpe
2021-03-30 1:29 ` Bjorn Helgaas
2021-03-30 13:57 ` Jason Gunthorpe
2021-03-30 15:00 ` Bjorn Helgaas
2021-03-30 19:47 ` Jason Gunthorpe
2021-03-30 20:41 ` Bjorn Helgaas
2021-03-30 22:43 ` Jason Gunthorpe
2021-03-31 6:38 ` Greg Kroah-Hartman
2021-03-31 12:19 ` Jason Gunthorpe
2021-03-31 15:03 ` Greg Kroah-Hartman
2021-03-31 17:07 ` Jason Gunthorpe
2021-03-31 4:08 ` Leon Romanovsky
2021-04-01 1:23 ` Bjorn Helgaas [this message]
2021-04-01 11:49 ` Leon Romanovsky
2021-03-30 18:10 ` Keith Busch
2021-03-26 19:36 ` Bjorn Helgaas
2021-03-27 12:38 ` Greg Kroah-Hartman
2021-03-25 18:31 ` Keith Busch
2021-03-25 18:36 ` Jason Gunthorpe
2021-03-11 19:17 ` Leon Romanovsky
2021-03-11 19:37 ` Alexander Duyck
2021-03-11 19:51 ` Leon Romanovsky
2021-03-11 20:11 ` Alexander Duyck
2021-03-11 20:19 ` Jason Gunthorpe
2021-03-11 21:49 ` Alexander Duyck
2021-03-11 23:20 ` Jason Gunthorpe
2021-03-12 2:53 ` Alexander Duyck
2021-03-12 6:32 ` Leon Romanovsky
2021-03-12 16:59 ` Alexander Duyck
2021-03-12 17:03 ` Jason Gunthorpe
2021-03-12 18:34 ` Leon Romanovsky
2021-03-12 18:41 ` Leon Romanovsky
2021-03-12 13:00 ` Jason Gunthorpe
2021-03-12 13:36 ` Keith Busch
2021-03-11 20:31 ` Jason Gunthorpe
2021-03-10 5:58 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210401012340.GA1423690@bjorn-Precision-5520 \
--to=helgaas@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=bhelgaas@google.com \
--cc=davem@davemloft.net \
--cc=ddutile@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).