linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Leon Romanovsky <leon@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Keith Busch <kbusch@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	linux-rdma@vger.kernel.org, Netdev <netdev@vger.kernel.org>,
	Don Dutile <ddutile@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"David S . Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count
Date: Wed, 31 Mar 2021 20:23:40 -0500	[thread overview]
Message-ID: <20210401012340.GA1423690@bjorn-Precision-5520> (raw)
In-Reply-To: <YGP1p7KH+/gL4NAU@unreal>

[+cc Rafael, in case you're interested in the driver core issue here]

On Wed, Mar 31, 2021 at 07:08:07AM +0300, Leon Romanovsky wrote:
> On Tue, Mar 30, 2021 at 03:41:41PM -0500, Bjorn Helgaas wrote:
> > On Tue, Mar 30, 2021 at 04:47:16PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Mar 30, 2021 at 10:00:19AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Mar 30, 2021 at 10:57:38AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Mar 29, 2021 at 08:29:49PM -0500, Bjorn Helgaas wrote:
> > > > > 
> > > > > > I think I misunderstood Greg's subdirectory comment.  We already have
> > > > > > directories like this:
> > > > > 
> > > > > Yes, IIRC, Greg's remark applies if you have to start creating
> > > > > directories with manual kobjects.
> > > > > 
> > > > > > and aspm_ctrl_attr_group (for "link") is nicely done with static
> > > > > > attributes.  So I think we could do something like this:
> > > > > > 
> > > > > >   /sys/bus/pci/devices/0000:01:00.0/   # PF directory
> > > > > >     sriov/                             # SR-IOV related stuff
> > > > > >       vf_total_msix
> > > > > >       vf_msix_count_BB:DD.F        # includes bus/dev/fn of first VF
> > > > > >       ...
> > > > > >       vf_msix_count_BB:DD.F        # includes bus/dev/fn of last VF
> > > > > 
> > > > > It looks a bit odd that it isn't a subdirectory, but this seems
> > > > > reasonable.
> > > > 
> > > > Sorry, I missed your point; you'll have to lay it out more explicitly.
> > > > I did intend that "sriov" *is* a subdirectory of the 0000:01:00.0
> > > > directory.  The full paths would be:
> > > >
> > > >   /sys/bus/pci/devices/0000:01:00.0/sriov/vf_total_msix
> > > >   /sys/bus/pci/devices/0000:01:00.0/sriov/vf_msix_count_BB:DD.F
> > > >   ...
> > > 
> > > Sorry, I was meaning what you first proposed:
> > > 
> > >    /sys/bus/pci/devices/0000:01:00.0/sriov/BB:DD.F/vf_msix_count
> > > 
> > > Which has the extra sub directory to organize the child VFs.
> > > 
> > > Keep in mind there is going to be alot of VFs here, > 1k - so this
> > > will be a huge directory.
> > 
> > With 0000:01:00.0/sriov/vf_msix_count_BB:DD.F, sriov/ will contain
> > 1 + 1K files ("vf_total_msix" + 1 per VF).
> > 
> > With 0000:01:00.0/sriov/BB:DD.F/vf_msix_count, sriov/ will contain
> > 1 file and 1K subdirectories.
> 
> This is racy by design, in order to add new file and create BB:DD.F
> directory, the VF will need to do it after or during it's creation.
> During PF creation it is unknown to PF those BB:DD.F values.
> 
> The race here is due to the events of PF,VF directory already sent
> but new directory structure is not ready yet.
>
> From code perspective, we will need to add something similar to
> pci_iov_sysfs_link() with the code that you didn't like in previous
> variants (the one that messes with sysfs_create_file API).
> 
> It looks not good for large SR-IOV systems with >1K VFs with
> gazillion subdirectories inside PF, while the more natural is to see
> them in VF.
> 
> So I'm completely puzzled why you want to do these files on PF and
> not on VF as v0, v7 and v8 proposed.

On both mlx5 and NVMe, the "assign vectors to VF" functionality is
implemented by the PF, so I think it's reasonable to explore the idea
of "associate the vector assignment sysfs file with the PF."

Assume 1K VFs.  Either way we have >1K subdirectories of
/sys/devices/pci0000:00/.  I think we should avoid an extra
subdirectory level, so I think the choices on the table are:

Associate "vf_msix_count" with the PF:

  - /sys/.../<PF>/sriov/vf_total_msix    # all on PF

  - /sys/.../<PF>/sriov/vf_msix_count_BB:DD.F (1K of these).  Greg
    says the number of these is not a problem.

  - The "vf_total_msix" and "vf_msix_count_*" files are all related
    and are grouped together in PF/sriov/.

  - The "vf_msix_count_*" files operate directly on the PF.  Lock the
    PF for serialization, lookup and lock the VF to ensure no VF
    driver, call PF driver callback to assign vectors.

  - Requires special sysfs code to create/remove "vf_msix_count_*"
    files when setting/clearing VF Enable.  This code could create
    them only when the PF driver actually supports vector assignment.
    Unavoidable sysfs/uevent race, see below.

Associate "vf_msix_count" with the VF:

  - /sys/.../<PF>/sriov_vf_total_msix    # on PF

  - /sys/.../<VF>/sriov_vf_msix_count    # on each VF

  - The "sriov_vf_msix_count" files enter via the VF.  Lock the VF to
    ensure no VF driver, lookup and lock the PF for serialization,
    call PF driver callback to assign vectors.

  - Can be done with static sysfs attributes.  This means creating
    "sriov_vf_msix_count" *always*, even if PF driver doesn't support
    vector assignment.

IIUC, putting "vf_msix_count_*" under the PF involves a race.  When we
call device_add() for each new VF, it creates the VF sysfs directory
and emits the KOBJ_ADD uevent, but the "vf_msix_count_*" file doesn't
exist yet.  It can't be created before device_add() because the sysfs
directory doesn't exist.  If we create it after device_add(), the "add
VF" uevent has already been emitted, so userspace may consume it
before "vf_msix_count_*" is created.

  sriov_enable
    <set VF Enable>                     <-- VFs created on PCI
    sriov_add_vfs
      for (i = 0; i < num_vfs; i++) {
        pci_iov_add_virtfn
          pci_device_add
            device_initialize
            device_add
              device_add_attrs          <-- add VF sysfs attrs
              kobject_uevent(KOBJ_ADD)  <-- emit uevent
                                        <-- add "vf_msix_count_*" sysfs attr
          pci_iov_sysfs_link
          pci_bus_add_device
            pci_create_sysfs_dev_files
            device_attach
      }

Conceptually, I like having the "vf_total_msix" and "vf_msix_count_*"
files associated directly with the PF.  I think that's more natural
because they both operate directly on the PF.

But I don't like the race, and using static attributes seems much
cleaner implementation-wise.

Bjorn

  reply	other threads:[~2021-04-01  1:24 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01  7:55 [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-01  7:55 ` [PATCH mlx5-next v7 1/4] PCI: Add a sysfs file to change the MSI-X table size of SR-IOV VFs Leon Romanovsky
2021-03-01  8:14   ` Greg Kroah-Hartman
2021-03-01  8:32     ` Leon Romanovsky
2021-03-01  8:37       ` Greg Kroah-Hartman
2021-03-01  8:53         ` Leon Romanovsky
2021-03-01  7:55 ` [PATCH mlx5-next v7 2/4] net/mlx5: Add dynamic MSI-X capabilities bits Leon Romanovsky
2021-03-01  7:55 ` [PATCH mlx5-next v7 3/4] net/mlx5: Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-01  7:55 ` [PATCH mlx5-next v7 4/4] net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks Leon Romanovsky
2021-03-07  8:11 ` [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count Leon Romanovsky
2021-03-07 18:55 ` Alexander Duyck
2021-03-07 19:19   ` Leon Romanovsky
2021-03-08 16:33     ` Alexander Duyck
2021-03-08 19:20       ` Leon Romanovsky
2021-03-10 19:09   ` Bjorn Helgaas
2021-03-10 20:10     ` Leon Romanovsky
2021-03-10 20:21       ` Greg Kroah-Hartman
2021-03-11  8:37         ` Leon Romanovsky
2021-03-10 23:34     ` Alexander Duyck
2021-03-11 18:17       ` Bjorn Helgaas
2021-03-11 19:16         ` Keith Busch
2021-03-11 19:21           ` Leon Romanovsky
2021-03-11 20:22           ` Jason Gunthorpe
2021-03-11 20:50             ` Keith Busch
2021-03-11 21:44               ` Jason Gunthorpe
2021-03-25 17:21                 ` Bjorn Helgaas
2021-03-25 17:36                   ` Jason Gunthorpe
2021-03-25 18:20                     ` Bjorn Helgaas
2021-03-25 18:28                       ` Jason Gunthorpe
2021-03-26  6:44                         ` Leon Romanovsky
2021-03-26 16:00                           ` Alexander Duyck
2021-03-26 16:56                             ` Jason Gunthorpe
2021-03-26 17:08                             ` Bjorn Helgaas
2021-03-26 17:12                               ` Jason Gunthorpe
2021-03-27  6:00                                 ` Leon Romanovsky
2021-03-26 17:29                               ` Keith Busch
2021-03-26 17:31                                 ` Jason Gunthorpe
2021-03-26 18:50                               ` Alexander Duyck
2021-03-26 19:01                                 ` Jason Gunthorpe
2021-03-30  1:29                                   ` Bjorn Helgaas
2021-03-30 13:57                                     ` Jason Gunthorpe
2021-03-30 15:00                                       ` Bjorn Helgaas
2021-03-30 19:47                                         ` Jason Gunthorpe
2021-03-30 20:41                                           ` Bjorn Helgaas
2021-03-30 22:43                                             ` Jason Gunthorpe
2021-03-31  6:38                                               ` Greg Kroah-Hartman
2021-03-31 12:19                                                 ` Jason Gunthorpe
2021-03-31 15:03                                                   ` Greg Kroah-Hartman
2021-03-31 17:07                                                     ` Jason Gunthorpe
2021-03-31  4:08                                             ` Leon Romanovsky
2021-04-01  1:23                                               ` Bjorn Helgaas [this message]
2021-04-01 11:49                                                 ` Leon Romanovsky
2021-03-30 18:10                                     ` Keith Busch
2021-03-26 19:36                                 ` Bjorn Helgaas
2021-03-27 12:38                                   ` Greg Kroah-Hartman
2021-03-25 18:31                     ` Keith Busch
2021-03-25 18:36                       ` Jason Gunthorpe
2021-03-11 19:17         ` Leon Romanovsky
2021-03-11 19:37         ` Alexander Duyck
2021-03-11 19:51           ` Leon Romanovsky
2021-03-11 20:11             ` Alexander Duyck
2021-03-11 20:19           ` Jason Gunthorpe
2021-03-11 21:49             ` Alexander Duyck
2021-03-11 23:20               ` Jason Gunthorpe
2021-03-12  2:53                 ` Alexander Duyck
2021-03-12  6:32                   ` Leon Romanovsky
2021-03-12 16:59                     ` Alexander Duyck
2021-03-12 17:03                       ` Jason Gunthorpe
2021-03-12 18:34                         ` Leon Romanovsky
2021-03-12 18:41                       ` Leon Romanovsky
2021-03-12 13:00                   ` Jason Gunthorpe
2021-03-12 13:36                     ` Keith Busch
2021-03-11 20:31         ` Jason Gunthorpe
2021-03-10  5:58 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210401012340.GA1423690@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=ddutile@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jgg@ziepe.ca \
    --cc=kbusch@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).