linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
@ 2022-08-17  7:43 Zhoujian (jay)
  2022-08-17 19:49 ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Zhoujian (jay) @ 2022-08-17  7:43 UTC (permalink / raw)
  To: linux-pci, linux-kernel, bhelgaas; +Cc: Gonglei (Arei), zhuangshengen

Hi,

Enable SRIOV concurrently with many different PFs in userspace seems workable.
I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some warnings,
here is the backtrace:

Warning 1:
---
sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
Call Trace:
 dump_stack+0x6f/0xab
 sysfs_warn_dup+0x56/0x70
 sysfs_create_dir_ns+0x80/0x90
 kobject_add_internal+0xa0/0x2b0
 kobject_add+0x71/0xd0
 device_add+0x126/0x630
 pci_add_new_bus+0x17c/0x4b0
 pci_iov_add_virtfn+0x336/0x390
 sriov_enable+0x26e/0x450
 virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
---
The reason is that different VFs may create the same pci bus number
and try to add new bus concurrently in virtfn_add_bus.

Warning 2:
---
proc_dir_entry 'pci/33' already registered
WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360 proc_register+0xf8/0x130
Call Trace:
 proc_mkdir_data+0x5d/0x80
 pci_proc_attach_device+0xe9/0x120
 pci_bus_add_device+0x33/0x90
 pci_iov_add_virtfn+0x375/0x390
 sriov_enable+0x26e/0x450
 virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
---
The reason is that different VFs may create '/proc/bus/pci/bus_number'
directory using the same bus number in pci_proc_attach_device concurrently.

Mutex lock can avoid potential conflict. With the patch below the warnings above
are no longer appear.

My question is that any plan to support enable PCI SRIOV concurrently in kernel side?

Thanks

---
drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 952217572113..6a8a849298c4 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -16,6 +16,12 @@
 
 #define VIRTFN_ID_LEN    16
 
+static struct mutex add_bus_mutex;
+static int add_bus_mutex_initialized;
+
+static struct mutex add_device_mutex;
+static int add_device_mutex_initialized;
+
 int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
 {
     if (!dev->is_physfn)
@@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
     if (bus->number == busnr)
         return bus;
 
+    if (!add_bus_mutex_initialized) {
+        mutex_init(&add_bus_mutex);
+        add_bus_mutex_initialized = 1;
+    }
+    mutex_lock(&add_bus_mutex);
+
     child = pci_find_bus(pci_domain_nr(bus), busnr);
-    if (child)
+    if (child) {
+        mutex_unlock(&add_bus_mutex);
         return child;
+    }
 
     child = pci_add_new_bus(bus, NULL, busnr);
-    if (!child)
+    if (!child) {
+        mutex_unlock(&add_bus_mutex);
         return NULL;
+    }
+    mutex_unlock(&add_bus_mutex);
 
     pci_bus_insert_busn_res(child, busnr, busnr);
 
@@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
     if (rc)
         goto failed1;
 
+    if (!add_device_mutex_initialized) {
+        mutex_init(&add_device_mutex);
+        add_device_mutex_initialized = 1;
+    }
+    mutex_lock(&add_device_mutex);
+
     pci_bus_add_device(virtfn);
 
+    mutex_unlock(&add_device_mutex);
+
     return 0;
 
 failed1:
---

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
  2022-08-17  7:43 [Question] Any plan to support enable PCI SRIOV concurrently in kernel side? Zhoujian (jay)
@ 2022-08-17 19:49 ` Bjorn Helgaas
  2022-08-21 11:25   ` 答复: " zhuangshengen
  2022-08-27  8:01   ` Zhoujian (jay)
  0 siblings, 2 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2022-08-17 19:49 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: linux-pci, linux-kernel, bhelgaas, Gonglei (Arei), zhuangshengen

On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> Hi,
> 
> Enable SRIOV concurrently with many different PFs in userspace seems workable.
> I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some warnings,
> here is the backtrace:

This definitely seems like a problem that should be fixed.  If you
have a script that can reproduce it, that might help people work on
it.  If you can reproduce it in qemu, that would be even better.

Some comments on the patch below.

> Warning 1:
> ---
> sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> Call Trace:
>  dump_stack+0x6f/0xab
>  sysfs_warn_dup+0x56/0x70
>  sysfs_create_dir_ns+0x80/0x90
>  kobject_add_internal+0xa0/0x2b0
>  kobject_add+0x71/0xd0
>  device_add+0x126/0x630
>  pci_add_new_bus+0x17c/0x4b0
>  pci_iov_add_virtfn+0x336/0x390
>  sriov_enable+0x26e/0x450
>  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> ---
> The reason is that different VFs may create the same pci bus number
> and try to add new bus concurrently in virtfn_add_bus.
> 
> Warning 2:
> ---
> proc_dir_entry 'pci/33' already registered
> WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360 proc_register+0xf8/0x130
> Call Trace:
>  proc_mkdir_data+0x5d/0x80
>  pci_proc_attach_device+0xe9/0x120
>  pci_bus_add_device+0x33/0x90
>  pci_iov_add_virtfn+0x375/0x390
>  sriov_enable+0x26e/0x450
>  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> ---
> The reason is that different VFs may create '/proc/bus/pci/bus_number'
> directory using the same bus number in pci_proc_attach_device concurrently.
> 
> Mutex lock can avoid potential conflict. With the patch below the warnings above
> are no longer appear.
> 
> My question is that any plan to support enable PCI SRIOV concurrently in kernel side?
> 
> Thanks
> 
> ---
> drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 952217572113..6a8a849298c4 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -16,6 +16,12 @@
>  
>  #define VIRTFN_ID_LEN    16
>  
> +static struct mutex add_bus_mutex;
> +static int add_bus_mutex_initialized;
> +
> +static struct mutex add_device_mutex;
> +static int add_device_mutex_initialized;
> +
>  int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
>  {
>      if (!dev->is_physfn)
> @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
>      if (bus->number == busnr)
>          return bus;
>  
> +    if (!add_bus_mutex_initialized) {
> +        mutex_init(&add_bus_mutex);
> +        add_bus_mutex_initialized = 1;
> +    }

I assume this patch works around the warning.  I see the intent here,
but I think would need some rework before merging it.  These locks
protect pci_add_new_bus() and pci_bus_add_device(), but only for the
callers in iov.c.  These interfaces are both called from places other
than iov.c, and any mutual exclusion should cover all of them.

I'm actually not sure how the other callers are protected.  I assume
we're holding a device_lock for some device farther up the chain.  Or,
I see that acpi_pci_root_add() and rescan_store() hold
pci_rescan_remove_lock while calling these.  We don't seem to hold
that uniformly though, which bothers me, because I think there are
many other paths, e.g., pci_host_probe() and its callers.

> +    mutex_lock(&add_bus_mutex);
> +
>      child = pci_find_bus(pci_domain_nr(bus), busnr);
> -    if (child)
> +    if (child) {
> +        mutex_unlock(&add_bus_mutex);
>          return child;
> +    }
>  
>      child = pci_add_new_bus(bus, NULL, busnr);
> -    if (!child)
> +    if (!child) {
> +        mutex_unlock(&add_bus_mutex);
>          return NULL;
> +    }
> +    mutex_unlock(&add_bus_mutex);
>  
>      pci_bus_insert_busn_res(child, busnr, busnr);
>  
> @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
>      if (rc)
>          goto failed1;
>  
> +    if (!add_device_mutex_initialized) {
> +        mutex_init(&add_device_mutex);
> +        add_device_mutex_initialized = 1;
> +    }
> +    mutex_lock(&add_device_mutex);
> +
>      pci_bus_add_device(virtfn);
>  
> +    mutex_unlock(&add_device_mutex);
> +
>      return 0;
>  
>  failed1:
> ---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 答复: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
  2022-08-17 19:49 ` Bjorn Helgaas
@ 2022-08-21 11:25   ` zhuangshengen
  2022-09-23 21:26     ` Bjorn Helgaas
  2022-08-27  8:01   ` Zhoujian (jay)
  1 sibling, 1 reply; 6+ messages in thread
From: zhuangshengen @ 2022-08-21 11:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, linux-kernel
  Cc: Zhoujian (jay), Gonglei (Arei), Wangjing(Hogan)

[-- Attachment #1: Type: text/plain, Size: 9327 bytes --]

> On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> > Hi,
> > 
> > Enable SRIOV concurrently with many different PFs in userspace seems workable.
> > I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some 
> > warnings, here is the backtrace:
> 
> This definitely seems like a problem that should be fixed.  If you have a script that can reproduce it, that might help people work on it.  If you can reproduce it in qemu, that would be even better.
>

  I am enclosing a demo that will echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs concurrently, which can help reproduce the problem, please find attached.

> Some comments on the patch below.
> 
> > Warning 1:
> > ---
> > sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> > Call Trace:
> >  dump_stack+0x6f/0xab
> >  sysfs_warn_dup+0x56/0x70
> >  sysfs_create_dir_ns+0x80/0x90
> >  kobject_add_internal+0xa0/0x2b0
> >  kobject_add+0x71/0xd0
> >  device_add+0x126/0x630
> >  pci_add_new_bus+0x17c/0x4b0
> >  pci_iov_add_virtfn+0x336/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create the same pci bus number 
> > and try to add new bus concurrently in virtfn_add_bus.
> > 
> > Warning 2:
> > ---
> > proc_dir_entry 'pci/33' already registered
> > WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360 
> > proc_register+0xf8/0x130 Call Trace:
> >  proc_mkdir_data+0x5d/0x80
> >  pci_proc_attach_device+0xe9/0x120
> >  pci_bus_add_device+0x33/0x90
> >  pci_iov_add_virtfn+0x375/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create '/proc/bus/pci/bus_number'
> > directory using the same bus number in pci_proc_attach_device concurrently.
> > 
> > Mutex lock can avoid potential conflict. With the patch below the 
> > warnings above are no longer appear.
> > 
> > My question is that any plan to support enable PCI SRIOV concurrently in kernel side?
> > 
> > Thanks
> > 
> > ---
> > drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
> >  1 file changed, 27 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> > 952217572113..6a8a849298c4 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -16,6 +16,12 @@
> >  
> >  #define VIRTFN_ID_LEN    16
> >  
> > +static struct mutex add_bus_mutex;
> > +static int add_bus_mutex_initialized;
> > +
> > +static struct mutex add_device_mutex; static int 
> > +add_device_mutex_initialized;
> > +
> >  int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)  {
> >      if (!dev->is_physfn)
> > @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
> >      if (bus->number == busnr)
> >          return bus;
> >  
> > +    if (!add_bus_mutex_initialized) {
> > +        mutex_init(&add_bus_mutex);
> > +        add_bus_mutex_initialized = 1;
> > +    }
> 
> I assume this patch works around the warning.  I see the intent here, but I think would need some rework before merging it.  These locks protect pci_add_new_bus() and pci_bus_add_device(), but only for the callers in iov.c.  These interfaces are both called from places other than iov.c, and any mutual exclusion should cover all of them.
> 
> I'm actually not sure how the other callers are protected.  I assume we're holding a device_lock for some device farther up the chain.  Or, I see that acpi_pci_root_add() and rescan_store() hold pci_rescan_remove_lock while calling these.  We don't seem to hold that uniformly though, which bothers me, because I think there are many other paths, e.g., pci_host_probe() and its callers.
> 
> > +    mutex_lock(&add_bus_mutex);
> > +
> >      child = pci_find_bus(pci_domain_nr(bus), busnr);
> > -    if (child)
> > +    if (child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return child;
> > +    }
> >  
> >      child = pci_add_new_bus(bus, NULL, busnr);
> > -    if (!child)
> > +    if (!child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return NULL;
> > +    }
> > +    mutex_unlock(&add_bus_mutex);
> >  
> >      pci_bus_insert_busn_res(child, busnr, busnr);
> >  
> > @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> >      if (rc)
> >          goto failed1;
> >  
> > +    if (!add_device_mutex_initialized) {
> > +        mutex_init(&add_device_mutex);
> > +        add_device_mutex_initialized = 1;
> > +    }
> > +    mutex_lock(&add_device_mutex);
> > +
> >      pci_bus_add_device(virtfn);
> >  
> > +    mutex_unlock(&add_device_mutex);
> > +
> >      return 0;
> >  
> >  failed1:
> > ---

I write a new patch can fix the problem above. I add a new function pci_add_new_bus_locked which will do find bus and add new bus with mutex locked. at most case in virtfn_add_bus, vf will find exiting bus and return, this is a fast path and no need mutex protect; if bus is not exiting, and different vf in a race will add the same bus, they should call pci_add_new_bus_locked, this is the slower patch but safe; 

I alse add a device_lock in pci_proc_attach_device when create bus->procdir, this will fix the conflict when different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number concurrently

---
 drivers/pci/iov.c   |  7 ++++++-
 drivers/pci/probe.c | 26 ++++++++++++++++++++++++++
 drivers/pci/proc.c  |  6 +++++-
 include/linux/pci.h |  2 ++
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 952217572113..cde0155749c5 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -127,11 +127,16 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 	if (bus->number == busnr)
 		return bus;
 
+	/*
+	 * here vf will find existing bus at most case; if not existing, it should
+	 * go through slow path to create new bus with locked to support enable SRIOV
+	 * concurrently with many different PFs in userspace.
+	 */
 	child = pci_find_bus(pci_domain_nr(bus), busnr);
 	if (child)
 		return child;
 
-	child = pci_add_new_bus(bus, NULL, busnr);
+	child = pci_add_new_bus_locked(bus, NULL, busnr);
 	if (!child)
 		return NULL;
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c5286b027f00..5dc2a6774fa9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -25,6 +25,8 @@
 #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
 #define CARDBUS_RESERVE_BUSNR	3
 
+DEFINE_MUTEX(add_bus_mutex);
+
 static struct resource busn_resource = {
 	.name	= "PCI busn",
 	.start	= 0,
@@ -1170,6 +1172,30 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 }
 EXPORT_SYMBOL(pci_add_new_bus);
 
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
+                                int busnr)
+{
+        struct pci_bus *child;
+
+        mutex_lock(&add_bus_mutex);
+        child = pci_find_bus(pci_domain_nr(parent), busnr);
+        if (child) {
+                mutex_unlock(&add_bus_mutex);
+                return child;
+        }
+
+        child = pci_alloc_child_bus(parent, dev, busnr);
+        if (child) {
+                down_write(&pci_bus_sem);
+                list_add_tail(&child->node, &parent->children);
+                up_write(&pci_bus_sem);
+        }
+        mutex_unlock(&add_bus_mutex);
+
+        return child;
+}
+EXPORT_SYMBOL(pci_add_new_bus_locked);
+
 static void pci_enable_crs(struct pci_dev *pdev)
 {
 	u16 root_cap = 0;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index f967709082d6..f927263c2fe0 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -421,6 +421,7 @@ int pci_proc_attach_device(struct pci_dev *dev)
 	if (!proc_initialized)
 		return -EACCES;
 
+	device_lock(&bus->dev);
 	if (!bus->procdir) {
 		if (pci_proc_domain(bus)) {
 			sprintf(name, "%04x:%02x", pci_domain_nr(bus),
@@ -429,9 +430,12 @@ int pci_proc_attach_device(struct pci_dev *dev)
 			sprintf(name, "%02x", bus->number);
 		}
 		bus->procdir = proc_mkdir(name, proc_bus_pci_dir);
-		if (!bus->procdir)
+		if (!bus->procdir) {
+			device_unlock(&bus->dev);
 			return -ENOMEM;
+		}
 	}
+	device_unlock(&bus->dev);
 
 	sprintf(name, "%02x.%x", PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
 	e = proc_create_data(name, S_IFREG | S_IRUGO | S_IWUSR, bus->procdir,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 060af91bafcd..ec5b68b4c63d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1112,6 +1112,8 @@ struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
 int pci_scan_root_bus_bridge(struct pci_host_bridge *bridge);
 struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 				int busnr);
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
++                                int busnr);
 struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 				 const char *name,
 				 struct hotplug_slot *hotplug);
--

[-- Attachment #2: enable_sriov_demo.c --]
[-- Type: text/plain, Size: 4386 bytes --]

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <dirent.h>
#include <pthread.h>
#include <sys/stat.h>

#include <fcntl.h>
#include <string.h>

#define MAX_PATH_SIZE  1024
#define MAX_PF_NUM     20
#define VF_PATH   64
#define VF_LEN    10
#define DEVICE_DIR     "/sys/bus/pci/devices"
#define STREQ(a, b) (strcmp(a, b) == 0)

typedef struct SriovEnableThread {
    pthread_t thread;
    char bdf[VF_PATH];
    char totalVfs[VF_LEN];
} SriovEnableThread;

static int FileWriteString(const char *path, const char *str)
{
    int fd = open(path, O_WRONLY | O_TRUNC);
    if (fd == -1) {
        return -1;
    }

    if (write(fd, str, strlen(str)) < 0) {
        close(fd);
        return -1;
    }

    if (close(fd) != 0) {
        return -1;
    }

    return 0;
}

static int FileReadString(const char *file, char *buf, int len)
{
    int fd;
    ssize_t sz;

    fd = open(file, O_RDONLY);
    if (fd < 0) {
        return -1;
    }

    sz = read(fd, buf, len);
    close(fd);
    if (sz < 0) {
        return -1;
    }

    if (sz == len) {
        printf("String len '%d' is too short to save %s\n", len, file);
        buf[len - 1] = '\0';
        return -1;
    }

    buf[sz] = '\0';
    return sz;
}

static void *DeviceSriovEnable(void *arg)
{
    char numVfsPath[VF_PATH] = { 0 };
    SriovEnableThread *sriovThread = (SriovEnableThread *)arg;
    const char *bdf = sriovThread->bdf;
    const char *totalVfs = sriovThread->totalVfs;

    printf("echo %s > /sys/bus/pci/devices/%s/sriov_numvfs\n", totalVfs, bdf);

    if (snprintf(numVfsPath, sizeof(numVfsPath) - 1,
                   "%s/%s/sriov_numvfs", DEVICE_DIR, bdf) < 0) {
        printf("Get device sriov_numvfs fail\n");
        pthread_exit(NULL);
    }

    if (FileWriteString(numVfsPath, totalVfs)) {
        printf("enable '%s' sriov fail\n", bdf);
        pthread_exit(NULL);
    }
    pthread_exit(NULL);
}

static int DeviceCheckSriovEnable(const char *bdf)
{
    char path[VF_PATH] = { 0 };
    int ret;

    ret = snprintf(path, sizeof(path) - 1,
                     "%s/%s/sriov_totalvfs", DEVICE_DIR, bdf);
    if (ret < 0) {
        return ret;
    }

    if (access(path, R_OK) != 0) {
        return 0;
    }

    return 1;
}

static int DeviceReadValue(const char *bdf, const char *pattern, char *buffer, size_t len)
{
    char path[VF_PATH] = {0};
    int ret;

    ret = snprintf(path, sizeof(path) - 1, "%s/%s/%s", DEVICE_DIR,
                     bdf, pattern);
    if (ret < 0) {
        printf("Set devicePath fail\n");
        return ret;
    }

    if (FileReadString(path, buffer, len) < 0) {
        printf("Read data from devicePath %s fail\n", path);
        return -1;
    }

    return 0;
}

void main(int argc, char **argv)
{
    int ret;
    DIR* dirp;
    int i = 0;
    struct dirent *dp = NULL;
    char buffer[VF_LEN] = { 0 };
    int sriovThreadLen = sizeof(SriovEnableThread) * MAX_PF_NUM;

    dirp = opendir(DEVICE_DIR);
    if (!dirp) {
        printf("Can not find %s\n", DEVICE_DIR);
        return;
    }

    SriovEnableThread *sriovThread = (SriovEnableThread *)malloc(sriovThreadLen);
    if (!sriovThread) {
        printf("Can not alloc sriovThread\n");
        closedir(dirp);
        return;
    }
    memset(sriovThread, 0, sriovThreadLen);

    while ((dp = readdir(dirp)) != NULL) {
        if (STREQ(dp->d_name, ".") || STREQ(dp->d_name, "..")) {
            continue;
        }

        if (DeviceCheckSriovEnable(dp->d_name) <= 0) {
            continue;
        }

        if (DeviceReadValue(dp->d_name, "sriov_totalvfs", buffer, sizeof(buffer)) < 0) {
            continue;
        }

        if (i >= MAX_PF_NUM) {
            printf("pf num is exceed max %d\n", MAX_PF_NUM);
            break;
        }

        strcpy(sriovThread[i].bdf, dp->d_name);
        strcpy(sriovThread[i].totalVfs, buffer);
        ret = pthread_create(&sriovThread[i].thread, NULL, DeviceSriovEnable,
                             (void *)&sriovThread[i]);
        if (ret) {
            printf("create sriov thread %d for %s failed, ret : %d\n", i, sriovThread[i].bdf, ret);
            break;
        }
        i++;
    }

    closedir(dirp);
    /* wait until all sriov enable thread finish */
    for (int j = 0; j < i; j++) {
        pthread_join(sriovThread[j].thread, NULL);
    }

    printf("sriov enable finish\n");
    free(sriovThread);
}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
  2022-08-17 19:49 ` Bjorn Helgaas
  2022-08-21 11:25   ` 答复: " zhuangshengen
@ 2022-08-27  8:01   ` Zhoujian (jay)
  1 sibling, 0 replies; 6+ messages in thread
From: Zhoujian (jay) @ 2022-08-27  8:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, bhelgaas, Gonglei (Arei),
	zhuangshengen, Wangjing(Hogan)



> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Thursday, August 18, 2022 3:50 AM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
> Gonglei (Arei) <arei.gonglei@huawei.com>; zhuangshengen
> <zhuangshengen@huawei.com>
> Subject: Re: [Question] Any plan to support enable PCI SRIOV concurrently in
> kernel side?
> 
> On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> > Hi,
> >
> > Enable SRIOV concurrently with many different PFs in userspace seems workable.
> > I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some
> > warnings, here is the backtrace:
> 
> This definitely seems like a problem that should be fixed.  If you have a script that
> can reproduce it, that might help people work on it.  If you can reproduce it in
> qemu, that would be even better.

The script has posted by zhuangshengen.

> 
> Some comments on the patch below.
> 
> > Warning 1:
> > ---
> > sysfs: cannot create duplicate filename
> '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> > Call Trace:
> >  dump_stack+0x6f/0xab
> >  sysfs_warn_dup+0x56/0x70
> >  sysfs_create_dir_ns+0x80/0x90
> >  kobject_add_internal+0xa0/0x2b0
> >  kobject_add+0x71/0xd0
> >  device_add+0x126/0x630
> >  pci_add_new_bus+0x17c/0x4b0
> >  pci_iov_add_virtfn+0x336/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create the same pci bus number
> > and try to add new bus concurrently in virtfn_add_bus.
> >
> > Warning 2:
> > ---
> > proc_dir_entry 'pci/33' already registered
> > WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360
> > proc_register+0xf8/0x130 Call Trace:
> >  proc_mkdir_data+0x5d/0x80
> >  pci_proc_attach_device+0xe9/0x120
> >  pci_bus_add_device+0x33/0x90
> >  pci_iov_add_virtfn+0x375/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create '/proc/bus/pci/bus_number'
> > directory using the same bus number in pci_proc_attach_device concurrently.
> >
> > Mutex lock can avoid potential conflict. With the patch below the
> > warnings above are no longer appear.
> >
> > My question is that any plan to support enable PCI SRIOV concurrently in kernel
> side?
> >
> > Thanks
> >
> > ---
> > drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
> >  1 file changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > 952217572113..6a8a849298c4 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -16,6 +16,12 @@
> >
> >  #define VIRTFN_ID_LEN    16
> >
> > +static struct mutex add_bus_mutex;
> > +static int add_bus_mutex_initialized;
> > +
> > +static struct mutex add_device_mutex; static int
> > +add_device_mutex_initialized;
> > +
> >  int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)  {
> >      if (!dev->is_physfn)
> > @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus
> *bus, int busnr)
> >      if (bus->number == busnr)
> >          return bus;
> >
> > +    if (!add_bus_mutex_initialized) {
> > +        mutex_init(&add_bus_mutex);
> > +        add_bus_mutex_initialized = 1;
> > +    }
> 
> I assume this patch works around the warning.  I see the intent here, but I think

Yes, it is. We have tested for a few weeks and no new problems have found.

> would need some rework before merging it.  These locks protect
> pci_add_new_bus() and pci_bus_add_device(), but only for the callers in iov.c.
> These interfaces are both called from places other than iov.c, and any mutual
> exclusion should cover all of them.

I agree.

> 
> I'm actually not sure how the other callers are protected.  I assume we're holding a
> device_lock for some device farther up the chain.  Or, I see that
> acpi_pci_root_add() and rescan_store() hold pci_rescan_remove_lock while calling
> these.  

The pci_rescan_remove_lock is added since 2014, see the commit below.

commit 9d16947b75831acd317ab9a53e0e94d160731d33
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Fri Jan 10 15:22:18 2014 +0100

    PCI: Add global pci_lock_rescan_remove()
    
    There are multiple PCI device addition and removal code paths that may be
    run concurrently with the generic PCI bus rescan and device removal that
    can be triggered via sysfs.  If that happens, it may lead to multiple
    different, potentially dangerous race conditions.
    
    The most straightforward way to address those problems is to run
    the code in question under the same lock that is used by the
    generic rescan/remove code in pci-sysfs.c.  To prepare for those
    changes, move the definition of the global PCI remove/rescan lock
    to probe.c and provide global wrappers, pci_lock_rescan_remove()
    and pci_unlock_rescan_remove(), allowing drivers to manipulate
    that lock.  Also provide pci_stop_and_remove_bus_device_locked()
    for the callers of pci_stop_and_remove_bus_device() who only need
    to hold the rescan/remove lock around it.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Its purpose is to avoid PCI device addition and removal happen concurrently.
IMHO it is suitable to use this lock to enable SRIOV here to avoid creating a
new lock.

> We don't seem to hold that uniformly though, which bothers me, because
> I think there are many other paths, e.g., pci_host_probe() and its callers.

I have checked the other code paths. Most of the callers have hold the
pci_rescan_remove_lock. Some did missing. In order to hold the lock
uniformly, it may need to add the pci_rescan_remove_lock to the missing
place one by one(if it exists the possibility of concurrency). So what do we
need to do now? (1) fix all of them (2) only add pci_rescan_remove_lock in iov.c
as a start.

Thanks

> 
> > +    mutex_lock(&add_bus_mutex);
> > +
> >      child = pci_find_bus(pci_domain_nr(bus), busnr);
> > -    if (child)
> > +    if (child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return child;
> > +    }
> >
> >      child = pci_add_new_bus(bus, NULL, busnr);
> > -    if (!child)
> > +    if (!child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return NULL;
> > +    }
> > +    mutex_unlock(&add_bus_mutex);
> >
> >      pci_bus_insert_busn_res(child, busnr, busnr);
> >
> > @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> >      if (rc)
> >          goto failed1;
> >
> > +    if (!add_device_mutex_initialized) {
> > +        mutex_init(&add_device_mutex);
> > +        add_device_mutex_initialized = 1;
> > +    }
> > +    mutex_lock(&add_device_mutex);
> > +
> >      pci_bus_add_device(virtfn);
> >
> > +    mutex_unlock(&add_device_mutex);
> > +
> >      return 0;
> >
> >  failed1:
> > ---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 答复: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
  2022-08-21 11:25   ` 答复: " zhuangshengen
@ 2022-09-23 21:26     ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2022-09-23 21:26 UTC (permalink / raw)
  To: zhuangshengen
  Cc: linux-pci, linux-kernel, Zhoujian (jay), Gonglei (Arei), Wangjing(Hogan)

On Sun, Aug 21, 2022 at 11:25:34AM +0000, zhuangshengen wrote:
> > On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> > > Enable SRIOV concurrently with many different PFs in userspace seems workable.
> > > I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some 
> > > warnings, here is the backtrace:
> > 
> > This definitely seems like a problem that should be fixed.  If you have a script that can reproduce it, that might help people work on it.  If you can reproduce it in qemu, that would be even better.
> >
> 
>   I am enclosing a demo that will echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs concurrently, which can help reproduce the problem, please find attached.

I see the patch below, but it needs to be tweaked into the usual
format for kernel patches.  See
https://www.kernel.org/doc/html/latest/process/submitting-patches.html
for more details.  You can also browse the mailing list at
https://lore.kernel.org/linux-pci/ to see how other patches have been
posted.

> > Some comments on the patch below.
> > 
> > > Warning 1:
> > > ---
> > > sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> > > Call Trace:
> > >  dump_stack+0x6f/0xab
> > >  sysfs_warn_dup+0x56/0x70
> > >  sysfs_create_dir_ns+0x80/0x90
> > >  kobject_add_internal+0xa0/0x2b0
> > >  kobject_add+0x71/0xd0
> > >  device_add+0x126/0x630
> > >  pci_add_new_bus+0x17c/0x4b0
> > >  pci_iov_add_virtfn+0x336/0x390
> > >  sriov_enable+0x26e/0x450
> > >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > > ---
> > > The reason is that different VFs may create the same pci bus number 
> > > and try to add new bus concurrently in virtfn_add_bus.
> > > 
> > > Warning 2:
> > > ---
> > > proc_dir_entry 'pci/33' already registered
> > > WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360 
> > > proc_register+0xf8/0x130 Call Trace:
> > >  proc_mkdir_data+0x5d/0x80
> > >  pci_proc_attach_device+0xe9/0x120
> > >  pci_bus_add_device+0x33/0x90
> > >  pci_iov_add_virtfn+0x375/0x390
> > >  sriov_enable+0x26e/0x450
> > >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > > ---
> > > The reason is that different VFs may create '/proc/bus/pci/bus_number'
> > > directory using the same bus number in pci_proc_attach_device concurrently.
> > > 
> > > Mutex lock can avoid potential conflict. With the patch below the 
> > > warnings above are no longer appear.
> > > 
> > > My question is that any plan to support enable PCI SRIOV concurrently in kernel side?
> > > 
> > > Thanks
> > > 
> > > ---
> > > drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
> > >  1 file changed, 27 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> > > 952217572113..6a8a849298c4 100644
> > > --- a/drivers/pci/iov.c
> > > +++ b/drivers/pci/iov.c
> > > @@ -16,6 +16,12 @@
> > >  
> > >  #define VIRTFN_ID_LEN    16
> > >  
> > > +static struct mutex add_bus_mutex;
> > > +static int add_bus_mutex_initialized;
> > > +
> > > +static struct mutex add_device_mutex; static int 
> > > +add_device_mutex_initialized;
> > > +
> > >  int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)  {
> > >      if (!dev->is_physfn)
> > > @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
> > >      if (bus->number == busnr)
> > >          return bus;
> > >  
> > > +    if (!add_bus_mutex_initialized) {
> > > +        mutex_init(&add_bus_mutex);
> > > +        add_bus_mutex_initialized = 1;
> > > +    }
> > 
> > I assume this patch works around the warning.  I see the intent here, but I think would need some rework before merging it.  These locks protect pci_add_new_bus() and pci_bus_add_device(), but only for the callers in iov.c.  These interfaces are both called from places other than iov.c, and any mutual exclusion should cover all of them.
> > 
> > I'm actually not sure how the other callers are protected.  I assume we're holding a device_lock for some device farther up the chain.  Or, I see that acpi_pci_root_add() and rescan_store() hold pci_rescan_remove_lock while calling these.  We don't seem to hold that uniformly though, which bothers me, because I think there are many other paths, e.g., pci_host_probe() and its callers.
> > 
> > > +    mutex_lock(&add_bus_mutex);
> > > +
> > >      child = pci_find_bus(pci_domain_nr(bus), busnr);
> > > -    if (child)
> > > +    if (child) {
> > > +        mutex_unlock(&add_bus_mutex);
> > >          return child;
> > > +    }
> > >  
> > >      child = pci_add_new_bus(bus, NULL, busnr);
> > > -    if (!child)
> > > +    if (!child) {
> > > +        mutex_unlock(&add_bus_mutex);
> > >          return NULL;
> > > +    }
> > > +    mutex_unlock(&add_bus_mutex);
> > >  
> > >      pci_bus_insert_busn_res(child, busnr, busnr);
> > >  
> > > @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> > >      if (rc)
> > >          goto failed1;
> > >  
> > > +    if (!add_device_mutex_initialized) {
> > > +        mutex_init(&add_device_mutex);
> > > +        add_device_mutex_initialized = 1;
> > > +    }
> > > +    mutex_lock(&add_device_mutex);
> > > +
> > >      pci_bus_add_device(virtfn);
> > >  
> > > +    mutex_unlock(&add_device_mutex);
> > > +
> > >      return 0;
> > >  
> > >  failed1:
> > > ---
> 
> I write a new patch can fix the problem above. I add a new function pci_add_new_bus_locked which will do find bus and add new bus with mutex locked. at most case in virtfn_add_bus, vf will find exiting bus and return, this is a fast path and no need mutex protect; if bus is not exiting, and different vf in a race will add the same bus, they should call pci_add_new_bus_locked, this is the slower patch but safe; 
> 
> I alse add a device_lock in pci_proc_attach_device when create bus->procdir, this will fix the conflict when different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number concurrently
> 
> ---
>  drivers/pci/iov.c   |  7 ++++++-
>  drivers/pci/probe.c | 26 ++++++++++++++++++++++++++
>  drivers/pci/proc.c  |  6 +++++-
>  include/linux/pci.h |  2 ++
>  4 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 952217572113..cde0155749c5 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -127,11 +127,16 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
>  	if (bus->number == busnr)
>  		return bus;
>  
> +	/*
> +	 * here vf will find existing bus at most case; if not existing, it should
> +	 * go through slow path to create new bus with locked to support enable SRIOV
> +	 * concurrently with many different PFs in userspace.
> +	 */
>  	child = pci_find_bus(pci_domain_nr(bus), busnr);
>  	if (child)
>  		return child;
>  
> -	child = pci_add_new_bus(bus, NULL, busnr);
> +	child = pci_add_new_bus_locked(bus, NULL, busnr);
>  	if (!child)
>  		return NULL;
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c5286b027f00..5dc2a6774fa9 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -25,6 +25,8 @@
>  #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
>  #define CARDBUS_RESERVE_BUSNR	3
>  
> +DEFINE_MUTEX(add_bus_mutex);
> +
>  static struct resource busn_resource = {
>  	.name	= "PCI busn",
>  	.start	= 0,
> @@ -1170,6 +1172,30 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>  }
>  EXPORT_SYMBOL(pci_add_new_bus);
>  
> +struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
> +                                int busnr)
> +{
> +        struct pci_bus *child;
> +
> +        mutex_lock(&add_bus_mutex);
> +        child = pci_find_bus(pci_domain_nr(parent), busnr);
> +        if (child) {
> +                mutex_unlock(&add_bus_mutex);
> +                return child;
> +        }
> +
> +        child = pci_alloc_child_bus(parent, dev, busnr);
> +        if (child) {
> +                down_write(&pci_bus_sem);
> +                list_add_tail(&child->node, &parent->children);
> +                up_write(&pci_bus_sem);
> +        }
> +        mutex_unlock(&add_bus_mutex);
> +
> +        return child;
> +}
> +EXPORT_SYMBOL(pci_add_new_bus_locked);
> +
>  static void pci_enable_crs(struct pci_dev *pdev)
>  {
>  	u16 root_cap = 0;
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index f967709082d6..f927263c2fe0 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -421,6 +421,7 @@ int pci_proc_attach_device(struct pci_dev *dev)
>  	if (!proc_initialized)
>  		return -EACCES;
>  
> +	device_lock(&bus->dev);
>  	if (!bus->procdir) {
>  		if (pci_proc_domain(bus)) {
>  			sprintf(name, "%04x:%02x", pci_domain_nr(bus),
> @@ -429,9 +430,12 @@ int pci_proc_attach_device(struct pci_dev *dev)
>  			sprintf(name, "%02x", bus->number);
>  		}
>  		bus->procdir = proc_mkdir(name, proc_bus_pci_dir);
> -		if (!bus->procdir)
> +		if (!bus->procdir) {
> +			device_unlock(&bus->dev);
>  			return -ENOMEM;
> +		}
>  	}
> +	device_unlock(&bus->dev);
>  
>  	sprintf(name, "%02x.%x", PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
>  	e = proc_create_data(name, S_IFREG | S_IRUGO | S_IWUSR, bus->procdir,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 060af91bafcd..ec5b68b4c63d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1112,6 +1112,8 @@ struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
>  int pci_scan_root_bus_bridge(struct pci_host_bridge *bridge);
>  struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>  				int busnr);
> +struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
> ++                                int busnr);
>  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>  				 const char *name,
>  				 struct hotplug_slot *hotplug);
> --

> #include <stdio.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <dirent.h>
> #include <pthread.h>
> #include <sys/stat.h>
> 
> #include <fcntl.h>
> #include <string.h>
> 
> #define MAX_PATH_SIZE  1024
> #define MAX_PF_NUM     20
> #define VF_PATH   64
> #define VF_LEN    10
> #define DEVICE_DIR     "/sys/bus/pci/devices"
> #define STREQ(a, b) (strcmp(a, b) == 0)
> 
> typedef struct SriovEnableThread {
>     pthread_t thread;
>     char bdf[VF_PATH];
>     char totalVfs[VF_LEN];
> } SriovEnableThread;
> 
> static int FileWriteString(const char *path, const char *str)
> {
>     int fd = open(path, O_WRONLY | O_TRUNC);
>     if (fd == -1) {
>         return -1;
>     }
> 
>     if (write(fd, str, strlen(str)) < 0) {
>         close(fd);
>         return -1;
>     }
> 
>     if (close(fd) != 0) {
>         return -1;
>     }
> 
>     return 0;
> }
> 
> static int FileReadString(const char *file, char *buf, int len)
> {
>     int fd;
>     ssize_t sz;
> 
>     fd = open(file, O_RDONLY);
>     if (fd < 0) {
>         return -1;
>     }
> 
>     sz = read(fd, buf, len);
>     close(fd);
>     if (sz < 0) {
>         return -1;
>     }
> 
>     if (sz == len) {
>         printf("String len '%d' is too short to save %s\n", len, file);
>         buf[len - 1] = '\0';
>         return -1;
>     }
> 
>     buf[sz] = '\0';
>     return sz;
> }
> 
> static void *DeviceSriovEnable(void *arg)
> {
>     char numVfsPath[VF_PATH] = { 0 };
>     SriovEnableThread *sriovThread = (SriovEnableThread *)arg;
>     const char *bdf = sriovThread->bdf;
>     const char *totalVfs = sriovThread->totalVfs;
> 
>     printf("echo %s > /sys/bus/pci/devices/%s/sriov_numvfs\n", totalVfs, bdf);
> 
>     if (snprintf(numVfsPath, sizeof(numVfsPath) - 1,
>                    "%s/%s/sriov_numvfs", DEVICE_DIR, bdf) < 0) {
>         printf("Get device sriov_numvfs fail\n");
>         pthread_exit(NULL);
>     }
> 
>     if (FileWriteString(numVfsPath, totalVfs)) {
>         printf("enable '%s' sriov fail\n", bdf);
>         pthread_exit(NULL);
>     }
>     pthread_exit(NULL);
> }
> 
> static int DeviceCheckSriovEnable(const char *bdf)
> {
>     char path[VF_PATH] = { 0 };
>     int ret;
> 
>     ret = snprintf(path, sizeof(path) - 1,
>                      "%s/%s/sriov_totalvfs", DEVICE_DIR, bdf);
>     if (ret < 0) {
>         return ret;
>     }
> 
>     if (access(path, R_OK) != 0) {
>         return 0;
>     }
> 
>     return 1;
> }
> 
> static int DeviceReadValue(const char *bdf, const char *pattern, char *buffer, size_t len)
> {
>     char path[VF_PATH] = {0};
>     int ret;
> 
>     ret = snprintf(path, sizeof(path) - 1, "%s/%s/%s", DEVICE_DIR,
>                      bdf, pattern);
>     if (ret < 0) {
>         printf("Set devicePath fail\n");
>         return ret;
>     }
> 
>     if (FileReadString(path, buffer, len) < 0) {
>         printf("Read data from devicePath %s fail\n", path);
>         return -1;
>     }
> 
>     return 0;
> }
> 
> void main(int argc, char **argv)
> {
>     int ret;
>     DIR* dirp;
>     int i = 0;
>     struct dirent *dp = NULL;
>     char buffer[VF_LEN] = { 0 };
>     int sriovThreadLen = sizeof(SriovEnableThread) * MAX_PF_NUM;
> 
>     dirp = opendir(DEVICE_DIR);
>     if (!dirp) {
>         printf("Can not find %s\n", DEVICE_DIR);
>         return;
>     }
> 
>     SriovEnableThread *sriovThread = (SriovEnableThread *)malloc(sriovThreadLen);
>     if (!sriovThread) {
>         printf("Can not alloc sriovThread\n");
>         closedir(dirp);
>         return;
>     }
>     memset(sriovThread, 0, sriovThreadLen);
> 
>     while ((dp = readdir(dirp)) != NULL) {
>         if (STREQ(dp->d_name, ".") || STREQ(dp->d_name, "..")) {
>             continue;
>         }
> 
>         if (DeviceCheckSriovEnable(dp->d_name) <= 0) {
>             continue;
>         }
> 
>         if (DeviceReadValue(dp->d_name, "sriov_totalvfs", buffer, sizeof(buffer)) < 0) {
>             continue;
>         }
> 
>         if (i >= MAX_PF_NUM) {
>             printf("pf num is exceed max %d\n", MAX_PF_NUM);
>             break;
>         }
> 
>         strcpy(sriovThread[i].bdf, dp->d_name);
>         strcpy(sriovThread[i].totalVfs, buffer);
>         ret = pthread_create(&sriovThread[i].thread, NULL, DeviceSriovEnable,
>                              (void *)&sriovThread[i]);
>         if (ret) {
>             printf("create sriov thread %d for %s failed, ret : %d\n", i, sriovThread[i].bdf, ret);
>             break;
>         }
>         i++;
>     }
> 
>     closedir(dirp);
>     /* wait until all sriov enable thread finish */
>     for (int j = 0; j < i; j++) {
>         pthread_join(sriovThread[j].thread, NULL);
>     }
> 
>     printf("sriov enable finish\n");
>     free(sriovThread);
> }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
@ 2022-08-21 11:35 zhuangshengen
  0 siblings, 0 replies; 6+ messages in thread
From: zhuangshengen @ 2022-08-21 11:35 UTC (permalink / raw)
  To: bhelgaas, linux-pci, linux-kernel
  Cc: Gonglei (Arei), Zhoujian (jay), Wangjing(Hogan)

[-- Attachment #1: Type: text/plain, Size: 9327 bytes --]

> On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> > Hi,
> > 
> > Enable SRIOV concurrently with many different PFs in userspace seems workable.
> > I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some 
> > warnings, here is the backtrace:
> 
> This definitely seems like a problem that should be fixed.  If you have a script that can reproduce it, that might help people work on it.  If you can reproduce it in qemu, that would be even better.
>

  I am enclosing a demo that will echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs concurrently, which can help reproduce the problem, please find attached.

> Some comments on the patch below.
> 
> > Warning 1:
> > ---
> > sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> > Call Trace:
> >  dump_stack+0x6f/0xab
> >  sysfs_warn_dup+0x56/0x70
> >  sysfs_create_dir_ns+0x80/0x90
> >  kobject_add_internal+0xa0/0x2b0
> >  kobject_add+0x71/0xd0
> >  device_add+0x126/0x630
> >  pci_add_new_bus+0x17c/0x4b0
> >  pci_iov_add_virtfn+0x336/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create the same pci bus number 
> > and try to add new bus concurrently in virtfn_add_bus.
> > 
> > Warning 2:
> > ---
> > proc_dir_entry 'pci/33' already registered
> > WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360 
> > proc_register+0xf8/0x130 Call Trace:
> >  proc_mkdir_data+0x5d/0x80
> >  pci_proc_attach_device+0xe9/0x120
> >  pci_bus_add_device+0x33/0x90
> >  pci_iov_add_virtfn+0x375/0x390
> >  sriov_enable+0x26e/0x450
> >  virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create '/proc/bus/pci/bus_number'
> > directory using the same bus number in pci_proc_attach_device concurrently.
> > 
> > Mutex lock can avoid potential conflict. With the patch below the 
> > warnings above are no longer appear.
> > 
> > My question is that any plan to support enable PCI SRIOV concurrently in kernel side?
> > 
> > Thanks
> > 
> > ---
> > drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
> >  1 file changed, 27 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> > 952217572113..6a8a849298c4 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -16,6 +16,12 @@
> >  
> >  #define VIRTFN_ID_LEN    16
> >  
> > +static struct mutex add_bus_mutex;
> > +static int add_bus_mutex_initialized;
> > +
> > +static struct mutex add_device_mutex; static int 
> > +add_device_mutex_initialized;
> > +
> >  int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)  {
> >      if (!dev->is_physfn)
> > @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
> >      if (bus->number == busnr)
> >          return bus;
> >  
> > +    if (!add_bus_mutex_initialized) {
> > +        mutex_init(&add_bus_mutex);
> > +        add_bus_mutex_initialized = 1;
> > +    }
> 
> I assume this patch works around the warning.  I see the intent here, but I think would need some rework before merging it.  These locks protect pci_add_new_bus() and pci_bus_add_device(), but only for the callers in iov.c.  These interfaces are both called from places other than iov.c, and any mutual exclusion should cover all of them.
> 
> I'm actually not sure how the other callers are protected.  I assume we're holding a device_lock for some device farther up the chain.  Or, I see that acpi_pci_root_add() and rescan_store() hold pci_rescan_remove_lock while calling these.  We don't seem to hold that uniformly though, which bothers me, because I think there are many other paths, e.g., pci_host_probe() and its callers.
> 
> > +    mutex_lock(&add_bus_mutex);
> > +
> >      child = pci_find_bus(pci_domain_nr(bus), busnr);
> > -    if (child)
> > +    if (child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return child;
> > +    }
> >  
> >      child = pci_add_new_bus(bus, NULL, busnr);
> > -    if (!child)
> > +    if (!child) {
> > +        mutex_unlock(&add_bus_mutex);
> >          return NULL;
> > +    }
> > +    mutex_unlock(&add_bus_mutex);
> >  
> >      pci_bus_insert_busn_res(child, busnr, busnr);
> >  
> > @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> >      if (rc)
> >          goto failed1;
> >  
> > +    if (!add_device_mutex_initialized) {
> > +        mutex_init(&add_device_mutex);
> > +        add_device_mutex_initialized = 1;
> > +    }
> > +    mutex_lock(&add_device_mutex);
> > +
> >      pci_bus_add_device(virtfn);
> >  
> > +    mutex_unlock(&add_device_mutex);
> > +
> >      return 0;
> >  
> >  failed1:
> > ---

I write a new patch can fix the problem above. I add a new function pci_add_new_bus_locked which will do find bus and add new bus with mutex locked. at most case in virtfn_add_bus, vf will find exiting bus and return, this is a fast path and no need mutex protect; if bus is not exiting, and different vf in a race will add the same bus, they should call pci_add_new_bus_locked, this is the slower patch but safe; 

I alse add a device_lock in pci_proc_attach_device when create bus->procdir, this will fix the conflict when different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number concurrently

---
 drivers/pci/iov.c   |  7 ++++++-
 drivers/pci/probe.c | 26 ++++++++++++++++++++++++++
 drivers/pci/proc.c  |  6 +++++-
 include/linux/pci.h |  2 ++
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 952217572113..cde0155749c5 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -127,11 +127,16 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 	if (bus->number == busnr)
 		return bus;
 
+	/*
+	 * here vf will find existing bus at most case; if not existing, it should
+	 * go through slow path to create new bus with locked to support enable SRIOV
+	 * concurrently with many different PFs in userspace.
+	 */
 	child = pci_find_bus(pci_domain_nr(bus), busnr);
 	if (child)
 		return child;
 
-	child = pci_add_new_bus(bus, NULL, busnr);
+	child = pci_add_new_bus_locked(bus, NULL, busnr);
 	if (!child)
 		return NULL;
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c5286b027f00..5dc2a6774fa9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -25,6 +25,8 @@
 #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
 #define CARDBUS_RESERVE_BUSNR	3
 
+DEFINE_MUTEX(add_bus_mutex);
+
 static struct resource busn_resource = {
 	.name	= "PCI busn",
 	.start	= 0,
@@ -1170,6 +1172,30 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 }
 EXPORT_SYMBOL(pci_add_new_bus);
 
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
+                                int busnr)
+{
+        struct pci_bus *child;
+
+        mutex_lock(&add_bus_mutex);
+        child = pci_find_bus(pci_domain_nr(parent), busnr);
+        if (child) {
+                mutex_unlock(&add_bus_mutex);
+                return child;
+        }
+
+        child = pci_alloc_child_bus(parent, dev, busnr);
+        if (child) {
+                down_write(&pci_bus_sem);
+                list_add_tail(&child->node, &parent->children);
+                up_write(&pci_bus_sem);
+        }
+        mutex_unlock(&add_bus_mutex);
+
+        return child;
+}
+EXPORT_SYMBOL(pci_add_new_bus_locked);
+
 static void pci_enable_crs(struct pci_dev *pdev)
 {
 	u16 root_cap = 0;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index f967709082d6..f927263c2fe0 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -421,6 +421,7 @@ int pci_proc_attach_device(struct pci_dev *dev)
 	if (!proc_initialized)
 		return -EACCES;
 
+	device_lock(&bus->dev);
 	if (!bus->procdir) {
 		if (pci_proc_domain(bus)) {
 			sprintf(name, "%04x:%02x", pci_domain_nr(bus),
@@ -429,9 +430,12 @@ int pci_proc_attach_device(struct pci_dev *dev)
 			sprintf(name, "%02x", bus->number);
 		}
 		bus->procdir = proc_mkdir(name, proc_bus_pci_dir);
-		if (!bus->procdir)
+		if (!bus->procdir) {
+			device_unlock(&bus->dev);
 			return -ENOMEM;
+		}
 	}
+	device_unlock(&bus->dev);
 
 	sprintf(name, "%02x.%x", PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
 	e = proc_create_data(name, S_IFREG | S_IRUGO | S_IWUSR, bus->procdir,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 060af91bafcd..ec5b68b4c63d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1112,6 +1112,8 @@ struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
 int pci_scan_root_bus_bridge(struct pci_host_bridge *bridge);
 struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 				int busnr);
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
++                                int busnr);
 struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 				 const char *name,
 				 struct hotplug_slot *hotplug);
--

[-- Attachment #2: enable_sriov_demo.c --]
[-- Type: text/plain, Size: 4386 bytes --]

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <dirent.h>
#include <pthread.h>
#include <sys/stat.h>

#include <fcntl.h>
#include <string.h>

#define MAX_PATH_SIZE  1024
#define MAX_PF_NUM     20
#define VF_PATH   64
#define VF_LEN    10
#define DEVICE_DIR     "/sys/bus/pci/devices"
#define STREQ(a, b) (strcmp(a, b) == 0)

typedef struct SriovEnableThread {
    pthread_t thread;
    char bdf[VF_PATH];
    char totalVfs[VF_LEN];
} SriovEnableThread;

static int FileWriteString(const char *path, const char *str)
{
    int fd = open(path, O_WRONLY | O_TRUNC);
    if (fd == -1) {
        return -1;
    }

    if (write(fd, str, strlen(str)) < 0) {
        close(fd);
        return -1;
    }

    if (close(fd) != 0) {
        return -1;
    }

    return 0;
}

static int FileReadString(const char *file, char *buf, int len)
{
    int fd;
    ssize_t sz;

    fd = open(file, O_RDONLY);
    if (fd < 0) {
        return -1;
    }

    sz = read(fd, buf, len);
    close(fd);
    if (sz < 0) {
        return -1;
    }

    if (sz == len) {
        printf("String len '%d' is too short to save %s\n", len, file);
        buf[len - 1] = '\0';
        return -1;
    }

    buf[sz] = '\0';
    return sz;
}

static void *DeviceSriovEnable(void *arg)
{
    char numVfsPath[VF_PATH] = { 0 };
    SriovEnableThread *sriovThread = (SriovEnableThread *)arg;
    const char *bdf = sriovThread->bdf;
    const char *totalVfs = sriovThread->totalVfs;

    printf("echo %s > /sys/bus/pci/devices/%s/sriov_numvfs\n", totalVfs, bdf);

    if (snprintf(numVfsPath, sizeof(numVfsPath) - 1,
                   "%s/%s/sriov_numvfs", DEVICE_DIR, bdf) < 0) {
        printf("Get device sriov_numvfs fail\n");
        pthread_exit(NULL);
    }

    if (FileWriteString(numVfsPath, totalVfs)) {
        printf("enable '%s' sriov fail\n", bdf);
        pthread_exit(NULL);
    }
    pthread_exit(NULL);
}

static int DeviceCheckSriovEnable(const char *bdf)
{
    char path[VF_PATH] = { 0 };
    int ret;

    ret = snprintf(path, sizeof(path) - 1,
                     "%s/%s/sriov_totalvfs", DEVICE_DIR, bdf);
    if (ret < 0) {
        return ret;
    }

    if (access(path, R_OK) != 0) {
        return 0;
    }

    return 1;
}

static int DeviceReadValue(const char *bdf, const char *pattern, char *buffer, size_t len)
{
    char path[VF_PATH] = {0};
    int ret;

    ret = snprintf(path, sizeof(path) - 1, "%s/%s/%s", DEVICE_DIR,
                     bdf, pattern);
    if (ret < 0) {
        printf("Set devicePath fail\n");
        return ret;
    }

    if (FileReadString(path, buffer, len) < 0) {
        printf("Read data from devicePath %s fail\n", path);
        return -1;
    }

    return 0;
}

void main(int argc, char **argv)
{
    int ret;
    DIR* dirp;
    int i = 0;
    struct dirent *dp = NULL;
    char buffer[VF_LEN] = { 0 };
    int sriovThreadLen = sizeof(SriovEnableThread) * MAX_PF_NUM;

    dirp = opendir(DEVICE_DIR);
    if (!dirp) {
        printf("Can not find %s\n", DEVICE_DIR);
        return;
    }

    SriovEnableThread *sriovThread = (SriovEnableThread *)malloc(sriovThreadLen);
    if (!sriovThread) {
        printf("Can not alloc sriovThread\n");
        closedir(dirp);
        return;
    }
    memset(sriovThread, 0, sriovThreadLen);

    while ((dp = readdir(dirp)) != NULL) {
        if (STREQ(dp->d_name, ".") || STREQ(dp->d_name, "..")) {
            continue;
        }

        if (DeviceCheckSriovEnable(dp->d_name) <= 0) {
            continue;
        }

        if (DeviceReadValue(dp->d_name, "sriov_totalvfs", buffer, sizeof(buffer)) < 0) {
            continue;
        }

        if (i >= MAX_PF_NUM) {
            printf("pf num is exceed max %d\n", MAX_PF_NUM);
            break;
        }

        strcpy(sriovThread[i].bdf, dp->d_name);
        strcpy(sriovThread[i].totalVfs, buffer);
        ret = pthread_create(&sriovThread[i].thread, NULL, DeviceSriovEnable,
                             (void *)&sriovThread[i]);
        if (ret) {
            printf("create sriov thread %d for %s failed, ret : %d\n", i, sriovThread[i].bdf, ret);
            break;
        }
        i++;
    }

    closedir(dirp);
    /* wait until all sriov enable thread finish */
    for (int j = 0; j < i; j++) {
        pthread_join(sriovThread[j].thread, NULL);
    }

    printf("sriov enable finish\n");
    free(sriovThread);
}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-23 21:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17  7:43 [Question] Any plan to support enable PCI SRIOV concurrently in kernel side? Zhoujian (jay)
2022-08-17 19:49 ` Bjorn Helgaas
2022-08-21 11:25   ` 答复: " zhuangshengen
2022-09-23 21:26     ` Bjorn Helgaas
2022-08-27  8:01   ` Zhoujian (jay)
2022-08-21 11:35 zhuangshengen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).