Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] PCI/IOV: update num_VFs earlier
@ 2019-03-29  8:00 Pierre Crégut
  2019-04-05 22:33 ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Pierre Crégut @ 2019-03-29  8:00 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, Pierre Crégut

Ensure that iov->num_VFs is set before a netlink message is sent
when the number of VFs is changed. Only the path for num_VFs > 0
is affected. The path for num_VFs = 0 is already correct.

Monitoring programs can relie on netlink messages to track interface
change and query their state in /sys. But when sriov_numvfs is set to a
positive value, the netlink message is sent before the value is available
in sysfs. The value read after the message is received is always zero.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
---
note: the behaviour can be tested with the following shell script also
available on the bugzilla (d being the phy device name):

ip monitor dev $d | grep --line-buffered "^[0-9]*:" | \
while read line; do cat /sys/class/net/$d/device/sriov_numvfs; done

 drivers/pci/iov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 3aa115ed3a65..a9655c10e87f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 		goto err_pcibios;
 	}
 
+	iov->num_VFs = nr_virtfn;
 	pci_iov_set_numvfs(dev, nr_virtfn);
 	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
 	pci_cfg_access_lock(dev);
@@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 		goto err_pcibios;
 
 	kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
-	iov->num_VFs = nr_virtfn;
 
 	return 0;
 
@@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
+	iov->num_VFs = 0;
 	pci_iov_set_numvfs(dev, 0);
 	return rc;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/IOV: update num_VFs earlier
  2019-03-29  8:00 [PATCH] PCI/IOV: update num_VFs earlier Pierre Crégut
@ 2019-04-05 22:33 ` Bjorn Helgaas
  2019-04-26  8:11   ` CREGUT Pierre IMT/OLN
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2019-04-05 22:33 UTC (permalink / raw)
  To: Pierre Crégut; +Cc: linux-pci, linux-kernel

On Fri, Mar 29, 2019 at 09:00:58AM +0100, Pierre Crégut wrote:
> Ensure that iov->num_VFs is set before a netlink message is sent
> when the number of VFs is changed. Only the path for num_VFs > 0
> is affected. The path for num_VFs = 0 is already correct.
> 
> Monitoring programs can relie on netlink messages to track interface
> change and query their state in /sys. But when sriov_numvfs is set to a
> positive value, the netlink message is sent before the value is available
> in sysfs. The value read after the message is received is always zero.

Thanks, Pierre!  Can you clue me in on where exactly the connection
from sriov_enable() to netlink is?

I see one side of the race is with sriov_numvfs_show(), but I don't
know where the netlink message is sent.  Is that connected with the
kobject_uevent(KOBJ_CHANGE)?

One thing this would help with is figuring out exactly how *much*
earlier we need to set iov->num_VFs.  It looks like the current patch
sets it before we actually enable the VFs, so a user could read
/sys/.../sriov_numvfs and get the wrong value.  Of course, that's
unavoidable; the question is whether it's OK to get the new value
*before* it actually takes effect, or whether we want to return a
stale value until after it takes effect.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
> Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
> ---
> note: the behaviour can be tested with the following shell script also
> available on the bugzilla (d being the phy device name):
> 
> ip monitor dev $d | grep --line-buffered "^[0-9]*:" | \
> while read line; do cat /sys/class/net/$d/device/sriov_numvfs; done
> 
>  drivers/pci/iov.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 3aa115ed3a65..a9655c10e87f 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  		goto err_pcibios;
>  	}
>  
> +	iov->num_VFs = nr_virtfn;
>  	pci_iov_set_numvfs(dev, nr_virtfn);
>  	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
>  	pci_cfg_access_lock(dev);
> @@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  		goto err_pcibios;
>  
>  	kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
> -	iov->num_VFs = nr_virtfn;
>  
>  	return 0;
>  
> @@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  	if (iov->link != dev->devfn)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
>  
> +	iov->num_VFs = 0;
>  	pci_iov_set_numvfs(dev, 0);
>  	return rc;
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/IOV: update num_VFs earlier
  2019-04-05 22:33 ` Bjorn Helgaas
@ 2019-04-26  8:11   ` CREGUT Pierre IMT/OLN
  2019-06-13 23:51     ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: CREGUT Pierre IMT/OLN @ 2019-04-26  8:11 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel

I also initially thought that kobject_uevent generated the netlink event
but this is not the case. This is generated by the specific driver in use.
For the Intel i40e driver, this is the call to i40e_do_reset_safe in
i40e_pci_sriov_configure that sends the event.
It is followed by i40e_pci_sriov_enable that calls i40e_alloc_vfs that
finally calls the generic pci_enable_sriov function.

So the proposed patch works well for the i40e driver (x710 cards) because
the update to num_VFs is fast enough to be committed before the event is
received. It may not work with other cards. The same is true for the zero
value and there is no guarantee for other cards.

The clean solution would be to lock the device in sriov_numvfs_show.
I guess that there are good reasons why locks have been avoided
in sysfs getter functions so let us explore other approaches.

We can either return a "not settled" value (-1) or (probably better)
do not return a value but an error (-EAGAIN returned by the show
function).

To distinguish this "not settled" situation we can either:
* overload the meaning of num_VFs (eg make it negative)
   but it is an unsigned short.
* add a bool to pci_sriov struct (rather simple but modifies a well
   established structure).
* use the fact that not_settled => device is locked and use
   mutex_is_locked as an over approximation.

The later is not perfect but requires minimal changes to
sriov_numvfs_show:

      if (mutex_is_locked(&dev->mutex))
          return -EAGAIN;

In all cases, the device could be locked or the boolean set just
after the test. But I don't think there is a case where causality
would be violated.Thank you in advance for your recommendations.  
I will update the patch according to your instructions.

Le 06/04/2019 à 00:33, Bjorn Helgaas a écrit :
> On Fri, Mar 29, 2019 at 09:00:58AM +0100, Pierre Crégut wrote:
>> Ensure that iov->num_VFs is set before a netlink message is sent
>> when the number of VFs is changed. Only the path for num_VFs > 0
>> is affected. The path for num_VFs = 0 is already correct.
>>
>> Monitoring programs can relie on netlink messages to track interface
>> change and query their state in /sys. But when sriov_numvfs is set to a
>> positive value, the netlink message is sent before the value is available
>> in sysfs. The value read after the message is received is always zero.
> Thanks, Pierre!  Can you clue me in on where exactly the connection
> from sriov_enable() to netlink is?
>
> I see one side of the race is with sriov_numvfs_show(), but I don't
> know where the netlink message is sent.  Is that connected with the
> kobject_uevent(KOBJ_CHANGE)?
>
> One thing this would help with is figuring out exactly how *much*
> earlier we need to set iov->num_VFs.  It looks like the current patch
> sets it before we actually enable the VFs, so a user could read
> /sys/.../sriov_numvfs and get the wrong value.  Of course, that's
> unavoidable; the question is whether it's OK to get the new value
> *before* it actually takes effect, or whether we want to return a
> stale value until after it takes effect.
>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
>> Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
>> ---
>> note: the behaviour can be tested with the following shell script also
>> available on the bugzilla (d being the phy device name):
>>
>> ip monitor dev $d | grep --line-buffered "^[0-9]*:" | \
>> while read line; do cat /sys/class/net/$d/device/sriov_numvfs; done
>>
>>   drivers/pci/iov.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index 3aa115ed3a65..a9655c10e87f 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>>   		goto err_pcibios;
>>   	}
>>   
>> +	iov->num_VFs = nr_virtfn;
>>   	pci_iov_set_numvfs(dev, nr_virtfn);
>>   	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
>>   	pci_cfg_access_lock(dev);
>> @@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>>   		goto err_pcibios;
>>   
>>   	kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
>> -	iov->num_VFs = nr_virtfn;
>>   
>>   	return 0;
>>   
>> @@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>>   	if (iov->link != dev->devfn)
>>   		sysfs_remove_link(&dev->dev.kobj, "dep_link");
>>   
>> +	iov->num_VFs = 0;
>>   	pci_iov_set_numvfs(dev, 0);
>>   	return rc;
>>   }
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/IOV: update num_VFs earlier
  2019-04-26  8:11   ` CREGUT Pierre IMT/OLN
@ 2019-06-13 23:51     ` Bjorn Helgaas
  0 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2019-06-13 23:51 UTC (permalink / raw)
  To: CREGUT Pierre IMT/OLN; +Cc: linux-pci, linux-kernel

Hi Pierre,

I'm really sorry, I totally missed your response.

On Fri, Apr 26, 2019 at 10:11:54AM +0200, CREGUT Pierre IMT/OLN wrote:
> I also initially thought that kobject_uevent generated the netlink event
> but this is not the case. This is generated by the specific driver in use.
> For the Intel i40e driver, this is the call to i40e_do_reset_safe in
> i40e_pci_sriov_configure that sends the event.
> It is followed by i40e_pci_sriov_enable that calls i40e_alloc_vfs that
> finally calls the generic pci_enable_sriov function.
> 
> So the proposed patch works well for the i40e driver (x710 cards) because
> the update to num_VFs is fast enough to be committed before the event is
> received. It may not work with other cards. The same is true for the zero
> value and there is no guarantee for other cards.
> 
> The clean solution would be to lock the device in sriov_numvfs_show.
> I guess that there are good reasons why locks have been avoided
> in sysfs getter functions so let us explore other approaches.
> 
> We can either return a "not settled" value (-1) or (probably better)
> do not return a value but an error (-EAGAIN returned by the show
> function).
> 
> To distinguish this "not settled" situation we can either:
> * overload the meaning of num_VFs (eg make it negative)
>   but it is an unsigned short.
> * add a bool to pci_sriov struct (rather simple but modifies a well
>   established structure).
> * use the fact that not_settled => device is locked and use
>   mutex_is_locked as an over approximation.
> 
> The later is not perfect but requires minimal changes to
> sriov_numvfs_show:
> 
>      if (mutex_is_locked(&dev->mutex))
>          return -EAGAIN;

I think this looks like a good resolution.

> In all cases, the device could be locked or the boolean set just
> after the test. But I don't think there is a case where causality
> would be violated.Thank you in advance for your recommendations.  I will
> update the patch according to your instructions.
> 
> Le 06/04/2019 à 00:33, Bjorn Helgaas a écrit :
> > On Fri, Mar 29, 2019 at 09:00:58AM +0100, Pierre Crégut wrote:
> > > Ensure that iov->num_VFs is set before a netlink message is sent
> > > when the number of VFs is changed. Only the path for num_VFs > 0
> > > is affected. The path for num_VFs = 0 is already correct.
> > > 
> > > Monitoring programs can relie on netlink messages to track interface
> > > change and query their state in /sys. But when sriov_numvfs is set to a
> > > positive value, the netlink message is sent before the value is available
> > > in sysfs. The value read after the message is received is always zero.
> > Thanks, Pierre!  Can you clue me in on where exactly the connection
> > from sriov_enable() to netlink is?
> > 
> > I see one side of the race is with sriov_numvfs_show(), but I don't
> > know where the netlink message is sent.  Is that connected with the
> > kobject_uevent(KOBJ_CHANGE)?
> > 
> > One thing this would help with is figuring out exactly how *much*
> > earlier we need to set iov->num_VFs.  It looks like the current patch
> > sets it before we actually enable the VFs, so a user could read
> > /sys/.../sriov_numvfs and get the wrong value.  Of course, that's
> > unavoidable; the question is whether it's OK to get the new value
> > *before* it actually takes effect, or whether we want to return a
> > stale value until after it takes effect.
> > 
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
> > > Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
> > > ---
> > > note: the behaviour can be tested with the following shell script also
> > > available on the bugzilla (d being the phy device name):
> > > 
> > > ip monitor dev $d | grep --line-buffered "^[0-9]*:" | \
> > > while read line; do cat /sys/class/net/$d/device/sriov_numvfs; done
> > > 
> > >   drivers/pci/iov.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > > index 3aa115ed3a65..a9655c10e87f 100644
> > > --- a/drivers/pci/iov.c
> > > +++ b/drivers/pci/iov.c
> > > @@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> > >   		goto err_pcibios;
> > >   	}
> > > +	iov->num_VFs = nr_virtfn;
> > >   	pci_iov_set_numvfs(dev, nr_virtfn);
> > >   	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
> > >   	pci_cfg_access_lock(dev);
> > > @@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> > >   		goto err_pcibios;
> > >   	kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
> > > -	iov->num_VFs = nr_virtfn;
> > >   	return 0;
> > > @@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> > >   	if (iov->link != dev->devfn)
> > >   		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> > > +	iov->num_VFs = 0;
> > >   	pci_iov_set_numvfs(dev, 0);
> > >   	return rc;
> > >   }
> > > -- 
> > > 2.17.1
> > > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] PCI/IOV: update num_VFs earlier
@ 2019-03-25  8:18 Pierre Crégut
  0 siblings, 0 replies; 5+ messages in thread
From: Pierre Crégut @ 2019-03-25  8:18 UTC (permalink / raw)
  To: linux-pci; +Cc: Pierre Crégut

Ensure that iov->num_VFs is set before a netlink message is sent when
the number of VFs is changed. Otherwise /sys/.../sriov_numvfs is not
updated when read immediately after receiving the netlink message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
---
 drivers/pci/iov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 3aa115ed3a65..a9655c10e87f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 		goto err_pcibios;
 	}
 
+	iov->num_VFs = nr_virtfn;
 	pci_iov_set_numvfs(dev, nr_virtfn);
 	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
 	pci_cfg_access_lock(dev);
@@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 		goto err_pcibios;
 
 	kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
-	iov->num_VFs = nr_virtfn;
 
 	return 0;
 
@@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
+	iov->num_VFs = 0;
 	pci_iov_set_numvfs(dev, 0);
 	return rc;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29  8:00 [PATCH] PCI/IOV: update num_VFs earlier Pierre Crégut
2019-04-05 22:33 ` Bjorn Helgaas
2019-04-26  8:11   ` CREGUT Pierre IMT/OLN
2019-06-13 23:51     ` Bjorn Helgaas
  -- strict thread matches above, loose matches on Subject: below --
2019-03-25  8:18 Pierre Crégut

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org linux-pci@archiver.kernel.org
	public-inbox-index linux-pci


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/ public-inbox