* Re: pci_sriov_set_totalvfs again [not found] <53D9288B.5030302@solarflare.com> @ 2014-07-30 18:05 ` Don Dutile 2014-07-30 18:24 ` Edward Cree 0 siblings, 1 reply; 27+ messages in thread From: Don Dutile @ 2014-07-30 18:05 UTC (permalink / raw) To: Edward Cree, linux-pci On 07/30/2014 01:16 PM, Edward Cree wrote: > Calling pci_sriov_set_totalvfs(dev, 0) has no effect, because pci_sriov_get_totalvfs ignores dev->sriov->driver_max_VFs if it's 0, as that is used as the 'not set' value. > So, three questions: > a) is this a bug? > b) if not, should the comment on pci_sriov_set_totalvfs mention that passing numvfs=0 will be interpreted as numvfs=dev->sriov->total_VFs? > c) is there a better way of indicating "current configuration doesn't support VFs" rather than calling set_totalvfs(0)? > > Thanks, > -Edward > ||The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. The file shouldn't exist if the device doesn't provide an SRIOV capability. If it does, and it's not supported, then add a patch in quirks.c. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: pci_sriov_set_totalvfs again 2014-07-30 18:05 ` pci_sriov_set_totalvfs again Don Dutile @ 2014-07-30 18:24 ` Edward Cree 2014-07-30 21:14 ` Alexander Duyck 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-07-30 18:24 UTC (permalink / raw) To: Don Dutile; +Cc: linux-pci On 30/07/14 19:05, Don Dutile wrote: > On 07/30/2014 01:16 PM, Edward Cree wrote: >> Calling pci_sriov_set_totalvfs(dev, 0) has no effect, because >> pci_sriov_get_totalvfs ignores dev->sriov->driver_max_VFs if it's 0, >> as that is used as the 'not set' value. >> So, three questions: >> a) is this a bug? >> b) if not, should the comment on pci_sriov_set_totalvfs mention that >> passing numvfs=0 will be interpreted as numvfs=dev->sriov->total_VFs? >> c) is there a better way of indicating "current configuration doesn't >> support VFs" rather than calling set_totalvfs(0)? >> >> Thanks, >> -Edward > > The file shouldn't exist if the device doesn't provide an SRIOV > capability. > If it does, and it's not supported, then add a patch in quirks.c. > > I don't know much about quirks, but I'm not sure they're the answer here, as it's not quite as simple as "driver doesn't support it". It's a firmware / configuration issue, that if the device (it's a NIC) is configured a certain way [1], the VFs - while appearing fine from a PCI perspective - don't actually work (they can't pass traffic). We can't detect this misconfiguration until PF probe time, and we need a way to report that the VFs aren't usable. Can quirks handle this? -Edward [1] SFC9120-based NICs support multiple PFs per port and these can be used as a kind of "poor-man's SR-IOV" (we're calling it 'PF-IOV') by placing the firmware v-switch below the PFs. However, this then precludes adding a v-switch above the PF to direct VF traffic, meaning that VFs are useless in this configuration. Consequently, our configuration tools won't allow VFs and PF-IOV to be enabled simultaneously, but bugs or corruption could cause this to happen. The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: pci_sriov_set_totalvfs again 2014-07-30 18:24 ` Edward Cree @ 2014-07-30 21:14 ` Alexander Duyck 2014-07-31 12:07 ` Edward Cree 0 siblings, 1 reply; 27+ messages in thread From: Alexander Duyck @ 2014-07-30 21:14 UTC (permalink / raw) To: Edward Cree, Don Dutile; +Cc: linux-pci On 07/30/2014 11:24 AM, Edward Cree wrote: > On 30/07/14 19:05, Don Dutile wrote: >> On 07/30/2014 01:16 PM, Edward Cree wrote: >>> Calling pci_sriov_set_totalvfs(dev, 0) has no effect, because >>> pci_sriov_get_totalvfs ignores dev->sriov->driver_max_VFs if it's 0, >>> as that is used as the 'not set' value. >>> So, three questions: >>> a) is this a bug? >>> b) if not, should the comment on pci_sriov_set_totalvfs mention that >>> passing numvfs=0 will be interpreted as numvfs=dev->sriov->total_VFs? >>> c) is there a better way of indicating "current configuration doesn't >>> support VFs" rather than calling set_totalvfs(0)? >>> >>> Thanks, >>> -Edward >> >> The file shouldn't exist if the device doesn't provide an SRIOV >> capability. >> If it does, and it's not supported, then add a patch in quirks.c. >> >> > I don't know much about quirks, but I'm not sure they're the answer > here, as it's not quite as simple as "driver doesn't support it". > It's a firmware / configuration issue, that if the device (it's a NIC) > is configured a certain way [1], the VFs - while appearing fine from a > PCI perspective - don't actually work (they can't pass traffic). > We can't detect this misconfiguration until PF probe time, and we need a > way to report that the VFs aren't usable. > > Can quirks handle this? > > -Edward > > [1] SFC9120-based NICs support multiple PFs per port and these can be > used as a kind of "poor-man's SR-IOV" (we're calling it 'PF-IOV') by > placing the firmware v-switch below the PFs. However, this then > precludes adding a v-switch above the PF to direct VF traffic, meaning > that VFs are useless in this configuration. Consequently, our > configuration tools won't allow VFs and PF-IOV to be enabled > simultaneously, but bugs or corruption could cause this to happen. My $.02 on the issue would be to simply have sriov_configure return an error indicating the resources are not available if you have the PF-IOV mode enabled it is consuming the VF v-switch resources. Thanks, Alex ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: pci_sriov_set_totalvfs again 2014-07-30 21:14 ` Alexander Duyck @ 2014-07-31 12:07 ` Edward Cree 2014-07-31 14:24 ` [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) Edward Cree 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-07-31 12:07 UTC (permalink / raw) To: Alexander Duyck; +Cc: Don Dutile, linux-pci On 30/07/14 22:14, Alexander Duyck wrote: > My $.02 on the issue would be to simply have sriov_configure return an > error indicating the resources are not available if you have the > PF-IOV mode enabled it is consuming the VF v-switch resources. Thanks, > Alex That would work, but I don't like the idea of saying (in sysfs device/sriov_totalvfs) "I have some VFs" only to say "not really, ha" when someone tries to use them. At least, not when we knew all along. Let's look at this from a different angle: If I posted a patch to make pci_sriov_set_totalvfs(dev, 0) do what I want, would there be any objections to taking it? It seems like a sane API, the only existing (in-tree) users never pass 0, and it shouldn't require too ugly an implementation. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 12:07 ` Edward Cree @ 2014-07-31 14:24 ` Edward Cree 2014-07-31 15:21 ` Alexander Duyck 2014-08-01 3:51 ` Ethan Zhao 0 siblings, 2 replies; 27+ messages in thread From: Edward Cree @ 2014-07-31 14:24 UTC (permalink / raw) To: linux-pci; +Cc: Alexander Duyck, Don Dutile, Bjorn Helgaas Make a driver_max_VFs of 0 mean 0, instead of total_VFs as previously. Initialise driver_max_VFs to total_VFs instead of 0. Signed-off-by: Edward Cree <ecree@solarflare.com> --- Only existing callers are ixgbe and igb, and neither passes 0, so this doesn't break existing users. --- drivers/pci/iov.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index cb6f247..fb51467 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -381,7 +381,7 @@ found: iov->pos = pos; iov->nres = nres; iov->ctrl = ctrl; - iov->total_VFs = total; + iov->driver_max_VFs = iov->total_VFs = total; iov->offset = offset; iov->stride = stride; iov->pgsz = pgsz; @@ -686,9 +686,6 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev) if (!dev->is_physfn) return 0; - if (dev->sriov->driver_max_VFs) - return dev->sriov->driver_max_VFs; - - return dev->sriov->total_VFs; + return dev->sriov->driver_max_VFs; } EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs); -- 1.7.11.7 The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 14:24 ` [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) Edward Cree @ 2014-07-31 15:21 ` Alexander Duyck 2014-07-31 15:56 ` Edward Cree 2014-08-01 3:51 ` Ethan Zhao 1 sibling, 1 reply; 27+ messages in thread From: Alexander Duyck @ 2014-07-31 15:21 UTC (permalink / raw) To: Edward Cree, linux-pci; +Cc: Don Dutile, Bjorn Helgaas On 07/31/2014 07:24 AM, Edward Cree wrote: > Make a driver_max_VFs of 0 mean 0, instead of total_VFs as previously. > Initialise driver_max_VFs to total_VFs instead of 0. > > Signed-off-by: Edward Cree <ecree@solarflare.com> > --- > Only existing callers are ixgbe and igb, and neither passes 0, so this > doesn't break existing users. > --- > drivers/pci/iov.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index cb6f247..fb51467 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -381,7 +381,7 @@ found: > iov->pos = pos; > iov->nres = nres; > iov->ctrl = ctrl; > - iov->total_VFs = total; > + iov->driver_max_VFs = iov->total_VFs = total; > iov->offset = offset; > iov->stride = stride; > iov->pgsz = pgsz; > @@ -686,9 +686,6 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev) > if (!dev->is_physfn) > return 0; > > - if (dev->sriov->driver_max_VFs) > - return dev->sriov->driver_max_VFs; > - > - return dev->sriov->total_VFs; > + return dev->sriov->driver_max_VFs; > } > EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs); This change is mostly harmless so I am not concerned with any effects on igb or ixgbe. In the grand scheme of things it should also drop a few lines of code which is always good. However if you are wanting to use this as a way to disable SR-IOV for a device I wouldn't recommend it. Based on what you have described I would suggest dropping the fix in drivers/pci/quirks that would clear dev->is_physfn if you are in the PF-IOV mode. That way the SR-IOV functionality should be fully disabled. In addition this solution would also resolve the fact that the driver wouldn't actually have to be loaded for it to work so if someone were to load a driver that didn't contain the fix they would be blocked from enabling SR-IOV as well. Thanks, Alex ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 15:21 ` Alexander Duyck @ 2014-07-31 15:56 ` Edward Cree 2014-07-31 16:40 ` Alexander Duyck 2014-08-01 3:18 ` Ethan Zhao 0 siblings, 2 replies; 27+ messages in thread From: Edward Cree @ 2014-07-31 15:56 UTC (permalink / raw) To: Alexander Duyck; +Cc: linux-pci, Don Dutile, Bjorn Helgaas On 31/07/14 16:21, Alexander Duyck wrote: > However if you are wanting to use this as a way to disable SR-IOV for a > device I wouldn't recommend it. Based on what you have described I > would suggest dropping the fix in drivers/pci/quirks that would clear > dev->is_physfn if you are in the PF-IOV mode. That way the SR-IOV > functionality should be fully disabled. I still don't see how this can go in drivers/pci/quirks, because PF-IOV mode can only be detected at driver probe time. The way we find out that the NIC is in PF-IOV mode is that a request to the MC (the Management CPU on the card) to create a vswitch fails. But maybe I'm wrong about what quirks can do - is there any relevant documentation? Clearing dev->is_physfn in the driver probe routine isn't safe: dev->sriov memory can get leaked, for example, as sriov_init will get called at device add, but sriov_release won't be called at removal. I'm also unconvinced that having !(dev->is_physfn || dev->is_virtfn) is a valid state; but maybe I'm misunderstanding and "is_physfn" doesn't actually mean "is a physical function", but rather "has SR-IOV capability". > In addition this solution would > also resolve the fact that the driver wouldn't actually have to be > loaded for it to work so if someone were to load a driver that didn't > contain the fix they would be blocked from enabling SR-IOV as well. The current driver fails to probe the PF because it assumes the vswitch creation failure is fatal. There should never be a driver that knows it can live without the vswitch but doesn't know that that breaks SR-IOV. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 15:56 ` Edward Cree @ 2014-07-31 16:40 ` Alexander Duyck 2014-07-31 16:57 ` Edward Cree 2014-08-01 3:18 ` Ethan Zhao 1 sibling, 1 reply; 27+ messages in thread From: Alexander Duyck @ 2014-07-31 16:40 UTC (permalink / raw) To: Edward Cree; +Cc: linux-pci, Don Dutile, Bjorn Helgaas On 07/31/2014 08:56 AM, Edward Cree wrote: > On 31/07/14 16:21, Alexander Duyck wrote: >> However if you are wanting to use this as a way to disable SR-IOV for a >> device I wouldn't recommend it. Based on what you have described I >> would suggest dropping the fix in drivers/pci/quirks that would clear >> dev->is_physfn if you are in the PF-IOV mode. That way the SR-IOV >> functionality should be fully disabled. > I still don't see how this can go in drivers/pci/quirks, because PF-IOV > mode can only be detected at driver probe time. The way we find out > that the NIC is in PF-IOV mode is that a request to the MC (the > Management CPU on the card) to create a vswitch fails. But maybe I'm > wrong about what quirks can do - is there any relevant documentation? If this PF-IOV mode is enabled what would be the layout of the PF devices? It seems like you should be able to scan for multiple PFs all showing up on the same bus with a certain stride if you wanted to detect it. > Clearing dev->is_physfn in the driver probe routine isn't safe: > dev->sriov memory can get leaked, for example, as sriov_init will get > called at device add, but sriov_release won't be called at removal. Thanks for pointing that out. It seems like that is a bug. We should probably be checking for dev->sriov, not dev->is_physfn before calling sriov_release. I might see about submitting a patch to address that. > I'm also unconvinced that having !(dev->is_physfn || dev->is_virtfn) is > a valid state; but maybe I'm misunderstanding and "is_physfn" doesn't > actually mean "is a physical function", but rather "has SR-IOV capability". is_physfn is typically used to indicate the device is capable of acting as a physical function. So a non-IOV device will not set either is_physfn or is_virtfn. >> In addition this solution would >> also resolve the fact that the driver wouldn't actually have to be >> loaded for it to work so if someone were to load a driver that didn't >> contain the fix they would be blocked from enabling SR-IOV as well. > The current driver fails to probe the PF because it assumes the vswitch > creation failure is fatal. There should never be a driver that knows it > can live without the vswitch but doesn't know that that breaks SR-IOV. > > -Edward Is the vswitch a hard requirement for something other than SR-IOV? If not then maybe you should consider modifying the driver so it simply disabled SR-IOV if you cannot allocate the vswitch instead of blocking probe. Thanks, Alex P.S. You should really consider changing your email signature. The bit about "you may not use, copy or disclose to anyone this message" kind of goes against the whole point of submitting patches to an open source project. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 16:40 ` Alexander Duyck @ 2014-07-31 16:57 ` Edward Cree 2014-07-31 17:53 ` Don Dutile 2014-07-31 17:55 ` Alexander Duyck 0 siblings, 2 replies; 27+ messages in thread From: Edward Cree @ 2014-07-31 16:57 UTC (permalink / raw) To: Alexander Duyck; +Cc: linux-pci, Don Dutile, Bjorn Helgaas On 31/07/14 17:40, Alexander Duyck wrote: > If this PF-IOV mode is enabled what would be the layout of the PF > devices? It seems like you should be able to scan for multiple PFs all > showing up on the same bus with a certain stride if you wanted to detect it. Alas, it's (once again) more complicated than that. There's another mode, NIC partitioning, which has multiple PFs per port but connected with a VLAN aggregator rather than a v-switch, which means that each PF then can support VFs (the firmware limitation is that v-switches can't be stacked). So having multiple PFs on the port doesn't necessarily mean we can't do SR-IOV. > Thanks for pointing that out. It seems like that is a bug. We should > probably be checking for dev->sriov, not dev->is_physfn before calling > sriov_release. > > I might see about submitting a patch to address that. Unfortunately, that won't work because dev->sriov is in a union with dev->physfn (presumably on the grounds that if you have an associated PF, you're a VF, so you can't have VFs of your own). So I think the test in pci_iov_release is correct, and anyone changing dev->is_physfn after the PCI device has been set up is in the wrong. >> I'm also unconvinced that having !(dev->is_physfn || dev->is_virtfn) is >> a valid state; but maybe I'm misunderstanding and "is_physfn" doesn't >> actually mean "is a physical function", but rather "has SR-IOV capability". > is_physfn is typically used to indicate the device is capable of acting > as a physical function. So a non-IOV device will not set either > is_physfn or is_virtfn. Ok, well I think a comment to that effect in struct pci_dev might be a good idea. >>> In addition this solution would >>> also resolve the fact that the driver wouldn't actually have to be >>> loaded for it to work so if someone were to load a driver that didn't >>> contain the fix they would be blocked from enabling SR-IOV as well. >> The current driver fails to probe the PF because it assumes the vswitch >> creation failure is fatal. There should never be a driver that knows it >> can live without the vswitch but doesn't know that that breaks SR-IOV. >> >> -Edward > Is the vswitch a hard requirement for something other than SR-IOV? If > not then maybe you should consider modifying the driver so it simply > disabled SR-IOV if you cannot allocate the vswitch instead of blocking > probe. > That's exactly what I'm trying to do with pci_sriov_set_totalvfs(dev, 0); that's the whole point. > P.S. You should really consider changing your email signature. The bit > about "you may not use, copy or disclose to anyone this message" kind of > goes against the whole point of submitting patches to an open source > project. > I know; it's automatically appended by our mail server to all outgoing mail. I've already raised this with our IT department but they haven't answered yet :( -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 16:57 ` Edward Cree @ 2014-07-31 17:53 ` Don Dutile 2014-07-31 18:13 ` Edward Cree 2014-07-31 17:55 ` Alexander Duyck 1 sibling, 1 reply; 27+ messages in thread From: Don Dutile @ 2014-07-31 17:53 UTC (permalink / raw) To: Edward Cree, Alexander Duyck; +Cc: linux-pci, Bjorn Helgaas On 07/31/2014 12:57 PM, Edward Cree wrote: > On 31/07/14 17:40, Alexander Duyck wrote: >> If this PF-IOV mode is enabled what would be the layout of the PF >> devices? It seems like you should be able to scan for multiple PFs all >> showing up on the same bus with a certain stride if you wanted to detect it. > Alas, it's (once again) more complicated than that. There's another > mode, NIC partitioning, which has multiple PFs per port but connected > with a VLAN aggregator rather than a v-switch, which means that each PF > then can support VFs (the firmware limitation is that v-switches can't > be stacked). > So having multiple PFs on the port doesn't necessarily mean we can't do > SR-IOV. > Ed, It is fairly obvious that your PCIe device operates in very unexpected, non-std modes. Ignore twiddling total-vfs, and just have the driver fail sriov configuration when they can operate, print a warning why, and let's not create complicated/convoluted hacks dependent on the use of various assumptions/uses of pdev flags. Ideally, when the device is configured in different modes, the SRIOV cap structure should be modified so the pci sriov code doesn't try to act or reflect the non-reality the device is in. >> Thanks for pointing that out. It seems like that is a bug. We should >> probably be checking for dev->sriov, not dev->is_physfn before calling >> sriov_release. >> >> I might see about submitting a patch to address that. > Unfortunately, that won't work because dev->sriov is in a union with > dev->physfn (presumably on the grounds that if you have an associated > PF, you're a VF, so you can't have VFs of your own). > So I think the test in pci_iov_release is correct, and anyone changing > dev->is_physfn after the PCI device has been set up is in the wrong. > >>> I'm also unconvinced that having !(dev->is_physfn || dev->is_virtfn) is >>> a valid state; but maybe I'm misunderstanding and "is_physfn" doesn't >>> actually mean "is a physical function", but rather "has SR-IOV capability". >> is_physfn is typically used to indicate the device is capable of acting >> as a physical function. So a non-IOV device will not set either >> is_physfn or is_virtfn. > Ok, well I think a comment to that effect in struct pci_dev might be a > good idea. >>>> In addition this solution would >>>> also resolve the fact that the driver wouldn't actually have to be >>>> loaded for it to work so if someone were to load a driver that didn't >>>> contain the fix they would be blocked from enabling SR-IOV as well. >>> The current driver fails to probe the PF because it assumes the vswitch >>> creation failure is fatal. There should never be a driver that knows it >>> can live without the vswitch but doesn't know that that breaks SR-IOV. >>> >>> -Edward >> Is the vswitch a hard requirement for something other than SR-IOV? If >> not then maybe you should consider modifying the driver so it simply >> disabled SR-IOV if you cannot allocate the vswitch instead of blocking >> probe. >> > That's exactly what I'm trying to do with pci_sriov_set_totalvfs(dev, > 0); that's the whole point. >> P.S. You should really consider changing your email signature. The bit >> about "you may not use, copy or disclose to anyone this message" kind of >> goes against the whole point of submitting patches to an open source >> project. >> > I know; it's automatically appended by our mail server to all outgoing > mail. I've already raised this with our IT department but they haven't > answered yet :( > > -Edward > The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 17:53 ` Don Dutile @ 2014-07-31 18:13 ` Edward Cree 2014-08-04 14:03 ` Edward Cree 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-07-31 18:13 UTC (permalink / raw) To: Don Dutile; +Cc: Alexander Duyck, linux-pci, Bjorn Helgaas On 31/07/14 18:53, Don Dutile wrote: > On 07/31/2014 12:57 PM, Edward Cree wrote: >> Alas, it's (once again) more complicated than that. There's another >> mode, NIC partitioning, which has multiple PFs per port but connected >> with a VLAN aggregator rather than a v-switch, which means that each PF >> then can support VFs (the firmware limitation is that v-switches can't >> be stacked). >> So having multiple PFs on the port doesn't necessarily mean we can't do >> SR-IOV. >> > Ed, > It is fairly obvious that your PCIe device operates in very unexpected, > non-std modes. I really don't think it does. As far as PCIe is concerned, the device is behaving perfectly sanely: it claims to support VFs, and it does support VFs. You can add the VFs as PCIe functions just fine. It's just that the higher-layer entities those VFs give access to don't work properly. > Ignore twiddling total-vfs, and just have the driver fail sriov > configuration when they can operate, print a warning why, and let's > not create > complicated/convoluted hacks dependent on the use of various > assumptions/uses > of pdev flags. I don't like the idea of failing sriov configuration (I assume by this you mean the pci_driver.sriov_configure function), simply because we know at PF probe time that VFs won't work - so why advertise them? We have a pci_sriov_set_totalvfs function, why shouldn't it be used here? > Ideally, when the device is configured in different modes, the SRIOV > cap structure > should be modified so the pci sriov code doesn't try to act or reflect > the > non-reality the device is in. That would be much nicer of course, and I'll ask our firmware team if that's possible. But I don't think it will be. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 18:13 ` Edward Cree @ 2014-08-04 14:03 ` Edward Cree 2014-08-04 14:37 ` Alexander Duyck 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-08-04 14:03 UTC (permalink / raw) To: Don Dutile; +Cc: Alexander Duyck, linux-pci, Bjorn Helgaas On 31/07/14 19:13, Edward Cree wrote: > On 31/07/14 18:53, Don Dutile wrote: >> Ideally, when the device is configured in different modes, the SRIOV >> cap structure >> should be modified so the pci sriov code doesn't try to act or reflect >> the >> non-reality the device is in. > That would be much nicer of course, and I'll ask our firmware team if > that's possible. But I don't think it will be. As I feared, it's not practical for the device to do this. (Unfortunately, the IMEM (instruction memory) on the device is rather small and the firmware team are struggling to fit all the required features into it.) However, I also found out that in principle, the device _is_ capable of supporting VFs in PF-IOV mode, by attaching them directly to the underlying v-switch. The firmware may or may not already do this; the person who's likely to know this isn't in today. The driver at present doesn't support it because it assumes it always needs to attach VFs to its own v-switch, but it could (again, in principle) be taught otherwise. So at this point it becomes a policy question of whether we want to support this, and our opinion is that this isn't a valid use-case (as the only time you'd want PF-IOV is if your system doesn't support VFs) and thus shouldn't be supported (as it creates extra testing and maintenance work). The driver should probe the device, detect PF-IOV mode, recognise that it (the driver) doesn't have support for VFs in that mode, and set totalvfs to 0. Others may demur, of course, but that's likely to be decided when sending the driver patches to davem. It's my belief that, uncertainty about my use case notwithstanding, pci_sriov_set_totalvfs(dev, 0) should be a valid operation with the semantics I've implemented in my patch. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-04 14:03 ` Edward Cree @ 2014-08-04 14:37 ` Alexander Duyck 2014-08-04 15:22 ` Edward Cree 2014-08-06 9:38 ` Don Dutile 0 siblings, 2 replies; 27+ messages in thread From: Alexander Duyck @ 2014-08-04 14:37 UTC (permalink / raw) To: Edward Cree, Don Dutile; +Cc: linux-pci, Bjorn Helgaas On 08/04/2014 07:03 AM, Edward Cree wrote: > On 31/07/14 19:13, Edward Cree wrote: >> On 31/07/14 18:53, Don Dutile wrote: >>> Ideally, when the device is configured in different modes, the SRIOV >>> cap structure >>> should be modified so the pci sriov code doesn't try to act or reflect >>> the >>> non-reality the device is in. >> That would be much nicer of course, and I'll ask our firmware team if >> that's possible. But I don't think it will be. > As I feared, it's not practical for the device to do this. > (Unfortunately, the IMEM (instruction memory) on the device is rather > small and the firmware team are struggling to fit all the required > features into it.) > However, I also found out that in principle, the device _is_ capable of > supporting VFs in PF-IOV mode, by attaching them directly to the > underlying v-switch. The firmware may or may not already do this; the > person who's likely to know this isn't in today. The driver at present > doesn't support it because it assumes it always needs to attach VFs to > its own v-switch, but it could (again, in principle) be taught otherwise. > > So at this point it becomes a policy question of whether we want to > support this, and our opinion is that this isn't a valid use-case (as > the only time you'd want PF-IOV is if your system doesn't support VFs) > and thus shouldn't be supported (as it creates extra testing and > maintenance work). The driver should probe the device, detect PF-IOV > mode, recognise that it (the driver) doesn't have support for VFs in > that mode, and set totalvfs to 0. > Others may demur, of course, but that's likely to be decided when > sending the driver patches to davem. It's my belief that, uncertainty > about my use case notwithstanding, pci_sriov_set_totalvfs(dev, 0) should > be a valid operation with the semantics I've implemented in my patch. > > -Edward How is it you get yourself into the PF-IOV mode? Is this something that is stored in the NVM of the adapter, or is it something where the driver has to issue a request for it? Also does each PF advertise SR-IOV support in PF-IOV mode or only function 0? Modifying the total VFs and/or the initial VFs in the configuration space is not practical. You would likely run into issues as one of these fields is read at probe time and the other is read at SR-IOV enablement and if the two differ you cannot enable SR-IOV. I'm not sure what davem has to do with any of this. We are discussing PCI at this point, not the network drivers. As such Dave won't be the one accepting the pci_sriov_set_totalvfs changes you proposed. If you are just wanting to get a patch through Dave for this you would be much better off just returning an error on sriov_configure if the vswitch isn't available rather than trying to change the SR-IOV API to match your use case. Thanks, Alex ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-04 14:37 ` Alexander Duyck @ 2014-08-04 15:22 ` Edward Cree 2014-08-06 9:38 ` Don Dutile 1 sibling, 0 replies; 27+ messages in thread From: Edward Cree @ 2014-08-04 15:22 UTC (permalink / raw) To: Alexander Duyck; +Cc: Don Dutile, linux-pci, Bjorn Helgaas I must be really bad at explaining things, I keep confusing everyone... On 04/08/14 15:37, Alexander Duyck wrote: > How is it you get yourself into the PF-IOV mode? Is this something that > is stored in the NVM of the adapter, or is it something where the driver > has to issue a request for it? Also does each PF advertise SR-IOV > support in PF-IOV mode or only function 0? It's stored in the card's NVM, yes. Each PF will advertise SR-IOV support (if it has VFs configured - another NVM setting). > Modifying the total VFs and/or the initial VFs in the configuration > space is not practical. You would likely run into issues as one of > these fields is read at probe time and the other is read at SR-IOV > enablement and if the two differ you cannot enable SR-IOV. I wasn't suggesting modifying configuration space at probe time. The suggestion was for the firmware to do it at power-on. But it was decided impractical, hence why I want to use pci_sriov_set_totalvfs instead. > I'm not sure what davem has to do with any of this. We are discussing > PCI at this point, not the network drivers. As such Dave won't be the > one accepting the pci_sriov_set_totalvfs changes you proposed. I just meant that if and when this PCI patch gets accepted, it'll be followed by a driver patch (against davem's tree) to use this function, and at that point there's likely to be another discussion of whether it's the right thing for the driver to do. > rather than trying to change the SR-IOV API to match > your use case. Here's the thing: I don't feel like I'm changing the API, I feel like I'm fixing a bug. Now what is a bug and what is an undocumented, unused feature is of course subjective. But I haven't seen any arguments so far why pci_sriov_set_totalvfs(dev, 0) _should_ have the old behaviour, while it's easy to argue that the old behaviour is an inconsistent interface with an undocumented and surprising pitfall. So can someone please explain to me why this isn't a bug? -Edward ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-04 14:37 ` Alexander Duyck 2014-08-04 15:22 ` Edward Cree @ 2014-08-06 9:38 ` Don Dutile 1 sibling, 0 replies; 27+ messages in thread From: Don Dutile @ 2014-08-06 9:38 UTC (permalink / raw) To: Alexander Duyck; +Cc: Edward Cree, linux-pci, Bjorn Helgaas On 08/04/2014 10:37 AM, Alexander Duyck wrote: > On 08/04/2014 07:03 AM, Edward Cree wrote: >> On 31/07/14 19:13, Edward Cree wrote: >>> On 31/07/14 18:53, Don Dutile wrote: >>>> Ideally, when the device is configured in different modes, the SRIOV >>>> cap structure >>>> should be modified so the pci sriov code doesn't try to act or reflect >>>> the >>>> non-reality the device is in. >>> That would be much nicer of course, and I'll ask our firmware team if >>> that's possible. But I don't think it will be. >> As I feared, it's not practical for the device to do this. >> (Unfortunately, the IMEM (instruction memory) on the device is rather >> small and the firmware team are struggling to fit all the required >> features into it.) >> However, I also found out that in principle, the device _is_ capable of >> supporting VFs in PF-IOV mode, by attaching them directly to the >> underlying v-switch. The firmware may or may not already do this; the >> person who's likely to know this isn't in today. The driver at present >> doesn't support it because it assumes it always needs to attach VFs to >> its own v-switch, but it could (again, in principle) be taught otherwise. >> >> So at this point it becomes a policy question of whether we want to >> support this, and our opinion is that this isn't a valid use-case (as >> the only time you'd want PF-IOV is if your system doesn't support VFs) >> and thus shouldn't be supported (as it creates extra testing and >> maintenance work). The driver should probe the device, detect PF-IOV >> mode, recognise that it (the driver) doesn't have support for VFs in >> that mode, and set totalvfs to 0. >> Others may demur, of course, but that's likely to be decided when >> sending the driver patches to davem. It's my belief that, uncertainty >> about my use case notwithstanding, pci_sriov_set_totalvfs(dev, 0) should >> be a valid operation with the semantics I've implemented in my patch. >> >> -Edward > > How is it you get yourself into the PF-IOV mode? Is this something that > is stored in the NVM of the adapter, or is it something where the driver > has to issue a request for it? Also does each PF advertise SR-IOV > support in PF-IOV mode or only function 0? > > Modifying the total VFs and/or the initial VFs in the configuration > space is not practical. You would likely run into issues as one of > these fields is read at probe time and the other is read at SR-IOV > enablement and if the two differ you cannot enable SR-IOV. > > I'm not sure what davem has to do with any of this. We are discussing > PCI at this point, not the network drivers. As such Dave won't be the > one accepting the pci_sriov_set_totalvfs changes you proposed. If you > are just wanting to get a patch through Dave for this you would be much > better off just returning an error on sriov_configure if the vswitch > isn't available rather than trying to change the SR-IOV API to match > your use case. > > Thanks, > > Alex > +1 to the points Alex makes. This is a device-specific issue that can be handled with device-specific changes, leaving the existing PCI SRIOV infrastructure working as-is. pci_sriov_set_totalvfs() was not designed to disable SRIOV for a device, merely adjust it's max number reported in the config structure, which is common in a number of PCIe-SRIOV devices. returning failure on sriov_configure() is the proper solution for this device in this state... no different then if some other failure occurred in the device to fail sriov configuration. Nack to this patch request. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 16:57 ` Edward Cree 2014-07-31 17:53 ` Don Dutile @ 2014-07-31 17:55 ` Alexander Duyck 2014-07-31 18:24 ` Edward Cree 1 sibling, 1 reply; 27+ messages in thread From: Alexander Duyck @ 2014-07-31 17:55 UTC (permalink / raw) To: Edward Cree; +Cc: linux-pci, Don Dutile, Bjorn Helgaas On 07/31/2014 09:57 AM, Edward Cree wrote: > On 31/07/14 17:40, Alexander Duyck wrote: >> If this PF-IOV mode is enabled what would be the layout of the PF >> devices? It seems like you should be able to scan for multiple PFs all >> showing up on the same bus with a certain stride if you wanted to detect it. > Alas, it's (once again) more complicated than that. There's another > mode, NIC partitioning, which has multiple PFs per port but connected > with a VLAN aggregator rather than a v-switch, which means that each PF > then can support VFs (the firmware limitation is that v-switches can't > be stacked). > So having multiple PFs on the port doesn't necessarily mean we can't do > SR-IOV. Okay, so this is purely a vswitch issue then. If it is allocated for the PF-IOV it isn't available for SR-IOV. I think I understand now. >>>> In addition this solution would >>>> also resolve the fact that the driver wouldn't actually have to be >>>> loaded for it to work so if someone were to load a driver that didn't >>>> contain the fix they would be blocked from enabling SR-IOV as well. >>> The current driver fails to probe the PF because it assumes the vswitch >>> creation failure is fatal. There should never be a driver that knows it >>> can live without the vswitch but doesn't know that that breaks SR-IOV. >>> >>> -Edward >> Is the vswitch a hard requirement for something other than SR-IOV? If >> not then maybe you should consider modifying the driver so it simply >> disabled SR-IOV if you cannot allocate the vswitch instead of blocking >> probe. >> > That's exactly what I'm trying to do with pci_sriov_set_totalvfs(dev, > 0); that's the whole point. It sounds reasonable. My only real concern is the fact that the error reported when attempting to enable the interface will be a ERANGE when really a more appropriate error return might be something like a EBUSY since it is the vswitch that is occupied. You might want to take a look at modifying sriov_numvfs_store so that if the only option is 0 you return EBUSY for all non-zero values requested. This way it will be more in sync with how we handle the case of attempting to update the number of VFs when SR-IOV is already enabled. Thanks, Alex ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 17:55 ` Alexander Duyck @ 2014-07-31 18:24 ` Edward Cree 0 siblings, 0 replies; 27+ messages in thread From: Edward Cree @ 2014-07-31 18:24 UTC (permalink / raw) To: Alexander Duyck; +Cc: linux-pci, Don Dutile, Bjorn Helgaas On 31/07/14 18:55, Alexander Duyck wrote: > It sounds reasonable. My only real concern is the fact that the error > reported when attempting to enable the interface will be a ERANGE when > really a more appropriate error return might be something like a EBUSY > since it is the vswitch that is occupied. > > You might want to take a look at modifying sriov_numvfs_store so that if > the only option is 0 you return EBUSY for all non-zero values requested. > This way it will be more in sync with how we handle the case of > attempting to update the number of VFs when SR-IOV is already enabled. Seems mostly reasonable for sfc, but I wonder if EBUSY would necessarily be appropriate for all devices. Other devices might have other reasons why their driver_max_VFs is zero. So I think it's best to leave it as ERANGE - the proximate reason why numvfs can't be set is because totalvfs is 0; for the reason why totalvfs is 0, user can look at the logs. But I'm open to being persuaded otherwise. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 15:56 ` Edward Cree 2014-07-31 16:40 ` Alexander Duyck @ 2014-08-01 3:18 ` Ethan Zhao 2014-08-01 11:51 ` Edward Cree 1 sibling, 1 reply; 27+ messages in thread From: Ethan Zhao @ 2014-08-01 3:18 UTC (permalink / raw) To: Edward Cree; +Cc: Alexander Duyck, linux-pci, Don Dutile, Bjorn Helgaas On Thu, Jul 31, 2014 at 11:56 PM, Edward Cree <ecree@solarflare.com> wrote: > On 31/07/14 16:21, Alexander Duyck wrote: >> However if you are wanting to use this as a way to disable SR-IOV for a >> device I wouldn't recommend it. Based on what you have described I >> would suggest dropping the fix in drivers/pci/quirks that would clear >> dev->is_physfn if you are in the PF-IOV mode. That way the SR-IOV >> functionality should be fully disabled. > I still don't see how this can go in drivers/pci/quirks, because PF-IOV > mode can only be detected at driver probe time. The way we find out > that the NIC is in PF-IOV mode is that a request to the MC (the > Management CPU on the card) to create a vswitch fails. But maybe I'm > wrong about what quirks can do - is there any relevant documentation? > > Clearing dev->is_physfn in the driver probe routine isn't safe: > dev->sriov memory can get leaked, for example, as sriov_init will get > called at device add, but sriov_release won't be called at removal. > I think this memory leak bug doesn't exist. The dev->sriov memory was allocated when pci_device_add() called. whenever devices were hot-pluged into system or built-in devices were scanned while booting. The memory of dev->sriov is to be freed by kobject->kobject_release() when the device reference counter reaches zero when put_device() was called. for hot-plug, that is pciehp_unconfigure_device()....etc. but not in driver->remove(). put_device() kobject_put() kobject_release() kobject_cleanup() device_release() dev->release(dev) pci_release_dev() pci_release_capabilities() pci_iov_release() sriov_release(dev); kfree(dev->sriov); Thanks, Ethan > I'm also unconvinced that having !(dev->is_physfn || dev->is_virtfn) is > a valid state; but maybe I'm misunderstanding and "is_physfn" doesn't > actually mean "is a physical function", but rather "has SR-IOV capability". >> In addition this solution would >> also resolve the fact that the driver wouldn't actually have to be >> loaded for it to work so if someone were to load a driver that didn't >> contain the fix they would be blocked from enabling SR-IOV as well. > The current driver fails to probe the PF because it assumes the vswitch > creation failure is fatal. There should never be a driver that knows it > can live without the vswitch but doesn't know that that breaks SR-IOV. > > -Edward > The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-01 3:18 ` Ethan Zhao @ 2014-08-01 11:51 ` Edward Cree 2014-08-02 0:34 ` Ethan Zhao 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-08-01 11:51 UTC (permalink / raw) To: Ethan Zhao; +Cc: Alexander Duyck, linux-pci, Don Dutile, Bjorn Helgaas On 01/08/14 04:18, Ethan Zhao wrote: > On Thu, Jul 31, 2014 at 11:56 PM, Edward Cree <ecree@solarflare.com> wrote: >> Clearing dev->is_physfn in the driver probe routine isn't safe: >> dev->sriov memory can get leaked, for example, as sriov_init will get >> called at device add, but sriov_release won't be called at removal. >> > I think this memory leak bug doesn't exist. > > The dev->sriov memory was allocated when pci_device_add() called. > whenever devices > were hot-pluged into system or built-in devices were scanned while booting. > > The memory of dev->sriov is to be freed by kobject->kobject_release() when > the device reference counter reaches zero when put_device() was called. > for hot-plug, that is pciehp_unconfigure_device()....etc. but not in > driver->remove(). > Yes, but the problem is in pci_iov_release(), not how it gets called; its code is if (dev->is_physfn) sriov_release(dev); So if the device driver clears dev->is_physfn, and later on the device is hot-unplugged, pci_iov_release() won't call sriov_release(). Thus any device driver that changes dev->is_physfn will, if the device is hot-unplugged, leak dev->sriov. Fortunately a quick grep shows no drivers doing this, so there's no actual bug in existing code. It's just that we can't use this approach to disable SR-IOV from the driver's probe routine. -Edward > put_device() > kobject_put() > kobject_release() > kobject_cleanup() > device_release() > dev->release(dev) > pci_release_dev() > pci_release_capabilities() > pci_iov_release() > sriov_release(dev); > kfree(dev->sriov); The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-01 11:51 ` Edward Cree @ 2014-08-02 0:34 ` Ethan Zhao 0 siblings, 0 replies; 27+ messages in thread From: Ethan Zhao @ 2014-08-02 0:34 UTC (permalink / raw) To: Edward Cree; +Cc: Alexander Duyck, linux-pci, Don Dutile, Bjorn Helgaas > 在 2014年8月1日,下午7:51,Edward Cree <ecree@solarflare.com> 写道: > >> On 01/08/14 04:18, Ethan Zhao wrote: >>> On Thu, Jul 31, 2014 at 11:56 PM, Edward Cree <ecree@solarflare.com> wrote: >>> Clearing dev->is_physfn in the driver probe routine isn't safe: >>> dev->sriov memory can get leaked, for example, as sriov_init will get >>> called at device add, but sriov_release won't be called at removal. >> I think this memory leak bug doesn't exist. >> >> The dev->sriov memory was allocated when pci_device_add() called. >> whenever devices >> were hot-pluged into system or built-in devices were scanned while booting. >> >> The memory of dev->sriov is to be freed by kobject->kobject_release() when >> the device reference counter reaches zero when put_device() was called. >> for hot-plug, that is pciehp_unconfigure_device()....etc. but not in >> driver->remove(). > Yes, but the problem is in pci_iov_release(), not how it gets called; > its code is > if (dev->is_physfn) > sriov_release(dev); > So if the device driver clears dev->is_physfn, and later on the device > is hot-unplugged, pci_iov_release() won't call sriov_release(). > Thus any device driver that changes dev->is_physfn will, if the device > is hot-unplugged, leak dev->sriov. Currently , we set dev->is_physfn when PF' SRIOV cap is detected, no reason to Change it later in driver. Thanks Ethan > Fortunately a quick grep shows no drivers doing this, so there's no > actual bug in existing code. It's just that we can't use this approach > to disable SR-IOV from the driver's probe routine. > -Edward >> put_device() >> kobject_put() >> kobject_release() >> kobject_cleanup() >> device_release() >> dev->release(dev) >> pci_release_dev() >> pci_release_capabilities() >> pci_iov_release() >> sriov_release(dev); >> kfree(dev->sriov); > The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-07-31 14:24 ` [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) Edward Cree 2014-07-31 15:21 ` Alexander Duyck @ 2014-08-01 3:51 ` Ethan Zhao 2014-08-01 12:15 ` Edward Cree 1 sibling, 1 reply; 27+ messages in thread From: Ethan Zhao @ 2014-08-01 3:51 UTC (permalink / raw) To: Edward Cree; +Cc: linux-pci, Alexander Duyck, Don Dutile, Bjorn Helgaas Though I like this patch, it shows direct style, but it will change the meaning of a function not mentioned here. int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs) So far seems no driver call pci_sriov_set_totalvfs() with numvfs = 0 and the patch will change the behavior of the drivers, but it will definitely change the test cases and documents related to IOV. Thanks, Ethan On Thu, Jul 31, 2014 at 10:24 PM, Edward Cree <ecree@solarflare.com> wrote: > Make a driver_max_VFs of 0 mean 0, instead of total_VFs as previously. > Initialise driver_max_VFs to total_VFs instead of 0. > > Signed-off-by: Edward Cree <ecree@solarflare.com> > --- > Only existing callers are ixgbe and igb, and neither passes 0, so this > doesn't break existing users. > --- > drivers/pci/iov.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index cb6f247..fb51467 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -381,7 +381,7 @@ found: > iov->pos = pos; > iov->nres = nres; > iov->ctrl = ctrl; > - iov->total_VFs = total; > + iov->driver_max_VFs = iov->total_VFs = total; > iov->offset = offset; > iov->stride = stride; > iov->pgsz = pgsz; > @@ -686,9 +686,6 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev) > if (!dev->is_physfn) > return 0; > > - if (dev->sriov->driver_max_VFs) > - return dev->sriov->driver_max_VFs; > - > - return dev->sriov->total_VFs; > + return dev->sriov->driver_max_VFs; > } > EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs); > -- > 1.7.11.7 > The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-01 3:51 ` Ethan Zhao @ 2014-08-01 12:15 ` Edward Cree 2014-08-02 0:25 ` Ethan Zhao 2014-08-04 6:53 ` Sathya Perla 0 siblings, 2 replies; 27+ messages in thread From: Edward Cree @ 2014-08-01 12:15 UTC (permalink / raw) To: Ethan Zhao Cc: linux-pci, Alexander Duyck, Don Dutile, Bjorn Helgaas, Vasundhara Volam On 01/08/14 04:51, Ethan Zhao wrote: > Though I like this patch, it shows direct style, but it will change > the meaning of a function not mentioned here. > > int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs) Am I missing something here, or is that exactly the function mentioned in the patch title / subject line? > So far seems no driver call pci_sriov_set_totalvfs() with numvfs = 0 > and the patch will change the behavior of the drivers, but it will > definitely change the test cases and documents related to IOV. As for test cases and documents, I can't find any in the kernel tree - if I grep for pci_sriov_set_totalvfs, the only mentions I find are: * The declaration in include/linux/pci.h. Doesn't have any comments. * The definition in drivers/pci/iov.c. The comment doesn't document the behaviour in the numvfs = 0 case. * Documentation/ABI/testing/sysfs-bus-pci. Describes the function's purpose "reduce the value read from [the sriov_totalvfs sysfs file]" but doesn't say anything about the numvfs = 0 case. * the Intel igb and ixgbe ethernet drivers. Both pass nonzero constants for numvfs. * the Emulex benet driver. This went into Dave Miller's net-next branch on 30/06/2014 (so hasn't hit Linus' or other trees yet), but it looks to me like it might call with numvfs = 0. CCing the author of that patch (Vasundhara Volam) for comments. -Edward The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-01 12:15 ` Edward Cree @ 2014-08-02 0:25 ` Ethan Zhao 2014-08-04 15:45 ` Edward Cree 2014-08-04 6:53 ` Sathya Perla 1 sibling, 1 reply; 27+ messages in thread From: Ethan Zhao @ 2014-08-02 0:25 UTC (permalink / raw) To: Edward Cree Cc: linux-pci, Alexander Duyck, Don Dutile, Bjorn Helgaas, Vasundhara Volam > 在 2014年8月1日,下午8:15,Edward Cree <ecree@solarflare.com> 写道: > >> On 01/08/14 04:51, Ethan Zhao wrote: >> Though I like this patch, it shows direct style, but it will change >> the meaning of a function not mentioned here. >> >> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs) > Am I missing something here, or is that exactly the function mentioned > in the patch title / subject line? > >> So far seems no driver call pci_sriov_set_totalvfs() with numvfs = 0 >> and the patch will change the behavior of the drivers, but it will >> definitely change the test cases and documents related to IOV. > As for test cases and documents, I can't find any in the kernel tree - > if I grep for pci_sriov_set_totalvfs, the only mentions I find are: > * The declaration in include/linux/pci.h. Doesn't have any comments. > * The definition in drivers/pci/iov.c. The comment doesn't document the > behaviour in the numvfs = 0 case. > * Documentation/ABI/testing/sysfs-bus-pci. Describes the function's > purpose "reduce the value read from [the sriov_totalvfs sysfs file]" but > doesn't say anything about the numvfs = 0 case. > * the Intel igb and ixgbe ethernet drivers. Both pass nonzero constants > for numvfs. > * the Emulex benet driver. This went into Dave Miller's net-next branch > on 30/06/2014 (so hasn't hit Linus' or other trees yet), but it looks to > me like it might call with numvfs = 0. CCing the author of that patch > (Vasundhara Volam) for comments. I mean some test cases or doc that are out of kernel tree ... such as some Distro, not only Hardware vendors. Thanks. Ethan > > -Edward > The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-02 0:25 ` Ethan Zhao @ 2014-08-04 15:45 ` Edward Cree 2014-08-04 16:40 ` Alexander Duyck 0 siblings, 1 reply; 27+ messages in thread From: Edward Cree @ 2014-08-04 15:45 UTC (permalink / raw) To: Ethan Zhao Cc: linux-pci, Alexander Duyck, Don Dutile, Bjorn Helgaas, Vasundhara Volam On 02/08/14 01:25, Ethan Zhao wrote: > 在 2014年8月1日,下午8:15,Edward Cree <ecree@solarflare.com> 写道: >> On 01/08/14 04:51, Ethan Zhao wrote: >>> So far seems no driver call pci_sriov_set_totalvfs() with numvfs = 0 >>> and the patch will change the behavior of the drivers, but it will >>> definitely change the test cases and documents related to IOV. >> As for test cases and documents, I can't find any in the kernel tree > I mean some test cases or doc that are out of kernel tree ... such as some Distro, > not only Hardware vendors. I was under the impression that we don't care about breaking out-of-tree code. Distros should be (and it looks like most are) using the kernel-doc to generate their documentation, and the kernel-doc doesn't say anything about what happens when passing 0. -Edward ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-04 15:45 ` Edward Cree @ 2014-08-04 16:40 ` Alexander Duyck 2014-08-04 17:08 ` Edward Cree 0 siblings, 1 reply; 27+ messages in thread From: Alexander Duyck @ 2014-08-04 16:40 UTC (permalink / raw) To: Edward Cree, Ethan Zhao Cc: linux-pci, Don Dutile, Bjorn Helgaas, Vasundhara Volam On 08/04/2014 08:45 AM, Edward Cree wrote: > On 02/08/14 01:25, Ethan Zhao wrote: >> 在 2014年8月1日,下午8:15,Edward Cree <ecree@solarflare.com> 写道: >>> On 01/08/14 04:51, Ethan Zhao wrote: >>>> So far seems no driver call pci_sriov_set_totalvfs() with numvfs = 0 >>>> and the patch will change the behavior of the drivers, but it will >>>> definitely change the test cases and documents related to IOV. >>> As for test cases and documents, I can't find any in the kernel tree >> I mean some test cases or doc that are out of kernel tree ... such as some Distro, >> not only Hardware vendors. > I was under the impression that we don't care about breaking out-of-tree > code. Distros should be (and it looks like most are) using the > kernel-doc to generate their documentation, and the kernel-doc doesn't > say anything about what happens when passing 0. > > -Edward Edward, The concern isn't what is out there, but what will come of it as a result. The question is should we allow setting the value to 0 to change the behavior from disabling the override to essentially disabling SR-IOV. If we consider it acceptable to use 0 as a disable value, and there are several people on this thread that don't, the secondary issue of this is the error return for sriov_numvfs_store which hasn't been addressed. Should we still be returning ERANGE if SR-IOV has essentially been disabled, or should we be returning some other error such as EBUSY, ENOMEM, ENODEV, or ENOSYS to indicate that there are no resources available for SR-IOV. I think you would be much better off just implementing sriov_configure for your PCI driver and placing your own error return there if you cannot allocate a vswitch. Thanks, Alex ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-04 16:40 ` Alexander Duyck @ 2014-08-04 17:08 ` Edward Cree 0 siblings, 0 replies; 27+ messages in thread From: Edward Cree @ 2014-08-04 17:08 UTC (permalink / raw) To: Alexander Duyck Cc: Ethan Zhao, linux-pci, Don Dutile, Bjorn Helgaas, Vasundhara Volam On 04/08/14 17:40, Alexander Duyck wrote: > Edward, > > The concern isn't what is out there, but what will come of it as a > result. The question is should we allow setting the value to 0 to > change the behavior from disabling the override to essentially disabling > SR-IOV. I still don't see why any driver would set an override and then want to disable it; moreover, it seems like unnatural semantics to use 0 for this (something like -1 would seem much more sensible) since most people will expect 0 to mean, well, 0. > If we consider it acceptable to use 0 as a disable value, and there are > several people on this thread that don't, the secondary issue of this is > the error return for sriov_numvfs_store which hasn't been addressed. > Should we still be returning ERANGE if SR-IOV has essentially been > disabled, or should we be returning some other error such as EBUSY, > ENOMEM, ENODEV, or ENOSYS to indicate that there are no resources > available for SR-IOV. In my opinion we should still return ERANGE, as that's the immediate error - user requested more VFs than were advertised. I.e. it's exactly the same as if someone requested 8 VFs on igb after igb reduced totalvfs to 7; we don't return a different error code to say _why_ igb can only support 7 VFs. I don't see why 0 is special in this regard. > I think you would be much better off just implementing sriov_configure > for your PCI driver and placing your own error return there if you > cannot allocate a vswitch. It just seems silly to advertise VFs when we already know at PF probe time that we've failed to allocate the vswitch. -Edward ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) 2014-08-01 12:15 ` Edward Cree 2014-08-02 0:25 ` Ethan Zhao @ 2014-08-04 6:53 ` Sathya Perla 1 sibling, 0 replies; 27+ messages in thread From: Sathya Perla @ 2014-08-04 6:53 UTC (permalink / raw) To: Edward Cree, Ethan Zhao Cc: linux-pci, Alexander Duyck, Don Dutile, Bjorn Helgaas, Vasundhara Volam PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1wY2ktb3duZXJAdmdl ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtcGNpLQ0KPiANCj4gT24gMDEvMDgvMTQgMDQ6NTEs IEV0aGFuIFpoYW8gd3JvdGU6DQo+ID4gVGhvdWdoIEkgbGlrZSB0aGlzIHBhdGNoLCBpdCBzaG93 cyBkaXJlY3Qgc3R5bGUsIGJ1dCBpdCB3aWxsIGNoYW5nZQ0KPiA+IHRoZSBtZWFuaW5nIG9mIGEg ZnVuY3Rpb24gbm90IG1lbnRpb25lZCBoZXJlLg0KPiA+DQo+ID4gICBpbnQgcGNpX3NyaW92X3Nl dF90b3RhbHZmcyhzdHJ1Y3QgcGNpX2RldiAqZGV2LCB1MTYgbnVtdmZzKQ0KPiBBbSBJIG1pc3Np bmcgc29tZXRoaW5nIGhlcmUsIG9yIGlzIHRoYXQgZXhhY3RseSB0aGUgZnVuY3Rpb24gbWVudGlv bmVkDQo+IGluIHRoZSBwYXRjaCB0aXRsZSAvIHN1YmplY3QgbGluZT8NCj4gDQo+ID4gU28gZmFy IHNlZW1zIG5vIGRyaXZlciBjYWxsIHBjaV9zcmlvdl9zZXRfdG90YWx2ZnMoKSB3aXRoIG51bXZm cyA9IDANCj4gPiBhbmQgdGhlIHBhdGNoIHdpbGwgY2hhbmdlIHRoZSBiZWhhdmlvciBvZiB0aGUg ZHJpdmVycywgYnV0IGl0IHdpbGwNCj4gPiBkZWZpbml0ZWx5IGNoYW5nZSB0aGUgdGVzdCBjYXNl cyBhbmQgZG9jdW1lbnRzIHJlbGF0ZWQgdG8gSU9WLg0KPiBBcyBmb3IgdGVzdCBjYXNlcyBhbmQg ZG9jdW1lbnRzLCBJIGNhbid0IGZpbmQgYW55IGluIHRoZSBrZXJuZWwgdHJlZSAtDQo+IGlmIEkg Z3JlcCBmb3IgcGNpX3NyaW92X3NldF90b3RhbHZmcywgdGhlIG9ubHkgbWVudGlvbnMgSSBmaW5k IGFyZToNCj4gKiBUaGUgZGVjbGFyYXRpb24gaW4gaW5jbHVkZS9saW51eC9wY2kuaC4gIERvZXNu J3QgaGF2ZSBhbnkgY29tbWVudHMuDQo+ICogVGhlIGRlZmluaXRpb24gaW4gZHJpdmVycy9wY2kv aW92LmMuICBUaGUgY29tbWVudCBkb2Vzbid0IGRvY3VtZW50IHRoZQ0KPiBiZWhhdmlvdXIgaW4g dGhlIG51bXZmcyA9IDAgY2FzZS4NCj4gKiBEb2N1bWVudGF0aW9uL0FCSS90ZXN0aW5nL3N5c2Zz LWJ1cy1wY2kuICBEZXNjcmliZXMgdGhlIGZ1bmN0aW9uJ3MNCj4gcHVycG9zZSAicmVkdWNlIHRo ZSB2YWx1ZSByZWFkIGZyb20gW3RoZSBzcmlvdl90b3RhbHZmcyBzeXNmcyBmaWxlXSIgYnV0DQo+ IGRvZXNuJ3Qgc2F5IGFueXRoaW5nIGFib3V0IHRoZSBudW12ZnMgPSAwIGNhc2UuDQo+ICogdGhl IEludGVsIGlnYiBhbmQgaXhnYmUgZXRoZXJuZXQgZHJpdmVycy4gIEJvdGggcGFzcyBub256ZXJv IGNvbnN0YW50cw0KPiBmb3IgbnVtdmZzLg0KPiAqIHRoZSBFbXVsZXggYmVuZXQgZHJpdmVyLiAg VGhpcyB3ZW50IGludG8gRGF2ZSBNaWxsZXIncyBuZXQtbmV4dCBicmFuY2gNCj4gb24gMzAvMDYv MjAxNCAoc28gaGFzbid0IGhpdCBMaW51cycgb3Igb3RoZXIgdHJlZXMgeWV0KSwgYnV0IGl0IGxv b2tzIHRvDQo+IG1lIGxpa2UgaXQgbWlnaHQgY2FsbCB3aXRoIG51bXZmcyA9IDAuICBDQ2luZyB0 aGUgYXV0aG9yIG9mIHRoYXQgcGF0Y2gNCj4gKFZhc3VuZGhhcmEgVm9sYW0pIGZvciBjb21tZW50 cy4NCj4gDQoNCkluY2lkZW50YWxseSwgYWZ0ZXIgbXkgbW9zdCByZWNlbnQgcGF0Y2ggLSBjb21t aXQgZDNkMTgzMTINCigiYmUybmV0OiBpZ25vcmUgZ2V0L3NldCBwcm9maWxlIEZXIGNtZCBmYWls dXJlcyIpLCBiZTJuZXQgbm8gbG9uZ2VyDQpjYWxscyBwY2lfc3Jpb3Zfc2V0X3RvdGFsdmZzKCkg d2l0aCBudW1fdmZzPTAuIA0KSG9wZSB0aGlzIGhlbHBzLg0KDQp0aGFua3MsDQotU2F0aHlhDQo= ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-08-06 13:38 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <53D9288B.5030302@solarflare.com> 2014-07-30 18:05 ` pci_sriov_set_totalvfs again Don Dutile 2014-07-30 18:24 ` Edward Cree 2014-07-30 21:14 ` Alexander Duyck 2014-07-31 12:07 ` Edward Cree 2014-07-31 14:24 ` [PATCH] PCI: handle pci_sriov_set_totalvfs(dev, 0) Edward Cree 2014-07-31 15:21 ` Alexander Duyck 2014-07-31 15:56 ` Edward Cree 2014-07-31 16:40 ` Alexander Duyck 2014-07-31 16:57 ` Edward Cree 2014-07-31 17:53 ` Don Dutile 2014-07-31 18:13 ` Edward Cree 2014-08-04 14:03 ` Edward Cree 2014-08-04 14:37 ` Alexander Duyck 2014-08-04 15:22 ` Edward Cree 2014-08-06 9:38 ` Don Dutile 2014-07-31 17:55 ` Alexander Duyck 2014-07-31 18:24 ` Edward Cree 2014-08-01 3:18 ` Ethan Zhao 2014-08-01 11:51 ` Edward Cree 2014-08-02 0:34 ` Ethan Zhao 2014-08-01 3:51 ` Ethan Zhao 2014-08-01 12:15 ` Edward Cree 2014-08-02 0:25 ` Ethan Zhao 2014-08-04 15:45 ` Edward Cree 2014-08-04 16:40 ` Alexander Duyck 2014-08-04 17:08 ` Edward Cree 2014-08-04 6:53 ` Sathya Perla
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.