On Tue, Feb 16, 2021 at 02:44:44PM -0300, Daniel Henrique Barboza wrote: > > > On 2/16/21 2:16 PM, Greg Kurz wrote: > > On Tue, 16 Feb 2021 13:09:43 -0300 > > Daniel Henrique Barboza wrote: > > > > > > > > > > > On 2/16/21 12:50 PM, Greg Kurz wrote: > > > > On Thu, 11 Feb 2021 19:52:41 -0300 > > > > Daniel Henrique Barboza wrote: > > > > > > > > > When hotunplugging a PCI function we'll branch out the logic in two cases, > > > > > function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and > > > > > nothing else. If it's function zero, we'll loop it once between all the > > > > > functions in the slot to call spapr_drc_detach() on them, and afterwards > > > > > we'll do another backwards loop where we'll signal the event to the guest. > > > > > > > > > > We can simplify this logic. We can ignore all the DRC handling for non-zero > > > > > functions, since we'll end up doing that regardless when unplugging function > > > > > zero. And for function zero, everything can be done in a single loop, since > > > > > tt doesn't matter if we end up marking the function DRCs as unplug pending in > > > > > backwards order or not, as long as we call spapr_drc_detach() before issuing > > > > > the hotunplug event to the guest. > > > > > > > > > > This will also avoid a possible scenario where the user starts to hotunplug > > > > > the slot, starting with a non-zero function, and then delays/forgets to > > > > > hotunplug function zero afterwards. This would keep the function DRC marked > > > > > as unplug requested indefinitely. > > > > > > > > > > > > > ... or until the guest is reset, which will no longer happen with this > > > > patch applied, i.e. breaks the long standing policy that machine reset > > > > causes pending hot-unplug requests to succeed. I don't see an obvious > > > > reason to special case non-zero PCI functions. > > > > > > It's not possible to hotunplug the non-zero functions during machine reset for > > > multifunction PCI devices. We need to unplug the entire slot, and that will only > > > happen when function zero is unplugged. In fact, I think bad things will happen > > > in this case you mentioned if we are forcing the removal of non-zero functions > > > without function zero (spoiler: didn't test it). > > > > > > > I've tested with the aggregation of two e1000e emulated devices: > > > > device_add e1000e,addr=10.1,id=netfn1 > > device_add e1000e,multifunction=on,addr=10.0,id=netfn0 > > > > And I don't quite see what "bad things" could happen. We're resetting the > > machine to a stable state and the new OS instance will just not see the > > removed function (just like only function netfn0 got added). > > > Interesting. Thanks for looking this up. > > Given that the intention of this patch was a simplification of the existing > design, without changing what we currently do regarding PCI functions and unplug, > and apparently it just did that, let's drop it. I think that's best. As Greg says, I think maintaining the behaviour that reset completes pending hotplugs should be retained, and the usual constraints on non-zero function hot-unplug don't apply at reset time. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson