On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote: > On Thu, 16 Jul 2020 14:45:40 +1000 > David Gibson wrote: > > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > > Some recent error handling cleanups unveiled issues with our support of > > > PCI bridges: > > > > > > 1) QEMU aborts when using non-standard PCI bridge types, > > > unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > > > > > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > > Unexpected error in object_property_find() at qom/object.c:1240: > > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found > > > Aborted (core dumped) > > > > Oops, I thought we had a check that we actually had a "pci-bridge" > > device before continuing with the hotplug, but I guess not. > > Ah... are you suggesting we should explicitly check the actual type > of the bridge rather than looking for the "chassis_nr" property ? Uh.. I thought about it, but I don't think it matters much which way we do it. > > > This happens because we assume all PCI bridge types to have a "chassis_nr" > > > property. This property only exists with the standard PCI bridge type > > > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > > > much simpler to check the presence of "chassis_nr" earlier. > > > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > > can fail. > > Yes. > > > > 2) QEMU abort if same "chassis_nr" value is used several times, > > > unveiled by commit d2623129a7de "qom: Drop parameter @errp of > > > object_property_add() & friends" > > > > > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > > > -device pci-bridge,chassis_nr=1 > > > Unexpected error in object_property_try_add() at qom/object.c:1167: > > > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container') > > > Aborted (core dumped) > > > > > > This happens because we assume that "chassis_nr" values are unique, but > > > nobody enforces that and we end up generating duplicate DRC ids. The PCI > > > code doesn't really care for duplicate "chassis_nr" properties since it > > > is only used to initialize the "Chassis Number Register" of the bridge, > > > with no functional impact on QEMU. So, even if passing the same value > > > several times might look weird, it never broke anything before, so > > > I guess we don't necessarily want to enforce strict checking in the PCI > > > code now. > > > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > > supposed to be system-unique (well, unique within the PCI domain at > > least, I guess) as part of the hardware spec. So specifying multiple > > chassis ids the same is a user error, but we need a better failure > > mode. > > According to section 3.2.6.4 of "PCI-to-PCI Bridge Architecture > Specification", the chassis number is exposed to the OS as a > non-volatile r/w register. Argh. Dammit. I could have sworn I checked that chassis numbers were supposed to be unique (and not guest alterable). That's the whole reason I chose it. > It seems to be expected that chassis > numbers might collide, in which case the system software can > overwrite the register with a new number. So I'm not sure that > specifying the same number multiple times is an actual user error. > > > > Workaround both issues in the PAPR code: check that the bridge has a > > > unique and non null "chassis_nr" when plugging it into its parent bus. > > > > > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > > > Arguably, it's really fixing 7ef1553dac. > > True for issue 1 but not for issue 2, which is the result of > 05929a6c5dfe (switch to "chassis_nr" introduces a condition > to end up with duplicate DRC ids) Well, technically. But the method we had before was *way* more broken than chassis numbers. It was using bus number which is not stable across the VM's lifetime, which is a non-starter. Plus, the bus number too won't be unique, until the guest has enumerated the bus, which is too late for DRC creation. The only reason we got away with it, was that it was basically dead code - at that stage we didn't support hotplug under bridges, so we never actually created DRCs except for the root bus. > and d2623129a7de (assert > when trying to add a duplicated DRC). Well, again, only on a technicality. It might not have immediatley assert()ed, but adding a duplicated DRC was still completely broken before that. > I'm starting to think I should maybe split this in > two patches. One for each issue. At this stage, I'd kind of prefer to merge this fix now, with the intention of doing a pull request tomorrow. AFAICT none of the suggested improvements can't be done as followups. > > > Reported-by: Thomas Huth > > > Signed-off-by: Greg Kurz > > > > I had a few misgivings about the details of this, but I think I've > > convinced myself they're fine. There's a couple of things I'd like to > > polish, but I'll do that as a follow up. > > Some fixes for d2623129a7de just got merged. Let me have a look > again. In fact the main part of the polish I was thinking of didn't really work out. The only change I made was a tiny move to the check (it's not really relevant until *after* we've checked if hotplug is enabled at all). So I just folded that in and put it into the ppc-for-5.1 tree. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson