qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Crashes with qemu-system-ppc64
@ 2021-03-23 16:48 Thomas Huth
  2021-03-23 18:20 ` Philippe Mathieu-Daudé
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Thomas Huth @ 2021-03-23 16:48 UTC (permalink / raw)
  To: qemu-ppc; +Cc: QEMU Developers


In case anyone is interested in fixing those, there are two regressions with 
qemu-system-ppc64 in the current master branch:

$ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443: 
memory_region_add_subregion_common: Assertion `!subregion->container' failed.

$ ./qemu-system-ppc64 -device power8_v2.0-spapr-cpu-core,help
/home/thuth/devel/qemu/include/hw/boards.h:24:MACHINE: Object 0x5635bd53af10 
is not an instance of type machine
Aborted (core dumped)

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 16:48 Crashes with qemu-system-ppc64 Thomas Huth
@ 2021-03-23 18:20 ` Philippe Mathieu-Daudé
  2021-03-23 21:19 ` Mark Cave-Ayland
  2021-03-23 23:00 ` Greg Kurz
  2 siblings, 0 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-23 18:20 UTC (permalink / raw)
  To: Thomas Huth, qemu-ppc; +Cc: Mark Cave-Ayland, QEMU Developers

+Mark

On 3/23/21 5:48 PM, Thomas Huth wrote:
> 
> In case anyone is interested in fixing those, there are two regressions
> with qemu-system-ppc64 in the current master branch:
> 
> $ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
> qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443:
> memory_region_add_subregion_common: Assertion `!subregion->container'
> failed.

I wonder if this is a side effect of using
object_resolve_path_component().

> 
> $ ./qemu-system-ppc64 -device power8_v2.0-spapr-cpu-core,help
> /home/thuth/devel/qemu/include/hw/boards.h:24:MACHINE: Object
> 0x5635bd53af10 is not an instance of type machine
> Aborted (core dumped)
> 
>  Thomas
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 16:48 Crashes with qemu-system-ppc64 Thomas Huth
  2021-03-23 18:20 ` Philippe Mathieu-Daudé
@ 2021-03-23 21:19 ` Mark Cave-Ayland
  2021-03-23 22:57   ` Peter Maydell
  2021-03-23 23:00 ` Greg Kurz
  2 siblings, 1 reply; 13+ messages in thread
From: Mark Cave-Ayland @ 2021-03-23 21:19 UTC (permalink / raw)
  To: Thomas Huth, qemu-ppc; +Cc: QEMU Developers, Markus Armbruster

On 23/03/2021 16:48, Thomas Huth wrote:

(adding Markus on CC for comment)

> In case anyone is interested in fixing those, there are two regressions with 
> qemu-system-ppc64 in the current master branch:
> 
> $ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
> qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443: 
> memory_region_add_subregion_common: Assertion `!subregion->container' failed.

Well this is an odd one. The basic backtrace looks like this:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff5bf9535 in __GI_abort () at abort.c:79
#2  0x00007ffff5bf940f in __assert_fail_base (fmt=0x7ffff5d5bee0 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", assertion=0x55555617aaf7 "!subregion->container", 
file=0x55555617a7bb "../softmmu/memory.c", line=2443,
     function=<optimized out>) at assert.c:92
#3  0x00007ffff5c07102 in __GI___assert_fail (assertion=0x55555617aaf7 
"!subregion->container", file=0x55555617a7bb "../softmmu/memory.c", line=2443,
     function=0x55555617b0e0 <__PRETTY_FUNCTION__.33121> 
"memory_region_add_subregion_common") at assert.c:101
#4  0x0000555555cd69f1 in memory_region_add_subregion_common (mr=0x55555728a910, 
offset=32768, subregion=0x55555728c2d0) at ../softmmu/memory.c:2443
#5  0x0000555555cd6a51 in memory_region_add_subregion (mr=0x55555728a910, 
offset=32768, subregion=0x55555728c2d0) at ../softmmu/memory.c:2454
#6  0x0000555555a8f83c in macio_common_realize (d=0x555557289f90, 
errp=0x7fffffffe3c0) at ../hw/misc/macio/macio.c:106
#7  0x0000555555a8fadb in macio_oldworld_realize (d=0x555557289f90, 
errp=0x7fffffffe428) at ../hw/misc/macio/macio.c:147
#8  0x0000555555b605bb in pci_qdev_realize (qdev=0x555557289f90, errp=0x7fffffffe4a0) 
at ../hw/pci/pci.c:2114
#9  0x0000555555f1b95f in device_set_realized (obj=0x555557289f90, value=true, 
errp=0x7fffffffe5a8) at ../hw/core/qdev.c:761
#10 0x0000555555e14457 in property_set_bool (obj=0x555557289f90, v=0x5555572f1890, 
name=0x5555561f6af9 "realized", opaque=0x555556a1fce0, errp=0x7fffffffe5a8) at 
../qom/object.c:2257
#11 0x0000555555e12562 in object_property_set (obj=0x555557289f90, 
name=0x5555561f6af9 "realized", v=0x5555572f1890, errp=0x555556920e78 <error_fatal>) 
at ../qom/object.c:1402
#12 0x0000555555e0f0dc in object_property_set_qobject (obj=0x555557289f90, 
name=0x5555561f6af9 "realized", value=0x5555572f16e0, errp=0x555556920e78 
<error_fatal>) at ../qom/qom-qobject.c:28
#13 0x0000555555e128c6 in object_property_set_bool (obj=0x555557289f90, 
name=0x5555561f6af9 "realized", value=true, errp=0x555556920e78 <error_fatal>) at 
../qom/object.c:1472
#14 0x0000555555f1a9e4 in qdev_realize (dev=0x555557289f90, bus=0x5555570046a0, 
errp=0x555556920e78 <error_fatal>) at ../hw/core/qdev.c:389
#15 0x0000555555b6717a in qdev_device_add (opts=0x555556995650, errp=0x555556920e78 
<error_fatal>) at ../softmmu/qdev-monitor.c:665
#16 0x0000555555d93b21 in device_init_func (opaque=0x0, opts=0x555556995650, 
errp=0x555556920e78 <error_fatal>) at ../softmmu/vl.c:1211
#17 0x000055555602d49d in qemu_opts_foreach (list=0x555556826c60 <qemu_device_opts>, 
func=0x555555d93afa <device_init_func>, opaque=0x0, errp=0x555556920e78 
<error_fatal>) at ../util/qemu-option.c:1167
#18 0x0000555555d96cac in qemu_create_cli_devices () at ../softmmu/vl.c:2541
#19 0x0000555555d96dd0 in qmp_x_exit_preconfig (errp=0x555556920e78 <error_fatal>) at 
../softmmu/vl.c:2589
#20 0x0000555555d9945e in qemu_init (argc=5, argv=0x7fffffffea28, 
envp=0x7fffffffea58) at ../softmmu/vl.c:3611
#21 0x000055555586b34d in main (argc=5, argv=0x7fffffffea28, envp=0x7fffffffea58) at 
../softmmu/main.c:49
(gdb)

But how can the assert() fire given that the DBDMA memory region hasn't been attached 
to anything yet? The answer appears to be because of some magic in the platform bus 
used by the e500 machine. Poking around you can see what is happening by placing a 
breakpoint on e500plat_machine_device_plug_cb():

(gdb) bt
#0  e500plat_machine_device_plug_cb (hotplug_dev=0x555556a1ea00, dev=0x55555728bcf0, 
errp=0x7fffffffe170) at ../hw/ppc/e500plat.c:48
#1  0x0000555555f171ac in hotplug_handler_plug (plug_handler=0x555556a1ea00, 
plugged_dev=0x55555728bcf0, errp=0x7fffffffe170) at ../hw/core/hotplug.c:34
#2  0x0000555555f1bb5f in device_set_realized (obj=0x55555728bcf0, value=true, 
errp=0x7fffffffe278) at ../hw/core/qdev.c:818
#3  0x0000555555e14457 in property_set_bool (obj=0x55555728bcf0, v=0x5555572f2600, 
name=0x5555561f6af9 "realized", opaque=0x555556a1fce0, errp=0x7fffffffe278) at 
../qom/object.c:2257
#4  0x0000555555e12562 in object_property_set (obj=0x55555728bcf0, 
name=0x5555561f6af9 "realized", v=0x5555572f2600, errp=0x7fffffffe3c0) at 
../qom/object.c:1402
#5  0x0000555555e0f0dc in object_property_set_qobject (obj=0x55555728bcf0, 
name=0x5555561f6af9 "realized", value=0x5555572f1f10, errp=0x7fffffffe3c0) at 
../qom/qom-qobject.c:28
#6  0x0000555555e128c6 in object_property_set_bool (obj=0x55555728bcf0, 
name=0x5555561f6af9 "realized", value=true, errp=0x7fffffffe3c0) at ../qom/object.c:1472
#7  0x0000555555f1a9e4 in qdev_realize (dev=0x55555728bcf0, bus=0x55555728a5d0, 
errp=0x7fffffffe3c0) at ../hw/core/qdev.c:389
#8  0x0000555555a8f7f0 in macio_common_realize (d=0x555557289cd0, 
errp=0x7fffffffe3c0) at ../hw/misc/macio/macio.c:102
#9  0x0000555555a8fadb in macio_oldworld_realize (d=0x555557289cd0, 
errp=0x7fffffffe428) at ../hw/misc/macio/macio.c:147
#10 0x0000555555b605bb in pci_qdev_realize (qdev=0x555557289cd0, errp=0x7fffffffe4a0) 
at ../hw/pci/pci.c:2114
#11 0x0000555555f1b95f in device_set_realized (obj=0x555557289cd0, value=true, 
errp=0x7fffffffe5a8) at ../hw/core/qdev.c:761
#12 0x0000555555e14457 in property_set_bool (obj=0x555557289cd0, v=0x5555572f1260, 
name=0x5555561f6af9 "realized", opaque=0x555556a1fce0, errp=0x7fffffffe5a8) at 
../qom/object.c:2257
#13 0x0000555555e12562 in object_property_set (obj=0x555557289cd0, 
name=0x5555561f6af9 "realized", v=0x5555572f1260, errp=0x555556920e78 <error_fatal>) 
at ../qom/object.c:1402
#14 0x0000555555e0f0dc in object_property_set_qobject (obj=0x555557289cd0, 
name=0x5555561f6af9 "realized", value=0x5555572f10b0, errp=0x555556920e78 
<error_fatal>) at ../qom/qom-qobject.c:28
#15 0x0000555555e128c6 in object_property_set_bool (obj=0x555557289cd0, 
name=0x5555561f6af9 "realized", value=true, errp=0x555556920e78 <error_fatal>) at 
../qom/object.c:1472
#16 0x0000555555f1a9e4 in qdev_realize (dev=0x555557289cd0, bus=0x555557004300, 
errp=0x555556920e78 <error_fatal>) at ../hw/core/qdev.c:389
#17 0x0000555555b6717a in qdev_device_add (opts=0x555556995650, errp=0x555556920e78 
<error_fatal>) at ../softmmu/qdev-monitor.c:665
#18 0x0000555555d93b21 in device_init_func (opaque=0x0, opts=0x555556995650, 
errp=0x555556920e78 <error_fatal>) at ../softmmu/vl.c:1211
#19 0x000055555602d49d in qemu_opts_foreach (list=0x555556826c60 <qemu_device_opts>, 
func=0x555555d93afa <device_init_func>, opaque=0x0, errp=0x555556920e78 
<error_fatal>) at ../util/qemu-option.c:1167
#20 0x0000555555d96cac in qemu_create_cli_devices () at ../softmmu/vl.c:2541
#21 0x0000555555d96dd0 in qmp_x_exit_preconfig (errp=0x555556920e78 <error_fatal>) at 
../softmmu/vl.c:2589
#22 0x0000555555d9945e in qemu_init (argc=5, argv=0x7fffffffea28, 
envp=0x7fffffffea58) at ../softmmu/vl.c:3611
#23 0x000055555586b34d in main (argc=5, argv=0x7fffffffea28, envp=0x7fffffffea58) at 
../softmmu/main.c:49

It seems that realizing the DBDMA device calls e500plat_machine_device_plug_cb() and 
since this is a DBDMA device it proceeds to call platform_bus_link_device() in 
hw/core/platform-bus.c like this:

         if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
             platform_bus_link_device(pms->pbus_dev, SYS_BUS_DEVICE(dev));
         }

which does:

/*
  * Look for unassigned IRQ lines as well as unassociated MMIO regions.
  * Connect them to the platform bus if available.
  */
void platform_bus_link_device(PlatformBusDevice *pbus, SysBusDevice *sbdev)
{
     int i;

     for (i = 0; sysbus_has_irq(sbdev, i); i++) {
         platform_bus_map_irq(pbus, sbdev, i);
     }

     for (i = 0; sysbus_has_mmio(sbdev, i); i++) {
         platform_bus_map_mmio(pbus, sbdev, i);
     }
}

Hmmm. So basically any sysbus device, even if it is part of another device has its 
MMIO memory regions automatically mapped to a free memory area within the platform 
bus on realize. And of course DBDMA has a parent class of TYPE_SYS_BUS_DEVICE...

I'm not sure what the right solution is here. In my mind there hasn't really been any 
difference between TYPE_DEVICE and TYPE_SYS_BUS_DEVICE other than the APIs that 
expose the memory regions and IRQs are different, but clearly platform bus 
expects/defines a different behaviour here.

Probably the quickest solution for now would be to change the DBDMA device so that it 
is derived from TYPE_DEVICE rather than TYPE_SYS_BUS_DEVICE and make the relevant 
changes if everyone agrees?


ATB,

Mark.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 21:19 ` Mark Cave-Ayland
@ 2021-03-23 22:57   ` Peter Maydell
  2021-03-25 12:57     ` Mark Cave-Ayland
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-03-23 22:57 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Igor Mammedov, Thomas Huth, qemu-ppc, QEMU Developers, Markus Armbruster

On Tue, 23 Mar 2021 at 21:21, Mark Cave-Ayland
<mark.cave-ayland@ilande.co.uk> wrote:
> I'm not sure what the right solution is here. In my mind there hasn't really been any
> difference between TYPE_DEVICE and TYPE_SYS_BUS_DEVICE other than the APIs that
> expose the memory regions and IRQs are different, but clearly platform bus
> expects/defines a different behaviour here.
>
> Probably the quickest solution for now would be to change the DBDMA device so that it
> is derived from TYPE_DEVICE rather than TYPE_SYS_BUS_DEVICE and make the relevant
> changes if everyone agrees?

You want to be at least cautious about doing that. TYPE_DEVICE objects
don't get reset usually, because they are not part of the qbus hierarchy
(all TYPE_SYS_BUS_DEVICE devices live on the sysbus and get reset when
the sysbus is reset). So you would need the (currently nonexistent)
reset function of the containing macio device to manually reset any
TYPE_DEVICE children it has. (This is one of the areas where reset is
a mess, incidentally: logically speaking if you have a PCI device then
you want its children to all get reset when the PCI device itself is
reset; as it stands that doesn't happen, because its sysbus children
are on the sysbus, and a pci-bus-reset won't touch them.)

I forget how the platform bus stuff is supposed to work, but something
should be arranging that it only happens for a pretty restrictive subset
of devices -- in general it should certainly not be firing for random
sysbus devices that are part of something else.

It's a pretty old commit (from 2018, one of Igor's), but I wonder if
this was broken by commit a3fc8396352e945f9. The original design of
the platform bus was that the "grab unassigned IRQs and MMIO regions"
hook got run as a machine-init-done hook. That meant that by definition
the board code had finished setting up all its sysbus devices, and
anything still unconnected must be (assuming not a bug) something the
user requested via -device to be put on the platform bus. But in
commit a3fc8396352e945f9 we changed this to use the hotplug-handler
callbacks, which happen when the device is realized. So it stopped
being true that we would only find loose MMIOs and IRQs on the user's
sysbus devices and now we're also incorrectly grabbing parts of
devices that are supposed to be being wired up by QEMU code before
that code has a chance to do that wiring.

There must still be something causing this not to happen literally for
every sysbus device, or we'd have noticed a lot earlier. I'm not sure
what's specifically different here, but I think it is that:
 (1) we only create the platform bus itself as pretty much the
     last thing we do in machine init. This (accidentally?)
     means it doesn't get to see most of the sysbus devices in
     the board itself
 (2) macio-oldworld is a pluggable PCI device which happens to
     use a sysbus device as part of its implementation, which
     is probably not very common

I think the long term fix is that we either need to undo
a3fc8396352e945f9 so that we only run after all other device
creation code has run and the unassigned IRQs and MMIOs really
are just the loose ends, or alternatively we need to make the
hooks much more restrictive about identifying what devices are
supposed to go into the platform bus.

Second note: does it actually make sense for a user to create
a macio-oldworld device and plug it into anything? It's a PCI
device, but is it really a generally usable device rather than
a specific built-into-the-board part of the g3beige machine ?
If it isn't actually useful for a user to create it on the command
line with -device, we could sidestep the whole thing for 6.0 by
marking it dc->user_creatable = false ...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 16:48 Crashes with qemu-system-ppc64 Thomas Huth
  2021-03-23 18:20 ` Philippe Mathieu-Daudé
  2021-03-23 21:19 ` Mark Cave-Ayland
@ 2021-03-23 23:00 ` Greg Kurz
  2021-03-23 23:35   ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2021-03-23 23:00 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-ppc, QEMU Developers, David Gibson

Cc'ing David

On Tue, 23 Mar 2021 17:48:36 +0100
Thomas Huth <thuth@redhat.com> wrote:

> 
> In case anyone is interested in fixing those, there are two regressions with 
> qemu-system-ppc64 in the current master branch:
> 
> $ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
> qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443: 
> memory_region_add_subregion_common: Assertion `!subregion->container' failed.
> 
> $ ./qemu-system-ppc64 -device power8_v2.0-spapr-cpu-core,help
> /home/thuth/devel/qemu/include/hw/boards.h:24:MACHINE: Object 0x5635bd53af10 
> is not an instance of type machine
> Aborted (core dumped)
> 

I've bisected this one to:

3df261b6676b5850e93d6fab3f7a98f8ee8f19c5 is the first bad commit
commit 3df261b6676b5850e93d6fab3f7a98f8ee8f19c5
Author: Peter Maydell <peter.maydell@linaro.org>
Date:   Fri Mar 13 17:24:47 2020 +0000

    softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine'
    
    Currently if you try to ask for the list of CPUs for a target
    architecture which does not specify a default machine type
    you just get an error:
    
      $ qemu-system-arm -cpu help
      qemu-system-arm: No machine specified, and there is no default
      Use -machine help to list supported machines
    
    Since the list of CPUs doesn't depend on the machine, this is
    unnecessarily unhelpful. "-device help" has a similar problem.
    
    Move the checks for "did the user ask for -cpu help or -device help"
    up so they precede the select_machine() call which checks that the
    user specified a valid machine type.
    
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

 softmmu/vl.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)
bisect run success

This change is fine but it unveils a bad assumption.

0  0x00007ffff64a3708 in raise () at /lib64/power9/libc.so.6
#1  0x00007ffff6483bcc in abort () at /lib64/power9/libc.so.6
#2  0x00000001008db940 in object_dynamic_cast_assert
    (obj=0x10126f670, typename=0x100c20380 "machine", file=0x100b34878 "/home/greg/Work/qemu/qemu-ppc/include/hw/boards.h", line=<optimized out>, func=0x100bcd320 <__func__.30338> "MACHINE") at ../../qom/object.c:883
#3  0x0000000100456e00 in MACHINE (obj=<optimized out>) at /home/greg/Work/qemu/qemu-ppc/include/hw/boards.h:24
#4  0x0000000100456e00 in cpu_core_instance_init (obj=0x10118e2c0) at ../../hw/cpu/core.c:69
#5  0x00000001008d9f44 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x1011fd470) at ../../qom/object.c:375
#6  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x101211ad0) at ../../qom/object.c:371
#7  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=ti@entry=0x101212760) at ../../qom/object.c:371
#8  0x00000001008dc474 in object_initialize_with_type (obj=obj@entry=0x10118e2c0, size=size@entry=160, type=type@entry=0x101212760) at ../../qom/object.c:517
#9  0x00000001008dc678 in object_new_with_type (type=0x101212760) at ../../qom/object.c:732
#10 0x00000001009fbad8 in qmp_device_list_properties (typename=<optimized out>, errp=<optimized out>) at ../../qom/qom-qmp-cmds.c:146
#11 0x00000001005a4bf0 in qdev_device_help (opts=0x10126c200) at ../../softmmu/qdev-monitor.c:285
#12 0x0000000100760afc in device_help_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>) at ../../softmmu/vl.c:1204
#13 0x0000000100ad1050 in qemu_opts_foreach (list=<optimized out>, func=0x100760ae0 <device_help_func>, opaque=0x0, errp=0x0) at ../../util/qemu-option.c:1167
#14 0x00000001007653cc in qemu_process_help_options () at ../../softmmu/vl.c:2451
#15 0x00000001007653cc in qemu_init (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/vl.c:3521
#16 0x00000001002f4f88 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/main.c:49

Basically, "-device power8_v2.0-spapr-cpu-core,help" ends up
instantiating an object of the "power8_v2.0-spapr-cpu-core" type,
which derives from "cpu-core". The "cpu-core" type has an instance
init function that assumes that qdev_get_machine() returns an object
of type "machine"...

static void cpu_core_instance_init(Object *obj)
{
    MachineState *ms = MACHINE(qdev_get_machine());
                         ^^
                     ...here.

qdev_get_machine() cannot return a valid machine type since
select_machine() hasn't been called yet... an instance init
function is probably not the best place to use qdev_get_machine()
if any.

    CPUCore *core = CPU_CORE(obj);

    core->nr_threads = ms->smp.threads;
}

It seems that this should rather sit in a device realize function,
when the machine type is known.

>   Thomas
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 23:00 ` Greg Kurz
@ 2021-03-23 23:35   ` Philippe Mathieu-Daudé
  2021-03-24  8:10     ` Greg Kurz
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-23 23:35 UTC (permalink / raw)
  To: Greg Kurz, Thomas Huth
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, QEMU Developers, David Gibson

On 3/24/21 12:00 AM, Greg Kurz wrote:
> Cc'ing David
> 
> On Tue, 23 Mar 2021 17:48:36 +0100
> Thomas Huth <thuth@redhat.com> wrote:
> 
>>
>> In case anyone is interested in fixing those, there are two regressions with 
>> qemu-system-ppc64 in the current master branch:
>>
>> $ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
>> qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443: 
>> memory_region_add_subregion_common: Assertion `!subregion->container' failed.
>>
>> $ ./qemu-system-ppc64 -device power8_v2.0-spapr-cpu-core,help
>> /home/thuth/devel/qemu/include/hw/boards.h:24:MACHINE: Object 0x5635bd53af10 
>> is not an instance of type machine
>> Aborted (core dumped)
>>
> 
> I've bisected this one to:
> 
> 3df261b6676b5850e93d6fab3f7a98f8ee8f19c5 is the first bad commit
> commit 3df261b6676b5850e93d6fab3f7a98f8ee8f19c5
> Author: Peter Maydell <peter.maydell@linaro.org>
> Date:   Fri Mar 13 17:24:47 2020 +0000
> 
>     softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine'
>     
>     Currently if you try to ask for the list of CPUs for a target
>     architecture which does not specify a default machine type
>     you just get an error:
>     
>       $ qemu-system-arm -cpu help
>       qemu-system-arm: No machine specified, and there is no default
>       Use -machine help to list supported machines
>     
>     Since the list of CPUs doesn't depend on the machine, this is
>     unnecessarily unhelpful. "-device help" has a similar problem.
>     
>     Move the checks for "did the user ask for -cpu help or -device help"
>     up so they precede the select_machine() call which checks that the
>     user specified a valid machine type.
>     
>     Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
>  softmmu/vl.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> bisect run success
> 
> This change is fine but it unveils a bad assumption.
> 
> 0  0x00007ffff64a3708 in raise () at /lib64/power9/libc.so.6
> #1  0x00007ffff6483bcc in abort () at /lib64/power9/libc.so.6
> #2  0x00000001008db940 in object_dynamic_cast_assert
>     (obj=0x10126f670, typename=0x100c20380 "machine", file=0x100b34878 "/home/greg/Work/qemu/qemu-ppc/include/hw/boards.h", line=<optimized out>, func=0x100bcd320 <__func__.30338> "MACHINE") at ../../qom/object.c:883
> #3  0x0000000100456e00 in MACHINE (obj=<optimized out>) at /home/greg/Work/qemu/qemu-ppc/include/hw/boards.h:24
> #4  0x0000000100456e00 in cpu_core_instance_init (obj=0x10118e2c0) at ../../hw/cpu/core.c:69
> #5  0x00000001008d9f44 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x1011fd470) at ../../qom/object.c:375
> #6  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x101211ad0) at ../../qom/object.c:371
> #7  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=ti@entry=0x101212760) at ../../qom/object.c:371
> #8  0x00000001008dc474 in object_initialize_with_type (obj=obj@entry=0x10118e2c0, size=size@entry=160, type=type@entry=0x101212760) at ../../qom/object.c:517
> #9  0x00000001008dc678 in object_new_with_type (type=0x101212760) at ../../qom/object.c:732
> #10 0x00000001009fbad8 in qmp_device_list_properties (typename=<optimized out>, errp=<optimized out>) at ../../qom/qom-qmp-cmds.c:146
> #11 0x00000001005a4bf0 in qdev_device_help (opts=0x10126c200) at ../../softmmu/qdev-monitor.c:285
> #12 0x0000000100760afc in device_help_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>) at ../../softmmu/vl.c:1204
> #13 0x0000000100ad1050 in qemu_opts_foreach (list=<optimized out>, func=0x100760ae0 <device_help_func>, opaque=0x0, errp=0x0) at ../../util/qemu-option.c:1167
> #14 0x00000001007653cc in qemu_process_help_options () at ../../softmmu/vl.c:2451
> #15 0x00000001007653cc in qemu_init (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/vl.c:3521
> #16 0x00000001002f4f88 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/main.c:49
> 
> Basically, "-device power8_v2.0-spapr-cpu-core,help" ends up
> instantiating an object of the "power8_v2.0-spapr-cpu-core" type,
> which derives from "cpu-core". The "cpu-core" type has an instance
> init function that assumes that qdev_get_machine() returns an object
> of type "machine"...
> 
> static void cpu_core_instance_init(Object *obj)
> {
>     MachineState *ms = MACHINE(qdev_get_machine());
>                          ^^
>                      ...here.
> 
> qdev_get_machine() cannot return a valid machine type since
> select_machine() hasn't been called yet... an instance init
> function is probably not the best place to use qdev_get_machine()
> if any.

Hmmm does this assert() matches your comment?

-- >8 --
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cefc5eaa0a9..41cbee77d14 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
 {
     static Object *dev;

+    assert(phase_check(PHASE_MACHINE_CREATED));
+
     if (dev == NULL) {
         dev = container_get(object_get_root(), "/machine");
     }
---

> 
>     CPUCore *core = CPU_CORE(obj);
> 
>     core->nr_threads = ms->smp.threads;
> }
> 
> It seems that this should rather sit in a device realize function,
> when the machine type is known.
> 
>>   Thomas
>>
>>
> 
> 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 23:35   ` Philippe Mathieu-Daudé
@ 2021-03-24  8:10     ` Greg Kurz
  2021-03-24  9:17     ` Paolo Bonzini
  2021-03-24 10:10     ` Thomas Huth
  2 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2021-03-24  8:10 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Peter Maydell, Thomas Huth, QEMU Developers, qemu-ppc,
	Paolo Bonzini, David Gibson

On Wed, 24 Mar 2021 00:35:05 +0100
Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:

> On 3/24/21 12:00 AM, Greg Kurz wrote:
> > Cc'ing David
> > 
> > On Tue, 23 Mar 2021 17:48:36 +0100
> > Thomas Huth <thuth@redhat.com> wrote:
> > 
> >>
> >> In case anyone is interested in fixing those, there are two regressions with 
> >> qemu-system-ppc64 in the current master branch:
> >>
> >> $ ./qemu-system-ppc64 -M ppce500 -device macio-oldworld
> >> qemu-system-ppc64: ../../devel/qemu/softmmu/memory.c:2443: 
> >> memory_region_add_subregion_common: Assertion `!subregion->container' failed.
> >>
> >> $ ./qemu-system-ppc64 -device power8_v2.0-spapr-cpu-core,help
> >> /home/thuth/devel/qemu/include/hw/boards.h:24:MACHINE: Object 0x5635bd53af10 
> >> is not an instance of type machine
> >> Aborted (core dumped)
> >>
> > 
> > I've bisected this one to:
> > 
> > 3df261b6676b5850e93d6fab3f7a98f8ee8f19c5 is the first bad commit
> > commit 3df261b6676b5850e93d6fab3f7a98f8ee8f19c5
> > Author: Peter Maydell <peter.maydell@linaro.org>
> > Date:   Fri Mar 13 17:24:47 2020 +0000
> > 
> >     softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine'
> >     
> >     Currently if you try to ask for the list of CPUs for a target
> >     architecture which does not specify a default machine type
> >     you just get an error:
> >     
> >       $ qemu-system-arm -cpu help
> >       qemu-system-arm: No machine specified, and there is no default
> >       Use -machine help to list supported machines
> >     
> >     Since the list of CPUs doesn't depend on the machine, this is
> >     unnecessarily unhelpful. "-device help" has a similar problem.
> >     
> >     Move the checks for "did the user ask for -cpu help or -device help"
> >     up so they precede the select_machine() call which checks that the
> >     user specified a valid machine type.
> >     
> >     Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> >  softmmu/vl.c | 26 ++++++++++++++++----------
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> > bisect run success
> > 
> > This change is fine but it unveils a bad assumption.
> > 
> > 0  0x00007ffff64a3708 in raise () at /lib64/power9/libc.so.6
> > #1  0x00007ffff6483bcc in abort () at /lib64/power9/libc.so.6
> > #2  0x00000001008db940 in object_dynamic_cast_assert
> >     (obj=0x10126f670, typename=0x100c20380 "machine", file=0x100b34878 "/home/greg/Work/qemu/qemu-ppc/include/hw/boards.h", line=<optimized out>, func=0x100bcd320 <__func__.30338> "MACHINE") at ../../qom/object.c:883
> > #3  0x0000000100456e00 in MACHINE (obj=<optimized out>) at /home/greg/Work/qemu/qemu-ppc/include/hw/boards.h:24
> > #4  0x0000000100456e00 in cpu_core_instance_init (obj=0x10118e2c0) at ../../hw/cpu/core.c:69
> > #5  0x00000001008d9f44 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x1011fd470) at ../../qom/object.c:375
> > #6  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=0x101211ad0) at ../../qom/object.c:371
> > #7  0x00000001008d9f24 in object_init_with_type (obj=obj@entry=0x10118e2c0, ti=ti@entry=0x101212760) at ../../qom/object.c:371
> > #8  0x00000001008dc474 in object_initialize_with_type (obj=obj@entry=0x10118e2c0, size=size@entry=160, type=type@entry=0x101212760) at ../../qom/object.c:517
> > #9  0x00000001008dc678 in object_new_with_type (type=0x101212760) at ../../qom/object.c:732
> > #10 0x00000001009fbad8 in qmp_device_list_properties (typename=<optimized out>, errp=<optimized out>) at ../../qom/qom-qmp-cmds.c:146
> > #11 0x00000001005a4bf0 in qdev_device_help (opts=0x10126c200) at ../../softmmu/qdev-monitor.c:285
> > #12 0x0000000100760afc in device_help_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>) at ../../softmmu/vl.c:1204
> > #13 0x0000000100ad1050 in qemu_opts_foreach (list=<optimized out>, func=0x100760ae0 <device_help_func>, opaque=0x0, errp=0x0) at ../../util/qemu-option.c:1167
> > #14 0x00000001007653cc in qemu_process_help_options () at ../../softmmu/vl.c:2451
> > #15 0x00000001007653cc in qemu_init (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/vl.c:3521
> > #16 0x00000001002f4f88 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/main.c:49
> > 
> > Basically, "-device power8_v2.0-spapr-cpu-core,help" ends up
> > instantiating an object of the "power8_v2.0-spapr-cpu-core" type,
> > which derives from "cpu-core". The "cpu-core" type has an instance
> > init function that assumes that qdev_get_machine() returns an object
> > of type "machine"...
> > 
> > static void cpu_core_instance_init(Object *obj)
> > {
> >     MachineState *ms = MACHINE(qdev_get_machine());
> >                          ^^
> >                      ...here.
> > 
> > qdev_get_machine() cannot return a valid machine type since
> > select_machine() hasn't been called yet... an instance init
> > function is probably not the best place to use qdev_get_machine()
> > if any.
> 
> Hmmm does this assert() matches your comment?
> 
> -- >8 --
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index cefc5eaa0a9..41cbee77d14 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
>  {
>      static Object *dev;
> 
> +    assert(phase_check(PHASE_MACHINE_CREATED));
> +


Heh... I didn't know about phase_check() but it could make sense
to prevent early misuse of qdev_get_machine() indeed.

>      if (dev == NULL) {
>          dev = container_get(object_get_root(), "/machine");
>      }
> ---
> 
> > 
> >     CPUCore *core = CPU_CORE(obj);
> > 
> >     core->nr_threads = ms->smp.threads;
> > }
> > 
> > It seems that this should rather sit in a device realize function,
> > when the machine type is known.
> > 

Or maybe leave everything in the instance init function but rely
on current_machine instead of qdev_get_machine(), i.e. something
like:

static void cpu_core_instance_init(Object *obj)
{
    MachineState *ms = current_machine;
    CPUCore *core = CPU_CORE(obj);

    /*
     * This can be called with "-device some_cpu_core,help" before the
     * machine has been created.
     */
    if (!ms) {
        core->nr_threads = 1;
        return;
    }

    core->nr_threads = ms->smp.threads;
}

> >>   Thomas
> >>
> >>
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 23:35   ` Philippe Mathieu-Daudé
  2021-03-24  8:10     ` Greg Kurz
@ 2021-03-24  9:17     ` Paolo Bonzini
  2021-03-25 10:27       ` Greg Kurz
  2021-03-24 10:10     ` Thomas Huth
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-03-24  9:17 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Greg Kurz, Thomas Huth
  Cc: Peter Maydell, qemu-ppc, QEMU Developers, David Gibson

On 24/03/21 00:35, Philippe Mathieu-Daudé wrote:
> Hmmm does this assert() matches your comment?
> 
> -- >8 --
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index cefc5eaa0a9..41cbee77d14 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
>   {
>       static Object *dev;
> 
> +    assert(phase_check(PHASE_MACHINE_CREATED));
> +

Very nice use of phase_check!  Kudos.

Paolo



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 23:35   ` Philippe Mathieu-Daudé
  2021-03-24  8:10     ` Greg Kurz
  2021-03-24  9:17     ` Paolo Bonzini
@ 2021-03-24 10:10     ` Thomas Huth
  2021-03-24 10:32       ` Thomas Huth
  2 siblings, 1 reply; 13+ messages in thread
From: Thomas Huth @ 2021-03-24 10:10 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Greg Kurz
  Cc: Peter Maydell, David Gibson, qemu-ppc, QEMU Developers, Paolo Bonzini

On 24/03/2021 00.35, Philippe Mathieu-Daudé wrote:
[...]
> Hmmm does this assert() matches your comment?
> 
> -- >8 --
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index cefc5eaa0a9..41cbee77d14 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
>   {
>       static Object *dev;
> 
> +    assert(phase_check(PHASE_MACHINE_CREATED));
> +
>       if (dev == NULL) {
>           dev = container_get(object_get_root(), "/machine");
>       }

Sounds like a good idea, but I think it should be sufficient to put it into 
the if-statement instead.
Could you send it as a proper patch?

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-24 10:10     ` Thomas Huth
@ 2021-03-24 10:32       ` Thomas Huth
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Huth @ 2021-03-24 10:32 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Greg Kurz
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, QEMU Developers, David Gibson

On 24/03/2021 11.10, Thomas Huth wrote:
> On 24/03/2021 00.35, Philippe Mathieu-Daudé wrote:
> [...]
>> Hmmm does this assert() matches your comment?
>>
>> -- >8 --
>> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
>> index cefc5eaa0a9..41cbee77d14 100644
>> --- a/hw/core/qdev.c
>> +++ b/hw/core/qdev.c
>> @@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
>>   {
>>       static Object *dev;
>>
>> +    assert(phase_check(PHASE_MACHINE_CREATED));
>> +
>>       if (dev == NULL) {
>>           dev = container_get(object_get_root(), "/machine");
>>       }
> 
> Sounds like a good idea, but I think it should be sufficient to put it into 
> the if-statement instead.

Scratch that. QEMU quickly asserts with that statement, which makes sense if 
you consider that container_get also creates the container objects along the 
way.

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-24  9:17     ` Paolo Bonzini
@ 2021-03-25 10:27       ` Greg Kurz
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2021-03-25 10:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Thomas Huth, Philippe Mathieu-Daudé,
	QEMU Developers, qemu-ppc, David Gibson

On Wed, 24 Mar 2021 10:17:55 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 24/03/21 00:35, Philippe Mathieu-Daudé wrote:
> > Hmmm does this assert() matches your comment?
> > 
> > -- >8 --
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index cefc5eaa0a9..41cbee77d14 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -1130,6 +1130,8 @@ Object *qdev_get_machine(void)
> >   {
> >       static Object *dev;
> > 
> > +    assert(phase_check(PHASE_MACHINE_CREATED));
> > +
> 
> Very nice use of phase_check!  Kudos.
> 

It seems promising at first sight but qdev_get_machine() gets
called under qemu_create_machine() long before phase is advanced
to PHASE_MACHINE_CREATED.

qemu-system-ppc64: ../../hw/core/qdev.c:1133: qdev_get_machine: Assertion `phase_check(PHASE_MACHINE_CREATED)' failed.

(gdb) bt
#0  0x00007ffff64a3708 in raise () at /lib64/power9/libc.so.6
#1  0x00007ffff6483bcc in abort () at /lib64/power9/libc.so.6
#2  0x00007ffff6497210 in __assert_fail_base () at /lib64/power9/libc.so.6
#3  0x00007ffff64972b4 in __assert_fail () at /lib64/power9/libc.so.6
#4  0x00000001009e7820 in qdev_get_machine () at ../../hw/core/qdev.c:1133
#5  0x00000001009e7820 in qdev_get_machine () at ../../hw/core/qdev.c:1129
#6  0x0000000100747894 in memory_region_do_init (mr=0x101261200, owner=0x0, name=<optimized out>, size=<optimized out>) at ../../softmmu/memory.c:1177
#7  0x00000001007fccc4 in memory_map_init () at ../../softmmu/physmem.c:2630
#8  0x00000001007fccc4 in cpu_exec_init_all () at ../../softmmu/physmem.c:3034
#9  0x00000001007e9c9c in qemu_create_machine (machine_class=machine_class@entry=0x1014b96d0) at ../../softmmu/vl.c:2086
#10 0x00000001007eb8c0 in qemu_init (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/vl.c:1640
#11 0x00000001002f53c8 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/main.c:49


static void memory_region_do_init(MemoryRegion *mr,
                                  Object *owner,
                                  const char *name,
                                  uint64_t size)
{
[...]
        if (!owner) {
            owner = container_get(qdev_get_machine(), "/unattached");
        }

The true condition for qdev_get_machine() to be functional is
actually that the following happened already:

    object_property_add_child(object_get_root(), "machine",
                              OBJECT(current_machine));

This is the case with the call stack ^^ and I don't see any valid
reason to forbid use of qdev_get_machine() here.

So I'm wondering if we shouldn't rather check the existence of the
"/machine" path in the QOM tree instead of checking the phase.

> Paolo
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-23 22:57   ` Peter Maydell
@ 2021-03-25 12:57     ` Mark Cave-Ayland
  2021-03-25 13:25       ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Cave-Ayland @ 2021-03-25 12:57 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Igor Mammedov, Thomas Huth, qemu-ppc, QEMU Developers, Markus Armbruster

On 23/03/2021 22:57, Peter Maydell wrote:

> On Tue, 23 Mar 2021 at 21:21, Mark Cave-Ayland
> <mark.cave-ayland@ilande.co.uk> wrote:
>> I'm not sure what the right solution is here. In my mind there hasn't really been any
>> difference between TYPE_DEVICE and TYPE_SYS_BUS_DEVICE other than the APIs that
>> expose the memory regions and IRQs are different, but clearly platform bus
>> expects/defines a different behaviour here.
>>
>> Probably the quickest solution for now would be to change the DBDMA device so that it
>> is derived from TYPE_DEVICE rather than TYPE_SYS_BUS_DEVICE and make the relevant
>> changes if everyone agrees?
> 
> You want to be at least cautious about doing that. TYPE_DEVICE objects
> don't get reset usually, because they are not part of the qbus hierarchy
> (all TYPE_SYS_BUS_DEVICE devices live on the sysbus and get reset when
> the sysbus is reset). So you would need the (currently nonexistent)
> reset function of the containing macio device to manually reset any
> TYPE_DEVICE children it has. (This is one of the areas where reset is
> a mess, incidentally: logically speaking if you have a PCI device then
> you want its children to all get reset when the PCI device itself is
> reset; as it stands that doesn't happen, because its sysbus children
> are on the sysbus, and a pci-bus-reset won't touch them.)
> 
> I forget how the platform bus stuff is supposed to work, but something
> should be arranging that it only happens for a pretty restrictive subset
> of devices -- in general it should certainly not be firing for random
> sysbus devices that are part of something else.
> 
> It's a pretty old commit (from 2018, one of Igor's), but I wonder if
> this was broken by commit a3fc8396352e945f9. The original design of
> the platform bus was that the "grab unassigned IRQs and MMIO regions"
> hook got run as a machine-init-done hook. That meant that by definition
> the board code had finished setting up all its sysbus devices, and
> anything still unconnected must be (assuming not a bug) something the
> user requested via -device to be put on the platform bus. But in
> commit a3fc8396352e945f9 we changed this to use the hotplug-handler
> callbacks, which happen when the device is realized. So it stopped
> being true that we would only find loose MMIOs and IRQs on the user's
> sysbus devices and now we're also incorrectly grabbing parts of
> devices that are supposed to be being wired up by QEMU code before
> that code has a chance to do that wiring.
> 
> There must still be something causing this not to happen literally for
> every sysbus device, or we'd have noticed a lot earlier. I'm not sure
> what's specifically different here, but I think it is that:
>   (1) we only create the platform bus itself as pretty much the
>       last thing we do in machine init. This (accidentally?)
>       means it doesn't get to see most of the sysbus devices in
>       the board itself
>   (2) macio-oldworld is a pluggable PCI device which happens to
>       use a sysbus device as part of its implementation, which
>       is probably not very common
> 
> I think the long term fix is that we either need to undo
> a3fc8396352e945f9 so that we only run after all other device
> creation code has run and the unassigned IRQs and MMIOs really
> are just the loose ends, or alternatively we need to make the
> hooks much more restrictive about identifying what devices are
> supposed to go into the platform bus.

Thanks for the analysis: I can certainly see how the above commit would have changed 
the behaviour. Looking at hw/ppc/e590plat.c in e500plat_machine_class_init() I see 
that line 101 reads "machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ETSEC_COMMON);" 
which looks like it is intended to add a class restriction to this functionality.

In machine_initfn() a callback for machine_init_notify() is added to perform the 
check but the macio-oldworld device is realized first, because qmp_x_exit_preconfig() 
calls qemu_create_cli_devices() to realize the devices just before 
qemu_machine_creation_done() where the notifier is invoked.

> Second note: does it actually make sense for a user to create
> a macio-oldworld device and plug it into anything? It's a PCI
> device, but is it really a generally usable device rather than
> a specific built-into-the-board part of the g3beige machine ?
> If it isn't actually useful for a user to create it on the command
> line with -device, we could sidestep the whole thing for 6.0 by
> marking it dc->user_creatable = false ...

I'd prefer not to do that if possible since the macio device is a good example to 
have as something that passes device-introspect-test whilst containing several 
different child devices.

Given that the DBDMA device hasn't changed for some time and the above commit dates 
back to 2018 then I'd be inclined to leave it for now unless it is causing gitlab CI 
to fail.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crashes with qemu-system-ppc64
  2021-03-25 12:57     ` Mark Cave-Ayland
@ 2021-03-25 13:25       ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2021-03-25 13:25 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Igor Mammedov, Thomas Huth, qemu-ppc, QEMU Developers, Markus Armbruster

On Thu, 25 Mar 2021 at 12:57, Mark Cave-Ayland
<mark.cave-ayland@ilande.co.uk> wrote:
> Thanks for the analysis: I can certainly see how the above commit would have changed
> the behaviour. Looking at hw/ppc/e590plat.c in e500plat_machine_class_init() I see
> that line 101 reads "machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ETSEC_COMMON);"
> which looks like it is intended to add a class restriction to this functionality.

Aha. I thought there was a restriction somewhere but hadn't found that
function. I think the device plug callback functions should check that
list rather than saying "any TYPE_SYS_BUS_DEVICE is fine".

> In machine_initfn() a callback for machine_init_notify() is added to perform the
> check but the macio-oldworld device is realized first

Also, this check is only for "did the user dynamically create a sysbus
device that isn't an allowed one", which is a necessary check, but
not quite the same thing as "is this device we're looking at one we want to
assume belongs to the platform bus".

I'll put together a patch...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-25 13:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23 16:48 Crashes with qemu-system-ppc64 Thomas Huth
2021-03-23 18:20 ` Philippe Mathieu-Daudé
2021-03-23 21:19 ` Mark Cave-Ayland
2021-03-23 22:57   ` Peter Maydell
2021-03-25 12:57     ` Mark Cave-Ayland
2021-03-25 13:25       ` Peter Maydell
2021-03-23 23:00 ` Greg Kurz
2021-03-23 23:35   ` Philippe Mathieu-Daudé
2021-03-24  8:10     ` Greg Kurz
2021-03-24  9:17     ` Paolo Bonzini
2021-03-25 10:27       ` Greg Kurz
2021-03-24 10:10     ` Thomas Huth
2021-03-24 10:32       ` Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).