All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on memory commit during MR finalize()
@ 2020-04-20 21:00 Peter Xu
  2020-04-20 21:44 ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-04-20 21:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Markus Armbruster, QEMU Devel Mailing List

Paolo & all,

After my recent rebase and changes to my local QEMU kvm-dirty-ring tree, I can
easily trigger a QEMU crash at boot with that:

  kvm_set_phys_mem: error registering slot: File exists

With backtrace:

#0  __GI_raise (sig=sig@entry=6) 
#1  0x00007f29559088d9 in __GI_abort () 
#2  0x000055ce0b39cd84 in kvm_set_phys_mem (kml=0x55ce0e1efeb8, section=0x7f294ae035b0, add=true) 
#3  0x000055ce0b39d5aa in kvm_region_add (listener=0x55ce0e1efeb8, section=0x7f294ae035b0) 
#4  0x000055ce0b388ae6 in address_space_update_topology_pass (as=0x55ce0bf129e0 <address_space_memory>, old_view=0x55ce0ef8ae80, new_view=0x7f2944001c20, adding=true) 
#5  0x000055ce0b388dde in address_space_set_flatview (as=0x55ce0bf129e0 <address_space_memory>) 
#6  0x000055ce0b388f85 in memory_region_transaction_commit () 
#7  0x000055ce0b38a8e4 in memory_region_finalize (obj=0x55ce0e52f700) 
#8  0x000055ce0b821de2 in object_deinit (obj=0x55ce0e52f700, type=0x55ce0de89d80) 
#9  0x000055ce0b821e54 in object_finalize (data=0x55ce0e52f700) 
#10 0x000055ce0b822e0f in object_unref (obj=0x55ce0e52f700) 
#11 0x000055ce0b32461c in phys_section_destroy (mr=0x55ce0e52f700) 
#12 0x000055ce0b324676 in phys_sections_free (map=0x55ce0e1bf7a0) 
#13 0x000055ce0b327d89 in address_space_dispatch_free (d=0x55ce0e1bf790) 
#14 0x000055ce0b3863f8 in flatview_destroy (view=0x55ce0e193ab0) 
#15 0x000055ce0b9a39cb in call_rcu_thread (opaque=0x0) 
#16 0x000055ce0b9899d9 in qemu_thread_start (args=0x55ce0de39a40) 
#17 0x00007f2955ab54e2 in start_thread (arg=<optimized out>) 
#18 0x00007f29559e4693 in clone () 

It's KVM_SET_USER_MEMORY_REGION returning -EEXIST.

I'm still uncertain how the dirty ring branch can easily trigger this, however
the backtrace looks really odd to me in that we're going to do memory commit
and even sending KVM ioctls during finalize(), especially in the RCU thread...
I never expected that.

I wanted to understand better on 2e2b8eb70f ("memory: allow destroying a
non-empty MemoryRegion", 2015-10-09), but the context didn't help much [1].  Do
any of you still remember why we do that?  Is it really what we want to send
KVM ioctls even in the RCU thread during finalize()?

Any input would be greatly welcomed.

Thanks,

[1] https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg00110.html

-- Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2020-04-20 21:00 Question on memory commit during MR finalize() Peter Xu
@ 2020-04-20 21:44 ` Paolo Bonzini
  2020-04-20 23:31   ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2020-04-20 21:44 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, QEMU Devel Mailing List

On 20/04/20 23:00, Peter Xu wrote:
> 
> I'm still uncertain how the dirty ring branch can easily trigger this, however
> the backtrace looks really odd to me in that we're going to do memory commit
> and even sending KVM ioctls during finalize(), especially in the RCU thread...
> I never expected that.

Short answer: it is really hard to not trigger finalize() from an RCU
callback, and it's the reason why the RCU thread takes the big QEMU lock.

However, instead of memory_region_transaction_commit,
memory_region_finalize probably should do something like

    --memory_region_transaction_depth;
    assert (memory_region_transaction_depth ||
	    (!memory_region_update_pending &&
             !ioeventfd_update_pending));

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2020-04-20 21:44 ` Paolo Bonzini
@ 2020-04-20 23:31   ` Peter Xu
  2020-04-21  9:43     ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-04-20 23:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Markus Armbruster, QEMU Devel Mailing List

On Mon, Apr 20, 2020 at 11:44:11PM +0200, Paolo Bonzini wrote:
> On 20/04/20 23:00, Peter Xu wrote:
> > 
> > I'm still uncertain how the dirty ring branch can easily trigger this, however
> > the backtrace looks really odd to me in that we're going to do memory commit
> > and even sending KVM ioctls during finalize(), especially in the RCU thread...
> > I never expected that.
> 
> Short answer: it is really hard to not trigger finalize() from an RCU
> callback, and it's the reason why the RCU thread takes the big QEMU lock.
> 
> However, instead of memory_region_transaction_commit,
> memory_region_finalize probably should do something like
> 
>     --memory_region_transaction_depth;
>     assert (memory_region_transaction_depth ||
> 	    (!memory_region_update_pending &&
>              !ioeventfd_update_pending));

Ah I see; this makes sense.

And finally I found the problem, which is indeed the bug in my own tree - I
forgot to remove the previous changes to flush the dirty ring during mem
removal (basically that's run_on_cpu() called during a memory commit, that will
wrongly release the BQL without being noticed).

Besides above assert, I'm thinking maybe we can also assert on something like:

  !(memory_region_transaction_depth || memory_region_update_pending ||
    ioeventfd_update_pending)

When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should cover
run_on_cpu()), so that we can identify misuse of BQL easier like this.

Let me know if you like these sanity checks. I can write up a small series if
you think that's a good idea.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2020-04-20 23:31   ` Peter Xu
@ 2020-04-21  9:43     ` Paolo Bonzini
  2020-04-21 10:43       ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2020-04-21  9:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, QEMU Devel Mailing List

On 21/04/20 01:31, Peter Xu wrote:
>>
>> However, instead of memory_region_transaction_commit,
>> memory_region_finalize probably should do something like
>>
>>     --memory_region_transaction_depth;
>>     assert (memory_region_transaction_depth ||
>> 	    (!memory_region_update_pending &&
>>              !ioeventfd_update_pending));
> Ah I see; this makes sense.
> 
> And finally I found the problem, which is indeed the bug in my own tree - I
> forgot to remove the previous changes to flush the dirty ring during mem
> removal (basically that's run_on_cpu() called during a memory commit, that will
> wrongly release the BQL without being noticed).
> 
> Besides above assert, I'm thinking maybe we can also assert on something like:
> 
>   !(memory_region_transaction_depth || memory_region_update_pending ||
>     ioeventfd_update_pending)
> 
> When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should cover
> run_on_cpu()), so that we can identify misuse of BQL easier like this.

Asserting invariants around lock release are an interesting concept, but
I'm not sure where to insert them exactly.  But it would be great if you
would like to introduce an assert_empty_memory_transaction() function
with the assertion I quoted above.

Thanks!

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2020-04-21  9:43     ` Paolo Bonzini
@ 2020-04-21 10:43       ` Peter Xu
  2021-07-15 14:27         ` Thanos Makatos
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-04-21 10:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Markus Armbruster, QEMU Devel Mailing List

On Tue, Apr 21, 2020 at 11:43:36AM +0200, Paolo Bonzini wrote:
> On 21/04/20 01:31, Peter Xu wrote:
> >>
> >> However, instead of memory_region_transaction_commit,
> >> memory_region_finalize probably should do something like
> >>
> >>     --memory_region_transaction_depth;
> >>     assert (memory_region_transaction_depth ||
> >> 	    (!memory_region_update_pending &&
> >>              !ioeventfd_update_pending));
> > Ah I see; this makes sense.
> > 
> > And finally I found the problem, which is indeed the bug in my own tree - I
> > forgot to remove the previous changes to flush the dirty ring during mem
> > removal (basically that's run_on_cpu() called during a memory commit, that will
> > wrongly release the BQL without being noticed).
> > 
> > Besides above assert, I'm thinking maybe we can also assert on something like:
> > 
> >   !(memory_region_transaction_depth || memory_region_update_pending ||
> >     ioeventfd_update_pending)
> > 
> > When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should cover
> > run_on_cpu()), so that we can identify misuse of BQL easier like this.
> 
> Asserting invariants around lock release are an interesting concept, but
> I'm not sure where to insert them exactly.  But it would be great if you
> would like to introduce an assert_empty_memory_transaction() function
> with the assertion I quoted above.

Let me give it a shot later today. :)

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Question on memory commit during MR finalize()
  2020-04-21 10:43       ` Peter Xu
@ 2021-07-15 14:27         ` Thanos Makatos
  2021-07-15 18:35           ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Thanos Makatos @ 2021-07-15 14:27 UTC (permalink / raw)
  To: Peter Xu, Paolo Bonzini
  Cc: John G Johnson, John Levon, Markus Armbruster, QEMU Devel Mailing List

Hi Peter,

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+thanos.makatos=nutanix.com@nongnu.org> On Behalf Of Peter
> Xu
> Sent: 21 April 2020 11:44
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>; QEMU Devel Mailing List
> <qemu-devel@nongnu.org>
> Subject: Re: Question on memory commit during MR finalize()
> 
> On Tue, Apr 21, 2020 at 11:43:36AM +0200, Paolo Bonzini wrote:
> > On 21/04/20 01:31, Peter Xu wrote:
> > >>
> > >> However, instead of memory_region_transaction_commit,
> > >> memory_region_finalize probably should do something like
> > >>
> > >>     --memory_region_transaction_depth;
> > >>     assert (memory_region_transaction_depth ||
> > >> 	    (!memory_region_update_pending &&
> > >>              !ioeventfd_update_pending));
> > > Ah I see; this makes sense.
> > >
> > > And finally I found the problem, which is indeed the bug in my own
> > > tree - I forgot to remove the previous changes to flush the dirty
> > > ring during mem removal (basically that's run_on_cpu() called during
> > > a memory commit, that will wrongly release the BQL without being
> noticed).
> > >
> > > Besides above assert, I'm thinking maybe we can also assert on
> something like:
> > >
> > >   !(memory_region_transaction_depth ||
> memory_region_update_pending ||
> > >     ioeventfd_update_pending)
> > >
> > > When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should
> > > cover run_on_cpu()), so that we can identify misuse of BQL easier like
> this.
> >
> > Asserting invariants around lock release are an interesting concept,
> > but I'm not sure where to insert them exactly.  But it would be great
> > if you would like to introduce an assert_empty_memory_transaction()
> > function with the assertion I quoted above.
> 
> Let me give it a shot later today. :)

We're hitting this issue using a QEMU branch where JJ is using vfio-user as the transport for multiprocess-qemu (https://github.com/oracle/qemu/issues/9). We can reproduce it fairly reliably by migrating a virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental migration support, https://review.spdk.io/gerrit/c/spdk/spdk/+/7617/14). I can provide detailed repro instructions but first I want to make sure we're not missing any patches.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-15 14:27         ` Thanos Makatos
@ 2021-07-15 18:35           ` Peter Xu
  2021-07-16 11:42             ` Thanos Makatos
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-07-15 18:35 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> Hi Peter,

Hi, Thanos,

> We're hitting this issue using a QEMU branch where JJ is using vfio-user as the transport for multiprocess-qemu (https://github.com/oracle/qemu/issues/9). We can reproduce it fairly reliably by migrating a virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental migration support, https://review.spdk.io/gerrit/c/spdk/spdk/+/7617/14). I can provide detailed repro instructions but first I want to make sure we're not missing any patches.

I don't think you missed any bug fix patches, as the issue I mentioned can only
be trigger with my own branch at that time, and that's fixed when my patchset
got merged.

However if you encountered the same issue, it's possible that there's an
incorrect use of qemu memory/cpu API too somewhere there so similar issue is
triggered.  For example, in my case it was run_on_cpu() called incorrectly
within memory layout changing so BQL is released without being noticed.

I've got a series that tries to expose these hard to debug issues:

https://lore.kernel.org/qemu-devel/20200421162108.594796-1-peterx@redhat.com/

Obviously the series didn't track enough interest so it didn't get merged.
However maybe that's also something useful to what you're debugging, so you can
apply those patches onto your branch and see the stack when it reproduces
again. Logically with these sanity patches it could fail earlier than what
you've hit right now (which I believe should be within the RCU thread; btw it
would be interesting to share your stack too when it's hit) and it could
provide more useful information.

I saw that the old series won't apply onto master any more, so I rebased it and
pushed it here (with one patch dropped since someone wrote a similar patch and
got merged, so there're only 7 patches in the new tree):

https://github.com/xzpeter/qemu/tree/memory-sanity

No guarantee it'll help, but IMHO worth trying.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Question on memory commit during MR finalize()
  2021-07-15 18:35           ` Peter Xu
@ 2021-07-16 11:42             ` Thanos Makatos
  2021-07-16 14:18               ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Thanos Makatos @ 2021-07-16 11:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: 15 July 2021 19:35
> To: Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> Johnson <john.g.johnson@oracle.com>
> Subject: Re: Question on memory commit during MR finalize()
> 
> On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> > Hi Peter,
> 
> Hi, Thanos,
> 
> > We're hitting this issue using a QEMU branch where JJ is using vfio-user as
> the transport for multiprocess-qemu
> (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_oracle_qemu_issues_9&d=DwIBaQ&c=s883GpUCOChKOHi
> ocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5l
> ZsKPi03BNzo9pckG8DlodVG0LuEofnKw&s=dcp70CIgJljcWFwSRZm5zZRJj80jX
> XERLwpbH6ZcgzQ&e= ). We can reproduce it fairly reliably by migrating a
> virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental
> migration support, https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__review.spdk.io_gerrit_c_spdk_spdk_-
> 2B_7617_14&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw
> 6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0
> LuEofnKw&s=iXolOQM5sYj4IB-cf__Ta8jgKXZqisYE-uuwq6qnbLo&e= ). I can
> provide detailed repro instructions but first I want to make sure we're not
> missing any patches.
> 
> I don't think you missed any bug fix patches, as the issue I mentioned can
> only be trigger with my own branch at that time, and that's fixed when my
> patchset got merged.
> 
> However if you encountered the same issue, it's possible that there's an
> incorrect use of qemu memory/cpu API too somewhere there so similar
> issue is triggered.  For example, in my case it was run_on_cpu() called
> incorrectly within memory layout changing so BQL is released without being
> noticed.
> 
> I've got a series that tries to expose these hard to debug issues:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_qemu-2Ddevel_20200421162108.594796-2D1-2Dpeterx-
> 40redhat.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJ
> vtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8Dlod
> VG0LuEofnKw&s=kQRJEb4CQmxEirS-III15QJz_phzhCYLIgjOF-SB9Pk&e=
> 
> Obviously the series didn't track enough interest so it didn't get merged.
> However maybe that's also something useful to what you're debugging, so
> you can apply those patches onto your branch and see the stack when it
> reproduces again. Logically with these sanity patches it could fail earlier than
> what you've hit right now (which I believe should be within the RCU thread;
> btw it would be interesting to share your stack too when it's hit) and it could
> provide more useful information.
> 
> I saw that the old series won't apply onto master any more, so I rebased it
> and pushed it here (with one patch dropped since someone wrote a similar
> patch and got merged, so there're only 7 patches in the new tree):
> 
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_xzpeter_qemu_tree_memory-
> 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> tti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0LuE
> ofnKw&s=G-8FV-H-VcZTgCVRfTEVKo1GALIk2PqBvTdAcAXFoZ0&e=
> 
> No guarantee it'll help, but IMHO worth trying.

The memory-sanity branch fails to build:

./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-linux-user  --enable-debug
make -j 8
...
[697/973] Linking target qemu-x86_64
FAILED: qemu-x86_64
c++  -o qemu-x86_64 libcommon.fa.p/cpus-common.c.o libcommon.fa.p/page-vary-common.c.o libcommon.fa.p/disas_i386.c.o libcommon.fa.p/disas_capstone.c.o libcommon.fa.p/hw_core_cpu-common.c.o libcommon.fa.p/ebpf_ebpf_rss-stub.c.o libcommon.fa.p/accel_accel-user.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_excp_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_seg_helper.c.o libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_signal.c.o libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_cpu_loop.c.o libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o libqemu-x86_64-linux-user.fa.p/target_i386_gdbstub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_xsave_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_cpu-dump.c.o libqemu-x86_64-linux-user.fa.p/target_i386_sev-stub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_kvm_kvm-stub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_bpt_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_cc_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_excp_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_fpu_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_int_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mem_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_misc_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mpx_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_seg_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_tcg-cpu.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_translate.c.o libqemu-x86_64-linux-user.fa.p/trace_control-target.c.o libqemu-x86_64-linux-user.fa.p/cpu.c.o libqemu-x86_64-linux-user.fa.p/disas.c.o libqemu-x86_64-linux-user.fa.p/gdbstub.c.o libqemu-x86_64-linux-user.fa.p/page-vary.c.o libqemu-x86_64-linux-user.fa.p/tcg_optimize.c.o libqemu-x86_64-linux-user.fa.p/tcg_region.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-common.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-gvec.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-vec.c.o libqemu-x86_64-linux-user.fa.p/fpu_softfloat.c.o libqemu-x86_64-linux-user.fa.p/accel_accel-common.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-all.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec-common.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime-gvec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_translate-all.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_translator.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_plugin-gen.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_hax-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_xen-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_kvm-stub.c.o libqemu-x86_64-linux-user.fa.p/plugins_loader.c.o libqemu-x86_64-linux-user.fa.p/plugins_core.c.o libqemu-x86_64-linux-user.fa.p/plugins_api.c.o libqemu-x86_64-linux-user.fa.p/linux-user_elfload.c.o libqemu-x86_64-linux-user.fa.p/linux-user_exit.c.o libqemu-x86_64-linux-user.fa.p/linux-user_fd-trans.c.o libqemu-x86_64-linux-user.fa.p/linux-user_linuxload.c.o libqemu-x86_64-linux-user.fa.p/linux-user_main.c.o libqemu-x86_64-linux-user.fa.p/linux-user_mmap.c.o libqemu-x86_64-linux-user.fa.p/linux-user_safe-syscall.S.o libqemu-x86_64-linux-user.fa.p/linux-user_signal.c.o libqemu-x86_64-linux-user.fa.p/linux-user_strace.c.o libqemu-x86_64-linux-user.fa.p/linux-user_syscall.c.o libqemu-x86_64-linux-user.fa.p/linux-user_uaccess.c.o libqemu-x86_64-linux-user.fa.p/linux-user_uname.c.o libqemu-x86_64-linux-user.fa.p/thunk.c.o libqemu-x86_64-linux-user.fa.p/meson-generated_.._x86_64-linux-user-gdbstub-xml.c.o libqemu-x86_64-linux-user.fa.p/meson-generated_.._trace_generated-helpers.c.o -Wl,--as-needed -Wl,--no-undefined -pie -Wl,--whole-archive libhwcore.fa libqom.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -m64 -fstack-protector-strong -Wl,--start-group libcapstone.a libqemuutil.a libhwcore.fa libqom.fa -ldl -Wl,--dynamic-list=/root/src/qemu/build/qemu-plugins-ld.symbols -lrt -lutil -lm -pthread -Wl,--export-dynamic -lgmodule-2.0 -lglib-2.0 -lstdc++ -Wl,--end-group
/usr/bin/ld: libcommon.fa.p/cpus-common.c.o: in function `do_run_on_cpu':
/root/src/qemu/build/../cpus-common.c:153: undefined reference to `qemu_cond_wait_iothread'
collect2: error: ld returned 1 exit status
[698/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_ui64_r_minMag.c.o
[699/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i32_r_minMag.c.o
[700/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f16.c.o
[701/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f64.c.o
[702/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i64_r_minMag.c.o
[703/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80M.c.o
[704/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80.c.o
ninja: build stopped: subcommand failed.
make[1]: *** [Makefile:154: run-ninja] Error 1
make[1]: Leaving directory '/root/src/qemu/build'
make: *** [GNUmakefile:11: all] Error 2

Regarding the stack trace, I can very easily reproduce it on our branch, I know exactly where to set the breakpoint:

(gdb) r
Starting prThread 0x7fffeffff7 In: __pthread_cond_waitu host -enable-kvm -smp 4 -nographic -m 2G -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes, -numa node,memdev=mem0 -L88   PC: 0x7ffff772700cuThread 8 "qemu-system-x86" received signal SIGUSR1, User defined signal 1.
                        f58c1        GI_raise                                                                                                                                                                        50               58f7bb
#0  0x00007ffff758f7bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff757a535 in __GI_abort () at abort.c:79
#2  0x0000555555c9301e in kvm_set_phys_mem (kml=0x5555568ee830, section=0x7ffff58c05e0, add=true) at ../accel/kvm/kvm-all.c:1194
#3  0x0000555555c930cd in kvm_region_add (listener=0x5555568ee830, section=0x7ffff58c05e0) at ../accel/kvm/kvm-all.c:1211
#4  0x0000555555bd6c9e in address_space_update_topology_pass (as=0x555556648420 <address_space_memory>, old_view=0x555556f21730, new_view=0x7ffff0001cb0, adding=true) at ../softmmu/memory.c:971
#5  0x0000555555bd6f98 in address_space_set_flatview (as=0x555556648420 <address_space_memory>) at ../softmmu/memory.c:1047
#6  0x0000555555bd713f in memory_region_transaction_commit () at ../softmmu/memory.c:1099
#7  0x0000555555bd89a5 in memory_region_finalize (obj=0x555556e21800) at ../softmmu/memory.c:1751
#8  0x0000555555cca132 in object_deinit (obj=0x555556e21800, type=0x5555566a8f80) at ../qom/object.c:673
#9  0x0000555555cca1a4 in object_finalize (data=0x555556e21800) at ../qom/object.c:687
#10 0x0000555555ccb196 in object_unref (objptr=0x555556e21800) at ../qom/object.c:1186
#11 0x0000555555bb11f0 in phys_section_destroy (mr=0x555556e21800) at ../softmmu/physmem.c:1171
#12 0x0000555555bb124a in phys_sections_free (map=0x5555572cf9a0) at ../softmmu/physmem.c:1180
#13 0x0000555555bb4632 in address_space_dispatch_free (d=0x5555572cf990) at ../softmmu/physmem.c:2562
#14 0x0000555555bd4485 in flatview_destroy (view=0x5555572cf950) at ../softmmu/memory.c:291
#15 0x0000555555e367e8 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:281
#16 0x0000555555e68e57 in qemu_thread_start (args=0x555556665e30) at ../util/qemu-thread-posix.c:521
#17 0x00007ffff7720fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486lot=10, start=0xfebd1000, size=0x1000: File exists
#18 0x00007ffff76514cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-16 11:42             ` Thanos Makatos
@ 2021-07-16 14:18               ` Peter Xu
  2021-07-19 14:38                 ` Thanos Makatos
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-07-16 14:18 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

On Fri, Jul 16, 2021 at 11:42:02AM +0000, Thanos Makatos wrote:
> > -----Original Message-----
> > From: Peter Xu <peterx@redhat.com>
> > Sent: 15 July 2021 19:35
> > To: Thanos Makatos <thanos.makatos@nutanix.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> > <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> > devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> > Johnson <john.g.johnson@oracle.com>
> > Subject: Re: Question on memory commit during MR finalize()
> > 
> > On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> > > Hi Peter,
> > 
> > Hi, Thanos,
> > 
> > > We're hitting this issue using a QEMU branch where JJ is using vfio-user as
> > the transport for multiprocess-qemu
> > (https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_oracle_qemu_issues_9&d=DwIBaQ&c=s883GpUCOChKOHi
> > ocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5l
> > ZsKPi03BNzo9pckG8DlodVG0LuEofnKw&s=dcp70CIgJljcWFwSRZm5zZRJj80jX
> > XERLwpbH6ZcgzQ&e= ). We can reproduce it fairly reliably by migrating a
> > virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental
> > migration support, https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__review.spdk.io_gerrit_c_spdk_spdk_-
> > 2B_7617_14&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw
> > 6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0
> > LuEofnKw&s=iXolOQM5sYj4IB-cf__Ta8jgKXZqisYE-uuwq6qnbLo&e= ). I can
> > provide detailed repro instructions but first I want to make sure we're not
> > missing any patches.
> > 
> > I don't think you missed any bug fix patches, as the issue I mentioned can
> > only be trigger with my own branch at that time, and that's fixed when my
> > patchset got merged.
> > 
> > However if you encountered the same issue, it's possible that there's an
> > incorrect use of qemu memory/cpu API too somewhere there so similar
> > issue is triggered.  For example, in my case it was run_on_cpu() called
> > incorrectly within memory layout changing so BQL is released without being
> > noticed.
> > 
> > I've got a series that tries to expose these hard to debug issues:
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lore.kernel.org_qemu-2Ddevel_20200421162108.594796-2D1-2Dpeterx-
> > 40redhat.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJ
> > vtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8Dlod
> > VG0LuEofnKw&s=kQRJEb4CQmxEirS-III15QJz_phzhCYLIgjOF-SB9Pk&e=
> > 
> > Obviously the series didn't track enough interest so it didn't get merged.
> > However maybe that's also something useful to what you're debugging, so
> > you can apply those patches onto your branch and see the stack when it
> > reproduces again. Logically with these sanity patches it could fail earlier than
> > what you've hit right now (which I believe should be within the RCU thread;
> > btw it would be interesting to share your stack too when it's hit) and it could
> > provide more useful information.
> > 
> > I saw that the old series won't apply onto master any more, so I rebased it
> > and pushed it here (with one patch dropped since someone wrote a similar
> > patch and got merged, so there're only 7 patches in the new tree):
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_xzpeter_qemu_tree_memory-
> > 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> > tti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0LuE
> > ofnKw&s=G-8FV-H-VcZTgCVRfTEVKo1GALIk2PqBvTdAcAXFoZ0&e=
> > 
> > No guarantee it'll help, but IMHO worth trying.
> 
> The memory-sanity branch fails to build:
> 
> ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-linux-user  --enable-debug
> make -j 8
> ...
> [697/973] Linking target qemu-x86_64
> FAILED: qemu-x86_64
> c++  -o qemu-x86_64 libcommon.fa.p/cpus-common.c.o libcommon.fa.p/page-vary-common.c.o libcommon.fa.p/disas_i386.c.o libcommon.fa.p/disas_capstone.c.o libcommon.fa.p/hw_core_cpu-common.c.o libcommon.fa.p/ebpf_ebpf_rss-stub.c.o libcommon.fa.p/accel_accel-user.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_excp_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_seg_helper.c.o libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_signal.c.o libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_cpu_loop.c.o libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o libqemu-x86_64-linux-user.fa.p/target_i386_gdbstub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_xsave_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_cpu-dump.c.o libqemu-x86_64-linux-user.fa.p/target_i386_sev-stub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_kvm_kvm-stub.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_bpt_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_cc_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_excp_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_fpu_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_int_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mem_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_misc_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mpx_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_seg_helper.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_tcg-cpu.c.o libqemu-x86_64-linux-user.fa.p/target_i386_tcg_translate.c.o libqemu-x86_64-linux-user.fa.p/trace_control-target.c.o libqemu-x86_64-linux-user.fa.p/cpu.c.o libqemu-x86_64-linux-user.fa.p/disas.c.o libqemu-x86_64-linux-user.fa.p/gdbstub.c.o libqemu-x86_64-linux-user.fa.p/page-vary.c.o libqemu-x86_64-linux-user.fa.p/tcg_optimize.c.o libqemu-x86_64-linux-user.fa.p/tcg_region.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-common.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-gvec.c.o libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-vec.c.o libqemu-x86_64-linux-user.fa.p/fpu_softfloat.c.o libqemu-x86_64-linux-user.fa.p/accel_accel-common.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-all.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec-common.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime-gvec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_translate-all.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_translator.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_tcg_plugin-gen.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_hax-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_xen-stub.c.o libqemu-x86_64-linux-user.fa.p/accel_stubs_kvm-stub.c.o libqemu-x86_64-linux-user.fa.p/plugins_loader.c.o libqemu-x86_64-linux-user.fa.p/plugins_core.c.o libqemu-x86_64-linux-user.fa.p/plugins_api.c.o libqemu-x86_64-linux-user.fa.p/linux-user_elfload.c.o libqemu-x86_64-linux-user.fa.p/linux-user_exit.c.o libqemu-x86_64-linux-user.fa.p/linux-user_fd-trans.c.o libqemu-x86_64-linux-user.fa.p/linux-user_linuxload.c.o libqemu-x86_64-linux-user.fa.p/linux-user_main.c.o libqemu-x86_64-linux-user.fa.p/linux-user_mmap.c.o libqemu-x86_64-linux-user.fa.p/linux-user_safe-syscall.S.o libqemu-x86_64-linux-user.fa.p/linux-user_signal.c.o libqemu-x86_64-linux-user.fa.p/linux-user_strace.c.o libqemu-x86_64-linux-user.fa.p/linux-user_syscall.c.o libqemu-x86_64-linux-user.fa.p/linux-user_uaccess.c.o libqemu-x86_64-linux-user.fa.p/linux-user_uname.c.o libqemu-x86_64-linux-user.fa.p/thunk.c.o libqemu-x86_64-linux-user.fa.p/meson-generated_.._x86_64-linux-user-gdbstub-xml.c.o libqemu-x86_64-linux-user.fa.p/meson-generated_.._trace_generated-helpers.c.o -Wl,--as-needed -Wl,--no-undefined -pie -Wl,--whole-archive libhwcore.fa libqom.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -m64 -fstack-protector-strong -Wl,--start-group libcapstone.a libqemuutil.a libhwcore.fa libqom.fa -ldl -Wl,--dynamic-list=/root/src/qemu/build/qemu-plugins-ld.symbols -lrt -lutil -lm -pthread -Wl,--export-dynamic -lgmodule-2.0 -lglib-2.0 -lstdc++ -Wl,--end-group
> /usr/bin/ld: libcommon.fa.p/cpus-common.c.o: in function `do_run_on_cpu':
> /root/src/qemu/build/../cpus-common.c:153: undefined reference to `qemu_cond_wait_iothread'
> collect2: error: ld returned 1 exit status
> [698/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_ui64_r_minMag.c.o
> [699/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i32_r_minMag.c.o
> [700/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f16.c.o
> [701/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f64.c.o
> [702/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i64_r_minMag.c.o
> [703/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80M.c.o
> [704/973] Compiling C object tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80.c.o
> ninja: build stopped: subcommand failed.
> make[1]: *** [Makefile:154: run-ninja] Error 1
> make[1]: Leaving directory '/root/src/qemu/build'
> make: *** [GNUmakefile:11: all] Error 2

So it fails linux-user...  I can fix the compilation, but it should pass
x86_64-softmmu. More importantly - are you using linux-user binaries?  The
thing is my branch will only be helpful to debug BQL related issues, so if
that's the case then please ignore the branch as linux-user shouldn't be using
bql, then my branch won't help.

> 
> Regarding the stack trace, I can very easily reproduce it on our branch, I know exactly where to set the breakpoint:
> 
> (gdb) r
> Starting prThread 0x7fffeffff7 In: __pthread_cond_waitu host -enable-kvm -smp 4 -nographic -m 2G -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes, -numa node,memdev=mem0 -L88   PC: 0x7ffff772700cuThread 8 "qemu-system-x86" received signal SIGUSR1, User defined signal 1.
>                         f58c1        GI_raise                                                                                                                                                                        50               58f7bb
> #0  0x00007ffff758f7bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ffff757a535 in __GI_abort () at abort.c:79
> #2  0x0000555555c9301e in kvm_set_phys_mem (kml=0x5555568ee830, section=0x7ffff58c05e0, add=true) at ../accel/kvm/kvm-all.c:1194
> #3  0x0000555555c930cd in kvm_region_add (listener=0x5555568ee830, section=0x7ffff58c05e0) at ../accel/kvm/kvm-all.c:1211
> #4  0x0000555555bd6c9e in address_space_update_topology_pass (as=0x555556648420 <address_space_memory>, old_view=0x555556f21730, new_view=0x7ffff0001cb0, adding=true) at ../softmmu/memory.c:971
> #5  0x0000555555bd6f98 in address_space_set_flatview (as=0x555556648420 <address_space_memory>) at ../softmmu/memory.c:1047
> #6  0x0000555555bd713f in memory_region_transaction_commit () at ../softmmu/memory.c:1099
> #7  0x0000555555bd89a5 in memory_region_finalize (obj=0x555556e21800) at ../softmmu/memory.c:1751
> #8  0x0000555555cca132 in object_deinit (obj=0x555556e21800, type=0x5555566a8f80) at ../qom/object.c:673
> #9  0x0000555555cca1a4 in object_finalize (data=0x555556e21800) at ../qom/object.c:687
> #10 0x0000555555ccb196 in object_unref (objptr=0x555556e21800) at ../qom/object.c:1186
> #11 0x0000555555bb11f0 in phys_section_destroy (mr=0x555556e21800) at ../softmmu/physmem.c:1171
> #12 0x0000555555bb124a in phys_sections_free (map=0x5555572cf9a0) at ../softmmu/physmem.c:1180
> #13 0x0000555555bb4632 in address_space_dispatch_free (d=0x5555572cf990) at ../softmmu/physmem.c:2562
> #14 0x0000555555bd4485 in flatview_destroy (view=0x5555572cf950) at ../softmmu/memory.c:291
> #15 0x0000555555e367e8 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:281
> #16 0x0000555555e68e57 in qemu_thread_start (args=0x555556665e30) at ../util/qemu-thread-posix.c:521
> #17 0x00007ffff7720fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486lot=10, start=0xfebd1000, size=0x1000: File exists
> #18 0x00007ffff76514cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Yes indeed it looks alike.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Question on memory commit during MR finalize()
  2021-07-16 14:18               ` Peter Xu
@ 2021-07-19 14:38                 ` Thanos Makatos
  2021-07-19 15:56                   ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Thanos Makatos @ 2021-07-19 14:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: 16 July 2021 15:19
> To: Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> Johnson <john.g.johnson@oracle.com>
> Subject: Re: Question on memory commit during MR finalize()
> 
> On Fri, Jul 16, 2021 at 11:42:02AM +0000, Thanos Makatos wrote:
> > > -----Original Message-----
> > > From: Peter Xu <peterx@redhat.com>
> > > Sent: 15 July 2021 19:35
> > > To: Thanos Makatos <thanos.makatos@nutanix.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> > > <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> > > devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> > > Johnson <john.g.johnson@oracle.com>
> > > Subject: Re: Question on memory commit during MR finalize()
> > >
> > > On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> > > > Hi Peter,
> > >
> > > Hi, Thanos,
> > >
> > > > We're hitting this issue using a QEMU branch where JJ is using
> > > > vfio-user as
> > > the transport for multiprocess-qemu
> > > (https://urldefense.proofpoint.com/v2/url?u=https-
> > >
> 3A__github.com_oracle_qemu_issues_9&d=DwIBaQ&c=s883GpUCOChKOHi
> > >
> ocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5l
> > >
> ZsKPi03BNzo9pckG8DlodVG0LuEofnKw&s=dcp70CIgJljcWFwSRZm5zZRJj80jX
> > > XERLwpbH6ZcgzQ&e= ). We can reproduce it fairly reliably by
> > > migrating a virtual SPDK NVMe controller (the NVMf/vfio-user target
> > > with experimental migration support,
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__review.spdk.io_gerrit_c_spdk_spdk_-
> > >
> 2B_7617_14&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw
> > >
> 6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0
> > > LuEofnKw&s=iXolOQM5sYj4IB-cf__Ta8jgKXZqisYE-uuwq6qnbLo&e= ). I
> can
> > > provide detailed repro instructions but first I want to make sure
> > > we're not missing any patches.
> > >
> > > I don't think you missed any bug fix patches, as the issue I
> > > mentioned can only be trigger with my own branch at that time, and
> > > that's fixed when my patchset got merged.
> > >
> > > However if you encountered the same issue, it's possible that
> > > there's an incorrect use of qemu memory/cpu API too somewhere there
> > > so similar issue is triggered.  For example, in my case it was
> > > run_on_cpu() called incorrectly within memory layout changing so BQL
> > > is released without being noticed.
> > >
> > > I've got a series that tries to expose these hard to debug issues:
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__lore.kernel.org_qemu-2Ddevel_20200421162108.594796-2D1-
> 2Dpeterx-
> > >
> 40redhat.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJ
> > >
> vtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8Dlod
> > > VG0LuEofnKw&s=kQRJEb4CQmxEirS-III15QJz_phzhCYLIgjOF-SB9Pk&e=
> > >
> > > Obviously the series didn't track enough interest so it didn't get merged.
> > > However maybe that's also something useful to what you're debugging,
> > > so you can apply those patches onto your branch and see the stack
> > > when it reproduces again. Logically with these sanity patches it
> > > could fail earlier than what you've hit right now (which I believe
> > > should be within the RCU thread; btw it would be interesting to
> > > share your stack too when it's hit) and it could provide more useful
> information.
> > >
> > > I saw that the old series won't apply onto master any more, so I
> > > rebased it and pushed it here (with one patch dropped since someone
> > > wrote a similar patch and got merged, so there're only 7 patches in the
> new tree):
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__github.com_xzpeter_qemu_tree_memory-
> > >
> 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> > >
> tti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0LuE
> > > ofnKw&s=G-8FV-H-VcZTgCVRfTEVKo1GALIk2PqBvTdAcAXFoZ0&e=
> > >
> > > No guarantee it'll help, but IMHO worth trying.
> >
> > The memory-sanity branch fails to build:
> >
> > ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-linux-user
> > --enable-debug make -j 8 ...
> > [697/973] Linking target qemu-x86_64
> > FAILED: qemu-x86_64
> > c++  -o qemu-x86_64 libcommon.fa.p/cpus-common.c.o
> > c++ libcommon.fa.p/page-vary-common.c.o libcommon.fa.p/disas_i386.c.o
> > c++ libcommon.fa.p/disas_capstone.c.o
> > c++ libcommon.fa.p/hw_core_cpu-common.c.o
> > c++ libcommon.fa.p/ebpf_ebpf_rss-stub.c.o
> > c++ libcommon.fa.p/accel_accel-user.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_excp_helper.c.
> > c++ o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_seg_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_signal.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_cpu_loop.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_gdbstub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_xsave_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_cpu-dump.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_sev-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_kvm_kvm-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_bpt_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_cc_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_excp_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_fpu_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_int_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mem_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_misc_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mpx_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_seg_helper.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_tcg-cpu.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/target_i386_tcg_translate.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/trace_control-target.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/cpu.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/disas.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/gdbstub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/page-vary.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_optimize.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_region.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_tcg.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_tcg-common.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_tcg-op.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-gvec.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-vec.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/fpu_softfloat.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_accel-common.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-all.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec-common.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime-gvec.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_translate-all.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_translator.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_tcg_plugin-gen.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_stubs_hax-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_stubs_xen-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/accel_stubs_kvm-stub.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/plugins_loader.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/plugins_core.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/plugins_api.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_elfload.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_exit.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_fd-trans.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_linuxload.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_main.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_mmap.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_safe-syscall.S.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_signal.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_strace.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_syscall.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_uaccess.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/linux-user_uname.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/thunk.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/meson-generated_.._x86_64-linux-
> use
> > c++ r-gdbstub-xml.c.o
> > c++ libqemu-x86_64-linux-user.fa.p/meson-
> generated_.._trace_generated-
> > c++ helpers.c.o -Wl,--as-needed -Wl,--no-undefined -pie
> > c++ -Wl,--whole-archive libhwcore.fa libqom.fa -Wl,--no-whole-archive
> > c++ -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -m64
> > c++ -fstack-protector-strong -Wl,--start-group libcapstone.a
> > c++ libqemuutil.a libhwcore.fa libqom.fa -ldl
> > c++ -Wl,--dynamic-list=/root/src/qemu/build/qemu-plugins-ld.symbols
> > c++ -lrt -lutil -lm -pthread -Wl,--export-dynamic -lgmodule-2.0
> > c++ -lglib-2.0 -lstdc++ -Wl,--end-group
> > /usr/bin/ld: libcommon.fa.p/cpus-common.c.o: in function
> `do_run_on_cpu':
> > /root/src/qemu/build/../cpus-common.c:153: undefined reference to
> `qemu_cond_wait_iothread'
> > collect2: error: ld returned 1 exit status [698/973] Compiling C
> > object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_ui64_r_mi
> > nMag.c.o [699/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i32_r_min
> > Mag.c.o [700/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f16.c.o
> > [701/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f64.c.o
> > [702/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i64_r_min
> > Mag.c.o [703/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80M.c
> > .o [704/973] Compiling C object
> > tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80.c.
> > o
> > ninja: build stopped: subcommand failed.
> > make[1]: *** [Makefile:154: run-ninja] Error 1
> > make[1]: Leaving directory '/root/src/qemu/build'
> > make: *** [GNUmakefile:11: all] Error 2
> 
> So it fails linux-user...  I can fix the compilation, but it should pass x86_64-
> softmmu. More importantly - are you using linux-user binaries?  The thing is
> my branch will only be helpful to debug BQL related issues, so if that's the
> case then please ignore the branch as linux-user shouldn't be using bql, then
> my branch won't help.

Doesn't matter, I can use x86_64-softmmu.

> 
> >
> > Regarding the stack trace, I can very easily reproduce it on our branch, I
> know exactly where to set the breakpoint:
> >
> > (gdb) r
> > Starting prThread 0x7fffeffff7 In: __pthread_cond_waitu host -enable-kvm
> -smp 4 -nographic -m 2G -object memory-backend-
> file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes, -
> numa node,memdev=mem0 -L88   PC: 0x7ffff772700cuThread 8 "qemu-
> system-x86" received signal SIGUSR1, User defined signal 1.
> >                         f58c1        GI_raise
> 50               58f7bb
> > #0  0x00007ffff758f7bb in __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007ffff757a535 in __GI_abort () at abort.c:79
> > #2  0x0000555555c9301e in kvm_set_phys_mem (kml=0x5555568ee830,
> > section=0x7ffff58c05e0, add=true) at ../accel/kvm/kvm-all.c:1194
> > #3  0x0000555555c930cd in kvm_region_add (listener=0x5555568ee830,
> > section=0x7ffff58c05e0) at ../accel/kvm/kvm-all.c:1211
> > #4  0x0000555555bd6c9e in address_space_update_topology_pass
> > (as=0x555556648420 <address_space_memory>,
> old_view=0x555556f21730,
> > new_view=0x7ffff0001cb0, adding=true) at ../softmmu/memory.c:971
> > #5  0x0000555555bd6f98 in address_space_set_flatview
> > (as=0x555556648420 <address_space_memory>) at
> ../softmmu/memory.c:1047
> > #6  0x0000555555bd713f in memory_region_transaction_commit () at
> > ../softmmu/memory.c:1099
> > #7  0x0000555555bd89a5 in memory_region_finalize (obj=0x555556e21800)
> > at ../softmmu/memory.c:1751
> > #8  0x0000555555cca132 in object_deinit (obj=0x555556e21800,
> > type=0x5555566a8f80) at ../qom/object.c:673
> > #9  0x0000555555cca1a4 in object_finalize (data=0x555556e21800) at
> > ../qom/object.c:687
> > #10 0x0000555555ccb196 in object_unref (objptr=0x555556e21800) at
> > ../qom/object.c:1186
> > #11 0x0000555555bb11f0 in phys_section_destroy (mr=0x555556e21800) at
> > ../softmmu/physmem.c:1171
> > #12 0x0000555555bb124a in phys_sections_free (map=0x5555572cf9a0) at
> > ../softmmu/physmem.c:1180
> > #13 0x0000555555bb4632 in address_space_dispatch_free
> > (d=0x5555572cf990) at ../softmmu/physmem.c:2562
> > #14 0x0000555555bd4485 in flatview_destroy (view=0x5555572cf950) at
> > ../softmmu/memory.c:291
> > #15 0x0000555555e367e8 in call_rcu_thread (opaque=0x0) at
> > ../util/rcu.c:281
> > #16 0x0000555555e68e57 in qemu_thread_start (args=0x555556665e30) at
> > ../util/qemu-thread-posix.c:521
> > #17 0x00007ffff7720fa3 in start_thread (arg=<optimized out>) at
> > pthread_create.c:486lot=10, start=0xfebd1000, size=0x1000: File exists
> > #18 0x00007ffff76514cf in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Yes indeed it looks alike.
> 
> --
> Peter Xu

I can trivially trigger an assertion with a build where I merged the recent vfio-user patches (https://patchew.org/QEMU/cover.1626675354.git.elena.ufimtseva@oracle.com/) to master and then merging the result into your xzpeter/memory-sanity branch, I've pushed the branch here: https://github.com/tmakatos/qemu/tree/memory-sanity. I explain the repro steps below in case you want to take a look:

Build as follows:

./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-softmmu --enable-kvm  --enable-debug --enable-multiprocess && make -j `nproc` && make install

Then build and run the GPIO sample from libvfio-user (https://github.com/nutanix/libvfio-user):

libvfio-user/build/dbg/samples/gpio-pci-idio-16 -v /var/run/vfio-user.sock

And then run QEMU as follows:

gdb --args /opt/qemu-xzpeter/bin/qemu-system-x86_64 -cpu host -enable-kvm -smp 4 -m 2G -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes -numa node,memdev=mem0 -kernel bionic-server-cloudimg-amd64-vmlinuz-generic -initrd bionic-server-cloudimg-amd64-initrd-generic -append 'console=ttyS0 root=/dev/sda1 single' -hda bionic-server-cloudimg-amd64-0.raw -nic user,model=virtio-net-pci -machine pc-q35-3.1 -device vfio-user-pci,socket=/var/run/vfio-user.sock -nographic

I immediately get the following stack trace:

Thread 5 "qemu-system-x86" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fffe6e82700 (LWP 151973)]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
103     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  0x00007ffff655d29c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007ffff6558642 in __pthread_mutex_cond_lock (mutex=mutex@entry=0x5555568bb280 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007ffff6559ef8 in __pthread_cond_wait_common (abstime=0x0, mutex=0x5555568bb280 <qemu_global_mutex>, cond=0x555556cecc30) at pthread_cond_wait.c:645
#3  0x00007ffff6559ef8 in __pthread_cond_wait (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>) at pthread_cond_wait.c:655
#4  0x000055555604f717 in qemu_cond_wait_impl (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>, file=0x5555561ca869 "../softmmu/cpus.c", line=514) at ../util/qemu-thread-posix.c:194
#5  0x0000555555d28a4a in qemu_cond_wait_iothread (cond=0x555556cecc30) at ../softmmu/cpus.c:514
#6  0x0000555555d28781 in qemu_wait_io_event (cpu=0x555556ce02c0) at ../softmmu/cpus.c:425
#7  0x0000555555e5da75 in kvm_vcpu_thread_fn (arg=0x555556ce02c0) at ../accel/kvm/kvm-accel-ops.c:54
#8  0x000055555604feed in qemu_thread_start (args=0x555556cecc70) at ../util/qemu-thread-posix.c:541
#9  0x00007ffff6553fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007ffff64824cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-19 14:38                 ` Thanos Makatos
@ 2021-07-19 15:56                   ` Peter Xu
  2021-07-19 18:02                     ` Thanos Makatos
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-07-19 15:56 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

Hi, Thanos,

On Mon, Jul 19, 2021 at 02:38:52PM +0000, Thanos Makatos wrote:
> I can trivially trigger an assertion with a build where I merged the recent vfio-user patches (https://patchew.org/QEMU/cover.1626675354.git.elena.ufimtseva@oracle.com/) to master and then merging the result into your xzpeter/memory-sanity branch, I've pushed the branch here: https://github.com/tmakatos/qemu/tree/memory-sanity. I explain the repro steps below in case you want to take a look:
> 
> Build as follows:
> 
> ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-softmmu --enable-kvm  --enable-debug --enable-multiprocess && make -j `nproc` && make install
> 
> Then build and run the GPIO sample from libvfio-user (https://github.com/nutanix/libvfio-user):
> 
> libvfio-user/build/dbg/samples/gpio-pci-idio-16 -v /var/run/vfio-user.sock
> 
> And then run QEMU as follows:
> 
> gdb --args /opt/qemu-xzpeter/bin/qemu-system-x86_64 -cpu host -enable-kvm -smp 4 -m 2G -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes -numa node,memdev=mem0 -kernel bionic-server-cloudimg-amd64-vmlinuz-generic -initrd bionic-server-cloudimg-amd64-initrd-generic -append 'console=ttyS0 root=/dev/sda1 single' -hda bionic-server-cloudimg-amd64-0.raw -nic user,model=virtio-net-pci -machine pc-q35-3.1 -device vfio-user-pci,socket=/var/run/vfio-user.sock -nographic
> 
> I immediately get the following stack trace:
> 
> Thread 5 "qemu-system-x86" received signal SIGUSR1, User defined signal 1.

This is SIGUSR1.  QEMU uses it for general vcpu ipis.

> [Switching to Thread 0x7fffe6e82700 (LWP 151973)]
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> 103     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
> (gdb) bt
> #0  0x00007ffff655d29c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> #1  0x00007ffff6558642 in __pthread_mutex_cond_lock (mutex=mutex@entry=0x5555568bb280 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007ffff6559ef8 in __pthread_cond_wait_common (abstime=0x0, mutex=0x5555568bb280 <qemu_global_mutex>, cond=0x555556cecc30) at pthread_cond_wait.c:645
> #3  0x00007ffff6559ef8 in __pthread_cond_wait (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>) at pthread_cond_wait.c:655
> #4  0x000055555604f717 in qemu_cond_wait_impl (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>, file=0x5555561ca869 "../softmmu/cpus.c", line=514) at ../util/qemu-thread-posix.c:194
> #5  0x0000555555d28a4a in qemu_cond_wait_iothread (cond=0x555556cecc30) at ../softmmu/cpus.c:514
> #6  0x0000555555d28781 in qemu_wait_io_event (cpu=0x555556ce02c0) at ../softmmu/cpus.c:425
> #7  0x0000555555e5da75 in kvm_vcpu_thread_fn (arg=0x555556ce02c0) at ../accel/kvm/kvm-accel-ops.c:54
> #8  0x000055555604feed in qemu_thread_start (args=0x555556cecc70) at ../util/qemu-thread-posix.c:541
> #9  0x00007ffff6553fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> #10 0x00007ffff64824cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Would you please add below to your ~/.gdbinit script?

  handle SIGUSR1 nostop noprint

Or just run without gdb and wait it to crash with SIGABRT.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-19 15:56                   ` Peter Xu
@ 2021-07-19 18:02                     ` Thanos Makatos
  2021-07-19 19:05                       ` Thanos Makatos
  0 siblings, 1 reply; 16+ messages in thread
From: Thanos Makatos @ 2021-07-19 18:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, John Levon, John G Johnson, Markus Armbruster,
	QEMU Devel Mailing List

[-- Attachment #1: Type: text/plain, Size: 4353 bytes --]

Omg I don't know how I missed that, of course I'll ignore SIGUSR1 and retest!

________________________________
From: Peter Xu <peterx@redhat.com>
Sent: Monday, 19 July 2021, 16:58
To: Thanos Makatos
Cc: Paolo Bonzini; Markus Armbruster; QEMU Devel Mailing List; John Levon; John G Johnson
Subject: Re: Question on memory commit during MR finalize()

Hi, Thanos,

On Mon, Jul 19, 2021 at 02:38:52PM +0000, Thanos Makatos wrote:
> I can trivially trigger an assertion with a build where I merged the recent vfio-user patches (https://urldefense.proofpoint.com/v2/url?u=https-3A__patchew.org_QEMU_cover.1626675354.git.elena.ufimtseva-40oracle.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_5r4XY&s=moFPVchYp27xozQcvvxG4nb4nC2QmMnqQ1Wmt4Z3dNE&e= ) to master and then merging the result into your xzpeter/memory-sanity branch, I've pushed the branch here: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmakatos_qemu_tree_memory-2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_5r4XY&s=veyjdkkFkGSYNDZOuksB-kbHmdQaw9RYxyZp8Qo7nW4&e= . I explain the repro steps below in case you want to take a look:
>
> Build as follows:
>
> ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-softmmu --enable-kvm  --enable-debug --enable-multiprocess && make -j `nproc` && make install
>
> Then build and run the GPIO sample from libvfio-user (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nutanix_libvfio-2Duser&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_5r4XY&s=HYP5NmDMGuS13pdyV83x3HzyhGbE-oP1T8NLtu0d1U8&e= ):
>
> libvfio-user/build/dbg/samples/gpio-pci-idio-16 -v /var/run/vfio-user.sock
>
> And then run QEMU as follows:
>
> gdb --args /opt/qemu-xzpeter/bin/qemu-system-x86_64 -cpu host -enable-kvm -smp 4 -m 2G -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes -numa node,memdev=mem0 -kernel bionic-server-cloudimg-amd64-vmlinuz-generic -initrd bionic-server-cloudimg-amd64-initrd-generic -append 'console=ttyS0 root=/dev/sda1 single' -hda bionic-server-cloudimg-amd64-0.raw -nic user,model=virtio-net-pci -machine pc-q35-3.1 -device vfio-user-pci,socket=/var/run/vfio-user.sock -nographic
>
> I immediately get the following stack trace:
>
> Thread 5 "qemu-system-x86" received signal SIGUSR1, User defined signal 1.

This is SIGUSR1.  QEMU uses it for general vcpu ipis.

> [Switching to Thread 0x7fffe6e82700 (LWP 151973)]
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> 103     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
> (gdb) bt
> #0  0x00007ffff655d29c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> #1  0x00007ffff6558642 in __pthread_mutex_cond_lock (mutex=mutex@entry=0x5555568bb280 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007ffff6559ef8 in __pthread_cond_wait_common (abstime=0x0, mutex=0x5555568bb280 <qemu_global_mutex>, cond=0x555556cecc30) at pthread_cond_wait.c:645
> #3  0x00007ffff6559ef8 in __pthread_cond_wait (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>) at pthread_cond_wait.c:655
> #4  0x000055555604f717 in qemu_cond_wait_impl (cond=0x555556cecc30, mutex=0x5555568bb280 <qemu_global_mutex>, file=0x5555561ca869 "../softmmu/cpus.c", line=514) at ../util/qemu-thread-posix.c:194
> #5  0x0000555555d28a4a in qemu_cond_wait_iothread (cond=0x555556cecc30) at ../softmmu/cpus.c:514
> #6  0x0000555555d28781 in qemu_wait_io_event (cpu=0x555556ce02c0) at ../softmmu/cpus.c:425
> #7  0x0000555555e5da75 in kvm_vcpu_thread_fn (arg=0x555556ce02c0) at ../accel/kvm/kvm-accel-ops.c:54
> #8  0x000055555604feed in qemu_thread_start (args=0x555556cecc70) at ../util/qemu-thread-posix.c:541
> #9  0x00007ffff6553fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> #10 0x00007ffff64824cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Would you please add below to your ~/.gdbinit script?

  handle SIGUSR1 nostop noprint

Or just run without gdb and wait it to crash with SIGABRT.

Thanks,

--
Peter Xu



[-- Attachment #2: Type: text/html, Size: 6133 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Question on memory commit during MR finalize()
  2021-07-19 18:02                     ` Thanos Makatos
@ 2021-07-19 19:05                       ` Thanos Makatos
  2021-07-19 19:59                         ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Thanos Makatos @ 2021-07-19 19:05 UTC (permalink / raw)
  To: Thanos Makatos, Peter Xu
  Cc: QEMU Devel Mailing List, Paolo Bonzini, John G Johnson,
	Markus Armbruster, John Levon

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+thanos.makatos=nutanix.com@nongnu.org> On Behalf Of Thanos
> Makatos
> Sent: 19 July 2021 19:02
> To: Peter Xu <peterx@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; John Levon
> <john.levon@nutanix.com>; John G Johnson <john.g.johnson@oracle.com>;
> Markus Armbruster <armbru@redhat.com>; QEMU Devel Mailing List
> <qemu-devel@nongnu.org>
> Subject: Re: Question on memory commit during MR finalize()
> 
> Omg I don't know how I missed that, of course I'll ignore SIGUSR1 and retest!
> 
> ________________________________________
> From: Peter Xu <mailto:peterx@redhat.com>
> Sent: Monday, 19 July 2021, 16:58
> To: Thanos Makatos
> Cc: Paolo Bonzini; Markus Armbruster; QEMU Devel Mailing List; John Levon;
> John G Johnson
> Subject: Re: Question on memory commit during MR finalize()
> 
> 
> Hi, Thanos,
> 
> On Mon, Jul 19, 2021 at 02:38:52PM +0000, Thanos Makatos wrote:
> > I can trivially trigger an assertion with a build where I merged the recent
> vfio-user patches (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patchew.org_QEMU_cover.1626675354.git.elena.ufimtseva-
> 40oracle.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJv
> tw6ogtti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8
> OnG_5r4XY&s=moFPVchYp27xozQcvvxG4nb4nC2QmMnqQ1Wmt4Z3dNE&e
> = ) to master and then merging the result into your xzpeter/memory-sanity
> branch, I've pushed the branch here:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_tmakatos_qemu_tree_memory-
> 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> tti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_
> 5r4XY&s=veyjdkkFkGSYNDZOuksB-kbHmdQaw9RYxyZp8Qo7nW4&e= . I
> explain the repro steps below in case you want to take a look:
> >
> > Build as follows:
> >
> > ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-softmmu --
> enable-kvm  --enable-debug --enable-multiprocess && make -j `nproc` &&
> make install
> >
> > Then build and run the GPIO sample from libvfio-user
> (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_nutanix_libvfio-
> 2Duser&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogt
> ti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_5
> r4XY&s=HYP5NmDMGuS13pdyV83x3HzyhGbE-oP1T8NLtu0d1U8&e= ):
> >
> > libvfio-user/build/dbg/samples/gpio-pci-idio-16 -v /var/run/vfio-user.sock
> >
> > And then run QEMU as follows:
> >
> > gdb --args /opt/qemu-xzpeter/bin/qemu-system-x86_64 -cpu host -
> enable-kvm -smp 4 -m 2G -object memory-backend-
> file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes -
> numa node,memdev=mem0 -kernel bionic-server-cloudimg-amd64-vmlinuz-
> generic -initrd bionic-server-cloudimg-amd64-initrd-generic -append
> 'console=ttyS0 root=/dev/sda1 single' -hda bionic-server-cloudimg-amd64-
> 0.raw -nic user,model=virtio-net-pci -machine pc-q35-3.1 -device vfio-user-
> pci,socket=/var/run/vfio-user.sock -nographic
> >
> > I immediately get the following stack trace:
> >
> > Thread 5 "qemu-system-x86" received signal SIGUSR1, User defined signal
> 1.
> 
> This is SIGUSR1.  QEMU uses it for general vcpu ipis.
> 
> > [Switching to Thread 0x7fffe6e82700 (LWP 151973)]
> > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> > 103     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or
> directory.
> > (gdb) bt
> > #0  0x00007ffff655d29c in __lll_lock_wait () at
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> > #1  0x00007ffff6558642 in __pthread_mutex_cond_lock
> (mutex=mutex@entry=0x5555568bb280 <qemu_global_mutex>) at
> ../nptl/pthread_mutex_lock.c:80
> > #2  0x00007ffff6559ef8 in __pthread_cond_wait_common (abstime=0x0,
> mutex=0x5555568bb280 <qemu_global_mutex>, cond=0x555556cecc30) at
> pthread_cond_wait.c:645
> > #3  0x00007ffff6559ef8 in __pthread_cond_wait (cond=0x555556cecc30,
> mutex=0x5555568bb280 <qemu_global_mutex>) at
> pthread_cond_wait.c:655
> > #4  0x000055555604f717 in qemu_cond_wait_impl (cond=0x555556cecc30,
> mutex=0x5555568bb280 <qemu_global_mutex>, file=0x5555561ca869
> "../softmmu/cpus.c", line=514) at ../util/qemu-thread-posix.c:194
> > #5  0x0000555555d28a4a in qemu_cond_wait_iothread
> (cond=0x555556cecc30) at ../softmmu/cpus.c:514
> > #6  0x0000555555d28781 in qemu_wait_io_event (cpu=0x555556ce02c0) at
> ../softmmu/cpus.c:425
> > #7  0x0000555555e5da75 in kvm_vcpu_thread_fn (arg=0x555556ce02c0) at
> ../accel/kvm/kvm-accel-ops.c:54
> > #8  0x000055555604feed in qemu_thread_start (args=0x555556cecc70) at
> ../util/qemu-thread-posix.c:541
> > #9  0x00007ffff6553fa3 in start_thread (arg=<optimized out>) at
> pthread_create.c:486
> > #10 0x00007ffff64824cf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Would you please add below to your ~/.gdbinit script?
> 
>   handle SIGUSR1 nostop noprint
> 
> Or just run without gdb and wait it to crash with SIGABRT.
> 
> Thanks,
> 
> --
> Peter Xu

Sorry for the bad brain day, here's your stack trace:

qemu-system-x86_64: ../softmmu/cpus.c:72: qemu_mutex_unlock_iothread_prepare: Assertion `!memory_region_has_pending_transaction()' failed.

Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff63c07bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff63ab535 in __GI_abort () at abort.c:79
#2  0x00007ffff63ab40f in __assert_fail_base
    (fmt=0x7ffff650dee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555561ca880 "!memory_region_has_pending_transaction()", file=0x5555561ca869 "../softmmu/cpus.c", line=72, function=<optimized out>) at assert.c:92
#3  0x00007ffff63b9102 in __GI___assert_fail
    (assertion=0x5555561ca880 "!memory_region_has_pending_transaction()", file=0x5555561ca869 "../softmmu/cpus.c", line=72, function=0x5555561caa60 <__PRETTY_FUNCTION__.37393> "qemu_mutex_unlock_iothread_prepare") at assert.c:101
#4  0x0000555555d27c20 in qemu_mutex_unlock_iothread_prepare () at ../softmmu/cpus.c:72
#5  0x0000555555d289f6 in qemu_mutex_unlock_iothread () at ../softmmu/cpus.c:507
#6  0x0000555555dcb3d6 in vfio_user_send_recv (proxy=0x555557ac5560, msg=0x555557933d50, fds=0x0, rsize=40) at ../hw/vfio/user.c:88
#7  0x0000555555dcd30a in vfio_user_dma_unmap (proxy=0x555557ac5560, unmap=0x7fffffffd8d0, bitmap=0x0) at ../hw/vfio/user.c:796
#8  0x0000555555dabf5f in vfio_dma_unmap (container=0x555557a06fb0, iova=786432, size=2146697216, iotlb=0x0) at ../hw/vfio/common.c:501
#9  0x0000555555dae12c in vfio_listener_region_del (listener=0x555557a06fc0, section=0x7fffffffd9f0) at ../hw/vfio/common.c:1249
#10 0x0000555555d3d06d in address_space_update_topology_pass (as=0x5555568bbc80 <address_space_memory>, old_view=0x555556d6cfb0, new_view=0x555556d6c8b0, adding=false) at ../softmmu/memory.c:960
#11 0x0000555555d3d62c in address_space_set_flatview (as=0x5555568bbc80 <address_space_memory>) at ../softmmu/memory.c:1062
#12 0x0000555555d3d800 in memory_region_transaction_commit () at ../softmmu/memory.c:1124
#13 0x0000555555b75a3e in mch_update_pam (mch=0x555556e80a40) at ../hw/pci-host/q35.c:344
#14 0x0000555555b76068 in mch_update (mch=0x555556e80a40) at ../hw/pci-host/q35.c:504
#15 0x0000555555b761d7 in mch_reset (qdev=0x555556e80a40) at ../hw/pci-host/q35.c:561
#16 0x0000555555e93a95 in device_transitional_reset (obj=0x555556e80a40) at ../hw/core/qdev.c:1028
#17 0x0000555555e956f8 in resettable_phase_hold (obj=0x555556e80a40, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:182
#18 0x0000555555e8e07c in bus_reset_child_foreach (obj=0x555556ebce80, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
#19 0x0000555555e953ff in resettable_child_foreach (rc=0x555556a07ab0, obj=0x555556ebce80, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
#20 0x0000555555e9567e in resettable_phase_hold (obj=0x555556ebce80, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
#21 0x0000555555e920e0 in device_reset_child_foreach (obj=0x555556e802d0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:366
#22 0x0000555555e953ff in resettable_child_foreach (rc=0x555556ad2830, obj=0x555556e802d0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
#23 0x0000555555e9567e in resettable_phase_hold (obj=0x555556e802d0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
#24 0x0000555555e8e07c in bus_reset_child_foreach (obj=0x555556beaac0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
#25 0x0000555555e953ff in resettable_child_foreach (rc=0x555556b1ca70, obj=0x555556beaac0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
#26 0x0000555555e9567e in resettable_phase_hold (obj=0x555556beaac0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
#27 0x0000555555e952b4 in resettable_assert_reset (obj=0x555556beaac0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:60
#28 0x0000555555e951f8 in resettable_reset (obj=0x555556beaac0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
#29 0x0000555555e95a37 in resettable_cold_reset_fn (opaque=0x555556beaac0) at ../hw/core/resettable.c:269
#30 0x0000555555e93f40 in qemu_devices_reset () at ../hw/core/reset.c:69
#31 0x0000555555c9eb04 in pc_machine_reset (machine=0x555556a4d9e0) at ../hw/i386/pc.c:1654
#32 0x0000555555d381fb in qemu_system_reset (reason=SHUTDOWN_CAUSE_NONE) at ../softmmu/runstate.c:443
#33 0x0000555555a787f2 in qdev_machine_creation_done () at ../hw/core/machine.c:1330
#34 0x0000555555d4e09c in qemu_machine_creation_done () at ../softmmu/vl.c:2650
#35 0x0000555555d4e16b in qmp_x_exit_preconfig (errp=0x5555568db1a0 <error_fatal>) at ../softmmu/vl.c:2673
#36 0x0000555555d506be in qemu_init (argc=31, argv=0x7fffffffe268, envp=0x7fffffffe368) at ../softmmu/vl.c:3692
#37 0x0000555555945cad in main (argc=31, argv=0x7fffffffe268, envp=0x7fffffffe368) at ../softmmu/main.c:49

This is where the vfio-user client in QEMU tells the vfio-user server (the GPIO device) that this particular memory region is not available for DMA. There are 3 vfio_dma_map() operations before this happens and this seems to be the very first vfio_dma_unmap() operation.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-19 19:05                       ` Thanos Makatos
@ 2021-07-19 19:59                         ` Peter Xu
  2021-07-19 20:58                           ` John Johnson
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-07-19 19:59 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: QEMU Devel Mailing List, Paolo Bonzini, John G Johnson,
	Markus Armbruster, John Levon

On Mon, Jul 19, 2021 at 07:05:29PM +0000, Thanos Makatos wrote:
> > -----Original Message-----
> > From: Qemu-devel <qemu-devel-
> > bounces+thanos.makatos=nutanix.com@nongnu.org> On Behalf Of Thanos
> > Makatos
> > Sent: 19 July 2021 19:02
> > To: Peter Xu <peterx@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>; John Levon
> > <john.levon@nutanix.com>; John G Johnson <john.g.johnson@oracle.com>;
> > Markus Armbruster <armbru@redhat.com>; QEMU Devel Mailing List
> > <qemu-devel@nongnu.org>
> > Subject: Re: Question on memory commit during MR finalize()
> > 
> > Omg I don't know how I missed that, of course I'll ignore SIGUSR1 and retest!
> > 
> > ________________________________________
> > From: Peter Xu <mailto:peterx@redhat.com>
> > Sent: Monday, 19 July 2021, 16:58
> > To: Thanos Makatos
> > Cc: Paolo Bonzini; Markus Armbruster; QEMU Devel Mailing List; John Levon;
> > John G Johnson
> > Subject: Re: Question on memory commit during MR finalize()
> > 
> > 
> > Hi, Thanos,
> > 
> > On Mon, Jul 19, 2021 at 02:38:52PM +0000, Thanos Makatos wrote:
> > > I can trivially trigger an assertion with a build where I merged the recent
> > vfio-user patches (https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__patchew.org_QEMU_cover.1626675354.git.elena.ufimtseva-
> > 40oracle.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJv
> > tw6ogtti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8
> > OnG_5r4XY&s=moFPVchYp27xozQcvvxG4nb4nC2QmMnqQ1Wmt4Z3dNE&e
> > = ) to master and then merging the result into your xzpeter/memory-sanity
> > branch, I've pushed the branch here:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_tmakatos_qemu_tree_memory-
> > 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> > tti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_
> > 5r4XY&s=veyjdkkFkGSYNDZOuksB-kbHmdQaw9RYxyZp8Qo7nW4&e= . I
> > explain the repro steps below in case you want to take a look:
> > >
> > > Build as follows:
> > >
> > > ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-softmmu --
> > enable-kvm  --enable-debug --enable-multiprocess && make -j `nproc` &&
> > make install
> > >
> > > Then build and run the GPIO sample from libvfio-user
> > (https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_nutanix_libvfio-
> > 2Duser&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogt
> > ti46atk736SI4vgsJiUKIyDE&m=LvALaULnrxZWlgXFcaxGAl95UIwq3a6LI8OnG_5
> > r4XY&s=HYP5NmDMGuS13pdyV83x3HzyhGbE-oP1T8NLtu0d1U8&e= ):
> > >
> > > libvfio-user/build/dbg/samples/gpio-pci-idio-16 -v /var/run/vfio-user.sock
> > >
> > > And then run QEMU as follows:
> > >
> > > gdb --args /opt/qemu-xzpeter/bin/qemu-system-x86_64 -cpu host -
> > enable-kvm -smp 4 -m 2G -object memory-backend-
> > file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes -
> > numa node,memdev=mem0 -kernel bionic-server-cloudimg-amd64-vmlinuz-
> > generic -initrd bionic-server-cloudimg-amd64-initrd-generic -append
> > 'console=ttyS0 root=/dev/sda1 single' -hda bionic-server-cloudimg-amd64-
> > 0.raw -nic user,model=virtio-net-pci -machine pc-q35-3.1 -device vfio-user-
> > pci,socket=/var/run/vfio-user.sock -nographic
> > >
> > > I immediately get the following stack trace:
> > >
> > > Thread 5 "qemu-system-x86" received signal SIGUSR1, User defined signal
> > 1.
> > 
> > This is SIGUSR1.  QEMU uses it for general vcpu ipis.
> > 
> > > [Switching to Thread 0x7fffe6e82700 (LWP 151973)]
> > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> > > 103     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or
> > directory.
> > > (gdb) bt
> > > #0  0x00007ffff655d29c in __lll_lock_wait () at
> > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> > > #1  0x00007ffff6558642 in __pthread_mutex_cond_lock
> > (mutex=mutex@entry=0x5555568bb280 <qemu_global_mutex>) at
> > ../nptl/pthread_mutex_lock.c:80
> > > #2  0x00007ffff6559ef8 in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x5555568bb280 <qemu_global_mutex>, cond=0x555556cecc30) at
> > pthread_cond_wait.c:645
> > > #3  0x00007ffff6559ef8 in __pthread_cond_wait (cond=0x555556cecc30,
> > mutex=0x5555568bb280 <qemu_global_mutex>) at
> > pthread_cond_wait.c:655
> > > #4  0x000055555604f717 in qemu_cond_wait_impl (cond=0x555556cecc30,
> > mutex=0x5555568bb280 <qemu_global_mutex>, file=0x5555561ca869
> > "../softmmu/cpus.c", line=514) at ../util/qemu-thread-posix.c:194
> > > #5  0x0000555555d28a4a in qemu_cond_wait_iothread
> > (cond=0x555556cecc30) at ../softmmu/cpus.c:514
> > > #6  0x0000555555d28781 in qemu_wait_io_event (cpu=0x555556ce02c0) at
> > ../softmmu/cpus.c:425
> > > #7  0x0000555555e5da75 in kvm_vcpu_thread_fn (arg=0x555556ce02c0) at
> > ../accel/kvm/kvm-accel-ops.c:54
> > > #8  0x000055555604feed in qemu_thread_start (args=0x555556cecc70) at
> > ../util/qemu-thread-posix.c:541
> > > #9  0x00007ffff6553fa3 in start_thread (arg=<optimized out>) at
> > pthread_create.c:486
> > > #10 0x00007ffff64824cf in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > 
> > Would you please add below to your ~/.gdbinit script?
> > 
> >   handle SIGUSR1 nostop noprint
> > 
> > Or just run without gdb and wait it to crash with SIGABRT.
> > 
> > Thanks,
> > 
> > --
> > Peter Xu
> 
> Sorry for the bad brain day, here's your stack trace:
> 
> qemu-system-x86_64: ../softmmu/cpus.c:72: qemu_mutex_unlock_iothread_prepare: Assertion `!memory_region_has_pending_transaction()' failed.
> 
> Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> 50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x00007ffff63c07bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ffff63ab535 in __GI_abort () at abort.c:79
> #2  0x00007ffff63ab40f in __assert_fail_base
>     (fmt=0x7ffff650dee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555561ca880 "!memory_region_has_pending_transaction()", file=0x5555561ca869 "../softmmu/cpus.c", line=72, function=<optimized out>) at assert.c:92
> #3  0x00007ffff63b9102 in __GI___assert_fail
>     (assertion=0x5555561ca880 "!memory_region_has_pending_transaction()", file=0x5555561ca869 "../softmmu/cpus.c", line=72, function=0x5555561caa60 <__PRETTY_FUNCTION__.37393> "qemu_mutex_unlock_iothread_prepare") at assert.c:101
> #4  0x0000555555d27c20 in qemu_mutex_unlock_iothread_prepare () at ../softmmu/cpus.c:72
> #5  0x0000555555d289f6 in qemu_mutex_unlock_iothread () at ../softmmu/cpus.c:507
> #6  0x0000555555dcb3d6 in vfio_user_send_recv (proxy=0x555557ac5560, msg=0x555557933d50, fds=0x0, rsize=40) at ../hw/vfio/user.c:88
> #7  0x0000555555dcd30a in vfio_user_dma_unmap (proxy=0x555557ac5560, unmap=0x7fffffffd8d0, bitmap=0x0) at ../hw/vfio/user.c:796
> #8  0x0000555555dabf5f in vfio_dma_unmap (container=0x555557a06fb0, iova=786432, size=2146697216, iotlb=0x0) at ../hw/vfio/common.c:501
> #9  0x0000555555dae12c in vfio_listener_region_del (listener=0x555557a06fc0, section=0x7fffffffd9f0) at ../hw/vfio/common.c:1249
> #10 0x0000555555d3d06d in address_space_update_topology_pass (as=0x5555568bbc80 <address_space_memory>, old_view=0x555556d6cfb0, new_view=0x555556d6c8b0, adding=false) at ../softmmu/memory.c:960
> #11 0x0000555555d3d62c in address_space_set_flatview (as=0x5555568bbc80 <address_space_memory>) at ../softmmu/memory.c:1062
> #12 0x0000555555d3d800 in memory_region_transaction_commit () at ../softmmu/memory.c:1124
> #13 0x0000555555b75a3e in mch_update_pam (mch=0x555556e80a40) at ../hw/pci-host/q35.c:344
> #14 0x0000555555b76068 in mch_update (mch=0x555556e80a40) at ../hw/pci-host/q35.c:504
> #15 0x0000555555b761d7 in mch_reset (qdev=0x555556e80a40) at ../hw/pci-host/q35.c:561
> #16 0x0000555555e93a95 in device_transitional_reset (obj=0x555556e80a40) at ../hw/core/qdev.c:1028
> #17 0x0000555555e956f8 in resettable_phase_hold (obj=0x555556e80a40, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:182
> #18 0x0000555555e8e07c in bus_reset_child_foreach (obj=0x555556ebce80, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #19 0x0000555555e953ff in resettable_child_foreach (rc=0x555556a07ab0, obj=0x555556ebce80, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
> #20 0x0000555555e9567e in resettable_phase_hold (obj=0x555556ebce80, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
> #21 0x0000555555e920e0 in device_reset_child_foreach (obj=0x555556e802d0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:366
> #22 0x0000555555e953ff in resettable_child_foreach (rc=0x555556ad2830, obj=0x555556e802d0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
> #23 0x0000555555e9567e in resettable_phase_hold (obj=0x555556e802d0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
> #24 0x0000555555e8e07c in bus_reset_child_foreach (obj=0x555556beaac0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #25 0x0000555555e953ff in resettable_child_foreach (rc=0x555556b1ca70, obj=0x555556beaac0, cb=0x555555e955ca <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
> #26 0x0000555555e9567e in resettable_phase_hold (obj=0x555556beaac0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173
> #27 0x0000555555e952b4 in resettable_assert_reset (obj=0x555556beaac0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:60
> #28 0x0000555555e951f8 in resettable_reset (obj=0x555556beaac0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
> #29 0x0000555555e95a37 in resettable_cold_reset_fn (opaque=0x555556beaac0) at ../hw/core/resettable.c:269
> #30 0x0000555555e93f40 in qemu_devices_reset () at ../hw/core/reset.c:69
> #31 0x0000555555c9eb04 in pc_machine_reset (machine=0x555556a4d9e0) at ../hw/i386/pc.c:1654
> #32 0x0000555555d381fb in qemu_system_reset (reason=SHUTDOWN_CAUSE_NONE) at ../softmmu/runstate.c:443
> #33 0x0000555555a787f2 in qdev_machine_creation_done () at ../hw/core/machine.c:1330
> #34 0x0000555555d4e09c in qemu_machine_creation_done () at ../softmmu/vl.c:2650
> #35 0x0000555555d4e16b in qmp_x_exit_preconfig (errp=0x5555568db1a0 <error_fatal>) at ../softmmu/vl.c:2673
> #36 0x0000555555d506be in qemu_init (argc=31, argv=0x7fffffffe268, envp=0x7fffffffe368) at ../softmmu/vl.c:3692
> #37 0x0000555555945cad in main (argc=31, argv=0x7fffffffe268, envp=0x7fffffffe368) at ../softmmu/main.c:49
> 
> This is where the vfio-user client in QEMU tells the vfio-user server (the GPIO device) that this particular memory region is not available for DMA. There are 3 vfio_dma_map() operations before this happens and this seems to be the very first vfio_dma_unmap() operation.
> 

Yes my branch is just for catching things like the stack above.

Here vfio_user_send_recv() looks tricky to me - it releases the bql within a
memory update procedure, and IMHO it needs some serious justification on why it
can do so. For example, what if memory layout changed when waiting for the
reply?  As it can happen in parallel if without bql, afaict.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-19 19:59                         ` Peter Xu
@ 2021-07-19 20:58                           ` John Johnson
  2021-07-20  1:22                             ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: John Johnson @ 2021-07-19 20:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Devel Mailing List, Thanos Makatos, John Levon,
	Markus Armbruster, Paolo Bonzini



> On Jul 19, 2021, at 12:59 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> 
> Here vfio_user_send_recv() looks tricky to me - it releases the bql within a
> memory update procedure, and IMHO it needs some serious justification on why it
> can do so. For example, what if memory layout changed when waiting for the
> reply?  As it can happen in parallel if without bql, afaict.
> 


	The reason bql is dropped is usually the thread will sleep waiting
for a reply from the server, and I didn't think it was a good idea to block
all threads in the meantime.  Most vfio-user requests result from a guest
action, so just blocking the single CPU thread for the reply is good.

	The vfio-user code doesn’t depend on the memory layout being stable,
it’s just sending the layout updates to the server.  Would it be better to
send memory updates asynchronously, and wait for all the replies in the commit
callback?

								JJ



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question on memory commit during MR finalize()
  2021-07-19 20:58                           ` John Johnson
@ 2021-07-20  1:22                             ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2021-07-20  1:22 UTC (permalink / raw)
  To: John Johnson
  Cc: Thanos Makatos, Paolo Bonzini, QEMU Devel Mailing List,
	Markus Armbruster, John Levon

On Mon, Jul 19, 2021 at 08:58:44PM +0000, John Johnson wrote:
> 
> 
> > On Jul 19, 2021, at 12:59 PM, Peter Xu <peterx@redhat.com> wrote:
> > 
> > 
> > Here vfio_user_send_recv() looks tricky to me - it releases the bql within a
> > memory update procedure, and IMHO it needs some serious justification on why it
> > can do so. For example, what if memory layout changed when waiting for the
> > reply?  As it can happen in parallel if without bql, afaict.
> > 
> 
> 
> 	The reason bql is dropped is usually the thread will sleep waiting
> for a reply from the server, and I didn't think it was a good idea to block
> all threads in the meantime.  Most vfio-user requests result from a guest
> action, so just blocking the single CPU thread for the reply is good.

Sleeping with bql is actually ok imho, but indeed if it can take very long then
we should make it async.

> 
> 	The vfio-user code doesn’t depend on the memory layout being stable,
> it’s just sending the layout updates to the server.  Would it be better to
> send memory updates asynchronously, and wait for all the replies in the commit
> callback?

Yes, I believe a lot of similar things are done within qemu, e.g. there can be
a bottom half scheduled so all things will be serialized using bql.

Btw, it's not only about whether vfio-user would survive with memory layout
change, I think the problem is vfio-user now released the bql without qemu core
qemu memory noticing it, while core memory relies on bql to serialize.  That's
why we can get very strange rcu thread crash - it's potentially an outcome of
the race.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-20  1:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-20 21:00 Question on memory commit during MR finalize() Peter Xu
2020-04-20 21:44 ` Paolo Bonzini
2020-04-20 23:31   ` Peter Xu
2020-04-21  9:43     ` Paolo Bonzini
2020-04-21 10:43       ` Peter Xu
2021-07-15 14:27         ` Thanos Makatos
2021-07-15 18:35           ` Peter Xu
2021-07-16 11:42             ` Thanos Makatos
2021-07-16 14:18               ` Peter Xu
2021-07-19 14:38                 ` Thanos Makatos
2021-07-19 15:56                   ` Peter Xu
2021-07-19 18:02                     ` Thanos Makatos
2021-07-19 19:05                       ` Thanos Makatos
2021-07-19 19:59                         ` Peter Xu
2021-07-19 20:58                           ` John Johnson
2021-07-20  1:22                             ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.