On Thu, 17 Jun 2021 10:47:12 +0800 Lei Rao wrote: > From: "Rao, Lei" > > When a PVM completed its SVM failover steps and begins to run in > the simplex mode, QEMU would encounter a 'Segmentation fault' if > the guest poweroff with the following calltrace: > > Program received signal SIGSEGV, Segmentation fault. > > This is because primary_vm_do_failover() would call "qemu_file_shutdown > (s->rp_state.from_dst_file);" and later the migration_shutdown() would > do it again. So, we should set the s->rp_state.from_dst_file to NULL. Hello, Please provide a backtrace to such bugfixes. It helps the reviewers to better understand the bug and the fix and it helps yourself to check if it is actually correct. I suggest you to enable coredumps in your test (or even production) system, so for every crash you definitely have a backtrace. > Signed-off-by: Like Xu > Signed-off-by: Lei Rao > --- > migration/colo.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/migration/colo.c b/migration/colo.c > index 616dc00..c25e488 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -156,14 +156,15 @@ static void primary_vm_do_failover(void) > > /* > * Wake up COLO thread which may blocked in recv() or send(), > - * The s->rp_state.from_dst_file and s->to_dst_file may use the > - * same fd, but we still shutdown the fd for twice, it is harmless. > + * The s->to_dst_file may use the same fd, but we still shutdown > + * the fd for twice, it is harmless. > */ This change to the comment is incorrect. Shutting down a file multiple times is safe and not a bug in it self. If it leads to a crash anyway, that points to a bug in the qemu_file_shutdown() function or similar. > if (s->to_dst_file) { > qemu_file_shutdown(s->to_dst_file); > } > if (s->rp_state.from_dst_file) { > qemu_file_shutdown(s->rp_state.from_dst_file); > + s->rp_state.from_dst_file = NULL; > } This is incorrect, as the file won't be closed after this and is leaked. And indeed, when applying the patch to master, qemu crashes when triggering failover on the primary, due to the leak: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. > old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, --