From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:34770) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghWVy-0001DS-Rd for qemu-devel@nongnu.org; Thu, 10 Jan 2019 04:20:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ghWVx-0000J3-9C for qemu-devel@nongnu.org; Thu, 10 Jan 2019 04:20:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43598) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ghWVx-0000Ij-1B for qemu-devel@nongnu.org; Thu, 10 Jan 2019 04:20:37 -0500 From: Markus Armbruster References: <20181225140449.15786-1-fli@suse.com> <20181225140449.15786-17-fli@suse.com> <874lakl5d2.fsf@dusky.pond.sub.org> <5007FDD8-8D3A-4E3F-B5A6-9B7B3189238F@126.com> <87pnt7awhg.fsf@dusky.pond.sub.org> <87o98pdfbf.fsf@dusky.pond.sub.org> <7A219BBE-8D9D-4B40-892E-AA48D22F5967@126.com> Date: Thu, 10 Jan 2019 10:20:33 +0100 In-Reply-To: <7A219BBE-8D9D-4B40-892E-AA48D22F5967@126.com> (fei's message of "Wed, 9 Jan 2019 23:57:10 +0800") Message-ID: <87tvig7tsu.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH for-4.0 v9 16/16] qemu_thread_join: fix segmentation fault List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: fei Cc: Markus Armbruster , Stefan Weil , qemu-devel@nongnu.org, shirley17fei@gmail.com fei writes: >> =E5=9C=A8 2019=E5=B9=B41=E6=9C=889=E6=97=A5=EF=BC=8C23:24=EF=BC=8CMarkus= Armbruster =E5=86=99=E9=81=93=EF=BC=9A >>=20 >> Fei Li writes: >>=20 >>>> =E5=9C=A8 2019/1/9 =E4=B8=8A=E5=8D=881:29, Markus Armbruster =E5=86=99= =E9=81=93: >>>> fei writes: >>>>=20 >>>>>> =E5=9C=A8 2019=E5=B9=B41=E6=9C=888=E6=97=A5=EF=BC=8C01:55=EF=BC=8CMa= rkus Armbruster =E5=86=99=E9=81=93=EF=BC=9A >>>>>>=20 >>>>>> Fei Li writes: >>>>>>=20 >>>>>>> To avoid the segmentation fault in qemu_thread_join(), just directly >>>>>>> return when the QemuThread *thread failed to be created in either >>>>>>> qemu-thread-posix.c or qemu-thread-win32.c. >>>>>>>=20 >>>>>>> Cc: Stefan Weil >>>>>>> Signed-off-by: Fei Li >>>>>>> Reviewed-by: Fam Zheng >>>>>>> --- >>>>>>> util/qemu-thread-posix.c | 3 +++ >>>>>>> util/qemu-thread-win32.c | 2 +- >>>>>>> 2 files changed, 4 insertions(+), 1 deletion(-) >>>>>>>=20 >>>>>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c >>>>>>> index 39834b0551..3548935dac 100644 >>>>>>> --- a/util/qemu-thread-posix.c >>>>>>> +++ b/util/qemu-thread-posix.c >>>>>>> @@ -571,6 +571,9 @@ void *qemu_thread_join(QemuThread *thread) >>>>>>> int err; >>>>>>> void *ret; >>>>>>>=20 >>>>>>> + if (!thread->thread) { >>>>>>> + return NULL; >>>>>>> + } >>>>>> How can this happen? >>>>> I think I have answered this earlier, please check the following link= to see whether it helps: >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg06554.html >>>> Thanks for the pointer. Unfortunately, I don't understand your >>>> explanation. You also wrote there "I will remove this patch in next >>>> version"; looks like you've since changed your mind. >>> Emm, issues left over from history.. The background is I was hurry to >>> make those five >>> Reviewed-by patches be merged, including this v9 16/16 patch but not >>> the real >>> qemu_thread_create() modification. But actually this patch is to fix >>> the segmentation >>> fault after we modified qemu_thread_create() related functions >>> although it has got a >>> Reviewed-by earlier. :) Thus to not make troube, I wrote the >>> "remove..." sentence >>> to separate it from those 5 Reviewed-by patches, and were plan to send >>> only four patches. >>> But later I got a message that these five patches are not that urgent >>> to catch qemu v3.1, >>> thus I joined the earlier 5 R-b patches into the later v8 & v9 to have >>> a better review. >>>=20 >>> Sorry for the trouble, I need to explain it without involving too much >>> background.. >>>=20 >>> Back at the farm: in our current qemu code, some cleanups use a loop >>> to join() >>> the total number of threads if caller fails. This is not a problem >>> until applying the >>> qemu_thread_create() modification. E.g. when compress_threads_save_setu= p() >>> fails while trying to create the last do_data_compress thread, >>> segmentation fault >>> will occur when join() is called (sadly there's not enough condition >>> to filter this >>> unsuccessful created thread) as this thread is actually not be created. >>>=20 >>> Hope the above makes it clear. :) >>=20 >> Alright, let's have a look at compress_threads_save_setup(): >>=20 >> static int compress_threads_save_setup(void) >> { >> int i, thread_count; >>=20 >> if (!migrate_use_compression()) { >> return 0; >> } >> thread_count =3D migrate_compress_threads(); >> compress_threads =3D g_new0(QemuThread, thread_count); >> comp_param =3D g_new0(CompressParam, thread_count); >> qemu_cond_init(&comp_done_cond); >> qemu_mutex_init(&comp_done_lock); >> for (i =3D 0; i < thread_count; i++) { >> comp_param[i].originbuf =3D g_try_malloc(TARGET_PAGE_SIZE); >> if (!comp_param[i].originbuf) { >> goto exit; >> } >>=20 >> if (deflateInit(&comp_param[i].stream, >> migrate_compress_level()) !=3D Z_OK) { >> g_free(comp_param[i].originbuf); >> goto exit; >> } >>=20 >> /* comp_param[i].file is just used as a dummy buffer to save = data, >> * set its ops to empty. >> */ >> comp_param[i].file =3D qemu_fopen_ops(NULL, &empty_ops); >> comp_param[i].done =3D true; >> comp_param[i].quit =3D false; >> qemu_mutex_init(&comp_param[i].mutex); >> qemu_cond_init(&comp_param[i].cond); >> qemu_thread_create(compress_threads + i, "compress", >> do_data_compress, comp_param + i, >> QEMU_THREAD_JOINABLE); >> } >> return 0; >>=20 >> exit: >> compress_threads_save_cleanup(); >> return -1; >> } >>=20 >> At label exit, we have @i threads, all fully initialized. That's an >> invariant. >>=20 >> compress_threads_save_cleanup() finds the threads to clean up by >> checking comp_param[i].file: >>=20 >> static void compress_threads_save_cleanup(void) >> { >> int i, thread_count; >>=20 >> if (!migrate_use_compression() || !comp_param) { >> return; >> } >>=20 >> thread_count =3D migrate_compress_threads(); >> for (i =3D 0; i < thread_count; i++) { >> /* >> * we use it as a indicator which shows if the thread is >> * properly init'd or not >> */ >> ---> if (!comp_param[i].file) { >> ---> break; >> ---> } >>=20 >> qemu_mutex_lock(&comp_param[i].mutex); >> comp_param[i].quit =3D true; >> qemu_cond_signal(&comp_param[i].cond); >> qemu_mutex_unlock(&comp_param[i].mutex); >>=20 >> qemu_thread_join(compress_threads + i); >> qemu_mutex_destroy(&comp_param[i].mutex); >> qemu_cond_destroy(&comp_param[i].cond); >> deflateEnd(&comp_param[i].stream); >> g_free(comp_param[i].originbuf); >> qemu_fclose(comp_param[i].file); >> comp_param[i].file =3D NULL; >> } >> qemu_mutex_destroy(&comp_done_lock); >> qemu_cond_destroy(&comp_done_cond); >> g_free(compress_threads); >> g_free(comp_param); >> compress_threads =3D NULL; >> comp_param =3D NULL; >> } >>=20 >> Due to the invariant, a comp_param[i] with a null .file doesn't need >> *any* cleanup. >>=20 >> To maintain the invariant, compress_threads_save_setup() carefully >> cleans up any partial initializations itself before a goto exit. Since >> the code is arranged smartly, the only such cleanup is the >> g_free(comp_param[i].originbuf) before the second goto exit. >>=20 >> Your PATCH 13 adds a third goto exit, but neglects to clean up partial >> initializations. Breaks the invariant. >>=20 >> I see two sane solutions: >>=20 >> 1. compress_threads_save_setup() carefully cleans up partial >> initializations itself. compress_threads_save_cleanup() copes only >> with fully initialized comp_param[i]. This is how things work before >> your series. >>=20 >> 2. compress_threads_save_cleanup() copes with partially initialized >> comp_param[i], i.e. does the right thing for each goto exit in >> compress_threads_save_setup(). compress_threads_save_setup() doesn't >> clean up partial initializations. >>=20 >> Your PATCH 13 together with the fixup PATCH 16 does >>=20 >> 3. A confusing mix of the two. >>=20 >> Don't. > Thanks for the detail analysis! :) > Emm.. Actually I have thought to do the cleanup in the setup() function f= or the third =E2=80=98goto exit=E2=80=99 [1], which is a partial initializ= ation. > But due to the below [1] is too long and seems not neat (I notice that mo= st cleanups for each thread are in the xxx_cleanup() function), I turned to= modify the join() function..=20 > Is the long [1] acceptable when the third =E2=80=98goto exit=E2=80=99 is = called, or is there any other better way to do the cleanup?=20 > > [1] > qemu_mutex_lock(&comp_param[i].mutex); > comp_param[i].quit =3D true; > qemu_cond_signal(&comp_param[i].cond); > qemu_mutex_unlock(&comp_param[i].mutex); > > qemu_mutex_destroy(&comp_param[i].mutex); > qemu_cond_destroy(&comp_param[i].cond); > deflateEnd(&comp_param[i].stream); > g_free(comp_param[i].originbuf); > qemu_fclose(comp_param[i].file); > comp_param[i].file =3D NULL; Have you considered creating the thread earlier, e.g. right after initializing compression with deflateInit()?