From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:36929) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghcr1-0005Lg-2O for qemu-devel@nongnu.org; Thu, 10 Jan 2019 11:06:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ghcqu-0003Le-9r for qemu-devel@nongnu.org; Thu, 10 Jan 2019 11:06:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51900) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ghcqt-0003AC-Qj for qemu-devel@nongnu.org; Thu, 10 Jan 2019 11:06:40 -0500 From: Markus Armbruster References: <20181225140449.15786-1-fli@suse.com> <20181225140449.15786-17-fli@suse.com> <874lakl5d2.fsf@dusky.pond.sub.org> <5007FDD8-8D3A-4E3F-B5A6-9B7B3189238F@126.com> <87pnt7awhg.fsf@dusky.pond.sub.org> <87o98pdfbf.fsf@dusky.pond.sub.org> <7A219BBE-8D9D-4B40-892E-AA48D22F5967@126.com> <87tvig7tsu.fsf@dusky.pond.sub.org> <016ae7bc-6271-59a7-36a1-c2b8a210020f@126.com> Date: Thu, 10 Jan 2019 17:06:16 +0100 In-Reply-To: <016ae7bc-6271-59a7-36a1-c2b8a210020f@126.com> (Fei Li's message of "Thu, 10 Jan 2019 21:24:54 +0800") Message-ID: <87tvigfqfb.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH for-4.0 v9 16/16] qemu_thread_join: fix segmentation fault List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fei Li Cc: Stefan Weil , qemu-devel@nongnu.org, shirley17fei@gmail.com Fei Li writes: > =E5=9C=A8 2019/1/10 =E4=B8=8B=E5=8D=885:20, Markus Armbruster =E5=86=99= =E9=81=93: >> fei writes: >> >>>> =E5=9C=A8 2019=E5=B9=B41=E6=9C=889=E6=97=A5=EF=BC=8C23:24=EF=BC=8CMark= us Armbruster =E5=86=99=E9=81=93=EF=BC=9A >>>> >>>> Fei Li writes: >>>> >>>>>> =E5=9C=A8 2019/1/9 =E4=B8=8A=E5=8D=881:29, Markus Armbruster =E5=86= =99=E9=81=93: >>>>>> fei writes: >>>>>> >>>>>>>> =E5=9C=A8 2019=E5=B9=B41=E6=9C=888=E6=97=A5=EF=BC=8C01:55=EF=BC=8C= Markus Armbruster =E5=86=99=E9=81=93=EF=BC=9A >>>>>>>> >>>>>>>> Fei Li writes: >>>>>>>> >>>>>>>>> To avoid the segmentation fault in qemu_thread_join(), just direc= tly >>>>>>>>> return when the QemuThread *thread failed to be created in either >>>>>>>>> qemu-thread-posix.c or qemu-thread-win32.c. >>>>>>>>> >>>>>>>>> Cc: Stefan Weil >>>>>>>>> Signed-off-by: Fei Li >>>>>>>>> Reviewed-by: Fam Zheng >>>>>>>>> --- >>>>>>>>> util/qemu-thread-posix.c | 3 +++ >>>>>>>>> util/qemu-thread-win32.c | 2 +- >>>>>>>>> 2 files changed, 4 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c >>>>>>>>> index 39834b0551..3548935dac 100644 >>>>>>>>> --- a/util/qemu-thread-posix.c >>>>>>>>> +++ b/util/qemu-thread-posix.c >>>>>>>>> @@ -571,6 +571,9 @@ void *qemu_thread_join(QemuThread *thread) >>>>>>>>> int err; >>>>>>>>> void *ret; >>>>>>>>> >>>>>>>>> + if (!thread->thread) { >>>>>>>>> + return NULL; >>>>>>>>> + } >>>>>>>> How can this happen? >>>>>>> I think I have answered this earlier, please check the following li= nk to see whether it helps: >>>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg06554.ht= ml >>>>>> Thanks for the pointer. Unfortunately, I don't understand your >>>>>> explanation. You also wrote there "I will remove this patch in next >>>>>> version"; looks like you've since changed your mind. >>>>> Emm, issues left over from history.. The background is I was hurry to >>>>> make those five >>>>> Reviewed-by patches be merged, including this v9 16/16 patch but not >>>>> the real >>>>> qemu_thread_create() modification. But actually this patch is to fix >>>>> the segmentation >>>>> fault after we modified qemu_thread_create() related functions >>>>> although it has got a >>>>> Reviewed-by earlier. :) Thus to not make troube, I wrote the >>>>> "remove..." sentence >>>>> to separate it from those 5 Reviewed-by patches, and were plan to send >>>>> only four patches. >>>>> But later I got a message that these five patches are not that urgent >>>>> to catch qemu v3.1, >>>>> thus I joined the earlier 5 R-b patches into the later v8 & v9 to have >>>>> a better review. >>>>> >>>>> Sorry for the trouble, I need to explain it without involving too much >>>>> background.. >>>>> >>>>> Back at the farm: in our current qemu code, some cleanups use a loop >>>>> to join() >>>>> the total number of threads if caller fails. This is not a problem >>>>> until applying the >>>>> qemu_thread_create() modification. E.g. when compress_threads_save_se= tup() >>>>> fails while trying to create the last do_data_compress thread, >>>>> segmentation fault >>>>> will occur when join() is called (sadly there's not enough condition >>>>> to filter this >>>>> unsuccessful created thread) as this thread is actually not be create= d. >>>>> >>>>> Hope the above makes it clear. :) >>>> Alright, let's have a look at compress_threads_save_setup(): >>>> >>>> static int compress_threads_save_setup(void) >>>> { >>>> int i, thread_count; >>>> >>>> if (!migrate_use_compression()) { >>>> return 0; >>>> } >>>> thread_count =3D migrate_compress_threads(); >>>> compress_threads =3D g_new0(QemuThread, thread_count); >>>> comp_param =3D g_new0(CompressParam, thread_count); >>>> qemu_cond_init(&comp_done_cond); >>>> qemu_mutex_init(&comp_done_lock); >>>> for (i =3D 0; i < thread_count; i++) { >>>> comp_param[i].originbuf =3D g_try_malloc(TARGET_PAGE_SIZE); >>>> if (!comp_param[i].originbuf) { >>>> goto exit; >>>> } >>>> >>>> if (deflateInit(&comp_param[i].stream, >>>> migrate_compress_level()) !=3D Z_OK) { >>>> g_free(comp_param[i].originbuf); >>>> goto exit; >>>> } >>>> >>>> /* comp_param[i].file is just used as a dummy buffer to sa= ve data, >>>> * set its ops to empty. >>>> */ >>>> comp_param[i].file =3D qemu_fopen_ops(NULL, &empty_ops); >>>> comp_param[i].done =3D true; >>>> comp_param[i].quit =3D false; >>>> qemu_mutex_init(&comp_param[i].mutex); >>>> qemu_cond_init(&comp_param[i].cond); >>>> qemu_thread_create(compress_threads + i, "compress", >>>> do_data_compress, comp_param + i, >>>> QEMU_THREAD_JOINABLE); >>>> } >>>> return 0; >>>> >>>> exit: >>>> compress_threads_save_cleanup(); >>>> return -1; >>>> } >>>> >>>> At label exit, we have @i threads, all fully initialized. That's an >>>> invariant. >>>> >>>> compress_threads_save_cleanup() finds the threads to clean up by >>>> checking comp_param[i].file: >>>> >>>> static void compress_threads_save_cleanup(void) >>>> { >>>> int i, thread_count; >>>> >>>> if (!migrate_use_compression() || !comp_param) { >>>> return; >>>> } >>>> >>>> thread_count =3D migrate_compress_threads(); >>>> for (i =3D 0; i < thread_count; i++) { >>>> /* >>>> * we use it as a indicator which shows if the thread is >>>> * properly init'd or not >>>> */ >>>> ---> if (!comp_param[i].file) { >>>> ---> break; >>>> ---> } >>>> >>>> qemu_mutex_lock(&comp_param[i].mutex); >>>> comp_param[i].quit =3D true; >>>> qemu_cond_signal(&comp_param[i].cond); >>>> qemu_mutex_unlock(&comp_param[i].mutex); >>>> >>>> qemu_thread_join(compress_threads + i); >>>> qemu_mutex_destroy(&comp_param[i].mutex); >>>> qemu_cond_destroy(&comp_param[i].cond); >>>> deflateEnd(&comp_param[i].stream); >>>> g_free(comp_param[i].originbuf); >>>> qemu_fclose(comp_param[i].file); >>>> comp_param[i].file =3D NULL; >>>> } >>>> qemu_mutex_destroy(&comp_done_lock); >>>> qemu_cond_destroy(&comp_done_cond); >>>> g_free(compress_threads); >>>> g_free(comp_param); >>>> compress_threads =3D NULL; >>>> comp_param =3D NULL; >>>> } >>>> >>>> Due to the invariant, a comp_param[i] with a null .file doesn't need >>>> *any* cleanup. >>>> >>>> To maintain the invariant, compress_threads_save_setup() carefully >>>> cleans up any partial initializations itself before a goto exit. Since >>>> the code is arranged smartly, the only such cleanup is the >>>> g_free(comp_param[i].originbuf) before the second goto exit. >>>> >>>> Your PATCH 13 adds a third goto exit, but neglects to clean up partial >>>> initializations. Breaks the invariant. >>>> >>>> I see two sane solutions: >>>> >>>> 1. compress_threads_save_setup() carefully cleans up partial >>>> initializations itself. compress_threads_save_cleanup() copes only >>>> with fully initialized comp_param[i]. This is how things work befo= re >>>> your series. >>>> >>>> 2. compress_threads_save_cleanup() copes with partially initialized >>>> comp_param[i], i.e. does the right thing for each goto exit in >>>> compress_threads_save_setup(). compress_threads_save_setup() doesn= 't >>>> clean up partial initializations. >>>> >>>> Your PATCH 13 together with the fixup PATCH 16 does >>>> >>>> 3. A confusing mix of the two. >>>> >>>> Don't. >>> Thanks for the detail analysis! :) >>> Emm.. Actually I have thought to do the cleanup in the setup() function= for the third =E2=80=98goto exit=E2=80=99 [1], which is a partial initial= ization. >>> But due to the below [1] is too long and seems not neat (I notice that = most cleanups for each thread are in the xxx_cleanup() function), I turned = to modify the join() function.. >>> Is the long [1] acceptable when the third =E2=80=98goto exit=E2=80=99 i= s called, or is there any other better way to do the cleanup? >>> >>> [1] >>> qemu_mutex_lock(&comp_param[i].mutex); >>> comp_param[i].quit =3D true; >>> qemu_cond_signal(&comp_param[i].cond); >>> qemu_mutex_unlock(&comp_param[i].mutex); >>> >>> qemu_mutex_destroy(&comp_param[i].mutex); >>> qemu_cond_destroy(&comp_param[i].cond); >>> deflateEnd(&comp_param[i].stream); >>> g_free(comp_param[i].originbuf); >>> qemu_fclose(comp_param[i].file); >>> comp_param[i].file =3D NULL; >> Have you considered creating the thread earlier, e.g. right after >> initializing compression with deflateInit()? > I am afraid we can not do this, as the members of comp_param[i], like > file/done/quit/mutex/cond > will be used later in the new created thread: do_data_[de]compress via > qemu_thread_create(). You're right. > Thus it seems we have to accept the above long [1] if we do want to > clean up partial initialization > in xxx_setup(). :( > > BTW, there is no other argument can be used except the > "(compress_threads+i)->thread" to > differentiate whether should we join() the thread, just in case we > want to change the > xxx_cleanup() function. We can try to make compress_threads_save_cleanup() cope with partially initialized comp_param[i]. Let's have a look at its members: bool done; // no cleanup bool quit; // see [2] bool zero_page; // no cleanup QEMUFile *file; // qemu_fclose() if non-null QemuMutex mutex; // see [1] QemuCond cond; // see [1] RAMBlock *block; // no cleanup (must be null) ram_addr_t offset; // no cleanup /* internally used fields */ z_stream stream; // see [3] uint8_t *originbuf; // unconditional g_free() [1]: we could do something like if (comp_param[i].mutex.initialized) { qemu_mutex_destroy(&comp_param[i].mutex); } if (comp_param[i].cond.initialized) { qemu_cond_destroy(&comp_param[i].cond); } but that would be unclean. Instead, I'd initialize these guys first, so we can clean them up unconditionally. [2] This is used to make the thread terminate. Must be done before we call qemu_thread_join(). I think it can safely be done always, as long as long as .mutex and .cond are initialized. Trivial if we initialize them first. [3]: I can't see a squeaky clean way to detect whether .stream has been initialized with deflateInit(). Here's a slightly unclean way: deflateInit() sets .stream.msg to null on success, and to non-null on failure. We can make it non-null until we're ready to call deflateInit(), then have compress_threads_save_cleanup() clean up .stream when it's null. If that's too unclean for your or your reviewers' taste, add a boolean @stream_initialized flag.