QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: kwolf@redhat.com,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, wencongyang2@huawei.com,
	xiechanglong.d@gmail.com, qemu-devel@nongnu.org,
	armbru@redhat.com, den@openvz.org, jsnow@redhat.com
Subject: Re: [PATCH v4 00/23] backup performance: block_status + async
Date: Wed, 20 Jan 2021 17:40:53 +0100
Message-ID: <420c2186-b37d-9a4f-5c23-c2f1021abd86@redhat.com> (raw)
In-Reply-To: <20210120160442.GK3015589@redhat.com>

On 20.01.21 17:04, Daniel P. Berrangé wrote:
> On Wed, Jan 20, 2021 at 04:53:26PM +0100, Max Reitz wrote:
>> On 20.01.21 15:44, Max Reitz wrote:
>>> On 20.01.21 15:34, Max Reitz wrote:
>>
>> [...]
>>
>>>>   From a glance, it looks to me like two coroutines are created
>>>> simultaneously in two threads, and so one thread sets up a special
>>>> SIGUSR2 action, then another reverts SIGUSR2 to the default, and
>>>> then the first one kills itself with SIGUSR2.
>>>>
>>>> Not sure what this has to do with backup, though it is interesting
>>>> that backup_loop() runs in two threads.  So perhaps some AioContext
>>>> problem.
>>>
>>> Oh, 256 runs two backups concurrently.  So it isn’t that interesting,
>>> but perhaps part of the problem still.  (I have no idea, still looking.)
>>
>> So this is what I found out:
>>
>> coroutine-sigaltstack, when creating a new coroutine, sets up a signal
>> handler for SIGUSR2, then kills itself with SIGUSR2, then uses the signal
>> handler context (with a sigaltstack) for the new coroutine, and then (the
>> signal handler returns after a sigsetjmp()) the old SIGUSR2 behavior is
>> restored.
>>
>> What I fail to understand is how this is thread-safe.  Setting up signal
>> handlers is a process-wide action.  When one thread changes what SIGUSR2
>> does, this will affect all threads immediately, so when two threads run
>> coroutine-sigaltstack’s qemu_coroutine_new() concurrently, and one thread
>> reverts to the default action before the other has SIGUSR2’ed itself, that
>> later SIGUSR2 will kill the whole process.
>>
>> (I suppose it gets even more interesting when one thread has set up the
>> sigaltstack, then the other sets up its own sigaltstack, and then both kill
>> themselves with SIGUSR2, so both coroutines get the same stack...)
>>
>> I have no idea why this has never been hit before, but it makes sense why
>> block-copy backup makes it apparent: It creates 64+x coroutines in a very
>> short time span, and 256 makes it do so in two threads concurrently (thanks
>> to launching two backups in two AioContexts in a transaction).
>>
>> So...  Looks to me like a bug in coroutine-sigaltstack.  Not sure what to do
>> now, though.  I don’t think we can use block-copy for backup before that
>> backend is fixed.  (And that is assuming that it’s indeed
>> coroutine-sigaltstack’s fault.)
>>
>> I’ll try to add some locking, see what it does, and send a mail concerning
>> coroutine-sigaltstack to qemu-devel.
> 
> I'm wondering if we should simply remove the sigaltstack impl and use
> ucontext on MacOS too.
> 
> MacOS has ucontext marked as deprecated by default, seemingly because
> this functionality was deprecated by POSIX. The functionality is still
> available without deprecation warnings if you set _XOPEN_SOURCE.

 From my outside point of view (on coroutine backends), everything you 
wrote below sounds like a very reasonable thing to do.  So perhaps we 
should (I’m not the right person to decide that, though).

However, for me, the immediate question is what I can do now.  I naively 
believe that dropping the sigaltstack implementation would require a 
deprecation cycle.  (If it doesn’t and we can get it out in, say, a 
week, great.)

I need something that helps right now, because I have Vladimir’s series 
in my block branch, the failure doesn’t seem to be its fault, but I 
can’t send a pull request as long as concurrent backups in two iothreads 
make qemu effectively crash when using a specific coroutine backend. 
(And I don’t see configure giving me a warning that choosing sigaltstack 
might be bad idea.)

I suppose I hope that the prospect of wanting to drop sigaltstack 
altogether may lessen the resistance to just wrapping most of its 
qemu_coroutine_new() implementation in a lock until then...  (i.e., what 
the RFC does that I’ve attached to
https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05164.html)

Max

> IOW, it is trivial to make the ucontext impl work on MacOS simply by
> adding
> 
>   #define _XOPEN_SOURCE 600
> 
> before including ucontext.h in coroutine-ucontext.c, and removing the
> restrictions in configure
> 
> 
> 
> diff --git a/configure b/configure
> index 881af4b6be..a58bdf70f3 100755
> --- a/configure
> +++ b/configure
> @@ -4822,8 +4822,9 @@ fi
>   # specific one.
>   
>   ucontext_works=no
> -if test "$darwin" != "yes"; then
> +
>     cat > $TMPC << EOF
> +#define _XOPEN_SOURCE 600
>   #include <ucontext.h>
>   #ifdef __stub_makecontext
>   #error Ignoring glibc stub makecontext which will always fail
> @@ -4833,7 +4834,6 @@ EOF
>     if compile_prog "" "" ; then
>       ucontext_works=yes
>     fi
> -fi
>   
>   if test "$coroutine" = ""; then
>     if test "$mingw32" = "yes"; then
> diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
> index 904b375192..9c0a2cf85c 100644
> --- a/util/coroutine-ucontext.c
> +++ b/util/coroutine-ucontext.c
> @@ -22,6 +22,7 @@
>   #ifdef _FORTIFY_SOURCE
>   #undef _FORTIFY_SOURCE
>   #endif
> +#define _XOPEN_SOURCE 600
>   #include "qemu/osdep.h"
>   #include <ucontext.h>
>   #include "qemu/coroutine_int.h"
> 
> 
> 
> Further more for iOS there was a proposal to add support for using
> libucontext, which provides a clean impl of ucontext APIs for x86
> and aarch64 hosts.
> 
> Regards,
> Daniel
> 



  reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-16 21:46 Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 01/23] qapi: backup: add perf.use-copy-range parameter Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 02/23] block/block-copy: More explicit call_state Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 03/23] block/block-copy: implement block_copy_async Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 04/23] block/block-copy: add max_chunk and max_workers parameters Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 05/23] block/block-copy: add list of all call-states Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 06/23] block/block-copy: add ratelimit to block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 07/23] block/block-copy: add block_copy_cancel Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 08/23] blockjob: add set_speed to BlockJobDriver Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 09/23] job: call job_enter from job_pause Vladimir Sementsov-Ogievskiy
2021-01-18 13:45   ` Max Reitz
2021-01-18 14:20     ` Max Reitz
2021-04-07 11:19   ` Max Reitz
2021-04-07 11:38     ` Vladimir Sementsov-Ogievskiy
2021-04-21  8:31       ` Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 10/23] qapi: backup: add max-chunk and max-workers to x-perf struct Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 11/23] iotests: 56: prepare for backup over block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 12/23] iotests: 185: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 13/23] iotests: 219: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 14/23] iotests: 257: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 15/23] block/block-copy: make progress_bytes_callback optional Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 16/23] block/backup: drop extra gotos from backup_run() Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 17/23] backup: move to block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 18/23] qapi: backup: disable copy_range by default Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 19/23] block/block-copy: drop unused block_copy_set_progress_callback() Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 20/23] block/block-copy: drop unused argument of block_copy() Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 21/23] simplebench/bench_block_job: use correct shebang line with python3 Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 22/23] simplebench: bench_block_job: add cmd_options argument Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 23/23] simplebench: add bench-backup.py Vladimir Sementsov-Ogievskiy
2021-01-18 14:01   ` Max Reitz
2021-01-18 15:07 ` [PATCH v4 00/23] backup performance: block_status + async Max Reitz
2021-01-18 17:04   ` Vladimir Sementsov-Ogievskiy
2021-01-19 18:40 ` Max Reitz
2021-01-19 19:22   ` Vladimir Sementsov-Ogievskiy
2021-01-19 19:29     ` Vladimir Sementsov-Ogievskiy
2021-01-20 10:39     ` Max Reitz
2021-01-20 13:50       ` Max Reitz
2021-01-20 14:34         ` Max Reitz
2021-01-20 14:44           ` Max Reitz
2021-01-20 15:53             ` Max Reitz
2021-01-20 16:00               ` Max Reitz
2021-01-20 16:04               ` Daniel P. Berrangé
2021-01-20 16:40                 ` Max Reitz [this message]
2021-01-20 10:20 ` [PATCH v4 11.5/23] iotests/129: Limit backup's max-chunk/max-workers Max Reitz
2021-01-20 11:24   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=420c2186-b37d-9a4f-5c23-c2f1021abd86@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=den@openvz.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git
	git clone --mirror https://lore.kernel.org/qemu-devel/2 qemu-devel/git/2.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git