QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, wencongyang2@huawei.com,
	xiechanglong.d@gmail.com, qemu-devel@nongnu.org,
	armbru@redhat.com, den@openvz.org, jsnow@redhat.com
Subject: Re: [PATCH v4 00/23] backup performance: block_status + async
Date: Wed, 20 Jan 2021 16:53:26 +0100
Message-ID: <8522c1f5-9476-3596-abf0-7f2d83f8844c@redhat.com> (raw)
In-Reply-To: <3db87f48-b628-8000-a46a-6d07cdf1ccc3@redhat.com>

On 20.01.21 15:44, Max Reitz wrote:
> On 20.01.21 15:34, Max Reitz wrote:

[...]

>>  From a glance, it looks to me like two coroutines are created 
>> simultaneously in two threads, and so one thread sets up a special 
>> SIGUSR2 action, then another reverts SIGUSR2 to the default, and then 
>> the first one kills itself with SIGUSR2.
>>
>> Not sure what this has to do with backup, though it is interesting 
>> that backup_loop() runs in two threads.  So perhaps some AioContext 
>> problem.
> 
> Oh, 256 runs two backups concurrently.  So it isn’t that interesting, 
> but perhaps part of the problem still.  (I have no idea, still looking.)

So this is what I found out:

coroutine-sigaltstack, when creating a new coroutine, sets up a signal 
handler for SIGUSR2, then kills itself with SIGUSR2, then uses the 
signal handler context (with a sigaltstack) for the new coroutine, and 
then (the signal handler returns after a sigsetjmp()) the old SIGUSR2 
behavior is restored.

What I fail to understand is how this is thread-safe.  Setting up signal 
handlers is a process-wide action.  When one thread changes what SIGUSR2 
does, this will affect all threads immediately, so when two threads run 
coroutine-sigaltstack’s qemu_coroutine_new() concurrently, and one 
thread reverts to the default action before the other has SIGUSR2’ed 
itself, that later SIGUSR2 will kill the whole process.

(I suppose it gets even more interesting when one thread has set up the 
sigaltstack, then the other sets up its own sigaltstack, and then both 
kill themselves with SIGUSR2, so both coroutines get the same stack...)

I have no idea why this has never been hit before, but it makes sense 
why block-copy backup makes it apparent: It creates 64+x coroutines in a 
very short time span, and 256 makes it do so in two threads concurrently 
(thanks to launching two backups in two AioContexts in a transaction).

So...  Looks to me like a bug in coroutine-sigaltstack.  Not sure what 
to do now, though.  I don’t think we can use block-copy for backup 
before that backend is fixed.  (And that is assuming that it’s indeed 
coroutine-sigaltstack’s fault.)

I’ll try to add some locking, see what it does, and send a mail 
concerning coroutine-sigaltstack to qemu-devel.

Max



  reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-16 21:46 Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 01/23] qapi: backup: add perf.use-copy-range parameter Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 02/23] block/block-copy: More explicit call_state Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 03/23] block/block-copy: implement block_copy_async Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 04/23] block/block-copy: add max_chunk and max_workers parameters Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 05/23] block/block-copy: add list of all call-states Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 06/23] block/block-copy: add ratelimit to block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 07/23] block/block-copy: add block_copy_cancel Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 08/23] blockjob: add set_speed to BlockJobDriver Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 09/23] job: call job_enter from job_pause Vladimir Sementsov-Ogievskiy
2021-01-18 13:45   ` Max Reitz
2021-01-18 14:20     ` Max Reitz
2021-04-07 11:19   ` Max Reitz
2021-04-07 11:38     ` Vladimir Sementsov-Ogievskiy
2021-04-21  8:31       ` Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 10/23] qapi: backup: add max-chunk and max-workers to x-perf struct Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 11/23] iotests: 56: prepare for backup over block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 12/23] iotests: 185: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 13/23] iotests: 219: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 14/23] iotests: 257: " Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 15/23] block/block-copy: make progress_bytes_callback optional Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 16/23] block/backup: drop extra gotos from backup_run() Vladimir Sementsov-Ogievskiy
2021-01-16 21:46 ` [PATCH v4 17/23] backup: move to block-copy Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 18/23] qapi: backup: disable copy_range by default Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 19/23] block/block-copy: drop unused block_copy_set_progress_callback() Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 20/23] block/block-copy: drop unused argument of block_copy() Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 21/23] simplebench/bench_block_job: use correct shebang line with python3 Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 22/23] simplebench: bench_block_job: add cmd_options argument Vladimir Sementsov-Ogievskiy
2021-01-16 21:47 ` [PATCH v4 23/23] simplebench: add bench-backup.py Vladimir Sementsov-Ogievskiy
2021-01-18 14:01   ` Max Reitz
2021-01-18 15:07 ` [PATCH v4 00/23] backup performance: block_status + async Max Reitz
2021-01-18 17:04   ` Vladimir Sementsov-Ogievskiy
2021-01-19 18:40 ` Max Reitz
2021-01-19 19:22   ` Vladimir Sementsov-Ogievskiy
2021-01-19 19:29     ` Vladimir Sementsov-Ogievskiy
2021-01-20 10:39     ` Max Reitz
2021-01-20 13:50       ` Max Reitz
2021-01-20 14:34         ` Max Reitz
2021-01-20 14:44           ` Max Reitz
2021-01-20 15:53             ` Max Reitz [this message]
2021-01-20 16:00               ` Max Reitz
2021-01-20 16:04               ` Daniel P. Berrangé
2021-01-20 16:40                 ` Max Reitz
2021-01-20 10:20 ` [PATCH v4 11.5/23] iotests/129: Limit backup's max-chunk/max-workers Max Reitz
2021-01-20 11:24   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8522c1f5-9476-3596-abf0-7f2d83f8844c@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git
	git clone --mirror https://lore.kernel.org/qemu-devel/2 qemu-devel/git/2.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git