From: Stefan Reiter <s.reiter@proxmox.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: kwolf@redhat.com, slp@redhat.com, mreitz@redhat.com,
stefanha@redhat.com, jsnow@redhat.com, dietmar@proxmox.com
Subject: Re: [PATCH] backup: don't acquire aio_context in backup_clean
Date: Thu, 26 Mar 2020 10:43:47 +0100 [thread overview]
Message-ID: <1d1984b3-14f5-5a17-b477-d70561f75e8f@proxmox.com> (raw)
In-Reply-To: <2b288000-7c09-ba31-82a7-02c5ed55f4e7@virtuozzo.com>
On 26/03/2020 06:54, Vladimir Sementsov-Ogievskiy wrote:
> 25.03.2020 18:50, Stefan Reiter wrote:
>> backup_clean is only ever called as a handler via job_exit, which
>
> Hmm.. I'm afraid it's not quite correct.
>
> job_clean
>
> job_finalize_single
>
> job_completed_txn_abort (lock aio context)
>
> job_do_finalize
>
>
> Hmm. job_do_finalize calls job_completed_txn_abort, which cares to lock
> aio context..
> And on the same time, it directaly calls job_txn_apply(job->txn,
> job_finalize_single)
> without locking. Is it a bug?
>
I think, as you say, the idea is that job_do_finalize is always called
with the lock acquired. That's why job_completed_txn_abort takes care to
release the lock (at least of the "outer_ctx" as it calls it) before
reacquiring it.
> And, even if job_do_finalize called always with locked context, where is
> guarantee that all
> context of all jobs in txn are locked?
>
I also don't see anything that guarantees that... I guess it could be
adapted to handle locks like job_completed_txn_abort does?
Haven't looked into transactions too much, but does it even make sense
to have jobs in different contexts in one transaction?
> Still, let's look through its callers.
>
> job_finalize
>
> qmp_block_job_finalize (lock aio context)
> qmp_job_finalize (lock aio context)
> test_cancel_concluded (doesn't lock, but it's a test)
>
> job_completed_txn_success
>
> job_completed
>
> job_exit (lock aio context)
>
> job_cancel
>
> blockdev_mark_auto_del (lock aio context)
>
> job_user_cancel
>
> qmp_block_job_cancel (locks context)
> qmp_job_cancel (locks context)
>
> job_cancel_err
>
> job_cancel_sync (return
> job_finish_sync(job, &job_cancel_err, NULL);, job_finish_sync just calls
> callback)
>
> replication_close (it's
> .bdrv_close.. Hmm, I don't see context locking, where is it ?)
Hm, don't see it either. This might indeed be a way to get to job_clean
without a lock held.
I don't have any testing set up for replication atm, but if you believe
this would be correct I can send a patch for that as well (just acquire
the lock in replication_close before job_cancel_async?).
>
> replication_stop (locks context)
>
> drive_backup_abort (locks context)
>
> blockdev_backup_abort (locks context)
>
> job_cancel_sync_all (locks context)
>
> cancel_common (locks context)
>
> test_* (I don't care)
>
To clarify, aside from the commit message the patch itself does not
appear to be wrong? All paths (aside from replication_close mentioned
above) guarantee the job lock to be held.
>> already acquires the job's context. The job's context is guaranteed to
>> be the same as the one used by backup_top via backup_job_create.
>>
>> Since the previous logic effectively acquired the lock twice, this
>> broke cleanup of backups for disks using IO threads, since the
>> BDRV_POLL_WHILE
>> in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release
>> the lock
>> once, thus deadlocking with the IO thread.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>
> Just note, that this thing were recently touched by 0abf2581717a19 , so
> add Sergio (its author) to CC.
>
>> ---
>>
>> This is a fix for the issue discussed in this part of the thread:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
>> ...not the original problem (core dump) posted by Dietmar.
>>
>> I've still seen it occasionally hang during a backup abort. I'm trying
>> to figure
>> out why that happens, stack trace indicates a similar problem with the
>> main
>> thread hanging at bdrv_do_drained_begin, though I have no clue why as
>> of yet.
>>
>> block/backup.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 7430ca5883..a7a7dcaf4c 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -126,11 +126,7 @@ static void backup_abort(Job *job)
>> static void backup_clean(Job *job)
>> {
>> BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>> - AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
>> -
>> - aio_context_acquire(aio_context);
>> bdrv_backup_top_drop(s->backup_top);
>> - aio_context_release(aio_context);
>> }
>> void backup_do_checkpoint(BlockJob *job, Error **errp)
>>
>
>
next prev parent reply other threads:[~2020-03-26 9:44 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-24 11:13 backup transaction with io-thread core dumps Dietmar Maurer
2020-03-24 13:30 ` Dietmar Maurer
2020-03-24 13:33 ` Dietmar Maurer
2020-03-24 13:44 ` John Snow
2020-03-24 14:00 ` Dietmar Maurer
2020-03-24 13:47 ` Max Reitz
2020-03-24 14:02 ` Dietmar Maurer
2020-03-24 16:49 ` Dietmar Maurer
2020-03-25 11:40 ` Stefan Reiter
2020-03-25 12:23 ` Vladimir Sementsov-Ogievskiy
2020-03-25 15:50 ` [PATCH] backup: don't acquire aio_context in backup_clean Stefan Reiter
2020-03-26 5:54 ` Vladimir Sementsov-Ogievskiy
2020-03-26 9:43 ` Stefan Reiter [this message]
2020-03-26 12:46 ` Vladimir Sementsov-Ogievskiy
2020-03-26 11:53 ` Sergio Lopez
2020-03-25 8:13 ` backup transaction with io-thread core dumps Sergio Lopez
2020-03-25 11:46 ` Sergio Lopez
2020-03-25 12:29 ` Dietmar Maurer
2020-03-25 12:39 ` Sergio Lopez
2020-03-25 15:40 ` Dietmar Maurer
2020-03-26 7:50 ` Sergio Lopez
2020-03-26 8:14 ` Dietmar Maurer
2020-03-26 9:23 ` Dietmar Maurer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d1984b3-14f5-5a17-b477-d70561f75e8f@proxmox.com \
--to=s.reiter@proxmox.com \
--cc=dietmar@proxmox.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=slp@redhat.com \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).