On 2018-05-15 14:17, Kevin Wolf wrote: > Am 14.05.2018 um 17:52 hat Max Reitz geschrieben: >> On 2018-05-09 18:26, Kevin Wolf wrote: >>> Signed-off-by: Kevin Wolf >>> --- >>> include/block/blockjob.h | 5 ---- >>> include/block/blockjob_int.h | 19 --------------- >>> include/qemu/job.h | 20 ++++++++++++++++ >>> block/backup.c | 7 +++--- >>> block/commit.c | 11 +++++---- >>> block/mirror.c | 15 ++++++------ >>> block/stream.c | 14 +++++------ >>> blockjob.c | 57 ++++---------------------------------------- >>> job.c | 33 +++++++++++++++++++++++++ >>> tests/test-bdrv-drain.c | 7 +++--- >>> tests/test-blockjob-txn.c | 13 +++++----- >>> tests/test-blockjob.c | 7 +++--- >>> 12 files changed, 98 insertions(+), 110 deletions(-) >> >> [...] >> >>> diff --git a/block/commit.c b/block/commit.c >>> index 85baea8f92..d326766e4d 100644 >>> --- a/block/commit.c >>> +++ b/block/commit.c >>> @@ -72,9 +72,10 @@ typedef struct { >>> int ret; >>> } CommitCompleteData; >>> >>> -static void commit_complete(BlockJob *job, void *opaque) >>> +static void commit_complete(Job *job, void *opaque) >>> { >>> - CommitBlockJob *s = container_of(job, CommitBlockJob, common); >>> + CommitBlockJob *s = container_of(job, CommitBlockJob, common.job); >> >> Now this is just abuse. >> >> ...but it's not the first time someone packs two container_of() into >> one, it appears. So, whatever, I guess. > > I don't think it's abuse. Why wouldn't I directly cast to the type that > I really want instead of casting to each of the uninteresting parent > classes, too? Because the final parameter is called "member" and not "path". :-) >>> @@ -170,3 +171,35 @@ void job_unref(Job *job) >>> g_free(job); >>> } >>> } >>> + >>> +typedef struct { >>> + Job *job; >>> + JobDeferToMainLoopFn *fn; >>> + void *opaque; >>> +} JobDeferToMainLoopData; >>> + >>> +static void job_defer_to_main_loop_bh(void *opaque) >>> +{ >>> + JobDeferToMainLoopData *data = opaque; >>> + Job *job = data->job; >>> + AioContext *aio_context = job->aio_context; >>> + >>> + /* Prevent race with job_defer_to_main_loop() */ >>> + aio_context_acquire(aio_context); >> >> I don't have a good feeling about this. The original code had this >> comment above an aio_context_acquire() for a context that might >> decidedly not have anything to do with the BB's context; >> block_job_defer_to_main_loop()'s description was that it would acquire >> the latter, so why did it acquire the former at all? >> >> We wouldn't need this comment here at all, because acquiring this >> AioContext is part of the interface. That's why I don't have a good >> feeling. > > To be honest, I don't fully understand what the old code was trying to > do or what race it was talking about, because I don't see any potential > race with job_defer_to_main_loop() (neither in the old nor in the new > code). > > My suspicion is that Stefan solved the race that you reported [1] (and > which was not with job_defer_to_main_loop(), but with random code that > runs between that and the BH) just by adding some more code that made > the scenario safe, but missed that this also made the existing code > redundant. In other words, I think taking data->aio_context didn't serve > a purpose even in the old code. Possible, yes. Also seems more likely than any of the explanations I tried to come up with. > The only thing it could possibly protect is the access of data->job->bs, > but that's not something that changes between job_defer_to_main_loop() > and the execution of the BH. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg00260.html > >> The best explanation I can come up with is that the original code >> acquired the AioContext both of the block device at the time of the BH >> (because that needs to be done), and at the time of >> block_job_defer_to_main_loop() -- because the latter is probably the >> context the block_job_defer_to_main_loop() call came from, so it should >> be (b)locked. >> >> But if that's the case, then the same should be done here. The context >> of the job may change between scheduling the BH and the BH being >> executed, so we might lock a different context here than the one >> job_defer_to_main_loop() ran in (i.e., job->aio_context at the time of >> job_defer_to_main_loop() running). And maybe we should lock that old >> context, too -- just like block_job_defer_to_main_loop_bh() did. > > Why should we lock the old context? We don't access anything protected > by it. Even the data->job->bs access has gone away because we now have > job->aio_context. Because the old code did so and I don't know why. O:-) Your explanation does make sense to me, though, so: Reviewed-by: Max Reitz