All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: "qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"eblake@redhat.com" <eblake@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"berto@igalia.com" <berto@igalia.com>
Subject: Re: [Qemu-devel] [PATCH v10 5/9] block: treat BDRV_REQ_ALLOCATE as serialising
Date: Wed, 5 Dec 2018 14:01:51 +0000	[thread overview]
Message-ID: <50f8f543-f242-649b-407c-fe7c73b91177@virtuozzo.com> (raw)
In-Reply-To: <c8832c65-4b52-b56f-cd0b-499597a82b11@virtuozzo.com>



On 5/12/2018 4:14 PM, Vladimir Sementsov-Ogievskiy wrote:
> 03.12.2018 13:14, Anton Nefedov wrote:
>> The idea is that ALLOCATE requests may overlap with other requests.
> 
> please, describe why
> 

It is not used in this series from some point, but the idea is that the
caller might use ALLOCATE requests on a larger extent.

Described in the series header:

   2. moreover, efficient write_zeroes() operation can be used to
preallocate
      space megabytes (*configurable number) ahead which gives noticeable
      improvement on some storage types (e.g. distributed storage)
      where the space allocation operation might be expensive)
      (Not included in this patchset since v6).

So, it's possible to drop from this series and add later but I'd like
to receive general remarks on whether this is an acceptable way.

>> Reuse the existing block layer infrastructure for serialising requests.
>> Use the following approach:
>>     - mark ALLOCATE also SERIALISING, so subsequent requests to the area wait
>>     - ALLOCATE request itself must never wait if another request is in flight
>>       already. Return EAGAIN, let the caller reconsider.
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>    block/io.c | 31 ++++++++++++++++++++++++-------
>>    1 file changed, 24 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index d9d7644858..6ff946f63d 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -720,12 +720,13 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
>>        bdrv_wakeup(bs);
>>    }
>>    
>> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>> +static bool coroutine_fn find_or_wait_serialising_requests(
>> +    BdrvTrackedRequest *self, bool wait)
>>    {
>>        BlockDriverState *bs = self->bs;
>>        BdrvTrackedRequest *req;
>>        bool retry;
>> -    bool waited = false;
>> +    bool found = false;
>>    
>>        if (!atomic_read(&bs->serialising_in_flight)) {
>>            return false;
>> @@ -751,11 +752,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>>                     * will wait for us as soon as it wakes up, then just go on
>>                     * (instead of producing a deadlock in the former case). */
>>                    if (!req->waiting_for) {
>> +                    found = true;
>> +                    if (!wait) {
>> +                        break;
>> +                    }
>>                        self->waiting_for = req;
>>                        qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
>>                        self->waiting_for = NULL;
>>                        retry = true;
>> -                    waited = true;
>>                        break;
>>                    }
>>                }
>> @@ -763,7 +767,12 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>>            qemu_co_mutex_unlock(&bs->reqs_lock);
>>        } while (retry);
>>    
>> -    return waited;
>> +    return found;
>> +}
>> +
>> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>> +{
>> +    return find_or_wait_serialising_requests(self, true);
>>    }
>>    
>>    static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
>> @@ -1585,7 +1594,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
>>                              BdrvTrackedRequest *req, int flags)
>>    {
>>        BlockDriverState *bs = child->bs;
>> -    bool waited;
>> +    bool found;
>>        int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
>>    
>>        if (bs->read_only) {
>> @@ -1602,9 +1611,13 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
>>            mark_request_serialising(req, bdrv_get_cluster_size(bs));
>>        }
>>    
>> -    waited = wait_serialising_requests(req);
>> +    found = find_or_wait_serialising_requests(req,
>> +                                              !(flags & BDRV_REQ_ALLOCATE));
>> +    if (found && (flags & BDRV_REQ_ALLOCATE)) {
>> +        return -EAGAIN;
>> +    }
>>    
>> -    assert(!waited || !req->serialising ||
>> +    assert(!found || !req->serialising ||
>>               is_request_serialising_and_aligned(req));
>>        assert(req->overlap_offset <= offset);
>>        assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
>> @@ -1866,6 +1879,10 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>        assert(!((flags & BDRV_REQ_ALLOCATE) && (flags & BDRV_REQ_MAY_UNMAP)));
>>        assert(!((flags & BDRV_REQ_ALLOCATE) && !(flags & BDRV_REQ_ZERO_WRITE)));
>>    
>> +    if (flags & BDRV_REQ_ALLOCATE) {
>> +        flags |= BDRV_REQ_SERIALISING;
>> +    }
>> +
>>        trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
>>    
>>        if (!bs->drv) {
>>
> 
> patch looks correct technically, I just don't know the reasoning..
> 
> the only thing, is that it would be good to add a comment in BDRV flags definition section, that _ALLOCATE implies _SERIALIZE.
> 
> 

I see your point, but it does not imply SERIALIZE in sense that the
caller must set both (as with ALLOCATE and WRITE_ZEROES) - it's set
implicitly.

maybe:

diff --git a/include/block/block.h b/include/block/block.h
index f571082415..a0bf3fed93 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -87,6 +87,9 @@ typedef enum {
       * efficiently allocate the space so it reads as zeroes, or return 
an error.
       * If this flag is set then BDRV_REQ_ZERO_WRITE must also be set.
       * This flag cannot be set together with BDRV_REQ_MAY_UNMAP.
+     * This flag implicitly behaves as BDRV_REQ_SERIALISING i.e. it is
+     * protected from conflicts with overlapping requests. If such 
conflict is
+     * detected, -EAGAIN is returned.
       */
      BDRV_REQ_ALLOCATE           = 0x100,



  reply	other threads:[~2018-12-05 14:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 10:14 [Qemu-devel] [PATCH v10 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 1/9] mirror: inherit supported write/zero flags Anton Nefedov
2018-12-05 12:43   ` Vladimir Sementsov-Ogievskiy
2018-12-05 13:27     ` Anton Nefedov
2018-12-07 14:31   ` Alberto Garcia
2018-12-12 12:15   ` Vladimir Sementsov-Ogievskiy
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 2/9] blkverify: set " Anton Nefedov
2018-12-07 14:32   ` Alberto Garcia
2018-12-12 12:26   ` Vladimir Sementsov-Ogievskiy
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 3/9] quorum: set supported write flags Anton Nefedov
2018-12-07 14:33   ` Alberto Garcia
2018-12-07 14:46     ` Anton Nefedov
2018-12-07 14:54       ` Alberto Garcia
2018-12-12 12:33   ` Vladimir Sementsov-Ogievskiy
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 4/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2018-12-05 12:59   ` Vladimir Sementsov-Ogievskiy
2018-12-05 13:38     ` Anton Nefedov
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 5/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2018-12-05 13:14   ` Vladimir Sementsov-Ogievskiy
2018-12-05 14:01     ` Anton Nefedov [this message]
2018-12-12 12:48       ` Vladimir Sementsov-Ogievskiy
2018-12-13 11:57         ` Anton Nefedov
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 6/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2018-12-05 13:25   ` Vladimir Sementsov-Ogievskiy
2018-12-05 14:11     ` Anton Nefedov
2018-12-12 17:19       ` Vladimir Sementsov-Ogievskiy
2018-12-13 12:01         ` Anton Nefedov
2018-12-07 15:09   ` Alberto Garcia
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2018-12-05 13:28   ` Vladimir Sementsov-Ogievskiy
2018-12-07 15:00   ` Alberto Garcia
2018-12-03 10:14 ` [Qemu-devel] [PATCH v10 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-12-03 13:59   ` Alberto Garcia
2018-12-03 14:04     ` Anton Nefedov
2018-12-05 14:01   ` Vladimir Sementsov-Ogievskiy
2018-12-05 16:59     ` Anton Nefedov
2018-12-05 17:42       ` Vladimir Sementsov-Ogievskiy
2018-12-13 12:02   ` Vladimir Sementsov-Ogievskiy
2018-12-13 13:57     ` Anton Nefedov
2018-12-14 16:20   ` Vladimir Sementsov-Ogievskiy
2018-12-17 10:17     ` Anton Nefedov
2018-12-03 10:15 ` [Qemu-devel] [PATCH v10 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50f8f543-f242-649b-407c-fe7c73b91177@virtuozzo.com \
    --to=anton.nefedov@virtuozzo.com \
    --cc=berto@igalia.com \
    --cc=den@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.