All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
Date: Tue, 10 Nov 2015 11:12:59 +0100	[thread overview]
Message-ID: <20151110101259.GA4163@noname.redhat.com> (raw)
In-Reply-To: <5641B282.10706@redhat.com>

Am 10.11.2015 um 10:01 hat Paolo Bonzini geschrieben:
> 
> 
> On 10/11/2015 07:14, Fam Zheng wrote:
> > On Mon, 11/09 17:29, Kevin Wolf wrote:
> >> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> >>>
> >>>
> >>> On 09/11/2015 17:04, Kevin Wolf wrote:
> >>>> Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> >>>>> The "pnum < nb_sectors" condition in deciding whether to actually copy
> >>>>> data is unnecessarily strict, and the qiov initialization is
> >>>>> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> >>>>> branches.
> >>>>>
> >>>>> Reorganize mirror_iteration flow so that we:
> >>>>>
> >>>>>     1) Find the contiguous zero/discarded sectors with
> >>>>>     bdrv_get_block_status_above() before deciding what to do. We query
> >>>>>     s->buf_size sized blocks at a time.
> >>>>>
> >>>>>     2) If the sectors in question are zeroed/discarded and aligned to
> >>>>>     target cluster, issue zero write or discard accordingly. It's done
> >>>>>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> >>>>>
> >>>>>     3) Otherwise, do the same loop as before in mirror_do_read.
> >>>>>
> >>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
> >>>>
> >>>> I'm not sure where in the patch to comment on this, so I'll just do it
> >>>> here right in the beginning.
> >>>>
> >>>> I'm concerned that we need to be more careful about races in this patch,
> >>>> in particular regarding the bitmaps. I think the conditions for the two
> >>>> bitmaps are:
> >>>>
> >>>> * Dirty bitmap: We must clear the bit after finding the next piece of
> >>>>   data to be mirrored, but before we yield after getting information
> >>>>   that we use for the decision which kind of operation we need.
> >>>>
> >>>>   In other words, we need to clear the dirty bitmap bit before calling
> >>>>   bdrv_get_block_status_above(), because that's both the function that
> >>>>   retrieves information about the next chunk and also a function that
> >>>>   can yield.
> >>>>
> >>>>   If after this point the data is written to, we need to mirror it
> >>>>   again.
> >>>
> >>> With Fam's patch, that's not trivial for two reasons:
> >>>
> >>> 1) bdrv_get_block_status_above() can return a smaller amount than what
> >>> is asked.
> >>>
> >>> 2) the "read and write" case can handle s->granularity sectors per
> >>> iteration (many of them can be coalesced, but still that's how the
> >>> iteration works).
> >>>
> >>> The simplest solution is to perform the query with s->granularity size
> >>> rather than s->buf_size.
> >>
> >> Then we end up with many small operations, that's not what we want.
> >>
> >> Why can't we mark up to s->buf_size dirty clusters as clean first, then
> >> query the status, and mark all of those that we can't handle dirty
> >> again?
> > 
> > Then we may end up marking more clusters as dirty than it should be.
> 
> You're both right.
> 
> > Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
> > we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> > bdrv_set_dirty_bitmap.
> 
> I think this is not necessary.
> 
> I think the following is safe:
> 
> 1) before calling bdrv_get_block_status_above(), find out how many
> consecutive bits in the dirty bitmap are 1
> 
> 2) zero all those bits in the dirty bitmap
> 
> 3) call bdrv_get_block_status_above() with a size equivalent to the
> number of dirty bits
> 
> 4) if bdrv_get_block_status_above() only returns a partial result, loop
> step (3) until all the dirty bits are processed

Right, you can always implement one iteration with more than one I/O
request. And maybe that would be the time to start a coroutine for the
requests already in the mirror code instead of complicating the AIO
state machine and letting block.c start coroutines.

> For full mirroring, this strategy will probably make the first
> incremental iteration more expensive.

You mean because we issue smaller, interleaved write and write_zeroes
requests now instead of only large writes? That's probably right, but
getting the right result should be more important than speed. :-)

Kevin

  reply	other threads:[~2015-11-10 10:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-06 10:22 [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image Fam Zheng
2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-11-09  2:12   ` Fam Zheng
2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
2015-11-09 16:18   ` Paolo Bonzini
2015-11-09 16:29     ` Kevin Wolf
2015-11-10  6:14       ` Fam Zheng
2015-11-10  9:01         ` Paolo Bonzini
2015-11-10 10:12           ` Kevin Wolf [this message]
2015-11-10 10:30             ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151110101259.GA4163@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.