All of lore.kernel.org
 help / color / mirror / Atom feed
* Allow read IO during share block breaking in dm-thin
@ 2013-08-23  2:35 Teng-Feng Yang
  2013-08-23  9:59 ` Joe Thornber
  2013-08-23 20:45 ` Mikulas Patocka
  0 siblings, 2 replies; 3+ messages in thread
From: Teng-Feng Yang @ 2013-08-23  2:35 UTC (permalink / raw)
  To: dm-devel

Hi folks,

I have tried to perform some experiments and enhance dm-thin with some
new features for couple of weeks.
I notice that dm-thin uses a dm_deferred_set data structure to record
all the share read IO and inserts new data mappings only when all
share read IO are quiesced.
This method is quite similar to the block tracking mechanism used by
dm-snap, which tries to prevent write IO to the origin device from
overwriting the block when some read IO from snap device has not yet
completed
Although it is reasonable to have something like this in dm-snap, it
looks like an overkill to have the similar mechanism in dm-thin.
Since the "redirect-on-write" nature of dm-thin makes all write IOs to
a share block writes in a new allocated block instead, all preceding
share read IO will still read the correct data even when there are
multiple share write IO on-the-fly.
So here is my question, do we really need to quiesce all share read
IOs before adding a new data mapping, or the share read IO's deferred
set is meant to deal with some other problems?

Any help would be grateful.
Thanks

Dennis

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Allow read IO during share block breaking in dm-thin
  2013-08-23  2:35 Allow read IO during share block breaking in dm-thin Teng-Feng Yang
@ 2013-08-23  9:59 ` Joe Thornber
  2013-08-23 20:45 ` Mikulas Patocka
  1 sibling, 0 replies; 3+ messages in thread
From: Joe Thornber @ 2013-08-23  9:59 UTC (permalink / raw)
  To: device-mapper development

Hi Dennis,

On Fri, Aug 23, 2013 at 10:35:30AM +0800, Teng-Feng Yang wrote:
> So here is my question, do we really need to quiesce all share read
> IOs before adding a new data mapping, or the share read IO's deferred
> set is meant to deal with some other problems?

This is a good question.

As you say we could let the read ios continue to run to the old block.
For large block sizes this could reduce latency of those reads.

I have a suspicion that I originally wrote things this way so we never
complete a read bio with older data after earlier completing a write
bio.  But if the bios were submitted concurrently there's no need to
observe this restriction.

There is one thing you need to consider if you're going to make this
change.  The old block may well have it's reference count reduced to
zero as a result of inserting the new mapping.  This in itself is
fine; we're careful to never free and realloc a block within the same
transaction.  But it does mean you'd need to start quiescing as part
of the commit operation.  I worry that introducing quiescing at the
commit level would impact the latency of ios to all blocks, rather
than just the one that's having it's sharing broken.

You could restructure the REQ_FUA/REQ_FLUSH code (which triggers the
commits), such that only they get stalled waiting for the quiesce,
but that's a chunk more work.

Do some benchmarking, quantify the problem.  Then we can make a call
on whether the development effort is worth it.

- Joe

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Allow read IO during share block breaking in dm-thin
  2013-08-23  2:35 Allow read IO during share block breaking in dm-thin Teng-Feng Yang
  2013-08-23  9:59 ` Joe Thornber
@ 2013-08-23 20:45 ` Mikulas Patocka
  1 sibling, 0 replies; 3+ messages in thread
From: Mikulas Patocka @ 2013-08-23 20:45 UTC (permalink / raw)
  To: device-mapper development; +Cc: Edward Thornber, Teng-Feng Yang



On Fri, 23 Aug 2013, Teng-Feng Yang wrote:

> Hi folks,
> 
> I have tried to perform some experiments and enhance dm-thin with some
> new features for couple of weeks.
> I notice that dm-thin uses a dm_deferred_set data structure to record
> all the share read IO and inserts new data mappings only when all
> share read IO are quiesced.
> This method is quite similar to the block tracking mechanism used by
> dm-snap, which tries to prevent write IO to the origin device from
> overwriting the block when some read IO from snap device has not yet
> completed
> Although it is reasonable to have something like this in dm-snap, it
> looks like an overkill to have the similar mechanism in dm-thin.
> Since the "redirect-on-write" nature of dm-thin makes all write IOs to
> a share block writes in a new allocated block instead, all preceding
> share read IO will still read the correct data even when there are
> multiple share write IO on-the-fly.
> So here is my question, do we really need to quiesce all share read
> IOs before adding a new data mapping, or the share read IO's deferred
> set is meant to deal with some other problems?
> 
> Any help would be grateful.
> Thanks
> 
> Dennis
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

The problem is this:
(1) you have a block with refcount 2, shared by two logical volumes
(2) you submit a read to the 1st logical volume to this block, the read
        waits in i/o queue
(3) you submit a write to the 1st logical volume to this block, this
        triggers reallocation - suppose that the i/o scheduler
        decides that this reallocation is performed before the
        previous read
(4) you submit a write to the 2nd logical volume to this block - reference
        count is 1 (because we dropped it at step(3)), so the write goes
        through and the data are written to the disk
(5) the i/o scheduler decides to perform the read request submitted in
        the step (2) => it incorrectly reads data written to the 2nd
        logical volume in step (4)

Original snapshot implementation had this bug and I fixed it in commit
a8d41b59f3f5a7ac19452ef442a7fc1b5fa17366.

As Joe said, if you want to avoid this scenario, you would have to wait 
for the read request to finish before doing comitting in step (3).

Mikulas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-08-23 20:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23  2:35 Allow read IO during share block breaking in dm-thin Teng-Feng Yang
2013-08-23  9:59 ` Joe Thornber
2013-08-23 20:45 ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.