linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity
@ 2010-02-22 20:49 Mike Snitzer
  2010-02-23 17:10 ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2010-02-22 20:49 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-kernel

Linux Device Mapper (DM) and Software Raid (MD) device drivers can be
used to arbitrarily combine devices with different "I/O Limits".  The
kernel's block layer goes to great lengths to reasonably combine the
"I/O Limits" of the individual devices.  The kernel will not prevent
combining heterogenuous devices but the user should be aware of the risk
associated with doing so.

For instance, a 512 byte device and a 4K device may be combined into a
single logical DM device; the resulting DM device would have a
logical_block_size of 4K.  Filesystems layered on such a hybrid device
assume that 4K will be written atomically but in reality that 4K will be
split into 8 512 byte IOs when issued to the 512 byte device.  Using a
4K logical_block_size for the higher-level DM device increases potential
for a partial write to the 512b device if there is a system crash.

If combining multiple devices' "I/O Limits" results in a conflict the
block layer will report a warning that the device is more susceptible to
partial writes and misaligned. [NOTE: setting "misaligned" for this
warning is somewhat awkward but blk_stack_limits() return of -1 can be
viewed as there was an "alignment inconsistency".  Would it be better to
return -1 but avoid setting t->misaligned?]

Signed-off-by: Mike Snitzer <snitzer@redhat.com>

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 5eeb9e0..33bebe7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -566,8 +566,16 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 		}
 	}
 
+	top = t->logical_block_size;
 	t->logical_block_size = max(t->logical_block_size,
 				    b->logical_block_size);
+	if (top && top < t->logical_block_size) {
+		printk(KERN_NOTICE "Warning: changing logical_block_size of top device "
+		       "(from %u to %u) increases potential for partial writes\n",
+		       top, t->logical_block_size);
+		t->misaligned = 1;
+		ret = -1;
+	}
 
 	t->physical_block_size = max(t->physical_block_size,
 				     b->physical_block_size);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity
  2010-02-22 20:49 [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity Mike Snitzer
@ 2010-02-23 17:10 ` Martin K. Petersen
  2010-02-23 19:32   ` Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Martin K. Petersen @ 2010-02-23 17:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, Christoph Hellwig, linux-kernel

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> For instance, a 512 byte device and a 4K device may be combined
Mike> into a single logical DM device; the resulting DM device would
Mike> have a logical_block_size of 4K.  Filesystems layered on such a
Mike> hybrid device assume that 4K will be written atomically but in
Mike> reality that 4K will be split into 8 512 byte IOs when issued to
Mike> the 512 byte device.

Not really.  It'll be issued as one I/O with a smaller LBA count but an
identical data payload.


Mike> Using a 4K logical_block_size for the higher-level DM device
Mike> increases potential for a partial write to the 512b device if
Mike> there is a system crash.

That's a definite maybe :)


Mike> [NOTE: setting "misaligned" for this warning is somewhat awkward
Mike> but blk_stack_limits() return of -1 can be viewed as there was an
Mike> "alignment inconsistency".  Would it be better to return -1 but
Mike> avoid setting t->misaligned?]

I don't have a problem with printing a warning but I don't think this
qualifies as misalignment on the grounds that the error scenario is in
the hypothetical bucket and not a deterministic thing.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity
  2010-02-23 17:10 ` Martin K. Petersen
@ 2010-02-23 19:32   ` Mike Snitzer
  2010-02-24  0:12     ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2010-02-23 19:32 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-kernel

On Tue, Feb 23 2010 at 12:10pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> For instance, a 512 byte device and a 4K device may be combined
> Mike> into a single logical DM device; the resulting DM device would
> Mike> have a logical_block_size of 4K.  Filesystems layered on such a
> Mike> hybrid device assume that 4K will be written atomically but in
> Mike> reality that 4K will be split into 8 512 byte IOs when issued to
> Mike> the 512 byte device.
> 
> Not really.  It'll be issued as one I/O with a smaller LBA count but an
> identical data payload.

Can you expand on that a bit?  How does a smaller LBA count relate to
this?  On a 512b device the 4K data payload would touch more LBAs.

In any case, a 4K write to a 512b device is not atomic.

> Mike> Using a 4K logical_block_size for the higher-level DM device
> Mike> increases potential for a partial write to the 512b device if
> Mike> there is a system crash.
> 
> That's a definite maybe :)

If you think what I've raised here is overblown then I'd like to
understand why in more detail.

> Mike> [NOTE: setting "misaligned" for this warning is somewhat awkward
> Mike> but blk_stack_limits() return of -1 can be viewed as there was an
> Mike> "alignment inconsistency".  Would it be better to return -1 but
> Mike> avoid setting t->misaligned?]
> 
> I don't have a problem with printing a warning but I don't think this
> qualifies as misalignment on the grounds that the error scenario is in
> the hypothetical bucket and not a deterministic thing.

OK, I was relying on returning -1 so the blk_stack_limits() caller could
provide additional context (via existing warnings) for which device
"increases potential for partial writes" when it gets stacked.

Otherwise all you have is a largely generic warning (as blk_stack_limits
knows nothing about which devices the provided limits belong to).

Mike

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity
  2010-02-23 19:32   ` Mike Snitzer
@ 2010-02-24  0:12     ` Martin K. Petersen
  0 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2010-02-24  0:12 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, Christoph Hellwig, linux-kernel

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> Not really.  It'll be issued as one I/O with a smaller LBA count but
>> an identical data payload.

Mike> Can you expand on that a bit?  How does a smaller LBA count relate
Mike> to this?  On a 512b device the 4K data payload would touch more
Mike> LBAs.

Sorry, I had my head stuck in the 4KB case.  More blocks.  My point
being that regardless of the logical block size we'll be issuing a
single command.  The only difference between the two cases is the LBA
count. I.e. the protocol encoding of how much data to transfer.


Mike> In any case, a 4K write to a 512b device is not atomic.

Mike> If you think what I've raised here is overblown then I'd like to
Mike> understand why in more detail.

I'm just playing devil's advocate here.  We have no guarantees that a
512-byte write to a 512-byte device is atomic either.  None.  We've been
trying very hard to get any guarantees out of storage vendors for years
without any luck.  I know that a lot of our stuff operate on the
assumption that sector writes are atomic.  But in a lot of cases they
are not.


Mike> Otherwise all you have is a largely generic warning (as
Mike> blk_stack_limits knows nothing about which devices the provided
Mike> limits belong to).

Yeah, I had a patch at some point that distinguished between the various
error conditions instead of returning -1.  I'll see if I can dig that
up...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-24  0:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-22 20:49 [RFC PATCH] block: warn if blk_stack_limits() undermines atomicity Mike Snitzer
2010-02-23 17:10 ` Martin K. Petersen
2010-02-23 19:32   ` Mike Snitzer
2010-02-24  0:12     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).