All of lore.kernel.org
 help / color / mirror / Atom feed
* all rbd users: set 'filestore fiemap = false'
@ 2012-06-18  4:02 Sage Weil
  2012-06-18  8:29 ` Oliver Francke
  2012-06-18  8:57 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Sage Weil @ 2012-06-18  4:02 UTC (permalink / raw)
  To: ceph-devel

If you are using RBD, and want to avoid potential image corruption, add

	filestore fiemap = false

to the [osd] section of your ceph.conf and restart your OSDs.

We've tracked down the source of some corruption to racy/buggy FIEMAP 
ioctl behavior.  The RBD client (when caching is diabled--the default) 
uses a 'sparse read' operation that the OSD implements by doing an fsync 
on the object file, mapping which extents are allocated, and sending only 
that data over the wire.  We have observed incorrect/changing FIEMAP on 
both btrfs:

	fsync
	fiemap returns mapping
	<time passes, no modifications to file>
	fiemap returns different mapping

Josh is still tracking down which kernels and file system are affected; 
fortunately it is relatively easy to reproduce with the test_librbd_fsx 
tool.  In the meantime, the (mis)feature can be safely disabled. It will 
default to off in 0.48. It is unclear whether it's really much of a 
performance win anyway.

Thanks!
sage

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: all rbd users: set 'filestore fiemap = false'
  2012-06-18  4:02 all rbd users: set 'filestore fiemap = false' Sage Weil
@ 2012-06-18  8:29 ` Oliver Francke
  2012-06-18  8:57 ` Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Oliver Francke @ 2012-06-18  8:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

On 06/18/2012 06:02 AM, Sage Weil wrote:
> If you are using RBD, and want to avoid potential image corruption, add
>
> 	filestore fiemap = false
>
> to the [osd] section of your ceph.conf and restart your OSDs.

as far as this heals some trouble, but I fairly don't understand...

>
> We've tracked down the source of some corruption to racy/buggy FIEMAP
> ioctl behavior.  The RBD client (when caching is diabled--the default)
> uses a 'sparse read' operation that the OSD implements by doing an fsync
> on the object file, mapping which extents are allocated, and sending only
> that data over the wire.  We have observed incorrect/changing FIEMAP on
> both btrfs:
>
> 	fsync
> 	fiemap returns mapping
> 	<time passes, no modifications to file>
> 	fiemap returns different mapping

... that even an initial start of a VM leads to corruption of the read data?

I get s/t like:

--- 8-< ---

Loading, please wait
/sbin/init: relocation error: ...
  not defined in file libc.so.6...
[     0.81...] Kernel panic - not snycing: Attempted to kill init!

--- 8-< ---

host-kernel is now 3.4.1 + qemu-1.0.1, but shows failures with other 
kernel/qemu-versions, too.

Keeping fingers crossed for Josh, though ;-)
Give me a shout, If I can do some debugging,

regards,

Oliver.

>
> Josh is still tracking down which kernels and file system are affected;
> fortunately it is relatively easy to reproduce with the test_librbd_fsx
> tool.  In the meantime, the (mis)feature can be safely disabled. It will
> default to off in 0.48. It is unclear whether it's really much of a
> performance win anyway.
>
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: all rbd users: set 'filestore fiemap = false'
  2012-06-18  4:02 all rbd users: set 'filestore fiemap = false' Sage Weil
  2012-06-18  8:29 ` Oliver Francke
@ 2012-06-18  8:57 ` Christoph Hellwig
  2012-06-18 15:32   ` Sage Weil
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2012-06-18  8:57 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Sun, Jun 17, 2012 at 09:02:15PM -0700, Sage Weil wrote:
> that data over the wire.  We have observed incorrect/changing FIEMAP on 
> both btrfs:

both btrfs and?

Btw, btrfs had SEEK_HOLE/SEEK_DATA which are a lot more useful for this
kind of operations, and xfs has added support for it as well now.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: all rbd users: set 'filestore fiemap = false'
  2012-06-18  8:57 ` Christoph Hellwig
@ 2012-06-18 15:32   ` Sage Weil
  2012-06-22 15:16     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Sage Weil @ 2012-06-18 15:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ceph-devel

On Mon, 18 Jun 2012, Christoph Hellwig wrote:
> On Sun, Jun 17, 2012 at 09:02:15PM -0700, Sage Weil wrote:
> > that data over the wire.  We have observed incorrect/changing FIEMAP on 
> > both btrfs:
> 
> both btrfs and?

Whoops, it was XFS.  :/ 

> Btw, btrfs had SEEK_HOLE/SEEK_DATA which are a lot more useful for this
> kind of operations, and xfs has added support for it as well now.

Yeah, started looking at that last night.  (This code predates SEEK_HOLE.)

sage

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: all rbd users: set 'filestore fiemap = false'
  2012-06-18 15:32   ` Sage Weil
@ 2012-06-22 15:16     ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2012-06-22 15:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Christoph Hellwig, ceph-devel

On Mon, Jun 18, 2012 at 08:32:50AM -0700, Sage Weil wrote:
> On Mon, 18 Jun 2012, Christoph Hellwig wrote:
> > On Sun, Jun 17, 2012 at 09:02:15PM -0700, Sage Weil wrote:
> > > that data over the wire.  We have observed incorrect/changing FIEMAP on 
> > > both btrfs:
> > 
> > both btrfs and?
> 
> Whoops, it was XFS.  :/

If you manage to extract a minimal test case I'd love to see it,  FIEMAP
is a complete mess, although most of the time the errors actually are on
the users side due to it's complicated semantics.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-22 15:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-18  4:02 all rbd users: set 'filestore fiemap = false' Sage Weil
2012-06-18  8:29 ` Oliver Francke
2012-06-18  8:57 ` Christoph Hellwig
2012-06-18 15:32   ` Sage Weil
2012-06-22 15:16     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.