All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.com>, Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	XFS Developers <xfs@oss.sgi.com>
Subject: Re: [PATCH v2 2/2] dax: move writeback calls into the filesystems
Date: Fri, 12 Feb 2016 10:44:15 +1100	[thread overview]
Message-ID: <20160211234415.GM19486@dastard> (raw)
In-Reply-To: <CAPcyv4hR60bahtQq68SgSG2uT9zP4H8u3zbUqtqndnx=ogwVtA@mail.gmail.com>

On Thu, Feb 11, 2016 at 02:59:14PM -0800, Dan Williams wrote:
> On Thu, Feb 11, 2016 at 2:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Feb 11, 2016 at 12:58:38PM -0800, Dan Williams wrote:
> >> On Thu, Feb 11, 2016 at 12:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> Maybe I don't need to worry because it's already the case that a
> >> mmap of the raw device may not see the most up to date data for a
> >> file that has dirty fs-page-cache data.
> >
> > It goes both ways. What happens if mkfs or fsck modifies the
> > block device via mmap+DAX and then the filesystem mounts the block
> > device and tries to read that metadata via the block device page
> > cache?
> >
> > Quite frankly, DAX on the block device is a can of worms we really
> > don't need to deal with right now. IMO it's a solution looking for a
> > problem to solve,
> 
> Virtualization use cases want to give large ranges to guest-VMs, and
> it is currently the only way to reliably get 1GiB mappings.

Precisely my point - block devices are not the best way to solve
this problem.

A file, on XFS, with a 1GB extent size hint and preallocated to be
aligned to 1GB addresses (i.e. mkfs.xfs -d su=1G,sw=1 on the host
filesystem) will give reliable 1GB aligned blocks for DAX mappings,
just like a block device will. Peformance wise it's little different
to using the block device directly. Management wise it's way more
flexible, especially as such image files can be recycled for new VMs
almost instantly via FALLOC_FL_FLAG_ZERO_RANGE.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.com>, Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	XFS Developers <xfs@oss.sgi.com>
Subject: Re: [PATCH v2 2/2] dax: move writeback calls into the filesystems
Date: Fri, 12 Feb 2016 10:44:15 +1100	[thread overview]
Message-ID: <20160211234415.GM19486@dastard> (raw)
In-Reply-To: <CAPcyv4hR60bahtQq68SgSG2uT9zP4H8u3zbUqtqndnx=ogwVtA@mail.gmail.com>

On Thu, Feb 11, 2016 at 02:59:14PM -0800, Dan Williams wrote:
> On Thu, Feb 11, 2016 at 2:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Feb 11, 2016 at 12:58:38PM -0800, Dan Williams wrote:
> >> On Thu, Feb 11, 2016 at 12:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> Maybe I don't need to worry because it's already the case that a
> >> mmap of the raw device may not see the most up to date data for a
> >> file that has dirty fs-page-cache data.
> >
> > It goes both ways. What happens if mkfs or fsck modifies the
> > block device via mmap+DAX and then the filesystem mounts the block
> > device and tries to read that metadata via the block device page
> > cache?
> >
> > Quite frankly, DAX on the block device is a can of worms we really
> > don't need to deal with right now. IMO it's a solution looking for a
> > problem to solve,
> 
> Virtualization use cases want to give large ranges to guest-VMs, and
> it is currently the only way to reliably get 1GiB mappings.

Precisely my point - block devices are not the best way to solve
this problem.

A file, on XFS, with a 1GB extent size hint and preallocated to be
aligned to 1GB addresses (i.e. mkfs.xfs -d su=1G,sw=1 on the host
filesystem) will give reliable 1GB aligned blocks for DAX mappings,
just like a block device will. Peformance wise it's little different
to using the block device directly. Management wise it's way more
flexible, especially as such image files can be recycled for new VMs
almost instantly via FALLOC_FL_FLAG_ZERO_RANGE.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@linux.intel.com>,
	XFS Developers <xfs@oss.sgi.com>, Linux MM <linux-mm@kvack.org>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 2/2] dax: move writeback calls into the filesystems
Date: Fri, 12 Feb 2016 10:44:15 +1100	[thread overview]
Message-ID: <20160211234415.GM19486@dastard> (raw)
In-Reply-To: <CAPcyv4hR60bahtQq68SgSG2uT9zP4H8u3zbUqtqndnx=ogwVtA@mail.gmail.com>

On Thu, Feb 11, 2016 at 02:59:14PM -0800, Dan Williams wrote:
> On Thu, Feb 11, 2016 at 2:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Feb 11, 2016 at 12:58:38PM -0800, Dan Williams wrote:
> >> On Thu, Feb 11, 2016 at 12:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> Maybe I don't need to worry because it's already the case that a
> >> mmap of the raw device may not see the most up to date data for a
> >> file that has dirty fs-page-cache data.
> >
> > It goes both ways. What happens if mkfs or fsck modifies the
> > block device via mmap+DAX and then the filesystem mounts the block
> > device and tries to read that metadata via the block device page
> > cache?
> >
> > Quite frankly, DAX on the block device is a can of worms we really
> > don't need to deal with right now. IMO it's a solution looking for a
> > problem to solve,
> 
> Virtualization use cases want to give large ranges to guest-VMs, and
> it is currently the only way to reliably get 1GiB mappings.

Precisely my point - block devices are not the best way to solve
this problem.

A file, on XFS, with a 1GB extent size hint and preallocated to be
aligned to 1GB addresses (i.e. mkfs.xfs -d su=1G,sw=1 on the host
filesystem) will give reliable 1GB aligned blocks for DAX mappings,
just like a block device will. Peformance wise it's little different
to using the block device directly. Management wise it's way more
flexible, especially as such image files can be recycled for new VMs
almost instantly via FALLOC_FL_FLAG_ZERO_RANGE.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-02-11 23:44 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 20:48 [PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS Ross Zwisler
2016-02-10 20:48 ` Ross Zwisler
2016-02-10 20:48 ` Ross Zwisler
2016-02-10 20:48 ` [PATCH v2 1/2] dax: supply DAX clearing code with correct bdev Ross Zwisler
2016-02-10 20:48   ` Ross Zwisler
2016-02-10 20:48   ` Ross Zwisler
2016-02-10 20:48 ` [PATCH v2 2/2] dax: move writeback calls into the filesystems Ross Zwisler
2016-02-10 20:48   ` Ross Zwisler
2016-02-10 20:48   ` Ross Zwisler
2016-02-10 22:03   ` Dave Chinner
2016-02-10 22:03     ` Dave Chinner
2016-02-10 22:03     ` Dave Chinner
2016-02-10 22:43     ` Ross Zwisler
2016-02-10 22:43       ` Ross Zwisler
2016-02-10 22:43       ` Ross Zwisler
2016-02-10 23:44       ` Dave Chinner
2016-02-10 23:44         ` Dave Chinner
2016-02-10 23:44         ` Dave Chinner
2016-02-11 12:50       ` Jan Kara
2016-02-11 12:50         ` Jan Kara
2016-02-11 12:50         ` Jan Kara
2016-02-11 15:22         ` Dan Williams
2016-02-11 15:22           ` Dan Williams
2016-02-11 15:22           ` Dan Williams
2016-02-11 15:22           ` Dan Williams
2016-02-11 16:22           ` Jan Kara
2016-02-11 16:22             ` Jan Kara
2016-02-11 16:22             ` Jan Kara
2016-02-11 16:22             ` Jan Kara
2016-02-11 20:46           ` Dave Chinner
2016-02-11 20:46             ` Dave Chinner
2016-02-11 20:46             ` Dave Chinner
2016-02-11 20:46             ` Dave Chinner
2016-02-11 20:58             ` Dan Williams
2016-02-11 20:58               ` Dan Williams
2016-02-11 20:58               ` Dan Williams
2016-02-11 20:58               ` Dan Williams
2016-02-11 22:46               ` Dave Chinner
2016-02-11 22:46                 ` Dave Chinner
2016-02-11 22:46                 ` Dave Chinner
2016-02-11 22:59                 ` Dan Williams
2016-02-11 22:59                   ` Dan Williams
2016-02-11 22:59                   ` Dan Williams
2016-02-11 23:44                   ` Dave Chinner [this message]
2016-02-11 23:44                     ` Dave Chinner
2016-02-11 23:44                     ` Dave Chinner
2016-02-11 12:43 ` [PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS Jan Kara
2016-02-11 12:43   ` Jan Kara
2016-02-11 12:43   ` Jan Kara
2016-02-11 19:49   ` Ross Zwisler
2016-02-11 19:49     ` Ross Zwisler
2016-02-11 19:49     ` Ross Zwisler
2016-02-11 19:49     ` Ross Zwisler
2016-02-11 20:50     ` Dave Chinner
2016-02-11 20:50       ` Dave Chinner
2016-02-11 20:50       ` Dave Chinner
2016-02-12 19:03   ` Ross Zwisler
2016-02-12 19:03     ` Ross Zwisler
2016-02-12 19:03     ` Ross Zwisler
2016-02-12 19:03     ` Ross Zwisler
2016-02-13  2:38     ` Dave Chinner
2016-02-13  2:38       ` Dave Chinner
2016-02-13  2:38       ` Dave Chinner
2016-02-13  4:59       ` Ross Zwisler
2016-02-13  4:59         ` Ross Zwisler
2016-02-13  4:59         ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160211234415.GM19486@dastard \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.