linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/10] btrfs: Support for DAX devices
Date: Wed, 5 Dec 2018 16:37:29 -0500	[thread overview]
Message-ID: <6a5b4bb8-896f-14c8-fc71-71b7122dd836@suse.com> (raw)
In-Reply-To: <20181205122835.19290-1-rgoldwyn@suse.de>

On 12/5/18 7:28 AM, Goldwyn Rodrigues wrote:
> This is a support for DAX in btrfs. I understand there have been
> previous attempts at it. However, I wanted to make sure copy-on-write
> (COW) works on dax as well.
> 
> Before I present this to the FS folks I wanted to run this through the
> btrfs. Even though I wish, I cannot get it correct the first time
> around :/.. Here are some questions for which I need suggestions:
> 
> Questions:
> 1. I have been unable to do checksumming for DAX devices. While
> checksumming can be done for reads and writes, it is a problem when mmap
> is involved because btrfs kernel module does not get back control after
> an mmap() writes. Any ideas are appreciated, or we would have to set
> nodatasum when dax is enabled.

Yep.  It has to be nodatasum, at least within the confines of datasum 
today.  DAX mmap writes are essentially in the same situation as with 
direct i/o when another thread modifies the buffer being submitted. 
Except rather than it being a race, it happens every time.  An 
alternative here could be to add the ability to mark a crc as unreliable 
and then go back and update them once the last DAX mmap reference is 
dropped on a range.  There's no reason to make this a requirement of the 
initial implementation, though.

> 2. Currently, a user can continue writing on "old" extents of an mmaped file
> after a snapshot has been created. How can we enforce writes to be directed
> to new extents after snapshots have been created? Do we keep a list of
> all mmap()s, and re-mmap them after a snapshot?

It's the second question that's the hard part.  As Adam describes later, 
setting each pfn read-only will ensure page faults cause the remapping.

The high level idea that Jan Kara and I came up with in our conversation 
at Labs conf is pretty expensive.  We'd need to set a flag that pauses 
new page faults, set the WP bit on affected ranges, do the snapshot, 
commit, clear the flag, and wake up the waiting threads.  Neither of us 
had any concrete idea of how well that would perform and it still 
depends on finding a good way to resolve all open mmap ranges on a 
subvolume.  Perhaps using the address_space->private_list anchored on 
each root would work.

-Jeff

> Tested by creating a pmem device in RAM with "memmap=2G!4G" kernel
> command line parameter.
> 
> 
> [PATCH 01/10] btrfs: create a mount option for dax
> [PATCH 02/10] btrfs: basic dax read
> [PATCH 03/10] btrfs: dax: read zeros from holes
> [PATCH 04/10] Rename __endio_write_update_ordered() to
> [PATCH 05/10] btrfs: Carve out btrfs_get_extent_map_write() out of
> [PATCH 06/10] btrfs: dax write support
> [PATCH 07/10] dax: export functions for use with btrfs
> [PATCH 08/10] btrfs: dax add read mmap path
> [PATCH 09/10] btrfs: dax support for cow_page/mmap_private and shared
> [PATCH 10/10] btrfs: dax mmap write
> 
>   fs/btrfs/Makefile   |    1
>   fs/btrfs/ctree.h    |   17 ++
>   fs/btrfs/dax.c      |  303 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>   fs/btrfs/file.c     |   29 ++++
>   fs/btrfs/inode.c    |   54 +++++----
>   fs/btrfs/ioctl.c    |    5
>   fs/btrfs/super.c    |   15 ++
>   fs/dax.c            |   35 ++++--
>   include/linux/dax.h |   16 ++
>   9 files changed, 430 insertions(+), 45 deletions(-)
> 
> 

-- 
Jeff Mahoney
SUSE Labs


  parent reply	other threads:[~2018-12-05 21:37 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05 12:28 [PATCH 00/10] btrfs: Support for DAX devices Goldwyn Rodrigues
2018-12-05 12:28 ` [PATCH 01/10] btrfs: create a mount option for dax Goldwyn Rodrigues
2018-12-05 12:42   ` Johannes Thumshirn
2018-12-05 12:43   ` Nikolay Borisov
2018-12-05 14:59     ` Adam Borowski
2018-12-05 12:28 ` [PATCH 02/10] btrfs: basic dax read Goldwyn Rodrigues
2018-12-05 13:11   ` Nikolay Borisov
2018-12-05 13:22   ` Johannes Thumshirn
2018-12-05 12:28 ` [PATCH 03/10] btrfs: dax: read zeros from holes Goldwyn Rodrigues
2018-12-05 13:26   ` Nikolay Borisov
2018-12-05 12:28 ` [PATCH 04/10] Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Goldwyn Rodrigues
2018-12-05 13:35   ` Nikolay Borisov
2018-12-05 12:28 ` [PATCH 05/10] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Goldwyn Rodrigues
2018-12-05 12:28 ` [PATCH 06/10] btrfs: dax write support Goldwyn Rodrigues
2018-12-05 13:56   ` Johannes Thumshirn
2018-12-05 12:28 ` [PATCH 07/10] dax: export functions for use with btrfs Goldwyn Rodrigues
2018-12-05 13:59   ` Johannes Thumshirn
2018-12-05 14:52   ` Christoph Hellwig
2018-12-06 11:46     ` Goldwyn Rodrigues
2018-12-12  8:07       ` Christoph Hellwig
2019-03-26 19:36   ` Dan Williams
2019-03-27 11:10     ` Goldwyn Rodrigues
2018-12-05 12:28 ` [PATCH 08/10] btrfs: dax add read mmap path Goldwyn Rodrigues
2018-12-05 12:28 ` [PATCH 09/10] btrfs: dax support for cow_page/mmap_private and shared Goldwyn Rodrigues
2018-12-05 12:28 ` [PATCH 10/10] btrfs: dax mmap write Goldwyn Rodrigues
2018-12-05 13:03 ` [PATCH 00/10] btrfs: Support for DAX devices Qu Wenruo
2018-12-05 21:36   ` Jeff Mahoney
2018-12-05 13:57 ` Adam Borowski
2018-12-05 21:37 ` Jeff Mahoney [this message]
2018-12-06  7:40   ` Robert White
2018-12-06 10:07 ` Johannes Thumshirn
2018-12-06 11:47   ` Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a5b4bb8-896f-14c8-fc71-71b7122dd836@suse.com \
    --to=jeffm@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).