From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
linux-block@vger.kernel.org, linux-mm@kvack.org
Subject: [LSF/MM TOPIC] Future direction of DAX
Date: Fri, 13 Jan 2017 17:20:08 -0700 [thread overview]
Message-ID: <20170114002008.GA25379@linux.intel.com> (raw)
This past year has seen a lot of new DAX development. We have added support
for fsync/msync, moved to the new iomap I/O data structure, introduced radix
tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
bugs.
We still have a lot of work to do, though, and I'd like to propose a discussion
around what features people would like to see enabled in the coming year as
well as what what use cases their customers have that we might not be aware of.
Here are a few topics to start the conversation:
- The current plan to allow users to safely flush dirty data from userspace is
built around the PMEM_IMMUTABLE feature [1]. I'm hoping that by LSF/MM we
will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
will be more to discuss.
- The DAX fsync/msync model was built for platforms that need to flush dirty
processor cache lines in order to make data durable on NVDIMMs. There exist
platforms, however, that are set up so that the processor caches are
effectively part of the ADR safe zone. This means that dirty data can be
assumed to be durable even in the processor cache, obviating the need to
manually flush the cache during fsync/msync. These platforms still need to
call fsync/msync to ensure that filesystem metadata updates are properly
written to media. Our first idea on how to properly support these platforms
would be for DAX to be made aware that in some cases doesn't need to keep
metadata about dirty cache lines. A similar issue exists for volatile uses
of DAX such as with BRD or with PMEM and the memmap command line parameter,
and we'd like a solution that covers them all.
- If I recall correctly, at one point Dave Chinner suggested that we change
DAX so that I/O would use cached stores instead of the non-temporal stores
that it currently uses. We would then track pages that were written to by
DAX in the radix tree so that they would be flushed later during
fsync/msync. Does this sound like a win? Also, assuming that we can find a
solution for platforms where the processor cache is part of the ADR safe
zone (above topic) this would be a clear improvement, moving us from using
non-temporal stores to faster cached stores with no downside.
- Jan suggested [2] that we could use the radix tree as a cache to service DAX
faults without needing to call into the filesystem. Are there any issues
with this approach, and should we move forward with it as an optimization?
- Whenever you mount a filesystem with DAX, it spits out a message that says
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk". What criteria
needs to be met for DAX to no longer be considered experimental?
- When we msync() a huge page, if the range is less than the entire huge page,
should we flush the entire huge page and mark it clean in the radix tree, or
should we only flush the requested range and leave the radix tree entry
dirty?
- Should we enable 1 GiB huge pages in filesystem DAX? Does anyone have any
specific customer requests for this or performance data suggesting it would
be a win? If so, what work needs to be done to get 1 GiB sized and aligned
filesystem block allocations, to get the required enabling in the MM layer,
etc?
Thanks,
- Ross
[1] https://lkml.org/lkml/2016/12/19/571
[2] https://lkml.org/lkml/2016/10/12/70
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2017-01-14 0:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-14 0:20 Ross Zwisler [this message]
2017-01-14 8:26 ` [LSF/MM TOPIC] Future direction of DAX Darrick J. Wong
2017-01-16 0:19 ` Viacheslav Dubeyko
2017-01-16 20:00 ` Jeff Moyer
2017-01-17 1:50 ` Darrick J. Wong
2017-01-17 2:42 ` Dan Williams
2017-01-17 7:57 ` Christoph Hellwig
2017-01-17 14:54 ` Jeff Moyer
2017-01-17 15:06 ` Christoph Hellwig
2017-01-17 16:07 ` Jeff Moyer
2017-01-17 15:59 ` [Lsf-pc] " Jan Kara
2017-01-17 16:56 ` Dan Williams
2017-01-18 0:03 ` Kani, Toshimitsu
2017-01-18 5:25 ` willy
2017-01-18 6:01 ` Dan Williams
2017-01-18 6:07 ` willy
2017-01-18 6:25 ` Dan Williams
2017-01-18 17:22 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170114002008.GA25379@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).