All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-14  0:20 ` Ross Zwisler
  0 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-14  0:20 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-block, linux-mm, linux-nvdimm

This past year has seen a lot of new DAX development.  We have added support
for fsync/msync, moved to the new iomap I/O data structure, introduced radix
tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
bugs.

We still have a lot of work to do, though, and I'd like to propose a discussion
around what features people would like to see enabled in the coming year as
well as what what use cases their customers have that we might not be aware of.

Here are a few topics to start the conversation:

- The current plan to allow users to safely flush dirty data from userspace is
  built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
  will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
  will be more to discuss.

- The DAX fsync/msync model was built for platforms that need to flush dirty
  processor cache lines in order to make data durable on NVDIMMs.  There exist
  platforms, however, that are set up so that the processor caches are
  effectively part of the ADR safe zone.  This means that dirty data can be
  assumed to be durable even in the processor cache, obviating the need to
  manually flush the cache during fsync/msync.  These platforms still need to
  call fsync/msync to ensure that filesystem metadata updates are properly
  written to media.  Our first idea on how to properly support these platforms
  would be for DAX to be made aware that in some cases doesn't need to keep
  metadata about dirty cache lines.  A similar issue exists for volatile uses
  of DAX such as with BRD or with PMEM and the memmap command line parameter,
  and we'd like a solution that covers them all.

- If I recall correctly, at one point Dave Chinner suggested that we change
  DAX so that I/O would use cached stores instead of the non-temporal stores
  that it currently uses.  We would then track pages that were written to by
  DAX in the radix tree so that they would be flushed later during
  fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
  solution for platforms where the processor cache is part of the ADR safe
  zone (above topic) this would be a clear improvement, moving us from using
  non-temporal stores to faster cached stores with no downside.

- Jan suggested [2] that we could use the radix tree as a cache to service DAX
  faults without needing to call into the filesystem.  Are there any issues
  with this approach, and should we move forward with it as an optimization?

- Whenever you mount a filesystem with DAX, it spits out a message that says
  "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
  needs to be met for DAX to no longer be considered experimental?

- When we msync() a huge page, if the range is less than the entire huge page,
  should we flush the entire huge page and mark it clean in the radix tree, or
  should we only flush the requested range and leave the radix tree entry
  dirty?

- Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
  specific customer requests for this or performance data suggesting it would
  be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
  filesystem block allocations, to get the required enabling in the MM layer,
  etc?

Thanks,
- Ross

[1] https://lkml.org/lkml/2016/12/19/571
[2] https://lkml.org/lkml/2016/10/12/70
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-14  0:20 ` Ross Zwisler
  0 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-14  0:20 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-nvdimm, linux-block, linux-mm

This past year has seen a lot of new DAX development.  We have added support
for fsync/msync, moved to the new iomap I/O data structure, introduced radix
tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
bugs.

We still have a lot of work to do, though, and I'd like to propose a discussion
around what features people would like to see enabled in the coming year as
well as what what use cases their customers have that we might not be aware of.

Here are a few topics to start the conversation:

- The current plan to allow users to safely flush dirty data from userspace is
  built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
  will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
  will be more to discuss.

- The DAX fsync/msync model was built for platforms that need to flush dirty
  processor cache lines in order to make data durable on NVDIMMs.  There exist
  platforms, however, that are set up so that the processor caches are
  effectively part of the ADR safe zone.  This means that dirty data can be
  assumed to be durable even in the processor cache, obviating the need to
  manually flush the cache during fsync/msync.  These platforms still need to
  call fsync/msync to ensure that filesystem metadata updates are properly
  written to media.  Our first idea on how to properly support these platforms
  would be for DAX to be made aware that in some cases doesn't need to keep
  metadata about dirty cache lines.  A similar issue exists for volatile uses
  of DAX such as with BRD or with PMEM and the memmap command line parameter,
  and we'd like a solution that covers them all.

- If I recall correctly, at one point Dave Chinner suggested that we change
  DAX so that I/O would use cached stores instead of the non-temporal stores
  that it currently uses.  We would then track pages that were written to by
  DAX in the radix tree so that they would be flushed later during
  fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
  solution for platforms where the processor cache is part of the ADR safe
  zone (above topic) this would be a clear improvement, moving us from using
  non-temporal stores to faster cached stores with no downside.

- Jan suggested [2] that we could use the radix tree as a cache to service DAX
  faults without needing to call into the filesystem.  Are there any issues
  with this approach, and should we move forward with it as an optimization?

- Whenever you mount a filesystem with DAX, it spits out a message that says
  "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
  needs to be met for DAX to no longer be considered experimental?

- When we msync() a huge page, if the range is less than the entire huge page,
  should we flush the entire huge page and mark it clean in the radix tree, or
  should we only flush the requested range and leave the radix tree entry
  dirty?

- Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
  specific customer requests for this or performance data suggesting it would
  be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
  filesystem block allocations, to get the required enabling in the MM layer,
  etc?

Thanks,
- Ross

[1] https://lkml.org/lkml/2016/12/19/571
[2] https://lkml.org/lkml/2016/10/12/70

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-14  0:20 ` Ross Zwisler
  0 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-14  0:20 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-nvdimm, linux-block, linux-mm

This past year has seen a lot of new DAX development.  We have added support
for fsync/msync, moved to the new iomap I/O data structure, introduced radix
tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
bugs.

We still have a lot of work to do, though, and I'd like to propose a discussion
around what features people would like to see enabled in the coming year as
well as what what use cases their customers have that we might not be aware of.

Here are a few topics to start the conversation:

- The current plan to allow users to safely flush dirty data from userspace is
  built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
  will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
  will be more to discuss.

- The DAX fsync/msync model was built for platforms that need to flush dirty
  processor cache lines in order to make data durable on NVDIMMs.  There exist
  platforms, however, that are set up so that the processor caches are
  effectively part of the ADR safe zone.  This means that dirty data can be
  assumed to be durable even in the processor cache, obviating the need to
  manually flush the cache during fsync/msync.  These platforms still need to
  call fsync/msync to ensure that filesystem metadata updates are properly
  written to media.  Our first idea on how to properly support these platforms
  would be for DAX to be made aware that in some cases doesn't need to keep
  metadata about dirty cache lines.  A similar issue exists for volatile uses
  of DAX such as with BRD or with PMEM and the memmap command line parameter,
  and we'd like a solution that covers them all.

- If I recall correctly, at one point Dave Chinner suggested that we change
  DAX so that I/O would use cached stores instead of the non-temporal stores
  that it currently uses.  We would then track pages that were written to by
  DAX in the radix tree so that they would be flushed later during
  fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
  solution for platforms where the processor cache is part of the ADR safe
  zone (above topic) this would be a clear improvement, moving us from using
  non-temporal stores to faster cached stores with no downside.

- Jan suggested [2] that we could use the radix tree as a cache to service DAX
  faults without needing to call into the filesystem.  Are there any issues
  with this approach, and should we move forward with it as an optimization?

- Whenever you mount a filesystem with DAX, it spits out a message that says
  "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
  needs to be met for DAX to no longer be considered experimental?

- When we msync() a huge page, if the range is less than the entire huge page,
  should we flush the entire huge page and mark it clean in the radix tree, or
  should we only flush the requested range and leave the radix tree entry
  dirty?

- Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
  specific customer requests for this or performance data suggesting it would
  be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
  filesystem block allocations, to get the required enabling in the MM layer,
  etc?

Thanks,
- Ross

[1] https://lkml.org/lkml/2016/12/19/571
[2] https://lkml.org/lkml/2016/10/12/70

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-14  0:20 ` Ross Zwisler
  (?)
@ 2017-01-14  8:26     ` Darrick J. Wong
  -1 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2017-01-14  8:26 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> This past year has seen a lot of new DAX development.  We have added support
> for fsync/msync, moved to the new iomap I/O data structure, introduced radix
> tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
> bugs.
> 
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.
> 
> Here are a few topics to start the conversation:
> 
> - The current plan to allow users to safely flush dirty data from userspace is
>   built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
>   will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
>   will be more to discuss.

Yes, probably. :)

> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.
> 
> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.
> 
> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?
> 
> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

For XFS I'd like to get reflink working with it, for starters.  We
probably need a bunch more verification work to show that file IO
doesn't adopt any bad quirks having turned on the per-inode DAX flag.

Some day we'll start designing a pmem-native fs, I guess. :P

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?
> 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

<giggle> :)

--D

> 
> Thanks,
> - Ross
> 
> [1] https://lkml.org/lkml/2016/12/19/571
> [2] https://lkml.org/lkml/2016/10/12/70
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-14  8:26     ` Darrick J. Wong
  0 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2017-01-14  8:26 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> This past year has seen a lot of new DAX development.  We have added support
> for fsync/msync, moved to the new iomap I/O data structure, introduced radix
> tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
> bugs.
> 
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.
> 
> Here are a few topics to start the conversation:
> 
> - The current plan to allow users to safely flush dirty data from userspace is
>   built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
>   will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
>   will be more to discuss.

Yes, probably. :)

> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.
> 
> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.
> 
> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?
> 
> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

For XFS I'd like to get reflink working with it, for starters.  We
probably need a bunch more verification work to show that file IO
doesn't adopt any bad quirks having turned on the per-inode DAX flag.

Some day we'll start designing a pmem-native fs, I guess. :P

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?
> 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

<giggle> :)

--D

> 
> Thanks,
> - Ross
> 
> [1] https://lkml.org/lkml/2016/12/19/571
> [2] https://lkml.org/lkml/2016/10/12/70
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-14  8:26     ` Darrick J. Wong
  0 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2017-01-14  8:26 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> This past year has seen a lot of new DAX development.  We have added support
> for fsync/msync, moved to the new iomap I/O data structure, introduced radix
> tree based locking, re-enabled PMD support (twice!), and have fixed a bunch of
> bugs.
> 
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.
> 
> Here are a few topics to start the conversation:
> 
> - The current plan to allow users to safely flush dirty data from userspace is
>   built around the PMEM_IMMUTABLE feature [1].  I'm hoping that by LSF/MM we
>   will have at least started work on PMEM_IMMUTABLE, but I'm guessing there
>   will be more to discuss.

Yes, probably. :)

> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.
> 
> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.
> 
> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?
> 
> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

For XFS I'd like to get reflink working with it, for starters.  We
probably need a bunch more verification work to show that file IO
doesn't adopt any bad quirks having turned on the per-inode DAX flag.

Some day we'll start designing a pmem-native fs, I guess. :P

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?
> 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

<giggle> :)

--D

> 
> Thanks,
> - Ross
> 
> [1] https://lkml.org/lkml/2016/12/19/571
> [2] https://lkml.org/lkml/2016/10/12/70
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-14  8:26     ` Darrick J. Wong
  (?)
@ 2017-01-16  0:19       ` Viacheslav Dubeyko
  -1 siblings, 0 replies; 50+ messages in thread
From: Viacheslav Dubeyko @ 2017-01-16  0:19 UTC (permalink / raw)
  To: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

On Sat, 2017-01-14 at 00:26 -0800, Darrick J. Wong wrote:

<skipped>

> Some day we'll start designing a pmem-native fs, I guess. :P

There are research efforts in this direction already ([1]-[15]). The
latest one is NOVA, as far as I can see. But, frankly speaking, I
believe that we need in new hardware paradigm/architecture and new OS
paradigm for the next generation of NVM memory. The DAX is
simple palliative, temporary solution. But, from my point of view,
pmem-native fs is also not good direction because, anyway, memory
subsystem will be affected significantly. And, finally, evolution of
memory subsystem will reveal something completely different that we can
imagine right now.

Thanks,
Vyacheslav Dubeyko. 

[1] http://pages.cs.wisc.edu/~swift/papers/eurosys14-aerie.pdf
[2] https://www.researchgate.net/publication/282792714_A_User-Level_File_System_for_Fast_Storage_Devices
[3] https://people.eecs.berkeley.edu/~dcoetzee/publications/Better%20IO%20Through%20Byte-Addressable,%20Persistent%20Memory.pdf
[4] https://www.computer.org/csdl/proceedings/msst/2013/0217/00/06558440.pdf
[5] https://users.soe.ucsc.edu/~scott/papers/MASCOTS04b.pdf
[6] http://ieeexplore.ieee.org/document/4142472/
[7] https://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf
[8] http://cesg.tamu.edu/wp-content/uploads/2012/02/MSST13.pdf
[9] http://ieeexplore.ieee.org/document/5487498/
[10] https://pdfs.semanticscholar.org/544c/1ddf24b90c3dfba7b1934049911b869c99b4.pdf
[11] http://pramfs.sourceforge.net/tech.html
[12] https://pdfs.semanticscholar.org/2981/b5abcbe1023b9f3cd962b0be7ef8bd45acfd.pdf
[13] http://ieeexplore.ieee.org/document/6232378/
[14] http://ieeexplore.ieee.org/document/7304365/
[15] http://ieeexplore.ieee.org/document/6272446/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-16  0:19       ` Viacheslav Dubeyko
  0 siblings, 0 replies; 50+ messages in thread
From: Viacheslav Dubeyko @ 2017-01-16  0:19 UTC (permalink / raw)
  To: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

On Sat, 2017-01-14 at 00:26 -0800, Darrick J. Wong wrote:

<skipped>

> Some day we'll start designing a pmem-native fs, I guess. :P

There are research efforts in this direction already ([1]-[15]). The
latest one is NOVA, as far as I can see. But, frankly speaking, I
believe that we need in new hardware paradigm/architecture and new OS
paradigm for the next generation of NVM memory. The DAX is
simple palliative, temporary solution. But, from my point of view,
pmem-native fs is also not good direction because, anyway, memory
subsystem will be affected significantly. And, finally, evolution of
memory subsystem will reveal something completely different that we can
imagine right now.

Thanks,
Vyacheslav Dubeyko. 

[1] http://pages.cs.wisc.edu/~swift/papers/eurosys14-aerie.pdf
[2] https://www.researchgate.net/publication/282792714_A_User-Level_File_System_for_Fast_Storage_Devices
[3] https://people.eecs.berkeley.edu/~dcoetzee/publications/Better%20IO%20Through%20Byte-Addressable,%20Persistent%20Memory.pdf
[4] https://www.computer.org/csdl/proceedings/msst/2013/0217/00/06558440.pdf
[5] https://users.soe.ucsc.edu/~scott/papers/MASCOTS04b.pdf
[6] http://ieeexplore.ieee.org/document/4142472/
[7] https://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf
[8] http://cesg.tamu.edu/wp-content/uploads/2012/02/MSST13.pdf
[9] http://ieeexplore.ieee.org/document/5487498/
[10] https://pdfs.semanticscholar.org/544c/1ddf24b90c3dfba7b1934049911b869c99b4.pdf
[11] http://pramfs.sourceforge.net/tech.html
[12] https://pdfs.semanticscholar.org/2981/b5abcbe1023b9f3cd962b0be7ef8bd45acfd.pdf
[13] http://ieeexplore.ieee.org/document/6232378/
[14] http://ieeexplore.ieee.org/document/7304365/
[15] http://ieeexplore.ieee.org/document/6272446/


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-16  0:19       ` Viacheslav Dubeyko
  0 siblings, 0 replies; 50+ messages in thread
From: Viacheslav Dubeyko @ 2017-01-16  0:19 UTC (permalink / raw)
  To: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

On Sat, 2017-01-14 at 00:26 -0800, Darrick J. Wong wrote:

<skipped>

> Some day we'll start designing a pmem-native fs, I guess. :P

There are research efforts in this direction already ([1]-[15]). The
latest one is NOVA, as far as I can see. But, frankly speaking, I
believe that we need in new hardware paradigm/architecture and new OS
paradigm for the next generation of NVM memory. The DAX is
simpleA palliative, temporary solution. But, from my point of view,
pmem-native fs is also not good direction because, anyway, memory
subsystem will be affected significantly. And, finally, evolution of
memory subsystem will reveal something completely different that we can
imagine right now.

Thanks,
Vyacheslav Dubeyko.A 

[1]A http://pages.cs.wisc.edu/~swift/papers/eurosys14-aerie.pdf
[2]A https://www.researchgate.net/publication/282792714_A_User-Level_File_System_for_Fast_Storage_Devices
[3] https://people.eecs.berkeley.edu/~dcoetzee/publications/Better%20IO%20Through%20Byte-Addressable,%20Persistent%20Memory.pdf
[4] https://www.computer.org/csdl/proceedings/msst/2013/0217/00/06558440.pdf
[5] https://users.soe.ucsc.edu/~scott/papers/MASCOTS04b.pdf
[6] http://ieeexplore.ieee.org/document/4142472/
[7] https://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf
[8] http://cesg.tamu.edu/wp-content/uploads/2012/02/MSST13.pdf
[9] http://ieeexplore.ieee.org/document/5487498/
[10] https://pdfs.semanticscholar.org/544c/1ddf24b90c3dfba7b1934049911b869c99b4.pdf
[11] http://pramfs.sourceforge.net/tech.html
[12] https://pdfs.semanticscholar.org/2981/b5abcbe1023b9f3cd962b0be7ef8bd45acfd.pdf
[13] http://ieeexplore.ieee.org/document/6232378/
[14] http://ieeexplore.ieee.org/document/7304365/
[15] http://ieeexplore.ieee.org/document/6272446/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-14  8:26     ` Darrick J. Wong
@ 2017-01-16 20:00       ` Jeff Moyer
  -1 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-16 20:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

>> - Whenever you mount a filesystem with DAX, it spits out a message that says
>>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>>   needs to be met for DAX to no longer be considered experimental?
>
> For XFS I'd like to get reflink working with it, for starters.

What do you mean by this, exactly?  When Dave outlined the requirements
for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
possible.  And would you really cosider this a barrier to marking dax
fully supported?  I wouldn't.

> We probably need a bunch more verification work to show that file IO
> doesn't adopt any bad quirks having turned on the per-inode DAX flag.

Can you be more specific?  We have ltp and xfstests.  If you have some
mkfs/mount options that you think should be tested, speak up.  Beyond
that, if it passes ./check -g auto and ltp, are we good?

-Jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-16 20:00       ` Jeff Moyer
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-16 20:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

>> - Whenever you mount a filesystem with DAX, it spits out a message that says
>>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>>   needs to be met for DAX to no longer be considered experimental?
>
> For XFS I'd like to get reflink working with it, for starters.

What do you mean by this, exactly?  When Dave outlined the requirements
for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
possible.  And would you really cosider this a barrier to marking dax
fully supported?  I wouldn't.

> We probably need a bunch more verification work to show that file IO
> doesn't adopt any bad quirks having turned on the per-inode DAX flag.

Can you be more specific?  We have ltp and xfstests.  If you have some
mkfs/mount options that you think should be tested, speak up.  Beyond
that, if it passes ./check -g auto and ltp, are we good?

-Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-16 20:00       ` Jeff Moyer
@ 2017-01-17  1:50         ` Darrick J. Wong
  -1 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2017-01-17  1:50 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Mon, Jan 16, 2017 at 03:00:41PM -0500, Jeff Moyer wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> 
> >> - Whenever you mount a filesystem with DAX, it spits out a message that says
> >>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
> >>   needs to be met for DAX to no longer be considered experimental?
> >
> > For XFS I'd like to get reflink working with it, for starters.
> 
> What do you mean by this, exactly?  When Dave outlined the requirements
> for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
> possible.  And would you really cosider this a barrier to marking dax
> fully supported?  I wouldn't.

For PMEM_IMMUTABLE files, yes, reflink cannot be supported.

I'm talking about supporting reflink for DAX files that are /not/
PMEM_IMMUTABLE, where user programs can mmap pmem directly but write
activity still must use fsync/msync to ensure that everything's on disk.

I wouldn't consider it a barrier in general (since ext4 also prints
EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
it's that big of a hurdle -- afaict XFS ought to be able to achieve this
by modifying iomap_begin to allocate new pmem blocks, memcpy the
contents, and update the memory mappings.  I think.

> > We probably need a bunch more verification work to show that file IO
> > doesn't adopt any bad quirks having turned on the per-inode DAX flag.
> 
> Can you be more specific?  We have ltp and xfstests.  If you have some
> mkfs/mount options that you think should be tested, speak up.  Beyond
> that, if it passes ./check -g auto and ltp, are we good?

That's probably good -- I simply wanted to know if we'd at least gotten
to the point that someone had run both suites with and without DAX and
not seen any major regressions between the two.

--D

> 
> -Jeff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17  1:50         ` Darrick J. Wong
  0 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2017-01-17  1:50 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Mon, Jan 16, 2017 at 03:00:41PM -0500, Jeff Moyer wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> 
> >> - Whenever you mount a filesystem with DAX, it spits out a message that says
> >>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
> >>   needs to be met for DAX to no longer be considered experimental?
> >
> > For XFS I'd like to get reflink working with it, for starters.
> 
> What do you mean by this, exactly?  When Dave outlined the requirements
> for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
> possible.  And would you really cosider this a barrier to marking dax
> fully supported?  I wouldn't.

For PMEM_IMMUTABLE files, yes, reflink cannot be supported.

I'm talking about supporting reflink for DAX files that are /not/
PMEM_IMMUTABLE, where user programs can mmap pmem directly but write
activity still must use fsync/msync to ensure that everything's on disk.

I wouldn't consider it a barrier in general (since ext4 also prints
EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
it's that big of a hurdle -- afaict XFS ought to be able to achieve this
by modifying iomap_begin to allocate new pmem blocks, memcpy the
contents, and update the memory mappings.  I think.

> > We probably need a bunch more verification work to show that file IO
> > doesn't adopt any bad quirks having turned on the per-inode DAX flag.
> 
> Can you be more specific?  We have ltp and xfstests.  If you have some
> mkfs/mount options that you think should be tested, speak up.  Beyond
> that, if it passes ./check -g auto and ltp, are we good?

That's probably good -- I simply wanted to know if we'd at least gotten
to the point that someone had run both suites with and without DAX and
not seen any major regressions between the two.

--D

> 
> -Jeff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-17  1:50         ` Darrick J. Wong
@ 2017-01-17  2:42           ` Dan Williams
  -1 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-17  2:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jeff Moyer, linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Mon, Jan 16, 2017 at 5:50 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Jan 16, 2017 at 03:00:41PM -0500, Jeff Moyer wrote:
>> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
>>
>> >> - Whenever you mount a filesystem with DAX, it spits out a message that says
>> >>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>> >>   needs to be met for DAX to no longer be considered experimental?
>> >
>> > For XFS I'd like to get reflink working with it, for starters.
>>
>> What do you mean by this, exactly?  When Dave outlined the requirements
>> for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
>> possible.  And would you really cosider this a barrier to marking dax
>> fully supported?  I wouldn't.
>
> For PMEM_IMMUTABLE files, yes, reflink cannot be supported.
>
> I'm talking about supporting reflink for DAX files that are /not/
> PMEM_IMMUTABLE, where user programs can mmap pmem directly but write
> activity still must use fsync/msync to ensure that everything's on disk.
>
> I wouldn't consider it a barrier in general (since ext4 also prints
> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
> by modifying iomap_begin to allocate new pmem blocks, memcpy the
> contents, and update the memory mappings.  I think.
>
>> > We probably need a bunch more verification work to show that file IO
>> > doesn't adopt any bad quirks having turned on the per-inode DAX flag.
>>
>> Can you be more specific?  We have ltp and xfstests.  If you have some
>> mkfs/mount options that you think should be tested, speak up.  Beyond
>> that, if it passes ./check -g auto and ltp, are we good?
>
> That's probably good -- I simply wanted to know if we'd at least gotten
> to the point that someone had run both suites with and without DAX and
> not seen any major regressions between the two.

Yes, xfstests is part the dax development flow. The hard part has been
maintaining a blacklist of tests that fail in both the DAX and non-DAX
cases, or false negatives due to DAX disabling delayed allocation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17  2:42           ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-17  2:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jeff Moyer, linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Mon, Jan 16, 2017 at 5:50 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Jan 16, 2017 at 03:00:41PM -0500, Jeff Moyer wrote:
>> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
>>
>> >> - Whenever you mount a filesystem with DAX, it spits out a message that says
>> >>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>> >>   needs to be met for DAX to no longer be considered experimental?
>> >
>> > For XFS I'd like to get reflink working with it, for starters.
>>
>> What do you mean by this, exactly?  When Dave outlined the requirements
>> for PMEM_IMMUTABLE, it was very clear that metadata updates would not be
>> possible.  And would you really cosider this a barrier to marking dax
>> fully supported?  I wouldn't.
>
> For PMEM_IMMUTABLE files, yes, reflink cannot be supported.
>
> I'm talking about supporting reflink for DAX files that are /not/
> PMEM_IMMUTABLE, where user programs can mmap pmem directly but write
> activity still must use fsync/msync to ensure that everything's on disk.
>
> I wouldn't consider it a barrier in general (since ext4 also prints
> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
> by modifying iomap_begin to allocate new pmem blocks, memcpy the
> contents, and update the memory mappings.  I think.
>
>> > We probably need a bunch more verification work to show that file IO
>> > doesn't adopt any bad quirks having turned on the per-inode DAX flag.
>>
>> Can you be more specific?  We have ltp and xfstests.  If you have some
>> mkfs/mount options that you think should be tested, speak up.  Beyond
>> that, if it passes ./check -g auto and ltp, are we good?
>
> That's probably good -- I simply wanted to know if we'd at least gotten
> to the point that someone had run both suites with and without DAX and
> not seen any major regressions between the two.

Yes, xfstests is part the dax development flow. The hard part has been
maintaining a blacklist of tests that fail in both the DAX and non-DAX
cases, or false negatives due to DAX disabling delayed allocation

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-17  1:50         ` Darrick J. Wong
  (?)
@ 2017-01-17  7:57             ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17  7:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, Jan 16, 2017 at 05:50:33PM -0800, Darrick J. Wong wrote:
> I wouldn't consider it a barrier in general (since ext4 also prints
> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
> by modifying iomap_begin to allocate new pmem blocks, memcpy the
> contents, and update the memory mappings.  I think.

Yes, and I have a working prototype for that.  I'm just way to busy
with lots of bugfixing at the moment but I plan to get to it in this
merge window.  I also agree that we can't mark a feature as fully
supported until it doesn't conflict with other features.

And I'm not going to get start on the PMEM_IMMUTABLE bullshit, please
don't even go there folks, it's a dead end.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17  7:57             ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17  7:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jeff Moyer, Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm,
	linux-block, linux-mm

On Mon, Jan 16, 2017 at 05:50:33PM -0800, Darrick J. Wong wrote:
> I wouldn't consider it a barrier in general (since ext4 also prints
> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
> by modifying iomap_begin to allocate new pmem blocks, memcpy the
> contents, and update the memory mappings.  I think.

Yes, and I have a working prototype for that.  I'm just way to busy
with lots of bugfixing at the moment but I plan to get to it in this
merge window.  I also agree that we can't mark a feature as fully
supported until it doesn't conflict with other features.

And I'm not going to get start on the PMEM_IMMUTABLE bullshit, please
don't even go there folks, it's a dead end.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17  7:57             ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17  7:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jeff Moyer, Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm,
	linux-block, linux-mm

On Mon, Jan 16, 2017 at 05:50:33PM -0800, Darrick J. Wong wrote:
> I wouldn't consider it a barrier in general (since ext4 also prints
> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
> by modifying iomap_begin to allocate new pmem blocks, memcpy the
> contents, and update the memory mappings.  I think.

Yes, and I have a working prototype for that.  I'm just way to busy
with lots of bugfixing at the moment but I plan to get to it in this
merge window.  I also agree that we can't mark a feature as fully
supported until it doesn't conflict with other features.

And I'm not going to get start on the PMEM_IMMUTABLE bullshit, please
don't even go there folks, it's a dead end.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-17  7:57             ` Christoph Hellwig
@ 2017-01-17 14:54               ` Jeff Moyer
  -1 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-17 14:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

Christoph Hellwig <hch@infradead.org> writes:

> On Mon, Jan 16, 2017 at 05:50:33PM -0800, Darrick J. Wong wrote:
>> I wouldn't consider it a barrier in general (since ext4 also prints
>> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
>> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
>> by modifying iomap_begin to allocate new pmem blocks, memcpy the
>> contents, and update the memory mappings.  I think.

Ah, I wasn't even thinking about non PMEM_IMMUTABLE usage.

> Yes, and I have a working prototype for that.  I'm just way to busy
> with lots of bugfixing at the moment but I plan to get to it in this
> merge window.  I also agree that we can't mark a feature as fully
> supported until it doesn't conflict with other features.

Fair enough.

> And I'm not going to get start on the PMEM_IMMUTABLE bullshit, please
> don't even go there folks, it's a dead end.

I spoke with Dave before the holidays, and he indicated that
PMEM_IMMUTABLE would be an acceptable solution to allowing applications
to flush data completely from userspace.  I know this subject has been
beaten to death, but would you mind just summarizing your opinion on
this one more time?  I'm guessing this will be something more easily
hashed out at LSF, though.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 14:54               ` Jeff Moyer
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-17 14:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

Christoph Hellwig <hch@infradead.org> writes:

> On Mon, Jan 16, 2017 at 05:50:33PM -0800, Darrick J. Wong wrote:
>> I wouldn't consider it a barrier in general (since ext4 also prints
>> EXPERIMENTAL warnings for DAX), merely one for XFS.  I don't even think
>> it's that big of a hurdle -- afaict XFS ought to be able to achieve this
>> by modifying iomap_begin to allocate new pmem blocks, memcpy the
>> contents, and update the memory mappings.  I think.

Ah, I wasn't even thinking about non PMEM_IMMUTABLE usage.

> Yes, and I have a working prototype for that.  I'm just way to busy
> with lots of bugfixing at the moment but I plan to get to it in this
> merge window.  I also agree that we can't mark a feature as fully
> supported until it doesn't conflict with other features.

Fair enough.

> And I'm not going to get start on the PMEM_IMMUTABLE bullshit, please
> don't even go there folks, it's a dead end.

I spoke with Dave before the holidays, and he indicated that
PMEM_IMMUTABLE would be an acceptable solution to allowing applications
to flush data completely from userspace.  I know this subject has been
beaten to death, but would you mind just summarizing your opinion on
this one more time?  I'm guessing this will be something more easily
hashed out at LSF, though.

Thanks,
Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-17 14:54               ` Jeff Moyer
  (?)
@ 2017-01-17 15:06                   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17 15:06 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, Darrick J. Wong,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, Christoph Hellwig,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
> I spoke with Dave before the holidays, and he indicated that
> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
> to flush data completely from userspace.  I know this subject has been
> beaten to death, but would you mind just summarizing your opinion on
> this one more time?  I'm guessing this will be something more easily
> hashed out at LSF, though.

Come up with a prototype that doesn't suck and allows all fs features to
actually work.  And show an application that actually cares and shows
benefits on publicly available real hardware.  Until then go away and
stop wasting everyones time.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 15:06                   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17 15:06 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, Darrick J. Wong, Ross Zwisler, lsf-pc,
	linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
> I spoke with Dave before the holidays, and he indicated that
> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
> to flush data completely from userspace.  I know this subject has been
> beaten to death, but would you mind just summarizing your opinion on
> this one more time?  I'm guessing this will be something more easily
> hashed out at LSF, though.

Come up with a prototype that doesn't suck and allows all fs features to
actually work.  And show an application that actually cares and shows
benefits on publicly available real hardware.  Until then go away and
stop wasting everyones time.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 15:06                   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2017-01-17 15:06 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, Darrick J. Wong, Ross Zwisler, lsf-pc,
	linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
> I spoke with Dave before the holidays, and he indicated that
> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
> to flush data completely from userspace.  I know this subject has been
> beaten to death, but would you mind just summarizing your opinion on
> this one more time?  I'm guessing this will be something more easily
> hashed out at LSF, though.

Come up with a prototype that doesn't suck and allows all fs features to
actually work.  And show an application that actually cares and shows
benefits on publicly available real hardware.  Until then go away and
stop wasting everyones time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
  2017-01-14  0:20 ` Ross Zwisler
  (?)
@ 2017-01-17 15:59   ` Jan Kara
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Kara @ 2017-01-17 15:59 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: linux-fsdevel, linux-block, linux-nvdimm, lsf-pc, linux-mm

On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.

Well, we still need the radix tree entries for locking. And you still need
to keep track of which file offsets are writeably mapped (which we
currently implicitely keep via dirty radix tree entries) so that you can
writeprotect them if needed (during filesystem freezing, for reflink, ...).
So I think what is going to gain the most by far is simply to avoid doing
the writeback at all in such situations.

> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.

I guess this needs measurements. But it is worth a try.

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Yup, I'm still for it.

> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

So from my POV I'd be OK with removing the warning but still the code is
new so there are clearly bugs lurking ;).

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?

If you do partial msync(), then you have the problem that msync(0, x),
msync(x, EOF) will not yield a clean file which may surprise somebody. So
I'm slightly skeptical.
 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

I'm not convinced it is worth it now. Maybe later...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 15:59   ` Jan Kara
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Kara @ 2017-01-17 15:59 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: lsf-pc, linux-fsdevel, linux-block, linux-mm, linux-nvdimm

On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.

Well, we still need the radix tree entries for locking. And you still need
to keep track of which file offsets are writeably mapped (which we
currently implicitely keep via dirty radix tree entries) so that you can
writeprotect them if needed (during filesystem freezing, for reflink, ...).
So I think what is going to gain the most by far is simply to avoid doing
the writeback at all in such situations.

> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.

I guess this needs measurements. But it is worth a try.

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Yup, I'm still for it.

> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

So from my POV I'd be OK with removing the warning but still the code is
new so there are clearly bugs lurking ;).

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?

If you do partial msync(), then you have the problem that msync(0, x),
msync(x, EOF) will not yield a clean file which may surprise somebody. So
I'm slightly skeptical.
 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

I'm not convinced it is worth it now. Maybe later...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 15:59   ` Jan Kara
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Kara @ 2017-01-17 15:59 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: lsf-pc, linux-fsdevel, linux-block, linux-mm, linux-nvdimm

On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
> - The DAX fsync/msync model was built for platforms that need to flush dirty
>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>   platforms, however, that are set up so that the processor caches are
>   effectively part of the ADR safe zone.  This means that dirty data can be
>   assumed to be durable even in the processor cache, obviating the need to
>   manually flush the cache during fsync/msync.  These platforms still need to
>   call fsync/msync to ensure that filesystem metadata updates are properly
>   written to media.  Our first idea on how to properly support these platforms
>   would be for DAX to be made aware that in some cases doesn't need to keep
>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>   and we'd like a solution that covers them all.

Well, we still need the radix tree entries for locking. And you still need
to keep track of which file offsets are writeably mapped (which we
currently implicitely keep via dirty radix tree entries) so that you can
writeprotect them if needed (during filesystem freezing, for reflink, ...).
So I think what is going to gain the most by far is simply to avoid doing
the writeback at all in such situations.

> - If I recall correctly, at one point Dave Chinner suggested that we change
>   DAX so that I/O would use cached stores instead of the non-temporal stores
>   that it currently uses.  We would then track pages that were written to by
>   DAX in the radix tree so that they would be flushed later during
>   fsync/msync.  Does this sound like a win?  Also, assuming that we can find a
>   solution for platforms where the processor cache is part of the ADR safe
>   zone (above topic) this would be a clear improvement, moving us from using
>   non-temporal stores to faster cached stores with no downside.

I guess this needs measurements. But it is worth a try.

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Yup, I'm still for it.

> - Whenever you mount a filesystem with DAX, it spits out a message that says
>   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk".  What criteria
>   needs to be met for DAX to no longer be considered experimental?

So from my POV I'd be OK with removing the warning but still the code is
new so there are clearly bugs lurking ;).

> - When we msync() a huge page, if the range is less than the entire huge page,
>   should we flush the entire huge page and mark it clean in the radix tree, or
>   should we only flush the requested range and leave the radix tree entry
>   dirty?

If you do partial msync(), then you have the problem that msync(0, x),
msync(x, EOF) will not yield a clean file which may surprise somebody. So
I'm slightly skeptical.
 
> - Should we enable 1 GiB huge pages in filesystem DAX?  Does anyone have any
>   specific customer requests for this or performance data suggesting it would
>   be a win?  If so, what work needs to be done to get 1 GiB sized and aligned
>   filesystem block allocations, to get the required enabling in the MM layer,
>   etc?

I'm not convinced it is worth it now. Maybe later...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-17 15:06                   ` Christoph Hellwig
  (?)
@ 2017-01-17 16:07                       ` Jeff Moyer
  -1 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-17 16:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> writes:

> On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
>> I spoke with Dave before the holidays, and he indicated that
>> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
>> to flush data completely from userspace.  I know this subject has been
>> beaten to death, but would you mind just summarizing your opinion on
>> this one more time?  I'm guessing this will be something more easily
>> hashed out at LSF, though.
>
> Come up with a prototype that doesn't suck and allows all fs features to
> actually work.

OK, I'll take this to mean that PMEM_IMMUTABLE is a non-starter.
Perhaps synchronous page faults (or whatever you want to call it) would
work, but...

> And show an application that actually cares and shows benefits on
> publicly available real hardware.

This is the crux of the issue.

> Until then go away and stop wasting everyones time.

Fair enough.  It seems fairly likely that this sort of functionality
would provide a big benefit.  But I agree we should have a real-world
use case as proof.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 16:07                       ` Jeff Moyer
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-17 16:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
>> I spoke with Dave before the holidays, and he indicated that
>> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
>> to flush data completely from userspace.  I know this subject has been
>> beaten to death, but would you mind just summarizing your opinion on
>> this one more time?  I'm guessing this will be something more easily
>> hashed out at LSF, though.
>
> Come up with a prototype that doesn't suck and allows all fs features to
> actually work.

OK, I'll take this to mean that PMEM_IMMUTABLE is a non-starter.
Perhaps synchronous page faults (or whatever you want to call it) would
work, but...

> And show an application that actually cares and shows benefits on
> publicly available real hardware.

This is the crux of the issue.

> Until then go away and stop wasting everyones time.

Fair enough.  It seems fairly likely that this sort of functionality
would provide a big benefit.  But I agree we should have a real-world
use case as proof.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 16:07                       ` Jeff Moyer
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff Moyer @ 2017-01-17 16:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Ross Zwisler, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block, linux-mm

Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Jan 17, 2017 at 09:54:27AM -0500, Jeff Moyer wrote:
>> I spoke with Dave before the holidays, and he indicated that
>> PMEM_IMMUTABLE would be an acceptable solution to allowing applications
>> to flush data completely from userspace.  I know this subject has been
>> beaten to death, but would you mind just summarizing your opinion on
>> this one more time?  I'm guessing this will be something more easily
>> hashed out at LSF, though.
>
> Come up with a prototype that doesn't suck and allows all fs features to
> actually work.

OK, I'll take this to mean that PMEM_IMMUTABLE is a non-starter.
Perhaps synchronous page faults (or whatever you want to call it) would
work, but...

> And show an application that actually cares and shows benefits on
> publicly available real hardware.

This is the crux of the issue.

> Until then go away and stop wasting everyones time.

Fair enough.  It seems fairly likely that this sort of functionality
would provide a big benefit.  But I agree we should have a real-world
use case as proof.

Thanks,
Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
  2017-01-17 15:59   ` Jan Kara
  (?)
@ 2017-01-17 16:56     ` Dan Williams
  -1 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-17 16:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Tue, Jan 17, 2017 at 7:59 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
>> - The DAX fsync/msync model was built for platforms that need to flush dirty
>>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>>   platforms, however, that are set up so that the processor caches are
>>   effectively part of the ADR safe zone.  This means that dirty data can be
>>   assumed to be durable even in the processor cache, obviating the need to
>>   manually flush the cache during fsync/msync.  These platforms still need to
>>   call fsync/msync to ensure that filesystem metadata updates are properly
>>   written to media.  Our first idea on how to properly support these platforms
>>   would be for DAX to be made aware that in some cases doesn't need to keep
>>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>>   and we'd like a solution that covers them all.
>
> Well, we still need the radix tree entries for locking. And you still need
> to keep track of which file offsets are writeably mapped (which we
> currently implicitely keep via dirty radix tree entries) so that you can
> writeprotect them if needed (during filesystem freezing, for reflink, ...).
> So I think what is going to gain the most by far is simply to avoid doing
> the writeback at all in such situations.

I came to the same conclusion when taking a look at this. I have some
patches that simply make the writeback optional, but do not touch any
of the other dirty tracking infrastructure. I'll send them out shortly
after a bit more testing. This also dovetails with the request from
Linus to push pmem flushing routines into the driver and stop abusing
__copy_user_nocache.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 16:56     ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-17 16:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-fsdevel, linux-block, linux-nvdimm, lsf-pc, Linux MM

On Tue, Jan 17, 2017 at 7:59 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
>> - The DAX fsync/msync model was built for platforms that need to flush dirty
>>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>>   platforms, however, that are set up so that the processor caches are
>>   effectively part of the ADR safe zone.  This means that dirty data can be
>>   assumed to be durable even in the processor cache, obviating the need to
>>   manually flush the cache during fsync/msync.  These platforms still need to
>>   call fsync/msync to ensure that filesystem metadata updates are properly
>>   written to media.  Our first idea on how to properly support these platforms
>>   would be for DAX to be made aware that in some cases doesn't need to keep
>>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>>   and we'd like a solution that covers them all.
>
> Well, we still need the radix tree entries for locking. And you still need
> to keep track of which file offsets are writeably mapped (which we
> currently implicitely keep via dirty radix tree entries) so that you can
> writeprotect them if needed (during filesystem freezing, for reflink, ...).
> So I think what is going to gain the most by far is simply to avoid doing
> the writeback at all in such situations.

I came to the same conclusion when taking a look at this. I have some
patches that simply make the writeback optional, but do not touch any
of the other dirty tracking infrastructure. I'll send them out shortly
after a bit more testing. This also dovetails with the request from
Linus to push pmem flushing routines into the driver and stop abusing
__copy_user_nocache.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-17 16:56     ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-17 16:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-fsdevel, linux-block, linux-nvdimm, lsf-pc, Linux MM

On Tue, Jan 17, 2017 at 7:59 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
>> - The DAX fsync/msync model was built for platforms that need to flush dirty
>>   processor cache lines in order to make data durable on NVDIMMs.  There exist
>>   platforms, however, that are set up so that the processor caches are
>>   effectively part of the ADR safe zone.  This means that dirty data can be
>>   assumed to be durable even in the processor cache, obviating the need to
>>   manually flush the cache during fsync/msync.  These platforms still need to
>>   call fsync/msync to ensure that filesystem metadata updates are properly
>>   written to media.  Our first idea on how to properly support these platforms
>>   would be for DAX to be made aware that in some cases doesn't need to keep
>>   metadata about dirty cache lines.  A similar issue exists for volatile uses
>>   of DAX such as with BRD or with PMEM and the memmap command line parameter,
>>   and we'd like a solution that covers them all.
>
> Well, we still need the radix tree entries for locking. And you still need
> to keep track of which file offsets are writeably mapped (which we
> currently implicitely keep via dirty radix tree entries) so that you can
> writeprotect them if needed (during filesystem freezing, for reflink, ...).
> So I think what is going to gain the most by far is simply to avoid doing
> the writeback at all in such situations.

I came to the same conclusion when taking a look at this. I have some
patches that simply make the writeback optional, but do not touch any
of the other dirty tracking infrastructure. I'll send them out shortly
after a bit more testing. This also dovetails with the request from
Linus to push pmem flushing routines into the driver and stop abusing
__copy_user_nocache.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
  2017-01-17 15:59   ` Jan Kara
  (?)
@ 2017-01-18  0:03     ` Kani, Toshimitsu
  -1 siblings, 0 replies; 50+ messages in thread
From: Kani, Toshimitsu @ 2017-01-18  0:03 UTC (permalink / raw)
  To: ross.zwisler, jack
  Cc: linux-block, linux-mm, lsf-pc, linux-fsdevel, linux-nvdimm

On Tue, 2017-01-17 at 16:59 +0100, Jan Kara wrote:
> On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
 :
> > - If I recall correctly, at one point Dave Chinner suggested that
> > we change - If I recall correctly, at one point Dave Chinner
> > suggested that we change   DAX so that I/O would use cached stores
> > instead of the non-temporal stores   that it currently uses.  We
> > would then track pages that were written to by DAX in the radix
> > tree so that they would be flushed later during  
> > fsync/msync.  Does this sound like a win?  Also, assuming that we
> > can find a solution for platforms where the processor cache is part
> > of the ADR safe zone (above topic) this would be a clear
> > improvement, moving us from using non-temporal stores to faster
> > cached stores with no downside.
> 
> I guess this needs measurements. But it is worth a try.

Brain Boylston did some measurement before.
http://oss.sgi.com/archives/xfs/2016-08/msg00239.html

I updated his test program to skip pmem_persist() for the cached copy
case.

                        dst = dstbase;
+ #if 0
                        /* see note above */
                        if (mode == 'c')
                                pmem_persist(dst, dstsz);
+ #endif
                }

Here are sample runs:

$ numactl -N0 time -p ./memcpyperf c /mnt/pmem0/file 1000000
INFO: dst 0x7f1d00000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 3.28
user 3.27
sys 0.00

$ numactl -N0 time -p ./memcpyperf n /mnt/pmem0/file 1000000
INFO: dst 0x7f6080000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 1.01
user 1.01
sys 0.00

$ numactl -N1 time -p ./memcpyperf c /mnt/pmem0/file 1000000
INFO: dst 0x7fe900000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 4.06
user 4.06
sys 0.00

$ numactl -N1 time -p ./memcpyperf n /mnt/pmem0/file 1000000
INFO: dst 0x7f7640000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 1.27
user 1.27
sys 0.00

In this simple test, using non-temporal copy is still faster than using
cached copy.

Thanks,
-Toshi

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  0:03     ` Kani, Toshimitsu
  0 siblings, 0 replies; 50+ messages in thread
From: Kani, Toshimitsu @ 2017-01-18  0:03 UTC (permalink / raw)
  To: ross.zwisler, jack
  Cc: linux-mm, linux-nvdimm, linux-block, lsf-pc, linux-fsdevel

T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE2OjU5ICswMTAwLCBKYW4gS2FyYSB3cm90ZToNCj4gT24g
RnJpIDEzLTAxLTE3IDE3OjIwOjA4LCBSb3NzIFp3aXNsZXIgd3JvdGU6DQogOg0KPiA+IC0gSWYg
SSByZWNhbGwgY29ycmVjdGx5LCBhdCBvbmUgcG9pbnQgRGF2ZSBDaGlubmVyIHN1Z2dlc3RlZCB0
aGF0DQo+ID4gd2UgY2hhbmdlIC0gSWYgSSByZWNhbGwgY29ycmVjdGx5LCBhdCBvbmUgcG9pbnQg
RGF2ZSBDaGlubmVyDQo+ID4gc3VnZ2VzdGVkIHRoYXQgd2UgY2hhbmdlIMKgIERBWCBzbyB0aGF0
IEkvTyB3b3VsZCB1c2UgY2FjaGVkIHN0b3Jlcw0KPiA+IGluc3RlYWQgb2YgdGhlIG5vbi10ZW1w
b3JhbCBzdG9yZXMgwqAgdGhhdCBpdCBjdXJyZW50bHkgdXNlcy7CoMKgV2UNCj4gPiB3b3VsZCB0
aGVuIHRyYWNrIHBhZ2VzIHRoYXQgd2VyZSB3cml0dGVuIHRvIGJ5IERBWCBpbiB0aGUgcmFkaXgN
Cj4gPiB0cmVlIHNvIHRoYXQgdGhleSB3b3VsZCBiZSBmbHVzaGVkIGxhdGVyIGR1cmluZyDCoA0K
PiA+IGZzeW5jL21zeW5jLsKgwqBEb2VzIHRoaXMgc291bmQgbGlrZSBhIHdpbj/CoMKgQWxzbywg
YXNzdW1pbmcgdGhhdCB3ZQ0KPiA+IGNhbiBmaW5kIGEgc29sdXRpb24gZm9yIHBsYXRmb3JtcyB3
aGVyZSB0aGUgcHJvY2Vzc29yIGNhY2hlIGlzIHBhcnQNCj4gPiBvZiB0aGUgQURSIHNhZmUgem9u
ZSAoYWJvdmUgdG9waWMpIHRoaXMgd291bGQgYmUgYSBjbGVhcg0KPiA+IGltcHJvdmVtZW50LCBt
b3ZpbmcgdXMgZnJvbSB1c2luZyBub24tdGVtcG9yYWwgc3RvcmVzIHRvIGZhc3Rlcg0KPiA+IGNh
Y2hlZCBzdG9yZXMgd2l0aCBubyBkb3duc2lkZS4NCj4gDQo+IEkgZ3Vlc3MgdGhpcyBuZWVkcyBt
ZWFzdXJlbWVudHMuIEJ1dCBpdCBpcyB3b3J0aCBhIHRyeS4NCg0KQnJhaW4gQm95bHN0b24gZGlk
IHNvbWUgbWVhc3VyZW1lbnQgYmVmb3JlLg0KaHR0cDovL29zcy5zZ2kuY29tL2FyY2hpdmVzL3hm
cy8yMDE2LTA4L21zZzAwMjM5Lmh0bWwNCg0KSSB1cGRhdGVkIGhpcyB0ZXN0IHByb2dyYW0gdG8g
c2tpcCBwbWVtX3BlcnNpc3QoKSBmb3IgdGhlIGNhY2hlZCBjb3B5DQpjYXNlLg0KDQrCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBkc3QgPSBkc3RiYXNlOw0K
KyAjaWYgMA0KwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
Lyogc2VlIG5vdGUgYWJvdmUgKi8NCsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoGlmIChtb2RlID09ICdjJykNCsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBwbWVtX3BlcnNpc3QoZHN0LCBk
c3Rzeik7DQorICNlbmRpZg0KwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqB9DQoNCkhl
cmUgYXJlIHNhbXBsZSBydW5zOg0KDQokIG51bWFjdGwgLU4wIHRpbWUgLXAgLi9tZW1jcHlwZXJm
IGMgL21udC9wbWVtMC9maWxlIDEwMDAwMDANCklORk86IGRzdCAweDdmMWQwMDAwMDAwMCBzcmMg
MHg2MDEyMDAgZHN0c3ogMjc1NjUwOTY5NiBjcHlzeiAxNjM4NA0KcmVhbCAzLjI4DQp1c2VyIDMu
MjcNCnN5cyAwLjAwDQoNCiQgbnVtYWN0bCAtTjAgdGltZSAtcCAuL21lbWNweXBlcmYgbiAvbW50
L3BtZW0wL2ZpbGUgMTAwMDAwMA0KSU5GTzogZHN0IDB4N2Y2MDgwMDAwMDAwIHNyYyAweDYwMTIw
MCBkc3RzeiAyNzU2NTA5Njk2IGNweXN6IDE2Mzg0DQpyZWFsIDEuMDENCnVzZXIgMS4wMQ0Kc3lz
IDAuMDANCg0KJCBudW1hY3RsIC1OMSB0aW1lIC1wIC4vbWVtY3B5cGVyZiBjIC9tbnQvcG1lbTAv
ZmlsZSAxMDAwMDAwDQpJTkZPOiBkc3QgMHg3ZmU5MDAwMDAwMDAgc3JjIDB4NjAxMjAwIGRzdHN6
IDI3NTY1MDk2OTYgY3B5c3ogMTYzODQNCnJlYWwgNC4wNg0KdXNlciA0LjA2DQpzeXMgMC4wMA0K
DQokIG51bWFjdGwgLU4xIHRpbWUgLXAgLi9tZW1jcHlwZXJmIG4gL21udC9wbWVtMC9maWxlIDEw
MDAwMDANCklORk86IGRzdCAweDdmNzY0MDAwMDAwMCBzcmMgMHg2MDEyMDAgZHN0c3ogMjc1NjUw
OTY5NiBjcHlzeiAxNjM4NA0KcmVhbCAxLjI3DQp1c2VyIDEuMjcNCnN5cyAwLjAwDQoNCkluIHRo
aXMgc2ltcGxlIHRlc3QsIHVzaW5nIG5vbi10ZW1wb3JhbCBjb3B5IGlzIHN0aWxsIGZhc3RlciB0
aGFuIHVzaW5nDQpjYWNoZWQgY29weS4NCg0KVGhhbmtzLA0KLVRvc2hpDQoNCg==

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  0:03     ` Kani, Toshimitsu
  0 siblings, 0 replies; 50+ messages in thread
From: Kani, Toshimitsu @ 2017-01-18  0:03 UTC (permalink / raw)
  To: ross.zwisler, jack
  Cc: linux-mm, linux-nvdimm, linux-block, lsf-pc, linux-fsdevel

On Tue, 2017-01-17 at 16:59 +0100, Jan Kara wrote:
> On Fri 13-01-17 17:20:08, Ross Zwisler wrote:
 :
> > - If I recall correctly, at one point Dave Chinner suggested that
> > we change - If I recall correctly, at one point Dave Chinner
> > suggested that we change   DAX so that I/O would use cached stores
> > instead of the non-temporal stores   that it currently uses.  We
> > would then track pages that were written to by DAX in the radix
> > tree so that they would be flushed later during  
> > fsync/msync.  Does this sound like a win?  Also, assuming that we
> > can find a solution for platforms where the processor cache is part
> > of the ADR safe zone (above topic) this would be a clear
> > improvement, moving us from using non-temporal stores to faster
> > cached stores with no downside.
> 
> I guess this needs measurements. But it is worth a try.

Brain Boylston did some measurement before.
http://oss.sgi.com/archives/xfs/2016-08/msg00239.html

I updated his test program to skip pmem_persist() for the cached copy
case.

                        dst = dstbase;
+ #if 0
                        /* see note above */
                        if (mode == 'c')
                                pmem_persist(dst, dstsz);
+ #endif
                }

Here are sample runs:

$ numactl -N0 time -p ./memcpyperf c /mnt/pmem0/file 1000000
INFO: dst 0x7f1d00000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 3.28
user 3.27
sys 0.00

$ numactl -N0 time -p ./memcpyperf n /mnt/pmem0/file 1000000
INFO: dst 0x7f6080000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 1.01
user 1.01
sys 0.00

$ numactl -N1 time -p ./memcpyperf c /mnt/pmem0/file 1000000
INFO: dst 0x7fe900000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 4.06
user 4.06
sys 0.00

$ numactl -N1 time -p ./memcpyperf n /mnt/pmem0/file 1000000
INFO: dst 0x7f7640000000 src 0x601200 dstsz 2756509696 cpysz 16384
real 1.27
user 1.27
sys 0.00

In this simple test, using non-temporal copy is still faster than using
cached copy.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-14  0:20 ` Ross Zwisler
  (?)
@ 2017-01-18  5:25   ` willy
  -1 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  5:25 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.

+1 to the discussion

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
patches to start that work.  And Dan blocked it.  So I'm not terribly
amused to see somebody else given credit for the idea.

It's not just an optimisation.  It's also essential for supporting
filesystems which don't have block devices.  I'm aware of at least two
customer demands for this in different domains.

1. Embedded uses with NOR flash
2. Cloud/virt uses with multiple VMs on a single piece of hardware

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  5:25   ` willy
  0 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  5:25 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.

+1 to the discussion

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
patches to start that work.  And Dan blocked it.  So I'm not terribly
amused to see somebody else given credit for the idea.

It's not just an optimisation.  It's also essential for supporting
filesystems which don't have block devices.  I'm aware of at least two
customer demands for this in different domains.

1. Embedded uses with NOR flash
2. Cloud/virt uses with multiple VMs on a single piece of hardware


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  5:25   ` willy
  0 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  5:25 UTC (permalink / raw)
  To: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> We still have a lot of work to do, though, and I'd like to propose a discussion
> around what features people would like to see enabled in the coming year as
> well as what what use cases their customers have that we might not be aware of.

+1 to the discussion

> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>   faults without needing to call into the filesystem.  Are there any issues
>   with this approach, and should we move forward with it as an optimization?

Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
patches to start that work.  And Dan blocked it.  So I'm not terribly
amused to see somebody else given credit for the idea.

It's not just an optimisation.  It's also essential for supporting
filesystems which don't have block devices.  I'm aware of at least two
customer demands for this in different domains.

1. Embedded uses with NOR flash
2. Cloud/virt uses with multiple VMs on a single piece of hardware

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-18  5:25   ` willy
  (?)
@ 2017-01-18  6:01     ` Dan Williams
  -1 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:01 UTC (permalink / raw)
  To: willy; +Cc: linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Tue, Jan 17, 2017 at 9:25 PM,  <willy@bombadil.infradead.org> wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
>> We still have a lot of work to do, though, and I'd like to propose a discussion
>> around what features people would like to see enabled in the coming year as
>> well as what what use cases their customers have that we might not be aware of.
>
> +1 to the discussion
>
>> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>>   faults without needing to call into the filesystem.  Are there any issues
>>   with this approach, and should we move forward with it as an optimization?
>
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
>

I "blocked" moving the phys to virt translation out of the driver
since that mapping lifetime is device specific.

However, I think caching the file offset to physical sector/address
result is a great idea.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:01     ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:01 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 9:25 PM,  <willy@bombadil.infradead.org> wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
>> We still have a lot of work to do, though, and I'd like to propose a discussion
>> around what features people would like to see enabled in the coming year as
>> well as what what use cases their customers have that we might not be aware of.
>
> +1 to the discussion
>
>> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>>   faults without needing to call into the filesystem.  Are there any issues
>>   with this approach, and should we move forward with it as an optimization?
>
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
>

I "blocked" moving the phys to virt translation out of the driver
since that mapping lifetime is device specific.

However, I think caching the file offset to physical sector/address
result is a great idea.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:01     ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:01 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 9:25 PM,  <willy@bombadil.infradead.org> wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
>> We still have a lot of work to do, though, and I'd like to propose a discussion
>> around what features people would like to see enabled in the coming year as
>> well as what what use cases their customers have that we might not be aware of.
>
> +1 to the discussion
>
>> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>>   faults without needing to call into the filesystem.  Are there any issues
>>   with this approach, and should we move forward with it as an optimization?
>
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
>

I "blocked" moving the phys to virt translation out of the driver
since that mapping lifetime is device specific.

However, I think caching the file offset to physical sector/address
result is a great idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-18  6:01     ` Dan Williams
  (?)
@ 2017-01-18  6:07       ` willy
  -1 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  6:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >>   faults without needing to call into the filesystem.  Are there any issues
> >>   with this approach, and should we move forward with it as an optimization?
> >
> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> > patches to start that work.  And Dan blocked it.  So I'm not terribly
> > amused to see somebody else given credit for the idea.
> 
> I "blocked" moving the phys to virt translation out of the driver
> since that mapping lifetime is device specific.

The problem is that DAX currently assumes that there *is* a block driver,
and it might be a char device or no device at all (the two examples I
gave earlier).

> However, I think caching the file offset to physical sector/address
> result is a great idea.

OK, great.  The lifetime problem I think you care about (hotplug) can be
handled by removing all the cached entries for every file on every file
on that block device ... I know there were prototype patches for that;
did they ever get merged?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:07       ` willy
  0 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  6:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >>   faults without needing to call into the filesystem.  Are there any issues
> >>   with this approach, and should we move forward with it as an optimization?
> >
> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> > patches to start that work.  And Dan blocked it.  So I'm not terribly
> > amused to see somebody else given credit for the idea.
> 
> I "blocked" moving the phys to virt translation out of the driver
> since that mapping lifetime is device specific.

The problem is that DAX currently assumes that there *is* a block driver,
and it might be a char device or no device at all (the two examples I
gave earlier).

> However, I think caching the file offset to physical sector/address
> result is a great idea.

OK, great.  The lifetime problem I think you care about (hotplug) can be
handled by removing all the cached entries for every file on every file
on that block device ... I know there were prototype patches for that;
did they ever get merged?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:07       ` willy
  0 siblings, 0 replies; 50+ messages in thread
From: willy @ 2017-01-18  6:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >>   faults without needing to call into the filesystem.  Are there any issues
> >>   with this approach, and should we move forward with it as an optimization?
> >
> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> > patches to start that work.  And Dan blocked it.  So I'm not terribly
> > amused to see somebody else given credit for the idea.
> 
> I "blocked" moving the phys to virt translation out of the driver
> since that mapping lifetime is device specific.

The problem is that DAX currently assumes that there *is* a block driver,
and it might be a char device or no device at all (the two examples I
gave earlier).

> However, I think caching the file offset to physical sector/address
> result is a great idea.

OK, great.  The lifetime problem I think you care about (hotplug) can be
handled by removing all the cached entries for every file on every file
on that block device ... I know there were prototype patches for that;
did they ever get merged?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-18  6:07       ` willy
  (?)
@ 2017-01-18  6:25         ` Dan Williams
  -1 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:25 UTC (permalink / raw)
  To: willy; +Cc: linux-nvdimm, linux-block, Linux MM, linux-fsdevel, lsf-pc

On Tue, Jan 17, 2017 at 10:07 PM,  <willy@bombadil.infradead.org> wrote:
> On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
>> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>> >>   faults without needing to call into the filesystem.  Are there any issues
>> >>   with this approach, and should we move forward with it as an optimization?
>> >
>> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
>> > patches to start that work.  And Dan blocked it.  So I'm not terribly
>> > amused to see somebody else given credit for the idea.
>>
>> I "blocked" moving the phys to virt translation out of the driver
>> since that mapping lifetime is device specific.
>
> The problem is that DAX currently assumes that there *is* a block driver,
> and it might be a char device or no device at all (the two examples I
> gave earlier).
>
>> However, I think caching the file offset to physical sector/address
>> result is a great idea.
>
> OK, great.  The lifetime problem I think you care about (hotplug) can be
> handled by removing all the cached entries for every file on every file
> on that block device ... I know there were prototype patches for that;
> did they ever get merged?

No, they didn't.. The last review comment was from Al. He wanted the
mechanism converted from explicit calls at del_gendisk() time into a
notifier chain since it's not just filesystems that may want to
register for a block-device end-of-life event.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:25         ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:25 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 10:07 PM,  <willy@bombadil.infradead.org> wrote:
> On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
>> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>> >>   faults without needing to call into the filesystem.  Are there any issues
>> >>   with this approach, and should we move forward with it as an optimization?
>> >
>> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
>> > patches to start that work.  And Dan blocked it.  So I'm not terribly
>> > amused to see somebody else given credit for the idea.
>>
>> I "blocked" moving the phys to virt translation out of the driver
>> since that mapping lifetime is device specific.
>
> The problem is that DAX currently assumes that there *is* a block driver,
> and it might be a char device or no device at all (the two examples I
> gave earlier).
>
>> However, I think caching the file offset to physical sector/address
>> result is a great idea.
>
> OK, great.  The lifetime problem I think you care about (hotplug) can be
> handled by removing all the cached entries for every file on every file
> on that block device ... I know there were prototype patches for that;
> did they ever get merged?

No, they didn't.. The last review comment was from Al. He wanted the
mechanism converted from explicit calls at del_gendisk() time into a
notifier chain since it's not just filesystems that may want to
register for a block-device end-of-life event.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18  6:25         ` Dan Williams
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Williams @ 2017-01-18  6:25 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, Linux MM

On Tue, Jan 17, 2017 at 10:07 PM,  <willy@bombadil.infradead.org> wrote:
> On Tue, Jan 17, 2017 at 10:01:30PM -0800, Dan Williams wrote:
>> >> - Jan suggested [2] that we could use the radix tree as a cache to service DAX
>> >>   faults without needing to call into the filesystem.  Are there any issues
>> >>   with this approach, and should we move forward with it as an optimization?
>> >
>> > Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
>> > patches to start that work.  And Dan blocked it.  So I'm not terribly
>> > amused to see somebody else given credit for the idea.
>>
>> I "blocked" moving the phys to virt translation out of the driver
>> since that mapping lifetime is device specific.
>
> The problem is that DAX currently assumes that there *is* a block driver,
> and it might be a char device or no device at all (the two examples I
> gave earlier).
>
>> However, I think caching the file offset to physical sector/address
>> result is a great idea.
>
> OK, great.  The lifetime problem I think you care about (hotplug) can be
> handled by removing all the cached entries for every file on every file
> on that block device ... I know there were prototype patches for that;
> did they ever get merged?

No, they didn't.. The last review comment was from Al. He wanted the
mechanism converted from explicit calls at del_gendisk() time into a
notifier chain since it's not just filesystems that may want to
register for a block-device end-of-life event.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
  2017-01-18  5:25   ` willy
  (?)
@ 2017-01-18 17:22     ` Ross Zwisler
  -1 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-18 17:22 UTC (permalink / raw)
  To: willy; +Cc: linux-nvdimm, linux-block, linux-mm, linux-fsdevel, lsf-pc

On Tue, Jan 17, 2017 at 09:25:33PM -0800, willy@bombadil.infradead.org wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> > We still have a lot of work to do, though, and I'd like to propose a discussion
> > around what features people would like to see enabled in the coming year as
> > well as what what use cases their customers have that we might not be aware of.
> 
> +1 to the discussion
> 
> > - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >   faults without needing to call into the filesystem.  Are there any issues
> >   with this approach, and should we move forward with it as an optimization?
> 
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
> 
> It's not just an optimisation.  It's also essential for supporting
> filesystems which don't have block devices.  I'm aware of at least two
> customer demands for this in different domains.
> 
> 1. Embedded uses with NOR flash
> 2. Cloud/virt uses with multiple VMs on a single piece of hardware

Yea, I didn't mean the full move to having PFNs in the tree, just using the
sector number in the radix tree instead of calling into the filesystem.

My apologies if you feel I didn't give you proper credit.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18 17:22     ` Ross Zwisler
  0 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-18 17:22 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Tue, Jan 17, 2017 at 09:25:33PM -0800, willy@bombadil.infradead.org wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> > We still have a lot of work to do, though, and I'd like to propose a discussion
> > around what features people would like to see enabled in the coming year as
> > well as what what use cases their customers have that we might not be aware of.
> 
> +1 to the discussion
> 
> > - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >   faults without needing to call into the filesystem.  Are there any issues
> >   with this approach, and should we move forward with it as an optimization?
> 
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
> 
> It's not just an optimisation.  It's also essential for supporting
> filesystems which don't have block devices.  I'm aware of at least two
> customer demands for this in different domains.
> 
> 1. Embedded uses with NOR flash
> 2. Cloud/virt uses with multiple VMs on a single piece of hardware

Yea, I didn't mean the full move to having PFNs in the tree, just using the
sector number in the radix tree instead of calling into the filesystem.

My apologies if you feel I didn't give you proper credit.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [LSF/MM TOPIC] Future direction of DAX
@ 2017-01-18 17:22     ` Ross Zwisler
  0 siblings, 0 replies; 50+ messages in thread
From: Ross Zwisler @ 2017-01-18 17:22 UTC (permalink / raw)
  To: willy
  Cc: Ross Zwisler, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block, linux-mm

On Tue, Jan 17, 2017 at 09:25:33PM -0800, willy@bombadil.infradead.org wrote:
> On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote:
> > We still have a lot of work to do, though, and I'd like to propose a discussion
> > around what features people would like to see enabled in the coming year as
> > well as what what use cases their customers have that we might not be aware of.
> 
> +1 to the discussion
> 
> > - Jan suggested [2] that we could use the radix tree as a cache to service DAX
> >   faults without needing to call into the filesystem.  Are there any issues
> >   with this approach, and should we move forward with it as an optimization?
> 
> Ahem.  I believe I proposed this at last year's LSFMM.  And I sent
> patches to start that work.  And Dan blocked it.  So I'm not terribly
> amused to see somebody else given credit for the idea.
> 
> It's not just an optimisation.  It's also essential for supporting
> filesystems which don't have block devices.  I'm aware of at least two
> customer demands for this in different domains.
> 
> 1. Embedded uses with NOR flash
> 2. Cloud/virt uses with multiple VMs on a single piece of hardware

Yea, I didn't mean the full move to having PFNs in the tree, just using the
sector number in the radix tree instead of calling into the filesystem.

My apologies if you feel I didn't give you proper credit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2017-01-18 17:24 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-14  0:20 [LSF/MM TOPIC] Future direction of DAX Ross Zwisler
2017-01-14  0:20 ` Ross Zwisler
2017-01-14  0:20 ` Ross Zwisler
     [not found] ` <20170114002008.GA25379-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-01-14  8:26   ` Darrick J. Wong
2017-01-14  8:26     ` Darrick J. Wong
2017-01-14  8:26     ` Darrick J. Wong
2017-01-16  0:19     ` Viacheslav Dubeyko
2017-01-16  0:19       ` Viacheslav Dubeyko
2017-01-16  0:19       ` Viacheslav Dubeyko
2017-01-16 20:00     ` Jeff Moyer
2017-01-16 20:00       ` Jeff Moyer
2017-01-17  1:50       ` Darrick J. Wong
2017-01-17  1:50         ` Darrick J. Wong
2017-01-17  2:42         ` Dan Williams
2017-01-17  2:42           ` Dan Williams
     [not found]         ` <20170117015033.GD10498-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2017-01-17  7:57           ` Christoph Hellwig
2017-01-17  7:57             ` Christoph Hellwig
2017-01-17  7:57             ` Christoph Hellwig
2017-01-17 14:54             ` Jeff Moyer
2017-01-17 14:54               ` Jeff Moyer
     [not found]               ` <x49mvep4tzw.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2017-01-17 15:06                 ` Christoph Hellwig
2017-01-17 15:06                   ` Christoph Hellwig
2017-01-17 15:06                   ` Christoph Hellwig
     [not found]                   ` <20170117150638.GA3747-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-01-17 16:07                     ` Jeff Moyer
2017-01-17 16:07                       ` Jeff Moyer
2017-01-17 16:07                       ` Jeff Moyer
2017-01-17 15:59 ` [Lsf-pc] " Jan Kara
2017-01-17 15:59   ` Jan Kara
2017-01-17 15:59   ` Jan Kara
2017-01-17 16:56   ` Dan Williams
2017-01-17 16:56     ` Dan Williams
2017-01-17 16:56     ` Dan Williams
2017-01-18  0:03   ` Kani, Toshimitsu
2017-01-18  0:03     ` Kani, Toshimitsu
2017-01-18  0:03     ` Kani, Toshimitsu
2017-01-18  5:25 ` willy
2017-01-18  5:25   ` willy
2017-01-18  5:25   ` willy
2017-01-18  6:01   ` Dan Williams
2017-01-18  6:01     ` Dan Williams
2017-01-18  6:01     ` Dan Williams
2017-01-18  6:07     ` willy
2017-01-18  6:07       ` willy
2017-01-18  6:07       ` willy
2017-01-18  6:25       ` Dan Williams
2017-01-18  6:25         ` Dan Williams
2017-01-18  6:25         ` Dan Williams
2017-01-18 17:22   ` Ross Zwisler
2017-01-18 17:22     ` Ross Zwisler
2017-01-18 17:22     ` Ross Zwisler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.