All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: david <david@fromorbit.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Mon, 29 Oct 2018 23:30:41 -0700	[thread overview]
Message-ID: <CAPcyv4ixoAh7HEMfm+B4sRDx1Qwm6SHGjtQ+5r3EKsxreRydrA@mail.gmail.com> (raw)
In-Reply-To: <20181019004303.GI6311@dastard>

On Thu, Oct 18, 2018 at 5:58 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Oct 18, 2018 at 04:55:55PM +0200, Jan Kara wrote:
> > On Thu 18-10-18 11:25:10, Dave Chinner wrote:
> > > On Wed, Oct 17, 2018 at 04:23:50PM -0400, Jeff Moyer wrote:
> > > > MAP_SYNC
> > > > - file system guarantees that metadata required to reach faulted-in file
> > > >   data is consistent on media before a write fault is completed.  A
> > > >   side-effect is that the page cache will not be used for
> > > >   writably-mapped pages.
> > >
> > > I think you are conflating current implementation with API
> > > requirements - MAP_SYNC doesn't guarantee anything about page cache
> > > use. The man page definition simply says "supported only for files
> > > supporting DAX" and that it provides certain data integrity
> > > guarantees. It does not define the implementation.
> > >
> > > We've /implemented MAP_SYNC/ as O_DSYNC page fault behaviour,
> > > because that's the only way we can currently provide the required
> > > behaviour to userspace. However, if a filesystem can use the page
> > > cache to provide the required functionality, then it's free to do
> > > so.
> > >
> > > i.e. if someone implements a pmem-based page cache, MAP_SYNC data
> > > integrity could be provided /without DAX/ by any filesystem using
> > > that persistent page cache. i.e. MAP_SYNC really only requires
> > > mmap() of CPU addressable persistent memory - it does not require
> > > DAX. Right now, however, the only way to get this functionality is
> > > through a DAX capable filesystem on dax capable storage.
> > >
> > > And, FWIW, this is pretty much how NOVA maintains DAX w/ COW - it
> > > COWs new pages in pmem and attaches them a special per-inode cache
> > > on clean->dirty transition. Then on data sync, background writeback
> > > or crash recovery, it migrates them from the cache into the file map
> > > proper via atomic metadata pointer swaps.
> > >
> > > IOWs, NOVA provides the correct MAP_SYNC semantics by using a
> > > separate persistent per-inode write cache to provide the correct
> > > crash recovery semantics for MAP_SYNC.
> >
> > Corect. NOVA would be able to provide MAP_SYNC semantics without DAX. But
> > effectively it will be also able to provide MAP_DIRECT semantics, right?
>
> Yes, I think so. It still needs to do COW on first write fault,
> but then the app has direct access to the data buffer until it is
> cleaned and put back in place. The "put back in place" is just an
> atomic swap of metadata pointers, so it doesn't need the page cache
> at all...
>
> > Because there won't be DRAM between app and persistent storage and I don't
> > think COW tricks or other data integrity methods are that interesting for
> > the application.
>
> Not for the application, but the filesystem still wants to support
> snapshots and other such functionality that requires COW. And NOVA
> doesn't have write-in-place functionality at all - it always COWs
> on the clean->dirty transition.
>
> > Most users of O_DIRECT are concerned about getting close
> > to media speed performance and low DRAM usage...
>
> *nod*
>
> > > > and what I think Dan had proposed:
> > > >
> > > > mmap flag, MAP_DIRECT
> > > > - file system guarantees that page cache will not be used to front storage.
> > > >   storage MUST be directly addressable.  This *almost* implies MAP_SYNC.
> > > >   The subtle difference is that a write fault /may/ not result in metadata
> > > >   being written back to media.
> > >
> > > SIimilar to O_DIRECT, these semantics do not allow userspace apps to
> > > replace msync/fsync with CPU cache flush operations. So any
> > > application that uses this mode still needs to use either MAP_SYNC
> > > or issue msync/fsync for data integrity.
> > >
> > > If the app is using MAP_DIRECT, the what do we do if the filesystem
> > > can't provide the required semantics for that specific operation? In
> > > the case of O_DIRECT, we fall back to buffered IO because it has the
> > > same data integrity semantics as O_DIRECT and will always work. It's
> > > just slower and consumes more memory, but the app continues to work
> > > just fine.
> > >
> > > Sending SIGBUS to apps when we can't perform MAP_DIRECT operations
> > > without using the pagecache seems extremely problematic to me.  e.g.
> > > an app already has an open MAP_DIRECT file, and a third party
> > > reflinks it or dedupes it and the fs has to fall back to buffered IO
> > > to do COW operations. This isn't the app's fault - the kernel should
> > > just fall back transparently to using the page cache for the
> > > MAP_DIRECT app and just keep working, just like it would if it was
> > > using O_DIRECT read/write.
> >
> > There's another option of failing reflink / dedupe with EBUSY if the file
> > is mapped with MAP_DIRECT and the filesystem cannot support relink &
> > MAP_DIRECT together. But there are downsides to that as well.
>
> Yup, not the least that setting MAP_DIRECT can race with a
> reflink....
>
> > > The point I'm trying to make here is that O_DIRECT is a /hint/, not
> > > a guarantee, and it's done that way to prevent applications from
> > > being presented with transient, potentially fatal error cases
> > > because a filesystem implementation can't do a specific operation
> > > through the direct IO path.
> > >
> > > IMO, MAP_DIRECT needs to be a hint like O_DIRECT and not a
> > > guarantee. Over time we'll end up with filesystems that can
> > > guarantee that MAP_DIRECT is always going to use DAX, in the same
> > > way we have filesystems that guarantee O_DIRECT will always be
> > > O_DIRECT (e.g. XFS). But if we decide that MAP_DIRECT must guarantee
> > > no page cache will ever be used, then we are basically saying
> > > "filesystems won't provide MAP_DIRECT even in common, useful cases
> > > because they can't provide MAP_DIRECT in all cases." And that
> > > doesn't seem like a very good solution to me.
> >
> > These are good points. I'm just somewhat wary of the situation where users
> > will map files with MAP_DIRECT and then the machine starts thrashing
> > because the file got reflinked and thus pagecache gets used suddently.
>
> It's still better than apps randomly getting SIGBUS.
>
> FWIW, this suggests that we probably need to be able to host both
> DAX pages and page cache pages on the same file at the same time,
> and be able to handle page faults based on the type of page being
> mapped (different sets of fault ops for different page types?)
> and have fallback paths when the page type needs to be changed
> between direct and cached during the fault....
>
> > With O_DIRECT the fallback to buffered IO is quite rare (at least for major
> > filesystems) so usually people just won't notice. If fallback for
> > MAP_DIRECT will be easy to hit, I'm not sure it would be very useful.
>
> Which is just like the situation where O_DIRECT on ext3 was not very
> useful, but on other filesystems like XFS it was fully functional.
>
> IMO, the fact that a specific filesytem has a suboptimal fallback
> path for an uncommon behaviour isn't an argument against MAP_DIRECT
> as a hint - it's actually a feature. If MAP_DIRECT can't be used
> until it's always direct access, then most filesystems wouldn't be
> able to provide any faster paths at all. It's much better to have
> partial functionality now than it is to never have the functionality
> at all, and so we need to design in the flexibility we need to
> iteratively improve implementations without needing API changes that
> will break applications.

The hard guarantee requirement still remains though because an
application that expects combined MAP_SYNC|MAP_DIRECT semantics will
be surprised if the MAP_DIRECT property silently disappears. I think
it still makes some sense as a hint for apps that want to minimize
page cache, but for the applications with a flush from userspace model
I think that wants to be an F_SETLEASE / F_DIRECTLCK operation. This
still gives the filesystem the option to inject page-cache at will,
but with an application coordination point.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: david <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, jmoyer <jmoyer@redhat.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Dave Jiang <dave.jiang@intel.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Mon, 29 Oct 2018 23:30:41 -0700	[thread overview]
Message-ID: <CAPcyv4ixoAh7HEMfm+B4sRDx1Qwm6SHGjtQ+5r3EKsxreRydrA@mail.gmail.com> (raw)
In-Reply-To: <20181019004303.GI6311@dastard>

On Thu, Oct 18, 2018 at 5:58 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Oct 18, 2018 at 04:55:55PM +0200, Jan Kara wrote:
> > On Thu 18-10-18 11:25:10, Dave Chinner wrote:
> > > On Wed, Oct 17, 2018 at 04:23:50PM -0400, Jeff Moyer wrote:
> > > > MAP_SYNC
> > > > - file system guarantees that metadata required to reach faulted-in file
> > > >   data is consistent on media before a write fault is completed.  A
> > > >   side-effect is that the page cache will not be used for
> > > >   writably-mapped pages.
> > >
> > > I think you are conflating current implementation with API
> > > requirements - MAP_SYNC doesn't guarantee anything about page cache
> > > use. The man page definition simply says "supported only for files
> > > supporting DAX" and that it provides certain data integrity
> > > guarantees. It does not define the implementation.
> > >
> > > We've /implemented MAP_SYNC/ as O_DSYNC page fault behaviour,
> > > because that's the only way we can currently provide the required
> > > behaviour to userspace. However, if a filesystem can use the page
> > > cache to provide the required functionality, then it's free to do
> > > so.
> > >
> > > i.e. if someone implements a pmem-based page cache, MAP_SYNC data
> > > integrity could be provided /without DAX/ by any filesystem using
> > > that persistent page cache. i.e. MAP_SYNC really only requires
> > > mmap() of CPU addressable persistent memory - it does not require
> > > DAX. Right now, however, the only way to get this functionality is
> > > through a DAX capable filesystem on dax capable storage.
> > >
> > > And, FWIW, this is pretty much how NOVA maintains DAX w/ COW - it
> > > COWs new pages in pmem and attaches them a special per-inode cache
> > > on clean->dirty transition. Then on data sync, background writeback
> > > or crash recovery, it migrates them from the cache into the file map
> > > proper via atomic metadata pointer swaps.
> > >
> > > IOWs, NOVA provides the correct MAP_SYNC semantics by using a
> > > separate persistent per-inode write cache to provide the correct
> > > crash recovery semantics for MAP_SYNC.
> >
> > Corect. NOVA would be able to provide MAP_SYNC semantics without DAX. But
> > effectively it will be also able to provide MAP_DIRECT semantics, right?
>
> Yes, I think so. It still needs to do COW on first write fault,
> but then the app has direct access to the data buffer until it is
> cleaned and put back in place. The "put back in place" is just an
> atomic swap of metadata pointers, so it doesn't need the page cache
> at all...
>
> > Because there won't be DRAM between app and persistent storage and I don't
> > think COW tricks or other data integrity methods are that interesting for
> > the application.
>
> Not for the application, but the filesystem still wants to support
> snapshots and other such functionality that requires COW. And NOVA
> doesn't have write-in-place functionality at all - it always COWs
> on the clean->dirty transition.
>
> > Most users of O_DIRECT are concerned about getting close
> > to media speed performance and low DRAM usage...
>
> *nod*
>
> > > > and what I think Dan had proposed:
> > > >
> > > > mmap flag, MAP_DIRECT
> > > > - file system guarantees that page cache will not be used to front storage.
> > > >   storage MUST be directly addressable.  This *almost* implies MAP_SYNC.
> > > >   The subtle difference is that a write fault /may/ not result in metadata
> > > >   being written back to media.
> > >
> > > SIimilar to O_DIRECT, these semantics do not allow userspace apps to
> > > replace msync/fsync with CPU cache flush operations. So any
> > > application that uses this mode still needs to use either MAP_SYNC
> > > or issue msync/fsync for data integrity.
> > >
> > > If the app is using MAP_DIRECT, the what do we do if the filesystem
> > > can't provide the required semantics for that specific operation? In
> > > the case of O_DIRECT, we fall back to buffered IO because it has the
> > > same data integrity semantics as O_DIRECT and will always work. It's
> > > just slower and consumes more memory, but the app continues to work
> > > just fine.
> > >
> > > Sending SIGBUS to apps when we can't perform MAP_DIRECT operations
> > > without using the pagecache seems extremely problematic to me.  e.g.
> > > an app already has an open MAP_DIRECT file, and a third party
> > > reflinks it or dedupes it and the fs has to fall back to buffered IO
> > > to do COW operations. This isn't the app's fault - the kernel should
> > > just fall back transparently to using the page cache for the
> > > MAP_DIRECT app and just keep working, just like it would if it was
> > > using O_DIRECT read/write.
> >
> > There's another option of failing reflink / dedupe with EBUSY if the file
> > is mapped with MAP_DIRECT and the filesystem cannot support relink &
> > MAP_DIRECT together. But there are downsides to that as well.
>
> Yup, not the least that setting MAP_DIRECT can race with a
> reflink....
>
> > > The point I'm trying to make here is that O_DIRECT is a /hint/, not
> > > a guarantee, and it's done that way to prevent applications from
> > > being presented with transient, potentially fatal error cases
> > > because a filesystem implementation can't do a specific operation
> > > through the direct IO path.
> > >
> > > IMO, MAP_DIRECT needs to be a hint like O_DIRECT and not a
> > > guarantee. Over time we'll end up with filesystems that can
> > > guarantee that MAP_DIRECT is always going to use DAX, in the same
> > > way we have filesystems that guarantee O_DIRECT will always be
> > > O_DIRECT (e.g. XFS). But if we decide that MAP_DIRECT must guarantee
> > > no page cache will ever be used, then we are basically saying
> > > "filesystems won't provide MAP_DIRECT even in common, useful cases
> > > because they can't provide MAP_DIRECT in all cases." And that
> > > doesn't seem like a very good solution to me.
> >
> > These are good points. I'm just somewhat wary of the situation where users
> > will map files with MAP_DIRECT and then the machine starts thrashing
> > because the file got reflinked and thus pagecache gets used suddently.
>
> It's still better than apps randomly getting SIGBUS.
>
> FWIW, this suggests that we probably need to be able to host both
> DAX pages and page cache pages on the same file at the same time,
> and be able to handle page faults based on the type of page being
> mapped (different sets of fault ops for different page types?)
> and have fallback paths when the page type needs to be changed
> between direct and cached during the fault....
>
> > With O_DIRECT the fallback to buffered IO is quite rare (at least for major
> > filesystems) so usually people just won't notice. If fallback for
> > MAP_DIRECT will be easy to hit, I'm not sure it would be very useful.
>
> Which is just like the situation where O_DIRECT on ext3 was not very
> useful, but on other filesystems like XFS it was fully functional.
>
> IMO, the fact that a specific filesytem has a suboptimal fallback
> path for an uncommon behaviour isn't an argument against MAP_DIRECT
> as a hint - it's actually a feature. If MAP_DIRECT can't be used
> until it's always direct access, then most filesystems wouldn't be
> able to provide any faster paths at all. It's much better to have
> partial functionality now than it is to never have the functionality
> at all, and so we need to design in the flexibility we need to
> iteratively improve implementations without needing API changes that
> will break applications.

The hard guarantee requirement still remains though because an
application that expects combined MAP_SYNC|MAP_DIRECT semantics will
be surprised if the MAP_DIRECT property silently disappears. I think
it still makes some sense as a hint for apps that want to minimize
page cache, but for the applications with a flush from userspace model
I think that wants to be an F_SETLEASE / F_DIRECTLCK operation. This
still gives the filesystem the option to inject page-cache at will,
but with an application coordination point.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: david <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: linux-nvdimm
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-xfs <linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-ext4 <linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Mon, 29 Oct 2018 23:30:41 -0700	[thread overview]
Message-ID: <CAPcyv4ixoAh7HEMfm+B4sRDx1Qwm6SHGjtQ+5r3EKsxreRydrA@mail.gmail.com> (raw)
In-Reply-To: <20181019004303.GI6311@dastard>

On Thu, Oct 18, 2018 at 5:58 PM Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
>
> On Thu, Oct 18, 2018 at 04:55:55PM +0200, Jan Kara wrote:
> > On Thu 18-10-18 11:25:10, Dave Chinner wrote:
> > > On Wed, Oct 17, 2018 at 04:23:50PM -0400, Jeff Moyer wrote:
> > > > MAP_SYNC
> > > > - file system guarantees that metadata required to reach faulted-in file
> > > >   data is consistent on media before a write fault is completed.  A
> > > >   side-effect is that the page cache will not be used for
> > > >   writably-mapped pages.
> > >
> > > I think you are conflating current implementation with API
> > > requirements - MAP_SYNC doesn't guarantee anything about page cache
> > > use. The man page definition simply says "supported only for files
> > > supporting DAX" and that it provides certain data integrity
> > > guarantees. It does not define the implementation.
> > >
> > > We've /implemented MAP_SYNC/ as O_DSYNC page fault behaviour,
> > > because that's the only way we can currently provide the required
> > > behaviour to userspace. However, if a filesystem can use the page
> > > cache to provide the required functionality, then it's free to do
> > > so.
> > >
> > > i.e. if someone implements a pmem-based page cache, MAP_SYNC data
> > > integrity could be provided /without DAX/ by any filesystem using
> > > that persistent page cache. i.e. MAP_SYNC really only requires
> > > mmap() of CPU addressable persistent memory - it does not require
> > > DAX. Right now, however, the only way to get this functionality is
> > > through a DAX capable filesystem on dax capable storage.
> > >
> > > And, FWIW, this is pretty much how NOVA maintains DAX w/ COW - it
> > > COWs new pages in pmem and attaches them a special per-inode cache
> > > on clean->dirty transition. Then on data sync, background writeback
> > > or crash recovery, it migrates them from the cache into the file map
> > > proper via atomic metadata pointer swaps.
> > >
> > > IOWs, NOVA provides the correct MAP_SYNC semantics by using a
> > > separate persistent per-inode write cache to provide the correct
> > > crash recovery semantics for MAP_SYNC.
> >
> > Corect. NOVA would be able to provide MAP_SYNC semantics without DAX. But
> > effectively it will be also able to provide MAP_DIRECT semantics, right?
>
> Yes, I think so. It still needs to do COW on first write fault,
> but then the app has direct access to the data buffer until it is
> cleaned and put back in place. The "put back in place" is just an
> atomic swap of metadata pointers, so it doesn't need the page cache
> at all...
>
> > Because there won't be DRAM between app and persistent storage and I don't
> > think COW tricks or other data integrity methods are that interesting for
> > the application.
>
> Not for the application, but the filesystem still wants to support
> snapshots and other such functionality that requires COW. And NOVA
> doesn't have write-in-place functionality at all - it always COWs
> on the clean->dirty transition.
>
> > Most users of O_DIRECT are concerned about getting close
> > to media speed performance and low DRAM usage...
>
> *nod*
>
> > > > and what I think Dan had proposed:
> > > >
> > > > mmap flag, MAP_DIRECT
> > > > - file system guarantees that page cache will not be used to front storage.
> > > >   storage MUST be directly addressable.  This *almost* implies MAP_SYNC.
> > > >   The subtle difference is that a write fault /may/ not result in metadata
> > > >   being written back to media.
> > >
> > > SIimilar to O_DIRECT, these semantics do not allow userspace apps to
> > > replace msync/fsync with CPU cache flush operations. So any
> > > application that uses this mode still needs to use either MAP_SYNC
> > > or issue msync/fsync for data integrity.
> > >
> > > If the app is using MAP_DIRECT, the what do we do if the filesystem
> > > can't provide the required semantics for that specific operation? In
> > > the case of O_DIRECT, we fall back to buffered IO because it has the
> > > same data integrity semantics as O_DIRECT and will always work. It's
> > > just slower and consumes more memory, but the app continues to work
> > > just fine.
> > >
> > > Sending SIGBUS to apps when we can't perform MAP_DIRECT operations
> > > without using the pagecache seems extremely problematic to me.  e.g.
> > > an app already has an open MAP_DIRECT file, and a third party
> > > reflinks it or dedupes it and the fs has to fall back to buffered IO
> > > to do COW operations. This isn't the app's fault - the kernel should
> > > just fall back transparently to using the page cache for the
> > > MAP_DIRECT app and just keep working, just like it would if it was
> > > using O_DIRECT read/write.
> >
> > There's another option of failing reflink / dedupe with EBUSY if the file
> > is mapped with MAP_DIRECT and the filesystem cannot support relink &
> > MAP_DIRECT together. But there are downsides to that as well.
>
> Yup, not the least that setting MAP_DIRECT can race with a
> reflink....
>
> > > The point I'm trying to make here is that O_DIRECT is a /hint/, not
> > > a guarantee, and it's done that way to prevent applications from
> > > being presented with transient, potentially fatal error cases
> > > because a filesystem implementation can't do a specific operation
> > > through the direct IO path.
> > >
> > > IMO, MAP_DIRECT needs to be a hint like O_DIRECT and not a
> > > guarantee. Over time we'll end up with filesystems that can
> > > guarantee that MAP_DIRECT is always going to use DAX, in the same
> > > way we have filesystems that guarantee O_DIRECT will always be
> > > O_DIRECT (e.g. XFS). But if we decide that MAP_DIRECT must guarantee
> > > no page cache will ever be used, then we are basically saying
> > > "filesystems won't provide MAP_DIRECT even in common, useful cases
> > > because they can't provide MAP_DIRECT in all cases." And that
> > > doesn't seem like a very good solution to me.
> >
> > These are good points. I'm just somewhat wary of the situation where users
> > will map files with MAP_DIRECT and then the machine starts thrashing
> > because the file got reflinked and thus pagecache gets used suddently.
>
> It's still better than apps randomly getting SIGBUS.
>
> FWIW, this suggests that we probably need to be able to host both
> DAX pages and page cache pages on the same file at the same time,
> and be able to handle page faults based on the type of page being
> mapped (different sets of fault ops for different page types?)
> and have fallback paths when the page type needs to be changed
> between direct and cached during the fault....
>
> > With O_DIRECT the fallback to buffered IO is quite rare (at least for major
> > filesystems) so usually people just won't notice. If fallback for
> > MAP_DIRECT will be easy to hit, I'm not sure it would be very useful.
>
> Which is just like the situation where O_DIRECT on ext3 was not very
> useful, but on other filesystems like XFS it was fully functional.
>
> IMO, the fact that a specific filesytem has a suboptimal fallback
> path for an uncommon behaviour isn't an argument against MAP_DIRECT
> as a hint - it's actually a feature. If MAP_DIRECT can't be used
> until it's always direct access, then most filesystems wouldn't be
> able to provide any faster paths at all. It's much better to have
> partial functionality now than it is to never have the functionality
> at all, and so we need to design in the flexibility we need to
> iteratively improve implementations without needing API changes that
> will break applications.

The hard guarantee requirement still remains though because an
application that expects combined MAP_SYNC|MAP_DIRECT semantics will
be surprised if the MAP_DIRECT property silently disappears. I think
it still makes some sense as a hint for apps that want to minimize
page cache, but for the applications with a flush from userspace model
I think that wants to be an F_SETLEASE / F_DIRECTLCK operation. This
still gives the filesystem the option to inject page-cache at will,
but with an application coordination point.

  reply	other threads:[~2018-10-30  6:30 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 10:05 Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps Jan Kara
2018-10-02 10:05 ` Jan Kara
2018-10-02 10:05 ` Jan Kara
2018-10-02 10:50 ` Michal Hocko
2018-10-02 10:50   ` Michal Hocko
2018-10-02 13:32   ` Jan Kara
2018-10-02 13:32     ` Jan Kara
2018-10-02 12:10 ` Johannes Thumshirn
2018-10-02 12:10   ` Johannes Thumshirn
2018-10-02 12:10   ` Johannes Thumshirn
2018-10-02 14:20   ` Johannes Thumshirn
2018-10-02 14:20     ` Johannes Thumshirn
2018-10-02 14:20     ` Johannes Thumshirn
2018-10-02 14:45     ` Christoph Hellwig
2018-10-02 14:45       ` Christoph Hellwig
2018-10-02 15:01       ` Johannes Thumshirn
2018-10-02 15:01         ` Johannes Thumshirn
2018-10-02 15:01         ` Johannes Thumshirn
2018-10-02 15:06         ` Christoph Hellwig
2018-10-02 15:06           ` Christoph Hellwig
2018-10-04 10:09           ` Johannes Thumshirn
2018-10-04 10:09             ` Johannes Thumshirn
2018-10-04 10:09             ` Johannes Thumshirn
2018-10-05  6:25             ` Christoph Hellwig
2018-10-05  6:25               ` Christoph Hellwig
2018-10-05  6:35               ` Johannes Thumshirn
2018-10-05  6:35                 ` Johannes Thumshirn
2018-10-05  6:35                 ` Johannes Thumshirn
2018-10-06  1:17                 ` Dan Williams
2018-10-06  1:17                   ` Dan Williams
2018-10-14 15:47                   ` Dan Williams
2018-10-14 15:47                     ` Dan Williams
2018-10-17 20:01                     ` Dan Williams
2018-10-18 17:43                       ` Jan Kara
2018-10-18 17:43                         ` Jan Kara
2018-10-18 19:10                         ` Dan Williams
2018-10-18 19:10                           ` Dan Williams
2018-10-19  3:01                           ` Dave Chinner
2018-10-19  3:01                             ` Dave Chinner
2018-10-02 14:29   ` Jan Kara
2018-10-02 14:29     ` Jan Kara
2018-10-02 14:29     ` Jan Kara
2018-10-02 14:37     ` Christoph Hellwig
2018-10-02 14:37       ` Christoph Hellwig
2018-10-02 14:37       ` Christoph Hellwig
2018-10-02 14:44       ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:52         ` Christoph Hellwig
2018-10-02 14:52           ` Christoph Hellwig
2018-10-02 14:52           ` Christoph Hellwig
2018-10-02 15:31           ` Jan Kara
2018-10-02 15:31             ` Jan Kara
2018-10-02 15:31             ` Jan Kara
2018-10-02 20:18             ` Dan Williams
2018-10-02 20:18               ` Dan Williams
2018-10-03 12:50               ` Jan Kara
2018-10-03 12:50                 ` Jan Kara
2018-10-03 12:50                 ` Jan Kara
2018-10-03 14:38                 ` Dan Williams
2018-10-03 14:38                   ` Dan Williams
2018-10-03 15:06                   ` Jan Kara
2018-10-03 15:06                     ` Jan Kara
2018-10-03 15:06                     ` Jan Kara
2018-10-03 15:13                     ` Dan Williams
2018-10-03 15:13                       ` Dan Williams
2018-10-03 15:13                       ` Dan Williams
2018-10-03 16:44                       ` Jan Kara
2018-10-03 16:44                         ` Jan Kara
2018-10-03 16:44                         ` Jan Kara
2018-10-03 21:13                         ` Dan Williams
2018-10-03 21:13                           ` Dan Williams
2018-10-03 21:13                           ` Dan Williams
2018-10-04 10:04                         ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-02 15:07       ` Jan Kara
2018-10-02 15:07         ` Jan Kara
2018-10-02 15:07         ` Jan Kara
2018-10-17 20:23     ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-18  0:25       ` Dave Chinner
2018-10-18  0:25         ` Dave Chinner
2018-10-18  0:25         ` Dave Chinner
2018-10-18 14:55         ` Jan Kara
2018-10-18 14:55           ` Jan Kara
2018-10-19  0:43           ` Dave Chinner
2018-10-19  0:43             ` Dave Chinner
2018-10-19  0:43             ` Dave Chinner
2018-10-30  6:30             ` Dan Williams [this message]
2018-10-30  6:30               ` Dan Williams
2018-10-30  6:30               ` Dan Williams
2018-10-30 22:49               ` Dave Chinner
2018-10-30 22:49                 ` Dave Chinner
2018-10-30 22:49                 ` Dave Chinner
2018-10-30 22:59                 ` Dan Williams
2018-10-30 22:59                   ` Dan Williams
2018-10-30 22:59                   ` Dan Williams
2018-10-31  5:59                 ` y-goto
2018-10-31  5:59                   ` y-goto-LMvhtfratI1BDgjK7y7TUQ
2018-10-31  5:59                   ` y-goto
2018-11-01 23:00                   ` Dave Chinner
2018-11-01 23:00                     ` Dave Chinner
2018-11-01 23:00                     ` Dave Chinner
2018-11-02  1:43                     ` y-goto
2018-11-02  1:43                       ` y-goto-LMvhtfratI1BDgjK7y7TUQ
2018-11-02  1:43                       ` y-goto
2018-10-18 21:05         ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-09 19:43 ` Jeff Moyer
2018-10-09 19:43   ` Jeff Moyer
2018-10-09 19:43   ` Jeff Moyer
2018-10-16  8:25   ` Jan Kara
2018-10-16  8:25     ` Jan Kara
2018-10-16 12:35     ` Jeff Moyer
2018-10-16 12:35       ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4ixoAh7HEMfm+B4sRDx1Qwm6SHGjtQ+5r3EKsxreRydrA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.