All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>, Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm@lists.01.org, linux-api@vger.kernel.org,
	darrick.wong@oracle.com, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	luto@kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v4 0/3] MAP_DIRECT and block-map sealed files
Date: Tue, 15 Aug 2017 19:01:16 +1000	[thread overview]
Message-ID: <20170815090116.GL21024@dastard> (raw)
In-Reply-To: <150277752553.23945.13932394738552748440.stgit@dwillia2-desk3.amr.corp.intel.com>

On Mon, Aug 14, 2017 at 11:12:05PM -0700, Dan Williams wrote:
> Changes since v3 [1]:
> * Move from an fallocate(2) interface to a new mmap(2) flag and rename
>   'immutable' to 'sealed'.
> 
> * Do not record the sealed state in permanent metadata it is now purely
>   a temporary state for as long as a MAP_DIRECT vma is referencing the
>   inode (Christoph)
> 
> * Drop the CAP_IMMUTABLE requirement, but do require a PROT_WRITE
>   mapping.
> 
> [1]: https://lwn.net/Articles/730570/
> 
> ---
> 
> This is the next revision of a patch series that aims to enable
> applications that otherwise need to resort to DAX mapping a raw device
> file to instead move to a filesystem.
> 
> In the course of reviewing a previous posting, Christoph said:
> 
>     That being said I think we absolutely should support RDMA memory
>     registrations for DAX mappings.  I'm just not sure how S_IOMAP_IMMUTABLE
>     helps with that.  We'll want a MAP_SYNC | MAP_POPULATE to make sure all
>     the blocks are populated and all ptes are set up.  Second we need to
>     make sure get_user_page works, which for now means we'll need a struct
>     page mapping for the region (which will be really annoying for PCIe
>     mappings, like the upcoming NVMe persistent memory region), and we need
>     to guarantee that the extent mapping won't change while the
>     get_user_pages holds the pages inside it.  I think that is true due to
>     side effects even with the current DAX code, but we'll need to make it
>     explicit.  And maybe that's where we need to converge - "sealing" the
>     extent map makes sense as such a temporary measure that is not persisted
>     on disk, which automatically gets released when the holding process
>     exits, because we sort of already do this implicitly.  It might also
>     make sense to have explicitly breakable seals similar to what I do for
>     the pNFS blocks kernel server, as any userspace RDMA file server would
>     also need those semantics.
> 
> So, this is an attempt to converge on the idea that we need an explicit
> and process-lifetime-temporary mechanism for a process to be able to
> make assumptions about the mapping to physical page to dax-file-offset
> relationship. The "explicitly breakable seals" aspect is not addressed
> in these patches, but I wonder if it might be a voluntary mechanism that
> can implemented via userfaultfd.
> 
> These pass a basic smoke test and are meant to just gauge 'right track'
> / 'wrong track'. The main question it seems is whether the pinning done
> in this patchset is too early (applies before get_user_pages()) and too
> coarse (applies to the whole file). Perhaps this is where I discarded
> too easily Jan's suggestion to look at Peter Z's mm_mpin() syscall [2]? On
> the other hand, the coarseness and simple lifetime rules of MAP_DIRECT
> make it an easy mechanism to implement and explain.
> 
> Another reason I kept the scope of S_IOMAP_SEALED coarsely defined was
> to support Dave's desired use case of sealing for operating on reflinked
> files [3].

Which really needs a fcntl() interface to set/clear iomap seals.

Which, now that I look at it, already has a bunch of "file sealing"
commands defined which arrived in 3.17. It appears to be a special
purpose access control interface for memfd_create() to manage shared
access to anonymous tmpfs files and will EINVAL on any fd that
points to a real file.

Oh, even more problematic:

	Seals are a property of an inode. [....] Furthermore, seals
	can never be removed, only added.

That seems somewhat difficult to reconcile with how I need
F_SEAL_IOMAP to operate.

/me calls it a day and goes looking for the hard liquor.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: darrick.wong@oracle.com, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm@lists.01.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	Jeff Moyer <jmoyer@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	luto@kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [PATCH v4 0/3] MAP_DIRECT and block-map sealed files
Date: Tue, 15 Aug 2017 19:01:16 +1000	[thread overview]
Message-ID: <20170815090116.GL21024@dastard> (raw)
In-Reply-To: <150277752553.23945.13932394738552748440.stgit@dwillia2-desk3.amr.corp.intel.com>

On Mon, Aug 14, 2017 at 11:12:05PM -0700, Dan Williams wrote:
> Changes since v3 [1]:
> * Move from an fallocate(2) interface to a new mmap(2) flag and rename
>   'immutable' to 'sealed'.
> 
> * Do not record the sealed state in permanent metadata it is now purely
>   a temporary state for as long as a MAP_DIRECT vma is referencing the
>   inode (Christoph)
> 
> * Drop the CAP_IMMUTABLE requirement, but do require a PROT_WRITE
>   mapping.
> 
> [1]: https://lwn.net/Articles/730570/
> 
> ---
> 
> This is the next revision of a patch series that aims to enable
> applications that otherwise need to resort to DAX mapping a raw device
> file to instead move to a filesystem.
> 
> In the course of reviewing a previous posting, Christoph said:
> 
>     That being said I think we absolutely should support RDMA memory
>     registrations for DAX mappings.  I'm just not sure how S_IOMAP_IMMUTABLE
>     helps with that.  We'll want a MAP_SYNC | MAP_POPULATE to make sure all
>     the blocks are populated and all ptes are set up.  Second we need to
>     make sure get_user_page works, which for now means we'll need a struct
>     page mapping for the region (which will be really annoying for PCIe
>     mappings, like the upcoming NVMe persistent memory region), and we need
>     to guarantee that the extent mapping won't change while the
>     get_user_pages holds the pages inside it.  I think that is true due to
>     side effects even with the current DAX code, but we'll need to make it
>     explicit.  And maybe that's where we need to converge - "sealing" the
>     extent map makes sense as such a temporary measure that is not persisted
>     on disk, which automatically gets released when the holding process
>     exits, because we sort of already do this implicitly.  It might also
>     make sense to have explicitly breakable seals similar to what I do for
>     the pNFS blocks kernel server, as any userspace RDMA file server would
>     also need those semantics.
> 
> So, this is an attempt to converge on the idea that we need an explicit
> and process-lifetime-temporary mechanism for a process to be able to
> make assumptions about the mapping to physical page to dax-file-offset
> relationship. The "explicitly breakable seals" aspect is not addressed
> in these patches, but I wonder if it might be a voluntary mechanism that
> can implemented via userfaultfd.
> 
> These pass a basic smoke test and are meant to just gauge 'right track'
> / 'wrong track'. The main question it seems is whether the pinning done
> in this patchset is too early (applies before get_user_pages()) and too
> coarse (applies to the whole file). Perhaps this is where I discarded
> too easily Jan's suggestion to look at Peter Z's mm_mpin() syscall [2]? On
> the other hand, the coarseness and simple lifetime rules of MAP_DIRECT
> make it an easy mechanism to implement and explain.
> 
> Another reason I kept the scope of S_IOMAP_SEALED coarsely defined was
> to support Dave's desired use case of sealing for operating on reflinked
> files [3].

Which really needs a fcntl() interface to set/clear iomap seals.

Which, now that I look at it, already has a bunch of "file sealing"
commands defined which arrived in 3.17. It appears to be a special
purpose access control interface for memfd_create() to manage shared
access to anonymous tmpfs files and will EINVAL on any fd that
points to a real file.

Oh, even more problematic:

	Seals are a property of an inode. [....] Furthermore, seals
	can never be removed, only added.

That seems somewhat difficult to reconcile with how I need
F_SEAL_IOMAP to operate.

/me calls it a day and goes looking for the hard liquor.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: darrick.wong@oracle.com, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm@lists.01.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	Jeff Moyer <jmoyer@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	luto@kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [PATCH v4 0/3] MAP_DIRECT and block-map sealed files
Date: Tue, 15 Aug 2017 19:01:16 +1000	[thread overview]
Message-ID: <20170815090116.GL21024@dastard> (raw)
In-Reply-To: <150277752553.23945.13932394738552748440.stgit@dwillia2-desk3.amr.corp.intel.com>

On Mon, Aug 14, 2017 at 11:12:05PM -0700, Dan Williams wrote:
> Changes since v3 [1]:
> * Move from an fallocate(2) interface to a new mmap(2) flag and rename
>   'immutable' to 'sealed'.
> 
> * Do not record the sealed state in permanent metadata it is now purely
>   a temporary state for as long as a MAP_DIRECT vma is referencing the
>   inode (Christoph)
> 
> * Drop the CAP_IMMUTABLE requirement, but do require a PROT_WRITE
>   mapping.
> 
> [1]: https://lwn.net/Articles/730570/
> 
> ---
> 
> This is the next revision of a patch series that aims to enable
> applications that otherwise need to resort to DAX mapping a raw device
> file to instead move to a filesystem.
> 
> In the course of reviewing a previous posting, Christoph said:
> 
>     That being said I think we absolutely should support RDMA memory
>     registrations for DAX mappings.  I'm just not sure how S_IOMAP_IMMUTABLE
>     helps with that.  We'll want a MAP_SYNC | MAP_POPULATE to make sure all
>     the blocks are populated and all ptes are set up.  Second we need to
>     make sure get_user_page works, which for now means we'll need a struct
>     page mapping for the region (which will be really annoying for PCIe
>     mappings, like the upcoming NVMe persistent memory region), and we need
>     to guarantee that the extent mapping won't change while the
>     get_user_pages holds the pages inside it.  I think that is true due to
>     side effects even with the current DAX code, but we'll need to make it
>     explicit.  And maybe that's where we need to converge - "sealing" the
>     extent map makes sense as such a temporary measure that is not persisted
>     on disk, which automatically gets released when the holding process
>     exits, because we sort of already do this implicitly.  It might also
>     make sense to have explicitly breakable seals similar to what I do for
>     the pNFS blocks kernel server, as any userspace RDMA file server would
>     also need those semantics.
> 
> So, this is an attempt to converge on the idea that we need an explicit
> and process-lifetime-temporary mechanism for a process to be able to
> make assumptions about the mapping to physical page to dax-file-offset
> relationship. The "explicitly breakable seals" aspect is not addressed
> in these patches, but I wonder if it might be a voluntary mechanism that
> can implemented via userfaultfd.
> 
> These pass a basic smoke test and are meant to just gauge 'right track'
> / 'wrong track'. The main question it seems is whether the pinning done
> in this patchset is too early (applies before get_user_pages()) and too
> coarse (applies to the whole file). Perhaps this is where I discarded
> too easily Jan's suggestion to look at Peter Z's mm_mpin() syscall [2]? On
> the other hand, the coarseness and simple lifetime rules of MAP_DIRECT
> make it an easy mechanism to implement and explain.
> 
> Another reason I kept the scope of S_IOMAP_SEALED coarsely defined was
> to support Dave's desired use case of sealing for operating on reflinked
> files [3].

Which really needs a fcntl() interface to set/clear iomap seals.

Which, now that I look at it, already has a bunch of "file sealing"
commands defined which arrived in 3.17. It appears to be a special
purpose access control interface for memfd_create() to manage shared
access to anonymous tmpfs files and will EINVAL on any fd that
points to a real file.

Oh, even more problematic:

	Seals are a property of an inode. [....] Furthermore, seals
	can never be removed, only added.

That seems somewhat difficult to reconcile with how I need
F_SEAL_IOMAP to operate.

/me calls it a day and goes looking for the hard liquor.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-08-15  8:58 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  6:12 [PATCH v4 0/3] MAP_DIRECT and block-map sealed files Dan Williams
2017-08-15  6:12 ` Dan Williams
2017-08-15  6:12 ` Dan Williams
2017-08-15  6:12 ` Dan Williams
2017-08-15  6:12 ` [PATCH v4 1/3] fs, xfs: introduce S_IOMAP_SEALED Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  6:12 ` [PATCH v4 2/3] mm: introduce MAP_VALIDATE a mechanism for adding new mmap flags Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15 12:27   ` Jan Kara
2017-08-15 12:27     ` Jan Kara
2017-08-15 16:24     ` Dan Williams
2017-08-15 16:24       ` Dan Williams
2017-08-15 16:24       ` Dan Williams
2017-08-15 16:24       ` Dan Williams
2017-09-17  3:44     ` Dan Williams
2017-09-17  3:44       ` Dan Williams
2017-09-17  3:44       ` Dan Williams
2017-09-17 17:39       ` Christoph Hellwig
2017-09-17 17:39         ` Christoph Hellwig
2017-09-17 17:39         ` Christoph Hellwig
2017-09-18  9:31         ` Jan Kara
2017-09-18  9:31           ` Jan Kara
2017-09-18  9:31           ` Jan Kara
2017-09-18  9:31           ` Jan Kara
2017-09-18 15:47           ` Dan Williams
2017-09-18 15:47             ` Dan Williams
2017-09-18  9:26       ` Jan Kara
2017-09-18  9:26         ` Jan Kara
2017-09-18  9:26         ` Jan Kara
2017-08-15 16:28   ` Andy Lutomirski
2017-08-15 16:28     ` Andy Lutomirski
2017-08-15 16:28     ` Andy Lutomirski
2017-08-15 22:31     ` Dan Williams
2017-08-15 22:31       ` Dan Williams
2017-08-17  8:06   ` kbuild test robot
2017-08-17  8:06     ` kbuild test robot
2017-08-15  6:12 ` [PATCH v4 3/3] fs, xfs: introduce MAP_DIRECT for creating block-map-sealed file ranges Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  6:12   ` Dan Williams
2017-08-15  9:18   ` Kirill A. Shutemov
2017-08-15  9:18     ` Kirill A. Shutemov
2017-08-15  9:18     ` Kirill A. Shutemov
2017-08-15 17:11     ` Dan Williams
2017-08-15 17:11       ` Dan Williams
2017-08-15 17:11       ` Dan Williams
2017-08-16 10:25       ` Kirill A. Shutemov
2017-08-16 10:25         ` Kirill A. Shutemov
2017-08-16 10:25         ` Kirill A. Shutemov
2017-08-15 12:42   ` Jan Kara
2017-08-15 12:42     ` Jan Kara
2017-08-15 12:42     ` Jan Kara
2017-08-15 16:29     ` Dan Williams
2017-08-15 16:29       ` Dan Williams
2017-08-15 16:29       ` Dan Williams
2017-08-15 16:29       ` Dan Williams
2017-08-16  1:15       ` Dan Williams
2017-08-16  1:15         ` Dan Williams
2017-08-16  1:15         ` Dan Williams
2017-08-16  1:15         ` Dan Williams
2017-08-17  8:49   ` kbuild test robot
2017-08-17  8:49     ` kbuild test robot
2017-08-17  8:49     ` kbuild test robot
2017-08-15  9:01 ` Dave Chinner [this message]
2017-08-15  9:01   ` [PATCH v4 0/3] MAP_DIRECT and block-map sealed files Dave Chinner
2017-08-15  9:01   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170815090116.GL21024@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.