All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, Linux MM <linux-mm@kvack.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jeff Layton <jlayton@poochiereds.net>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush
Date: Thu, 12 Oct 2017 10:41:39 -0700	[thread overview]
Message-ID: <CAPcyv4gTON__Ohop0B5R2gsKXC71bycTBozqGmF3WmwG9C6LVA@mail.gmail.com> (raw)
In-Reply-To: <20171012142319.GA11254@lst.de>

On Thu, Oct 12, 2017 at 7:23 AM, Christoph Hellwig <hch@lst.de> wrote:
> Sorry for chiming in so late, been extremely busy lately.
>
> From quickly glacing over what the now finally described use case is
> (which contradicts the subject btw - it's not about flushing, it's
> about not removing block mapping under a MR) and the previous comments
> I think that mmap is simply the wrong kind of interface for this.
>
> What we want is support for a new kinds of userspace memory registration in the
> RDMA code that uses the pnfs export interface, both getting the block (or
> rather byte in this case) mapping, and also gets the FL_LAYOUT lease for the
> memory registration.
>
> That btw is exactly what I do for the pNFS RDMA layout, just in-kernel.

...and this is exactly my plan.

So, you're jumping into this review at v9 where I've split the patches
that take an initial MAP_DIRECT lease out from the patches that take
FL_LAYOUT leases at memory registration time. You can see a previous
attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace
flush" which should be in your inbox.

I'm not proposing mmap as the memory registration interface, it's the
"register for notification of lease break" interface. Here's my
proposed sequence:

addr = mmap(..., MAP_DIRECT.., fd); <- register a vma for "direct"
memory registrations with an FL_LAYOUT lease that at a lease break
event sends SIGIO on the fd used for mmap.

ibv_reg_mr(..., addr, ...); <- check for a valid MAP_DIRECT vma, and
take out another FL_LAYOUT lease. This lease force revokes the RDMA
mapping when it expires, and it relies on the process receiving SIGIO
as the 'break' notification.

fallocate(fd, PUNCH_HOLE...) <- breaks all the FL_LAYOUT leases, the
vma owner gets notified by fd.

Al, rightly points out that the fd may be closed by the time the event
fires since the lease follows the vma lifetime. I see two ways to
solve this, document that the process may get notifications on a stale
fd if close() happens before munmap(), or, similar to how we call
locks_remove_posix() in filp_close(), add a routine to disable any
lease notifiers on close(). I'll investigate the second option because
this seems to be a general problem with leases.

For RDMA I am presently re-working the implementation [1]. Inspired by
a discussion with Jason [2], I am going to add something like
ib_umem_ops to allow drivers to override the default policy of what
happens on a lease that expires. The default action is to invalidate
device access to the memory with iommu_unmap(), but I want to allow
for drivers to do something smarter or choose to not support DAX
mappings at all.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012785.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012793.html
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	linux-xfs@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Linux MM <linux-mm@kvack.org>, Jeff Moyer <jmoyer@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jeff Layton <jlayton@poochiereds.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush
Date: Thu, 12 Oct 2017 10:41:39 -0700	[thread overview]
Message-ID: <CAPcyv4gTON__Ohop0B5R2gsKXC71bycTBozqGmF3WmwG9C6LVA@mail.gmail.com> (raw)
In-Reply-To: <20171012142319.GA11254@lst.de>

On Thu, Oct 12, 2017 at 7:23 AM, Christoph Hellwig <hch@lst.de> wrote:
> Sorry for chiming in so late, been extremely busy lately.
>
> From quickly glacing over what the now finally described use case is
> (which contradicts the subject btw - it's not about flushing, it's
> about not removing block mapping under a MR) and the previous comments
> I think that mmap is simply the wrong kind of interface for this.
>
> What we want is support for a new kinds of userspace memory registration in the
> RDMA code that uses the pnfs export interface, both getting the block (or
> rather byte in this case) mapping, and also gets the FL_LAYOUT lease for the
> memory registration.
>
> That btw is exactly what I do for the pNFS RDMA layout, just in-kernel.

...and this is exactly my plan.

So, you're jumping into this review at v9 where I've split the patches
that take an initial MAP_DIRECT lease out from the patches that take
FL_LAYOUT leases at memory registration time. You can see a previous
attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace
flush" which should be in your inbox.

I'm not proposing mmap as the memory registration interface, it's the
"register for notification of lease break" interface. Here's my
proposed sequence:

addr = mmap(..., MAP_DIRECT.., fd); <- register a vma for "direct"
memory registrations with an FL_LAYOUT lease that at a lease break
event sends SIGIO on the fd used for mmap.

ibv_reg_mr(..., addr, ...); <- check for a valid MAP_DIRECT vma, and
take out another FL_LAYOUT lease. This lease force revokes the RDMA
mapping when it expires, and it relies on the process receiving SIGIO
as the 'break' notification.

fallocate(fd, PUNCH_HOLE...) <- breaks all the FL_LAYOUT leases, the
vma owner gets notified by fd.

Al, rightly points out that the fd may be closed by the time the event
fires since the lease follows the vma lifetime. I see two ways to
solve this, document that the process may get notifications on a stale
fd if close() happens before munmap(), or, similar to how we call
locks_remove_posix() in filp_close(), add a routine to disable any
lease notifiers on close(). I'll investigate the second option because
this seems to be a general problem with leases.

For RDMA I am presently re-working the implementation [1]. Inspired by
a discussion with Jason [2], I am going to add something like
ib_umem_ops to allow drivers to override the default policy of what
happens on a lease that expires. The default action is to invalidate
device access to the memory with iommu_unmap(), but I want to allow
for drivers to do something smarter or choose to not support DAX
mappings at all.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012785.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012793.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	linux-xfs@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Linux MM <linux-mm@kvack.org>, Jeff Moyer <jmoyer@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jeff Layton <jlayton@poochiereds.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush
Date: Thu, 12 Oct 2017 10:41:39 -0700	[thread overview]
Message-ID: <CAPcyv4gTON__Ohop0B5R2gsKXC71bycTBozqGmF3WmwG9C6LVA@mail.gmail.com> (raw)
In-Reply-To: <20171012142319.GA11254@lst.de>

On Thu, Oct 12, 2017 at 7:23 AM, Christoph Hellwig <hch@lst.de> wrote:
> Sorry for chiming in so late, been extremely busy lately.
>
> From quickly glacing over what the now finally described use case is
> (which contradicts the subject btw - it's not about flushing, it's
> about not removing block mapping under a MR) and the previous comments
> I think that mmap is simply the wrong kind of interface for this.
>
> What we want is support for a new kinds of userspace memory registration in the
> RDMA code that uses the pnfs export interface, both getting the block (or
> rather byte in this case) mapping, and also gets the FL_LAYOUT lease for the
> memory registration.
>
> That btw is exactly what I do for the pNFS RDMA layout, just in-kernel.

...and this is exactly my plan.

So, you're jumping into this review at v9 where I've split the patches
that take an initial MAP_DIRECT lease out from the patches that take
FL_LAYOUT leases at memory registration time. You can see a previous
attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace
flush" which should be in your inbox.

I'm not proposing mmap as the memory registration interface, it's the
"register for notification of lease break" interface. Here's my
proposed sequence:

addr = mmap(..., MAP_DIRECT.., fd); <- register a vma for "direct"
memory registrations with an FL_LAYOUT lease that at a lease break
event sends SIGIO on the fd used for mmap.

ibv_reg_mr(..., addr, ...); <- check for a valid MAP_DIRECT vma, and
take out another FL_LAYOUT lease. This lease force revokes the RDMA
mapping when it expires, and it relies on the process receiving SIGIO
as the 'break' notification.

fallocate(fd, PUNCH_HOLE...) <- breaks all the FL_LAYOUT leases, the
vma owner gets notified by fd.

Al, rightly points out that the fd may be closed by the time the event
fires since the lease follows the vma lifetime. I see two ways to
solve this, document that the process may get notifications on a stale
fd if close() happens before munmap(), or, similar to how we call
locks_remove_posix() in filp_close(), add a routine to disable any
lease notifiers on close(). I'll investigate the second option because
this seems to be a general problem with leases.

For RDMA I am presently re-working the implementation [1]. Inspired by
a discussion with Jason [2], I am going to add something like
ib_umem_ops to allow drivers to override the default policy of what
happens on a lease that expires. The default action is to invalidate
device access to the memory with iommu_unmap(), but I want to allow
for drivers to do something smarter or choose to not support DAX
mappings at all.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012785.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012793.html

  reply	other threads:[~2017-10-12 17:38 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12  0:47 [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Dan Williams
2017-10-12  0:47 ` Dan Williams
2017-10-12  0:47 ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 1/6] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12 13:51   ` Jan Kara
2017-10-12 13:51     ` Jan Kara
2017-10-12 13:51     ` Jan Kara
2017-10-12 13:51     ` Jan Kara
2017-10-12 16:32     ` Linus Torvalds
2017-10-12 16:32       ` Linus Torvalds
2017-10-12 16:32       ` Linus Torvalds
2017-10-16  7:38       ` Christoph Hellwig
2017-10-16  7:38         ` Christoph Hellwig
2017-10-16  7:38         ` Christoph Hellwig
2017-10-16  7:56       ` Jan Kara
2017-10-16  7:56         ` Jan Kara
2017-10-16  7:56         ` Jan Kara
2017-10-12  0:47 ` [PATCH v9 2/6] fs, mm: pass fd to ->mmap_validate() Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  1:21   ` Al Viro
2017-10-12  1:21     ` Al Viro
2017-10-12  1:21     ` Al Viro
2017-10-12  1:21     ` Al Viro
2017-10-12  1:28     ` Dan Williams
2017-10-12  1:28       ` Dan Williams
2017-10-12  1:28       ` Dan Williams
2017-10-12  1:28       ` Dan Williams
2017-10-12  2:17       ` Dan Williams
2017-10-12  2:17         ` Dan Williams
2017-10-12  2:17         ` Dan Williams
2017-10-12  2:17         ` Dan Williams
2017-10-12  3:44         ` Dan Williams
2017-10-12  3:44           ` Dan Williams
2017-10-12  3:44           ` Dan Williams
2017-10-12  3:44           ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 3/6] fs: MAP_DIRECT core Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 4/6] xfs: prepare xfs_break_layouts() for reuse with MAP_DIRECT Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 5/6] fs, xfs, iomap: introduce break_layout_nowait() Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 6/6] xfs: wire up MAP_DIRECT Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12  0:47   ` Dan Williams
2017-10-12 14:23 ` [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Christoph Hellwig
2017-10-12 14:23   ` Christoph Hellwig
2017-10-12 14:23   ` Christoph Hellwig
2017-10-12 17:41   ` Dan Williams [this message]
2017-10-12 17:41     ` Dan Williams
2017-10-12 17:41     ` Dan Williams
2017-10-13  6:57     ` Christoph Hellwig
2017-10-13  6:57       ` Christoph Hellwig
2017-10-13  6:57       ` Christoph Hellwig
2017-10-13 15:14       ` Dan Williams
2017-10-13 15:14         ` Dan Williams
2017-10-13 15:14         ` Dan Williams
2017-10-13 16:38         ` Jason Gunthorpe
2017-10-13 16:38           ` Jason Gunthorpe
2017-10-13 16:38           ` Jason Gunthorpe
2017-10-13 16:38           ` Jason Gunthorpe
2017-10-13 17:01           ` Dan Williams
2017-10-13 17:01             ` Dan Williams
2017-10-13 17:01             ` Dan Williams
2017-10-13 17:01             ` Dan Williams
2017-10-13 17:31             ` Jason Gunthorpe
2017-10-13 17:31               ` Jason Gunthorpe
2017-10-13 17:31               ` Jason Gunthorpe
2017-10-13 17:31               ` Jason Gunthorpe
2017-10-13 18:22               ` Dan Williams
2017-10-13 18:22                 ` Dan Williams
2017-10-13 18:22                 ` Dan Williams
2017-10-13 18:22                 ` Dan Williams
2017-10-14  1:57                 ` Jason Gunthorpe
2017-10-14  1:57                   ` Jason Gunthorpe
2017-10-14  1:57                   ` Jason Gunthorpe
2017-10-14  1:57                   ` Jason Gunthorpe
2017-10-16 12:02                   ` Sagi Grimberg
2017-10-16 12:02                     ` Sagi Grimberg
2017-10-19  6:02                     ` Jason Gunthorpe
2017-10-19  6:02                       ` Jason Gunthorpe
2017-10-19  6:02                       ` Jason Gunthorpe
2017-10-19  6:02                       ` Jason Gunthorpe
2017-10-16  7:30                 ` Christoph Hellwig
2017-10-16  7:30                   ` Christoph Hellwig
2017-10-16  7:30                   ` Christoph Hellwig
2017-10-16  7:26               ` Christoph Hellwig
2017-10-16  7:26                 ` Christoph Hellwig
2017-10-16  7:26                 ` Christoph Hellwig
2017-10-16 12:07                 ` Sagi Grimberg
2017-10-16 12:07                   ` Sagi Grimberg
2017-10-16 12:07                   ` Sagi Grimberg
2017-10-16 17:43                 ` Dan Williams
2017-10-16 17:43                   ` Dan Williams
2017-10-16 17:43                   ` Dan Williams
2017-10-16 19:44                   ` Dan Williams
2017-10-16 19:44                     ` Dan Williams
2017-10-16 19:44                     ` Dan Williams
2017-10-17  6:46                     ` Christoph Hellwig
2017-10-17  6:46                       ` Christoph Hellwig
2017-10-17  6:46                       ` Christoph Hellwig
2017-10-17  6:46                       ` Christoph Hellwig
2017-10-16  7:22           ` Christoph Hellwig
2017-10-16  7:22             ` Christoph Hellwig
2017-10-16  7:22             ` Christoph Hellwig
2017-10-16  7:22             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gTON__Ohop0B5R2gsKXC71bycTBozqGmF3WmwG9C6LVA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bfields@fieldses.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.