All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ira Weiny <ira.weiny@intel.com>
Cc: "Jan Kara" <jack@suse.cz>,
	linux-nvdimm@lists.01.org, "Dave Chinner" <david@fromorbit.com>,
	"Jeff Layton" <jlayton@kernel.org>,
	linux-kernel@vger.kernel.org,
	"Matthew Wilcox" <willy@infradead.org>,
	linux-xfs@vger.kernel.org, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	linux-mm@kvack.org, "John Hubbard" <jhubbard@nvidia.com>,
	linux-fsdevel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	linux-ext4@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC 00/10] RDMA/FS DAX truncate proposal
Date: Wed, 12 Jun 2019 12:29:17 +0200	[thread overview]
Message-ID: <20190612102917.GB14578@quack2.suse.cz> (raw)
In-Reply-To: <20190607145213.GB14559@iweiny-DESK2.sc.intel.com>

On Fri 07-06-19 07:52:13, Ira Weiny wrote:
> On Fri, Jun 07, 2019 at 09:17:29AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 07, 2019 at 12:36:36PM +0200, Jan Kara wrote:
> > 
> > > Because the pins would be invisible to sysadmin from that point on. 
> > 
> > It is not invisible, it just shows up in a rdma specific kernel
> > interface. You have to use rdma netlink to see the kernel object
> > holding this pin.
> > 
> > If this visibility is the main sticking point I suggest just enhancing
> > the existing MR reporting to include the file info for current GUP
> > pins and teaching lsof to collect information from there as well so it
> > is easy to use.
> > 
> > If the ownership of the lease transfers to the MR, and we report that
> > ownership to userspace in a way lsof can find, then I think all the
> > concerns that have been raised are met, right?
> 
> I was contemplating some new lsof feature yesterday.  But what I don't
> think we want is sysadmins to have multiple tools for multiple
> subsystems.  Or even have to teach lsof something new for every potential
> new subsystem user of GUP pins.

Agreed.

> I was thinking more along the lines of reporting files which have GUP
> pins on them directly somewhere (dare I say procfs?) and teaching lsof to
> report that information.  That would cover any subsystem which does a
> longterm pin.

So lsof already parses /proc/<pid>/maps to learn about files held open by
memory mappings. It could parse some other file as well I guess. The good
thing about that would be that then "longterm pin" structure would just hold
struct file reference. That would avoid any needs of special behavior on
file close (the file reference in the "longterm pin" structure would make
sure struct file and thus the lease stays around, we'd just need to make
explicit lease unlock block until the "longterm pin" structure is freed).
The bad thing is that it requires us to come up with a sane new proc
interface for reporting "longterm pins" and associated struct file. Also we
need to define what this interface shows if the pinned pages are in DRAM
(either page cache or anon) and not on NVDIMM.

> > > ugly to live so we have to come up with something better. The best I can
> > > currently come up with is to have a method associated with the lease that
> > > would invalidate the RDMA context that holds the pins in the same way that
> > > a file close would do it.
> > 
> > This is back to requiring all RDMA HW to have some new behavior they
> > currently don't have..
> > 
> > The main objection to the current ODP & DAX solution is that very
> > little HW can actually implement it, having the alternative still
> > require HW support doesn't seem like progress.
> > 
> > I think we will eventually start seein some HW be able to do this
> > invalidation, but it won't be universal, and I'd rather leave it
> > optional, for recovery from truely catastrophic errors (ie my DAX is
> > on fire, I need to unplug it).
> 
> Agreed.  I think software wise there is not much some of the devices can do
> with such an "invalidate".

So out of curiosity: What does RDMA driver do when userspace just closes
the file pointing to RDMA object? It has to handle that somehow by aborting
everything that's going on... And I wanted similar behavior here.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ira Weiny <ira.weiny@intel.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>, "Jan Kara" <jack@suse.cz>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	"Jeff Layton" <jlayton@kernel.org>,
	"Dave Chinner" <david@fromorbit.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	linux-xfs@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-ext4@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC 00/10] RDMA/FS DAX truncate proposal
Date: Wed, 12 Jun 2019 12:29:17 +0200	[thread overview]
Message-ID: <20190612102917.GB14578@quack2.suse.cz> (raw)
In-Reply-To: <20190607145213.GB14559@iweiny-DESK2.sc.intel.com>

On Fri 07-06-19 07:52:13, Ira Weiny wrote:
> On Fri, Jun 07, 2019 at 09:17:29AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 07, 2019 at 12:36:36PM +0200, Jan Kara wrote:
> > 
> > > Because the pins would be invisible to sysadmin from that point on. 
> > 
> > It is not invisible, it just shows up in a rdma specific kernel
> > interface. You have to use rdma netlink to see the kernel object
> > holding this pin.
> > 
> > If this visibility is the main sticking point I suggest just enhancing
> > the existing MR reporting to include the file info for current GUP
> > pins and teaching lsof to collect information from there as well so it
> > is easy to use.
> > 
> > If the ownership of the lease transfers to the MR, and we report that
> > ownership to userspace in a way lsof can find, then I think all the
> > concerns that have been raised are met, right?
> 
> I was contemplating some new lsof feature yesterday.  But what I don't
> think we want is sysadmins to have multiple tools for multiple
> subsystems.  Or even have to teach lsof something new for every potential
> new subsystem user of GUP pins.

Agreed.

> I was thinking more along the lines of reporting files which have GUP
> pins on them directly somewhere (dare I say procfs?) and teaching lsof to
> report that information.  That would cover any subsystem which does a
> longterm pin.

So lsof already parses /proc/<pid>/maps to learn about files held open by
memory mappings. It could parse some other file as well I guess. The good
thing about that would be that then "longterm pin" structure would just hold
struct file reference. That would avoid any needs of special behavior on
file close (the file reference in the "longterm pin" structure would make
sure struct file and thus the lease stays around, we'd just need to make
explicit lease unlock block until the "longterm pin" structure is freed).
The bad thing is that it requires us to come up with a sane new proc
interface for reporting "longterm pins" and associated struct file. Also we
need to define what this interface shows if the pinned pages are in DRAM
(either page cache or anon) and not on NVDIMM.

> > > ugly to live so we have to come up with something better. The best I can
> > > currently come up with is to have a method associated with the lease that
> > > would invalidate the RDMA context that holds the pins in the same way that
> > > a file close would do it.
> > 
> > This is back to requiring all RDMA HW to have some new behavior they
> > currently don't have..
> > 
> > The main objection to the current ODP & DAX solution is that very
> > little HW can actually implement it, having the alternative still
> > require HW support doesn't seem like progress.
> > 
> > I think we will eventually start seein some HW be able to do this
> > invalidation, but it won't be universal, and I'd rather leave it
> > optional, for recovery from truely catastrophic errors (ie my DAX is
> > on fire, I need to unplug it).
> 
> Agreed.  I think software wise there is not much some of the devices can do
> with such an "invalidate".

So out of curiosity: What does RDMA driver do when userspace just closes
the file pointing to RDMA object? It has to handle that somehow by aborting
everything that's going on... And I wanted similar behavior here.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2019-06-12 10:29 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06  1:45 [PATCH RFC 00/10] RDMA/FS DAX truncate proposal ira.weiny
2019-06-06  1:45 ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 01/10] fs/locks: Add trace_leases_conflict ira.weiny
2019-06-09 12:52   ` Jeff Layton
2019-06-06  1:45 ` [PATCH RFC 02/10] fs/locks: Export F_LAYOUT lease to user space ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-09 13:00   ` Jeff Layton
2019-06-09 13:00     ` Jeff Layton
2019-06-11 21:38     ` Ira Weiny
2019-06-11 21:38       ` Ira Weiny
2019-06-12  9:46       ` Jan Kara
2019-06-06  1:45 ` [PATCH RFC 03/10] mm/gup: Pass flags down to __gup_device_huge* calls ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  6:18   ` Christoph Hellwig
2019-06-06 16:10     ` Ira Weiny
2019-06-06  1:45 ` [PATCH RFC 04/10] mm/gup: Ensure F_LAYOUT lease is held prior to GUP'ing pages ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 05/10] fs/ext4: Teach ext4 to break layout leases ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 06/10] fs/ext4: Teach dax_layout_busy_page() to operate on a sub-range ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 07/10] fs/ext4: Fail truncate if pages are GUP pinned ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06 10:58   ` Jan Kara
2019-06-06 10:58     ` Jan Kara
2019-06-06 16:17     ` Ira Weiny
2019-06-06  1:45 ` [PATCH RFC 08/10] fs/xfs: Teach xfs to use new dax_layout_busy_page() ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 09/10] fs/xfs: Fail truncate if pages are GUP pinned ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  1:45 ` [PATCH RFC 10/10] mm/gup: Remove FOLL_LONGTERM DAX exclusion ira.weiny
2019-06-06  1:45   ` ira.weiny
2019-06-06  5:52 ` [PATCH RFC 00/10] RDMA/FS DAX truncate proposal John Hubbard
2019-06-06  5:52   ` John Hubbard
2019-06-06 17:11   ` Ira Weiny
2019-06-06 17:11     ` Ira Weiny
2019-06-06 19:46     ` Jason Gunthorpe
2019-06-06 10:42 ` Jan Kara
2019-06-06 15:35   ` Dan Williams
2019-06-06 19:51   ` Jason Gunthorpe
2019-06-06 22:22     ` Ira Weiny
2019-06-07 10:36       ` Jan Kara
2019-06-07 12:17         ` Jason Gunthorpe
2019-06-07 14:52           ` Ira Weiny
2019-06-07 14:52             ` Ira Weiny
2019-06-07 15:10             ` Jason Gunthorpe
2019-06-12 10:29             ` Jan Kara [this message]
2019-06-12 10:29               ` Jan Kara
2019-06-12 11:47               ` Jason Gunthorpe
2019-06-12 12:09                 ` Jan Kara
2019-06-12 12:09                   ` Jan Kara
2019-06-12 18:41                   ` Dan Williams
2019-06-13  7:17                     ` Jan Kara
2019-06-13  7:17                       ` Jan Kara
2019-06-12 19:14                   ` Jason Gunthorpe
2019-06-12 22:13                     ` Ira Weiny
2019-06-12 22:54                       ` Dan Williams
2019-06-12 22:54                         ` Dan Williams
2019-06-12 23:33                         ` Ira Weiny
2019-06-12 23:33                           ` Ira Weiny
2019-06-13  1:14                           ` Dan Williams
2019-06-13  1:14                             ` Dan Williams
2019-06-13 15:13                             ` Jason Gunthorpe
2019-06-13 16:25                               ` Dan Williams
2019-06-13 16:25                                 ` Dan Williams
2019-06-13 17:18                                 ` Jason Gunthorpe
2019-06-13 16:53                           ` Dan Williams
2019-06-13 16:53                             ` Dan Williams
2019-06-13 15:12                         ` Jason Gunthorpe
2019-06-13  7:53                       ` Jan Kara
2019-06-13  7:53                         ` Jan Kara
2019-06-12 18:49               ` Dan Williams
2019-06-12 18:49                 ` Dan Williams
2019-06-13  7:43                 ` Jan Kara
2019-06-06 22:03   ` Ira Weiny
2019-06-06 22:03     ` Ira Weiny
2019-06-06 22:26     ` Ira Weiny
2019-06-06 22:28     ` Dave Chinner
2019-06-07 11:04     ` Jan Kara
2019-06-07 18:25       ` Ira Weiny
2019-06-07 18:25         ` Ira Weiny
2019-06-07 18:25         ` Ira Weiny
2019-06-07 18:50         ` Jason Gunthorpe
2019-06-08  0:10         ` Dave Chinner
2019-06-08  0:10           ` Dave Chinner
2019-06-09  1:29           ` Ira Weiny
2019-06-09  1:29             ` Ira Weiny
2019-06-09  1:29             ` Ira Weiny
2019-06-12 12:37           ` Matthew Wilcox
2019-06-12 12:37             ` Matthew Wilcox
2019-06-12 12:37             ` Matthew Wilcox
2019-06-12 23:30             ` Ira Weiny
2019-06-12 23:30               ` Ira Weiny
2019-06-12 23:30               ` Ira Weiny
2019-06-13  0:55               ` Dave Chinner
2019-06-13  0:55                 ` Dave Chinner
2019-06-13  0:55                 ` Dave Chinner
2019-06-13 20:34                 ` Ira Weiny
2019-06-13 20:34                   ` Ira Weiny
2019-06-13 20:34                   ` Ira Weiny
2019-06-14  3:42                   ` Dave Chinner
2019-06-13  0:25             ` Dave Chinner
2019-06-13  0:25               ` Dave Chinner
2019-06-13  3:23               ` Matthew Wilcox
2019-06-13  3:23                 ` Matthew Wilcox
2019-06-13  3:23                 ` Matthew Wilcox
2019-06-13  4:36                 ` Dave Chinner
2019-06-13  4:36                   ` Dave Chinner
2019-06-13  4:36                   ` Dave Chinner
2019-06-13 10:47                   ` Matthew Wilcox
2019-06-13 10:47                     ` Matthew Wilcox
2019-06-13 10:47                     ` Matthew Wilcox
2019-06-13 15:29                 ` Jason Gunthorpe
2019-06-13 15:27               ` Matthew Wilcox
2019-06-13 15:27                 ` Matthew Wilcox
2019-06-13 15:27                 ` Matthew Wilcox
2019-06-13 21:13                 ` Ira Weiny
2019-06-13 21:13                   ` Ira Weiny
2019-06-13 23:45                   ` Jason Gunthorpe
2019-06-14  0:00                     ` Ira Weiny
2019-06-14  0:00                       ` Ira Weiny
2019-06-14  2:09                     ` Dave Chinner
2019-06-14  2:09                       ` Dave Chinner
2019-06-14  2:09                       ` Dave Chinner
2019-06-14  2:31                       ` Matthew Wilcox
2019-06-14  2:31                         ` Matthew Wilcox
2019-06-14  3:07                         ` Dave Chinner
2019-06-14  3:07                           ` Dave Chinner
2019-06-14  3:07                           ` Dave Chinner
2019-06-20 14:52                 ` Jan Kara
2019-06-20 14:52                   ` Jan Kara
2019-06-13 20:34               ` Ira Weiny
2019-06-13 20:34                 ` Ira Weiny
2019-06-13 20:34                 ` Ira Weiny
2019-06-14  2:58                 ` Dave Chinner
2019-06-14  2:58                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612102917.GB14578@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.