All of lore.kernel.org
 help / color / mirror / Atom feed
From: ira.weiny@intel.com
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, "Dan Williams" <dan.j.williams@intel.com>,
	"Jan Kara" <jack@suse.cz>, "Jérôme Glisse" <jglisse@redhat.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Ira Weiny" <ira.weiny@intel.com>
Subject: [RFC PATCH 00/10] RDMA/FS DAX "LONGTERM" lease proposal
Date: Sun, 28 Apr 2019 21:53:49 -0700	[thread overview]
Message-ID: <20190429045359.8923-1-ira.weiny@intel.com> (raw)

From: Ira Weiny <ira.weiny@intel.com>

In order to support RDMA to File system pages[*] without On Demand Paging a
number of things need to be done.

1) GUP "longterm"[1] users need to inform the other subsystems that they have
   taken a pin on a page which may remain pinned for a very "long time".[1]

2) Any page which is "controlled" by a file system such needs to have special
   handling.  The details of the handling depends on if the page is page cache
   backed or not.

   2a) A page cache backed page which has been pinned by GUP Longterm can use a
   bounce buffer to allow the file system to write back snap shots of the page.
   This is handled by the FS recognizing the GUP longterm pin and making a copy
   of the page to be written back.
   	NOTE: this patch set does not address this path.

   2b) A FS "controlled" page which is not page cache backed is either easier
   to deal with or harder depending on the operation the filesystem is trying
   to do.
   
	2ba) [Hard case] If the FS operation _is_ a truncate or hole punch the
	FS can no longer use the pages in question until the pin has been
	removed.  This patch set presents a solution to this by introducing
	some reasonable restrictions on user space applications.

	2bb) [Easy case] If the FS operation is _not_ a truncate or hole punch
	then there is nothing which need be done.  Data is Read or Written
	directly to the page.  This is an easy case which would currently work
	if not for GUP longterm pins being disabled.  Therefore this patch set
	need not change access to the file data but does allow for GUP pins
	after 2ba above is dealt with.


The architecture of this series is to introduce a F_LONGTERM file lease
mechanism which applications use in one of 2 ways.

1) Applications which may require hole punch or truncation operations on files
   they intend to mmmapping and pinning for long periods.  Can take a
   F_LONGTERM lease on the file.  When a file system operation needs truncate
   access to this file the lease is broken and the application gets a SIGIO.
   Upon catching SIGIO the application can un-pin (note munmap is not required)
   the memory associated with that file.  At that point the truncating user can
   proceed.  Re-pinning the memory is entirely left up to the application.  In
   some cases a new mmap will be required (as with a truncation) or a SIGBUS
   would be experienced anyway.

   Failure to respond to a SIGIO lease break within the system configured
   lease-break-time will result in a SIGBUS.

   WIP: SIGBUS could be caught and ignored...  what danger does this present...
   should this be SIGKILL  or should we wait another lease-break-time and then
   send SIGKILL?

2) Applications which don't require hold punch or truncate operations can use
   pinning without taking a F_LONGTERM lease.  However, applications such as
   this are expected to have considered the access to the files they are
   mmaping and are expected to be controlling them in a way that other users on
   a system can't truncate a file and cause a DOS on the application.  These
   applications will be sent a SIGBUS if someone attempts to truncate or hole
   punch a file.

	ALTERNATIVE WIP patch in series: If the F_LONGTERM lease is not taken
	fail the GUP.

The patches compile and have been tested to a first degree.

NOTES:
Can we deal with the lease/pin at the VFS layer?  or for all FSs?
LONGTERM seems like a bad name.  Suggestions?

[1] The definition of long time is debatable but it has been established
that RDMAs use of pages, minutes or hours after the pin is the extreme case
which makes this problem most severe.

[*] Not all file system pages are Page Cache pages.  FS DAX bypasses the page
cache.


Ira Weiny (10):
  fs/locks: Add trace_leases_conflict
  fs/locks: Introduce FL_LONGTERM file lease
  mm/gup: Pass flags down to __gup_device_huge* calls
  WIP: mm/gup: Ensure F_LONGTERM lease is held on GUP pages
  mm/gup: Take FL_LONGTERM lease if not set by user
  fs/locks: Add longterm lease traces
  fs/dax: Create function dax_mapping_is_dax()
  mm/gup: fs: Send SIGBUS on truncate of active file
  fs/locks: Add tracepoint for SIGBUS on LONGTERM expiration
  mm/gup: Remove FOLL_LONGTERM DAX exclusion

 fs/dax.c                         |  23 ++-
 fs/ext4/inode.c                  |   4 +
 fs/locks.c                       | 301 +++++++++++++++++++++++++++++--
 fs/xfs/xfs_file.c                |   4 +
 include/linux/dax.h              |   6 +
 include/linux/fs.h               |  18 ++
 include/linux/mm.h               |   2 +
 include/trace/events/filelock.h  |  74 +++++++-
 include/uapi/asm-generic/fcntl.h |   2 +
 mm/gup.c                         | 107 ++++-------
 mm/huge_memory.c                 |  18 ++
 11 files changed, 468 insertions(+), 91 deletions(-)

-- 
2.20.1


             reply	other threads:[~2019-04-29  4:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29  4:53 ira.weiny [this message]
2019-04-29  4:53 ` [RFC PATCH 01/10] fs/locks: Add trace_leases_conflict ira.weiny
2019-04-29  4:53 ` [RFC PATCH 02/10] fs/locks: Introduce FL_LONGTERM file lease ira.weiny
2019-04-29  4:53 ` [RFC PATCH 03/10] mm/gup: Pass flags down to __gup_device_huge* calls ira.weiny
2019-04-29  4:53 ` [RFC PATCH 04/10] WIP: mm/gup: Ensure F_LONGTERM lease is held on GUP pages ira.weiny
2019-04-29  4:53 ` [RFC PATCH 05/10] mm/gup: Take FL_LONGTERM lease if not set by user ira.weiny
2019-04-29  4:53 ` [RFC PATCH 06/10] fs/locks: Add longterm lease traces ira.weiny
2019-04-29  4:53 ` [RFC PATCH 07/10] fs/dax: Create function dax_mapping_is_dax() ira.weiny
2019-04-29  4:53 ` [RFC PATCH 08/10] mm/gup: fs: Send SIGBUS on truncate of active file ira.weiny
2019-04-29  4:53 ` [RFC PATCH 09/10] fs/locks: Add tracepoint for SIGBUS on LONGTERM expiration ira.weiny
2019-04-29  4:53 ` [RFC PATCH 10/10] mm/gup: Remove FOLL_LONGTERM DAX exclusion ira.weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190429045359.8923-1-ira.weiny@intel.com \
    --to=ira.weiny@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.