Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
	linux-mm@kvack.org
Cc: Jeff Layton <jlayton@kernel.org>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Theodore Ts'o <tytso@mit.edu>, John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>
Subject: Lease semantic proposal
Date: Mon, 23 Sep 2019 12:08:53 -0700
Message-ID: <20190923190853.GA3781@iweiny-DESK2.sc.intel.com> (raw)


Since the last RFC patch set[1] much of the discussion of supporting RDMA with
FS DAX has been around the semantics of the lease mechanism.[2]  Within that
thread it was suggested I try and write some documentation and/or tests for the
new mechanism being proposed.  I have created a foundation to test lease
functionality within xfstests.[3] This should be close to being accepted.
Before writing additional lease tests, or changing lots of kernel code, this
email presents documentation for the new proposed "layout lease" semantic.

At Linux Plumbers[4] just over a week ago, I presented the current state of the
patch set and the outstanding issues.  Based on the discussion there, well as
follow up emails, I propose the following addition to the fcntl() man page.

Thank you,
Ira

[1] https://lkml.org/lkml/2019/8/9/1043
[2] https://lkml.org/lkml/2019/8/9/1062
[3] https://www.spinics.net/lists/fstests/msg12620.html
[4] https://linuxplumbersconf.org/event/4/contributions/368/


<fcntl man page addition>
Layout Leases
-------------

Layout (F_LAYOUT) leases are special leases which can be used to control and/or
be informed about the manipulation of the underlying layout of a file.

A layout is defined as the logical file block -> physical file block mapping
including the file size and sharing of physical blocks among files.  Note that
the unwritten state of a block is not considered part of file layout.

**Read layout lease F_RDLCK | F_LAYOUT**

Read layout leases can be used to be informed of layout changes by the
system or other users.  This lease is similar to the standard read (F_RDLCK)
lease in that any attempt to change the _layout_ of the file will be reported to
the process through the lease break process.  But this lease is different
because the file can be opened for write and data can be read and/or written to
the file as long as the underlying layout of the file does not change.
Therefore, the lease is not broken if the file is simply open for write, but
_may_ be broken if an operation such as, truncate(), fallocate() or write()
results in changing the underlying layout.

**Write layout lease (F_WRLCK | F_LAYOUT)**

Write Layout leases can be used to break read layout leases to indicate that
the process intends to change the underlying layout lease of the file.

A process which has taken a write layout lease has exclusive ownership of the
file layout and can modify that layout as long as the lease is held.
Operations which change the layout are allowed by that process.  But operations
from other file descriptors which attempt to change the layout will break the
lease through the standard lease break process.  The F_LAYOUT flag is used to
indicate a difference between a regular F_WRLCK and F_WRLCK with F_LAYOUT.  In
the F_LAYOUT case opens for write do not break the lease.  But some operations,
if they change the underlying layout, may.

The distinction between read layout leases and write layout leases is that
write layout leases can change the layout without breaking the lease within the
owning process.  This is useful to guarantee a layout prior to specifying the
unbreakable flag described below.


**Unbreakable Layout Leases (F_UNBREAK)**

In order to support pinning of file pages by direct user space users an
unbreakable flag (F_UNBREAK) can be used to modify the read and write layout
lease.  When specified, F_UNBREAK indicates that any user attempting to break
the lease will fail with ETXTBUSY rather than follow the normal breaking
procedure.

Both read and write layout leases can have the unbreakable flag (F_UNBREAK)
specified.  The difference between an unbreakable read layout lease and an
unbreakable write layout lease are that an unbreakable read layout lease is
_not_ exclusive.  This means that once a layout is established on a file,
multiple unbreakable read layout leases can be taken by multiple processes and
used to pin the underlying pages of that file.

Care must therefore be taken to ensure that the layout of the file is as the
user wants prior to using the unbreakable read layout lease.  A safe mechanism
to do this would be to take a write layout lease and use fallocate() to set the
layout of the file.  The layout lease can then be "downgraded" to unbreakable
read layout as long as no other user broke the write layout lease.

</fcntl man page addition>

             reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-23 19:08 Ira Weiny [this message]
2019-09-23 20:17 ` Jeff Layton
2019-10-01 18:17   ` Ira Weiny
2019-10-02 12:28     ` Jeff Layton
2019-10-02 19:27       ` bfields
2019-10-02 20:35         ` Jeff Layton
2019-10-03  8:43           ` Jan Kara
2019-10-03 15:37           ` J. Bruce Fields
2019-10-03  9:01     ` Jan Kara
2019-10-03 17:05       ` Ira Weiny
2019-09-23 22:26 ` Dave Chinner
2019-09-25 23:46   ` Ira Weiny
2019-09-26 11:29     ` Jeff Layton
2019-09-30  8:42     ` Dave Chinner
2019-10-01 21:01       ` Ira Weiny
2019-10-02 13:07         ` Dan Williams
2019-10-10 10:39         ` Dave Chinner
2019-10-04  7:51       ` Jan Kara

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190923190853.GA3781@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git