linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
To: Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org,
	Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>,
	watanabe.hiroyuki@lab.ntt.co.jp
Subject: [PATCH] pNFS: Avoid read-modify-write for page-aligned full page write
Date: Thu, 7 Feb 2019 17:12:53 +0900	[thread overview]
Message-ID: <37261782-eebb-b9c5-a480-7ced59b3703f@lab.ntt.co.jp> (raw)

As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(612aa983a0410 pnfs: add flag to force read-modify-write in ->write_begin)

The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even if it is block-aligned.

The change does not affect the optimization for modify-write-read
cases (38c73044f5f4d NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.

Testing results:

We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.

  pNFS clients ---1G Ethernet--- pNFS server
  (HP DL360 G8)                  (HP DL360 G8)
        |                              |
        |                              |
        +------8G Fiber Channel--------+
                      |
                Storage Array
                  (HP P6350)

Throughput of overwrite (both buffered and O_SYNC) is noticeably
improved.

Ops.     |block size|   Throughput   |
          |  (KiB)   |    (MiB/s)     |
          |          |  4.20 | patched|
---------+----------+----------------+
buffered |         4|  21.3 |  233   |
overwrite|        32|  22.2 |  254   |
          |       512|  22.4 |  258   |
---------+----------+----------------+
O_SYNC   |         4|   3.84|    4.68|
overwrite|        32|  12.2 |   32.9 |
          |       512|  18.5 |  151   |
---------+----------+----------------+

Read and write (buffered and O_SYNC) by the same client remain unchanged
by the patch either negatively or positively, as they should do.

Ops.     |block size|   Throughput   |
          |  (KiB)   |    (MiB/s)     |
          |          |  4.20 | patched|
---------+----------+----------------+
read     |         4| 548   |  551   |
          |        32| 547   |  552   |
          |       512| 548   |  549   |
---------+----------+----------------+
buffered |         4| 237   |  243   |
write    |        32| 261   |  267   |
          |       512| 265   |  271   |
---------+----------+----------------+
O_SYNC   |         4|   0.46|    0.46|
write    |        32|   3.60|    3.61|
          |       512| 105   |  108   |
---------+----------+----------------+

Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Tested-by: Hiroyuki Watabane <watanabe.hiroyuki@lab.ntt.co.jp>

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 29553fdba8af..e80954c96ec1 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -276,6 +276,12 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
   * then a modify/write/read cycle when writing to a page in the
   * page cache.
   *
+ * Some pNFS layout drivers can only read/write at a certain block
+ * granularity like all block devices and therefore we must perform
+ * read/modify/write whenever a page hasn't read yet and the data
+ * to be written there is not aligned to a block boundary and/or
+ * smaller than the block size.
+ *
   * The modify/write/read cycle may occur if a page is read before
   * being completely filled by the writer.  In this situation, the
   * page must be completely written to stable storage on the server
@@ -299,8 +305,10 @@ static int nfs_want_read_modify_write(struct file 
*file, struct page *page,
  	unsigned int end = offset + len;

  	if (pnfs_ld_read_whole_page(file->f_mapping->host)) {
-		if (!PageUptodate(page))
-			return 1;
+		if (!PageUptodate(page)) {
+			if (pglen && (end < pglen || offset))
+				return 1;
+		}
  		return 0;
  	}

             reply	other threads:[~2019-02-07  8:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07  8:12 Kazuo Ito [this message]
2019-02-07 13:37 ` [PATCH] pNFS: Avoid read-modify-write for page-aligned full page write Benjamin Coddington
2019-02-08  7:54   ` 伊藤和夫
2019-02-08 14:58     ` Trond Myklebust
2019-02-12  4:34       ` Kazuo Ito

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37261782-eebb-b9c5-a480-7ced59b3703f@lab.ntt.co.jp \
    --to=ito_kazuo_g3@lab.ntt.co.jp \
    --cc=anna.schumaker@netapp.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=watanabe.hiroyuki@lab.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).