From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BB12C10F0E for ; Thu, 4 Apr 2019 16:57:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CE017206DD for ; Thu, 4 Apr 2019 16:57:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dsAhy/QX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729627AbfDDQ5s (ORCPT ); Thu, 4 Apr 2019 12:57:48 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:44620 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727415AbfDDQ5s (ORCPT ); Thu, 4 Apr 2019 12:57:48 -0400 Received: by mail-wr1-f65.google.com with SMTP id y7so4654626wrn.11; Thu, 04 Apr 2019 09:57:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=7DHcehpZe/InvQEuN+lO95BvLCUzkcnjLwkGAO5cHlc=; b=dsAhy/QX5k+xIZLAswBgSaljmraT/vGbwAWY9KBV2WvG2UdqhZyLYSjfWyx+rGx8d4 4Thbc8WQGhlNH/mKFSYQibQJVex7tB+D/lYv5MUaajN0+FQb0fNJ2cjr2IsWDG/8a1R2 NvjthnPxG24cSo+4cCb9bWQ138r/szdOOXPYNYXJwIEo9q54rCCxR5WJPZ8HAWPnUhA6 ilManGe2Rk39FNcSfNQscz8VGHZ1c83T1s1NRebKW8DE4vHrpfu/1xmR+yR7a2y1jIMC Zg4+2nAsVI0pLfMwB8KUxPZDesI+z3tL+bAq/54Q3iPNxXD5AkCPR+aqaIUiwdl0z7Zf ovyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=7DHcehpZe/InvQEuN+lO95BvLCUzkcnjLwkGAO5cHlc=; b=P2M6BTiJC3PU6Oi+9LgvMbUXCRqJRuGqvdEZy/U2DfQ65ebYoH4AR3jrMnUfVXZXaI VTTb+BP7c5dzIP7LHRyfs3AhyoNNkDeHYJEYYgFc9lCgDUnTsdJ7b82lpK2ZKh4U+Hvu SFHjrM5z7Q0rL7sdEY4UGzo6r9bFzsTBtM+4i7y58/jljsIS6FZ7CPxCkc7LGM8VdACw bOiBlhFbFO48E+E3QW8pgClfZ0AZEnJA78XCqY1Obj9Etqr43ZkF4Wt4EpYHawMxUT2b OZxlrSv8VuP0FLWmaxWCQUR+yI2NMQE+LsfHTc0iQ+dRDFqQ0+LRIXEBPhf4l7Nccf3+ Cyhw== X-Gm-Message-State: APjAAAWjFhH4Ac/wT5sKe4X7bgTXWFrWFBFxcDc79uo0QmuA/oIkeiqL WhNIISMgfVieDkCMfdf5gh0= X-Google-Smtp-Source: APXvYqyj95bgi8OR7vCR7OC6bFvYiP/RoThJk5825VqPMc+cmG7idY9llNAreV4SKUhwhtCFG7ZgPg== X-Received: by 2002:adf:df08:: with SMTP id y8mr4878183wrl.91.1554397065321; Thu, 04 Apr 2019 09:57:45 -0700 (PDT) Received: from amir-VirtualBox.ctera.local (bzq-166-168-31-246.red.bezeqint.net. [31.168.166.246]) by smtp.gmail.com with ESMTPSA id x192sm25122409wmf.48.2019.04.04.09.57.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Apr 2019 09:57:44 -0700 (PDT) From: Amir Goldstein To: Dave Chinner Cc: "Darrick J . Wong" , Christoph Hellwig , Matthew Wilcox , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload Date: Thu, 4 Apr 2019 19:57:37 +0300 Message-Id: <20190404165737.30889-1-amir73il@gmail.com> X-Mailer: git-send-email 2.17.1 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch improves performance of mixed random rw workload on xfs without relaxing the atomic buffered read/write guaranty that xfs has always provided. We achieve that by calling generic_file_read_iter() twice. Once with a discard iterator to warm up page cache before taking the shared ilock and once again under shared ilock. Since this is a POC patch it also includes a separate fix to the copy_page_to_iter() helper when called with discard iterator. There is no other caller in the kernel to this method with a discard iterator as far as I could see. Signed-off-by: Amir Goldstein --- Hi Folks, With this experimenital patch I was able to bring performance of random rw workload benchmark on xfs much closer to ext4. Ext4, as most Linux filesystems doesn't take the shared inode lock on buffered reads and does not provide atomic buffered reads w.r.t buffered writes. Following are the numbers I got with filebench randomrw workload [1] on a VM with 4 CPUs and spindles. Note that this improvement is unrelated to the rw_semaphore starvation issue that was observed when running the same benchmark on fast SDD drive [2]. === random read/write - cold page cache === --- EXT4 --- filebench randomrw (8 read threads, 8 write threads) kernel 5.1.0-rc2, ext4 rand-write1 862304ops 14235ops/s 111.2mb/s 0.5ms/op rand-read1 22065ops 364ops/s 2.8mb/s 21.5ms/op --- XFS --- filebench randomrw (8 read threads, 8 write threads) kernel 5.1.0-rc2, xfs rand-write1 39451ops 657ops/s 5.1mb/s 12.0ms/op rand-read1 2035ops 34ops/s 0.3mb/s 232.7ms/op --- XFS+ --- filebench randomrw (8 read threads, 8 write threads) kernel 5.1.0-rc2+ (xfs page cache warmup patch) rand-write1 935597ops 15592ops/s 121.8mb/s 0.5ms/op rand-read1 4446ops 74ops/s 0.6mb/s 107.6ms/op To measure the effects of two passes of generic_file_read_iter(), I ran a random read [2] benchmark on 5GB file with warm and cold page cache. === random read - cold page cache === --- EXT4 --- filebench randomread (8 read threads) - cold page cache kernel 5.1.0-rc2 rand-read1 23589ops 393ops/s 3.1mb/s 20.3ms/op --- XFS --- filebench randomread (8 read threads) - cold page cache kernel 5.1.0-rc2 rand-read1 20578ops 343ops/s 2.7mb/s 23.3ms/op --- XFS+ --- filebench randomread (8 read threads) - cold page cache kernel 5.1.0-rc2+ (xfs page cache warmup patch) rand-read1 20476ops 341ops/s 2.7mb/s 23.4ms/op === random read - warm page cache === --- EXT4 --- filebench randomread (8 read threads) - warm page cache kernel 5.1.0-rc2 rand-read1 58168696ops 969410ops/s 7573.5mb/s 0.0ms/op --- XFS --- filebench randomread (8 read threads) - warm page cache kernel 5.1.0-rc2 rand-read1 52748818ops 878951ops/s 6866.8mb/s 0.0ms/op --- XFS+ --- filebench randomread (8 read threads) - warm page cache kernel 5.1.0-rc2+ (xfs page cache warmup patch) rand-read1 52770537ops 879445ops/s 6870.7mb/s 0.0ms/op The numbers of this benchmark do not show and measurable difference for readers only workload with either cold or warm page cache. If needed I can provide more measurments with fio or with different workloads and different drives. Fire away! Thanks, Amir. [1] https://github.com/amir73il/filebench/blob/overlayfs-devel/workloads/randomrw.f [2] https://marc.info/?l=linux-xfs&m=155347265016053&w=2 [3] https://github.com/amir73il/filebench/blob/overlayfs-devel/workloads/randomread.f fs/xfs/xfs_file.c | 14 ++++++++++++++ lib/iov_iter.c | 5 +++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 1f2e284..4e5f88a 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -240,6 +240,20 @@ xfs_file_buffered_aio_read( if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) return -EAGAIN; } else { + /* + * Warm up page cache to minimize time spent under + * shared ilock. + */ + struct iov_iter iter; + loff_t pos = iocb->ki_pos; + + iov_iter_discard(&iter, READ, iov_iter_count(to)); + ret = generic_file_read_iter(iocb, &iter); + if (ret <= 0) + return ret; + + iocb->ki_pos = pos; + xfs_ilock(ip, XFS_IOLOCK_SHARED); } ret = generic_file_read_iter(iocb, to); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index ea36dc3..b22e433 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -893,9 +893,10 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, size_t wanted = copy_to_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); return wanted; - } else if (unlikely(iov_iter_is_discard(i))) + } else if (unlikely(iov_iter_is_discard(i))) { + i->count -= bytes; return bytes; - else if (likely(!iov_iter_is_pipe(i))) + } else if (likely(!iov_iter_is_pipe(i))) return copy_page_to_iter_iovec(page, offset, bytes, i); else return copy_page_to_iter_pipe(page, offset, bytes, i); -- 2.7.4