From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=kyAr=SH=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B6F92C4360F
	for <linux-fsdevel@archiver.kernel.org>; Fri,  5 Apr 2019 14:02:47 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 8619821850
	for <linux-fsdevel@archiver.kernel.org>; Fri,  5 Apr 2019 14:02:47 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="orhBsZtz"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731264AbfDEOCq (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 5 Apr 2019 10:02:46 -0400
Received: from mail-yw1-f65.google.com ([209.85.161.65]:32918 "EHLO
        mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726643AbfDEOCq (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 5 Apr 2019 10:02:46 -0400
Received: by mail-yw1-f65.google.com with SMTP id l5so2349662ywa.0;
        Fri, 05 Apr 2019 07:02:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=QfNewao+6aKRIyZh9wdpLvDXsS4LbHOmjBJWCQPDyp0=;
        b=orhBsZtzOKjZLCUXkmeuzRc0brvlt9Td2tuixcxinN2nAfbxFaSth8eVs5/JmPDblL
         69lWYirXU2IOuceamEavOMxebJVnJjar0yjK3kWBwrH7ukAG+ByxplKV9JCS9WHBHflM
         xZjkj13+2ZFy9HC3DXOBKiPAwsxkbeICFkyv9KFssCZxG3zzvKWDqs/KKtHnwurluucw
         16f2fNj+kgKEhfl5fBAF4YSVSiMN5OGV89ky3JNQPuhPw0bgqR6d3aMa0lEMMb6lsktn
         +7q6G5rBzfL1dUwdi38hFNYUPzqiN/VcLDCvLtXEh1SnetC4q4dEY2MXAXxhZJYz4nAm
         JyxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=QfNewao+6aKRIyZh9wdpLvDXsS4LbHOmjBJWCQPDyp0=;
        b=tcn9muuv+2yiNnuQ5WzkKejO34s1gdgIbVJXbGT3dkglcezM5jkf0Wq9azBJ2uyXo7
         I16a2Yxgd7greRx9v9gFUj96E0OYUhioxw0pJxOFHYPHh+nkGdi5/MFDLEI5AI39jqWE
         0FZdT/6EPSdhcivgqcREvq8E83OkZAHZohv8YC7hvpvi6aIclaZ+v2NPD9+JgAgHTMon
         TemRLLmXvwJ1LnS9vY8AzEeXap/iGRiPDjWM8QOUM0RLsdzfF5jDUn30xlPfqniWvqnM
         zNz5c9y62IJ5/ITWTXZwVDT48/9/tM8vUL1zBHYoKYbzMPyrx/glqUYmHBkjpRxY4nu8
         TrvA==
X-Gm-Message-State: APjAAAV+TrS9/WoCkMEedRfNduWlUmK1Rf8LO8VA37JfsSVhKLi5r+SG
        jHk9J9pXGFOqYWxkc0L3ZqsiP9LEzXcgCLXHqSs=
X-Google-Smtp-Source: APXvYqzD/CLcuqtSFjjLdYr1OT2B178wsPGvz8nJ1PnNE45yNGgbkswlcVXneWxeEvXJkPrWDXfk0mppCNFVtrKtkQA=
X-Received: by 2002:a81:b554:: with SMTP id c20mr10075553ywk.169.1554472964618;
 Fri, 05 Apr 2019 07:02:44 -0700 (PDT)
MIME-Version: 1.0
References: <20190404165737.30889-1-amir73il@gmail.com> <20190404211730.GD26298@dastard>
In-Reply-To: <20190404211730.GD26298@dastard>
From:   Amir Goldstein <amir73il@gmail.com>
Date:   Fri, 5 Apr 2019 17:02:33 +0300
Message-ID: <CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com>
Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload
To:     Dave Chinner <david@fromorbit.com>
Cc:     "Darrick J . Wong" <darrick.wong@oracle.com>,
        Christoph Hellwig <hch@lst.de>,
        Matthew Wilcox <willy@infradead.org>,
        linux-xfs <linux-xfs@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Fri, Apr 5, 2019 at 12:17 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Apr 04, 2019 at 07:57:37PM +0300, Amir Goldstein wrote:
> > This patch improves performance of mixed random rw workload
> > on xfs without relaxing the atomic buffered read/write guaranty
> > that xfs has always provided.
> >
> > We achieve that by calling generic_file_read_iter() twice.
> > Once with a discard iterator to warm up page cache before taking
> > the shared ilock and once again under shared ilock.
>
> This will race with thing like truncate, hole punching, etc that
> serialise IO and invalidate the page cache for data integrity
> reasons under the IOLOCK. These rely on there being no IO to the
> inode in progress at all to work correctly, which this patch
> violates. IOWs, while this is fast, it is not safe and so not a
> viable approach to solving the problem.
>

This statement leaves me wondering, if ext4 does not takes
i_rwsem on generic_file_read_iter(), how does ext4 (or any other
fs for that matter) guaranty buffered read synchronization with
truncate, hole punching etc?
The answer in ext4 case is i_mmap_sem, which is read locked
in the page fault handler.

And xfs does the same type of synchronization with MMAPLOCK,
so while my patch may not be safe, I cannot follow why from your
explanation, so please explain if I am missing something.

One thing that Darrick mentioned earlier was that IOLOCK is also
used by xfs to synchronization pNFS leases (probably listed under
'etc' in your explanation). I consent that my patch does not look safe
w.r.t pNFS leases, but that can be sorted out with a hammer
#ifndef CONFIG_EXPORTFS_BLOCK_OPS
or with finer instruments.

> FYI, I'm working on a range lock implementation that should both
> solve the performance issue and the reader starvation issue at the
> same time by allowing concurrent buffered reads and writes to
> different file ranges.
>
> IO range locks will allow proper exclusion for other extent
> manipulation operations like fallocate and truncate, and eventually
> even allow truncate, hole punch, file extension, etc to run
> concurrently with other non-overlapping IO. They solve more than
> just the performance issue you are seeing....
>

I'm glad to hear that. IO range locks are definitely a more wholesome
solution to the problem looking forward.

However, I am still interested to continue the discussion on my POC
patch. One reason is that I am guessing it would be much easier for
distros to backport and pick up to solve performance issues.

Even if my patch doesn't get applied upstream nor picked by distros,
I would still like to understand its flaws and limitations. I know...
if I break it, I get to keep the pieces, but the information that you
provide helps me make my risk assessments.

Thanks,
Amir.