From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78DF1C4338F for ; Thu, 5 Aug 2021 17:33:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 26FCD610A2 for ; Thu, 5 Aug 2021 17:33:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 26FCD610A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9ADC18D0001; Thu, 5 Aug 2021 13:33:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95E656B0071; Thu, 5 Aug 2021 13:33:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84C468D0001; Thu, 5 Aug 2021 13:33:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 676E66B006C for ; Thu, 5 Aug 2021 13:33:34 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 124CB250BB for ; Thu, 5 Aug 2021 17:33:34 +0000 (UTC) X-FDA: 78441723948.06.65941E7 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf26.hostedemail.com (Postfix) with ESMTP id A79372002808 for ; Thu, 5 Aug 2021 17:33:33 +0000 (UTC) Received: by mail-lj1-f176.google.com with SMTP id x7so8104924ljn.10 for ; Thu, 05 Aug 2021 10:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kSMSzayyNT2YehjLQ2Cwc0r8HqyEvjniy8Pw7lCSffg=; b=CjeV+4n++fBbfbgUnSajYHMZzSgIGQ/AzXROF+Y4CaB/aORGEHsRe4pUpEJVdYpwfB NmAZXoNyQn6JPxdWXpyZnM00VCUOuYnkUNerNg1Ng90I+xulwoWK81gZiomswCGM8sp6 nIAp7+gezxOviLaRF+m3bsJwsaKOi0OIDRvRA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kSMSzayyNT2YehjLQ2Cwc0r8HqyEvjniy8Pw7lCSffg=; b=UXTtBkgVqJYSlzBJX/H+XUojpxZO/Wi54CJn8+RO0RbMTTlSv6BgrO5SogdNdB63dh PJkDlpydyvcWuVM5KORpuoH2kLzpAY/uu8zm9QjNLf+yaalo1xVTodCD1IUlxLDUcNfl QTXPHpPad1QeLfD0d8ASDnh/bqCCpmfPmHItYXjtWsT2kL/PTOvl3LsDqRFbvy+sb1NT p9zBrPpSo/6PKeYY+opinmnLudZ+artt3DU3L5GkNazJpRjRox3ZuKIPfVlCtqoOxqI6 VdUSIVjiTqqFRQ4xsjSQDTXEWchdQEUS6YfrIElLcnlgrQ8rdGT3c0oGnUTi+XfgA96g xALA== X-Gm-Message-State: AOAM533jmCWqvrnnpF0TF1vdaW5xjDtwCQK7E/eo/aHxY0DCqa+8G7BS lozWH2D1J8svmUXql955F536Jne8heIvfun1Fbo= X-Google-Smtp-Source: ABdhPJwCdfc5koKN/EcVzwpFdjlM0Vw/OjJUjFw/KvVvY/YS8/B8b6AnyR/QSVki7octe+lF+YDLLQ== X-Received: by 2002:a2e:8184:: with SMTP id e4mr4007474ljg.186.1628184811775; Thu, 05 Aug 2021 10:33:31 -0700 (PDT) Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com. [209.85.167.48]) by smtp.gmail.com with ESMTPSA id c8sm574507lfr.56.2021.08.05.10.33.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Aug 2021 10:33:31 -0700 (PDT) Received: by mail-lf1-f48.google.com with SMTP id bq29so12648675lfb.5 for ; Thu, 05 Aug 2021 10:33:31 -0700 (PDT) X-Received: by 2002:a05:6512:2388:: with SMTP id c8mr4369071lfv.201.1628184441363; Thu, 05 Aug 2021 10:27:21 -0700 (PDT) MIME-Version: 1.0 References: <1017390.1628158757@warthog.procyon.org.uk> <1170464.1628168823@warthog.procyon.org.uk> <1186271.1628174281@warthog.procyon.org.uk> <1219713.1628181333@warthog.procyon.org.uk> In-Reply-To: <1219713.1628181333@warthog.procyon.org.uk> From: Linus Torvalds Date: Thu, 5 Aug 2021 10:27:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Canvassing for network filesystem write size vs page size To: David Howells Cc: Anna Schumaker , Trond Myklebust , Jeff Layton , Steve French , Dominique Martinet , Mike Marshall , Miklos Szeredi , "Matthew Wilcox (Oracle)" , Shyam Prasad N , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, "open list:NFS, SUNRPC, AND..." , CIFS , ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, Linux-MM , linux-fsdevel , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=CjeV+4n+; dmarc=none; spf=pass (imf26.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.176 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Stat-Signature: 9zhxhjdptcgu8sfwarxsiejyk5omexdo X-Rspamd-Queue-Id: A79372002808 X-Rspamd-Server: rspam01 X-HE-Tag: 1628184813-505800 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 5, 2021 at 9:36 AM David Howells wrote: > > Some network filesystems, however, currently keep track of which byte ranges > are modified within a dirty page (AFS does; NFS seems to also) and only write > out the modified data. NFS definitely does. I haven't used NFS in two decades, but I worked on some of the code (read: I made nfs use the page cache both for reading and writing) back in my Transmeta days, because NFSv2 was the default filesystem setup back then. See fs/nfs/write.c, although I have to admit that I don't recognize that code any more. It's fairly important to be able to do streaming writes without having to read the old contents for some loads. And read-modify-write cycles are death for performance, so you really want to coalesce writes until you have the whole page. That said, I suspect it's also *very* filesystem-specific, to the point where it might not be worth trying to do in some generic manner. In particular, NFS had things like interesting credential issues, so if you have multiple concurrent writers that used different 'struct file *' to write to the file, you can't just mix the writes. You have to sync the writes from one writer before you start the writes for the next one, because one might succeed and the other not. So you can't just treat it as some random "page cache with dirty byte extents". You really have to be careful about credentials, timeouts, etc, and the pending writes have to keep a fair amount of state around. At least that was the case two decades ago. [ goes off and looks. See "nfs_write_begin()" and friends in fs/nfs/file.c for some of the examples of these things, althjough it looks like the code is less aggressive about avoding the read-modify-write case than I thought I remembered, and only does it for write-only opens ] Linus Linus