From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ZY7O=PU=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E80EC43387
	for <linux-fsdevel@archiver.kernel.org>; Sat, 12 Jan 2019 00:06:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 138792183F
	for <linux-fsdevel@archiver.kernel.org>; Sat, 12 Jan 2019 00:06:38 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SYHbrJcx"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726467AbfALAGh (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 11 Jan 2019 19:06:37 -0500
Received: from mail-pg1-f194.google.com ([209.85.215.194]:38989 "EHLO
        mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725927AbfALAGg (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 11 Jan 2019 19:06:36 -0500
Received: by mail-pg1-f194.google.com with SMTP id w6so6972088pgl.6;
        Fri, 11 Jan 2019 16:06:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=I1nIkiX2Tc0fkHb2d3oi30EcJ5EKDLXYgwIE4qsSO1U=;
        b=SYHbrJcxws5IH1/bObn9UX5FfQMLdLmPpIZ4BqAW7TRPmrXWva4+MIqo/cYl8Cl9D+
         sC2ftCpz095nAlWotr4ZRcnrxYdzfWVhoN7RQlvH4WUYY66W3Wp/fEiGVKcz3W7fdr6P
         XjpEg154xE7cD+9wre0l/3CiJiIjPQpYfKpnaX/WZ4Knf/IOuIZAEhNzPhZXIGmu/Cir
         fNzEH9owIcZkB8EMZ/Oz4IzlOGa15se/j5+sPPHNaqfK9sZRC1twcR1AuiauWfYPfWek
         zWaMApWrFPj5XkdpktL7YnCPOvN8+F1uLxlx+u3jjG336y8g8RABRvojjhnJDJgZ7z/i
         m5YQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=I1nIkiX2Tc0fkHb2d3oi30EcJ5EKDLXYgwIE4qsSO1U=;
        b=Ez8LTJnl68C7dSb74gVubD4geiUkM6Mikkm/pfLCrSKzkZp4TlMUklQ0xUSTPMlfAK
         fNhp3RHvexVEVOUp7EZtMlSLVUu9WZMCwxRUDtHu18S77eZ6PwJK/4REUiaUkOapg6oG
         mgb5CewBBphmOwoRQw2Y2Vu+CtoPA2YsOdGs91dPG//H00ys6W3qCrv3yOr9DfHMT4VB
         T0rawTJPCDYy6ltpOD3kTsG+ZIo2JSaHsK0qYZva2hhr3hjmA5t4L+vlUUUfPZrc7ou7
         i7piGUi2hQXSc7ctZSFoNNOo9aTQW3rnAtmhl3mTPbKqPSCUuGMfaPmwgahVdyB1xZ43
         IfIw==
X-Gm-Message-State: AJcUukdZfRQ7V+dk22yaL9ZJJexwC5tQpFW4n25ZB5rCPv6Tei/RBzNa
        0IkyjI9YmDKAI6iSrRra+T65ZcIlKaGTLbPn39I=
X-Google-Smtp-Source: ALg8bN64s+smcL6DSPQ8YTXNWRjEWZgK/cgGRZqAYJPc1PB+Y0UCBrtrxp+lmHmCPEcMCJj1QA6TB8teXf1NGubPtMs=
X-Received: by 2002:a65:6684:: with SMTP id b4mr15225053pgw.55.1547251595388;
 Fri, 11 Jan 2019 16:06:35 -0800 (PST)
MIME-Version: 1.0
References: <CAH2r5mtQTu1mwh0Zpep71A5oHHYboLv5qD=8n_LeASzmncNx4g@mail.gmail.com>
 <20190111132817.GE6310@bombadil.infradead.org> <CAH2r5mveiLSTS6W_yQzhEbEdVb0AuEkaP1XO9WxXLWpLXE7JAA@mail.gmail.com>
 <F5A48F1A-1501-477D-BF83-1150A0FAA34D@dilger.ca> <CAH2r5mshisW=B-=qaVCbj2AF+BBr5n7O+4ATa+pH5FJr6bZw1A@mail.gmail.com>
 <2E820BDF-2984-4D02-A5A9-E25635348211@dilger.ca>
In-Reply-To: <2E820BDF-2984-4D02-A5A9-E25635348211@dilger.ca>
From:   Steve French <smfrench@gmail.com>
Date:   Fri, 11 Jan 2019 18:06:23 -0600
Message-ID: <CAH2r5mtarJZxte3tZb_=xH9U-ysPYsd6o0VM9ymYmju=wjSaHw@mail.gmail.com>
Subject: Re: scp bug due to progress indicator when copying from remote to
 local on Linux
To:     Andreas Dilger <adilger@dilger.ca>
Cc:     Matthew Wilcox <willy@infradead.org>,
        CIFS <linux-cifs@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Pavel Shilovsky <piastryyy@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Fri, Jan 11, 2019 at 5:51 PM Andreas Dilger <adilger@dilger.ca> wrote:
>
> On Jan 11, 2019, at 3:02 PM, Steve French <smfrench@gmail.com> wrote:
> >
> > On Fri, Jan 11, 2019 at 3:48 PM Andreas Dilger <adilger@dilger.ca> wrote:
> >>
> >> On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@gmail.com> wrote:
> >>>
> >>> On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>>>
> >>>> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote:
> >>>>> In discussing an interesting problem with scp recently with Pavel, we
> >>>>> found a fairly obvious bug in scp when it is run with a progress
> >>>>> indicator (which is the default when the source file is remote).
> >>>>>
> >>>>> scp triggers "SIGALRM" probably from update_progress_meter() in
> >>>>> progressmeter.c when executed with "scp localhost:somelargefile /mnt"
> >>>>> ie with an ssh source path but a local path for the target, as long as
> > <snip>
> >
> >> My typical response for issues like this is to find the github repo for
> >> the code and push a patch to fix the code myself.  If I can't find the
> >> upstream project/source repo easily, download the .src.rpm and make a
> >> patch and submit it to the major distro package maintainers (RHEL, SLES,
> >> Ubuntu or Debian), so that at least it is fixed for the vast majority
> >> of users, and those maintainers can handle pushing the patch further
> >> upstream (and smaller distros will copy patches/sources from them).
> >
> > I have built the scp code as an experiment and by the way there are many,
> > many things that could be changed (other copy tools e.g. robocopy even rsync
> > and cp) look much more mature (with lots more useful performance options).
> > My earlier experiments e.g. changing the scp i/o size to something larger
> > did help as expected for various cases (16K is suboptimal in many cases).
>
> Years ago I submitted a patch to fix GNU fileutils so that "cp" could use
> st_blksize to determine the data IO size for the copy.  Originally it was
> allocate its copy buffer on the stack, but when Lustre returned st_blksize
> of 8MB it exploded.  2MB was safe at the time, and what we used for many
> years but fixing "cp" allowed us to increase st_blksize over time. Probably
> scp should be modified to do the same.
>
> It is often just a case of the developer didn't know any better, or the code
> was written so long ago that 16KB transfers were seen as a huge improvement
> over 4KB or 1KB transfers, and nobody has looked at it again.  Since we all
> depend on other developers to know their side of the house (e.g. I'm wholly
> dependent on others for the crypto implementation of SSH/scp), we may as
> well provide some help when there is something as widely used as scp that
> we can improve in some useful way.
>
> > But in this case it seems much simpler - the code path is basically
> > read 16K over ssh
> > write 16K to local/network file system (cached)
> > (repeated 1000s of times)
> > then ftruncate
>
> Even if the filesystem is transferring large IOs from disk/network, doing
> many thousands of excess system calls is pure overhead that could be avoided.
> Also, even if scp could only do 16KB network transfers, staying in tight
> "scp copies data over ssh" loop is better for icache/dcache/context, then
> doing a single large write call to/from the filesystem is most efficient
> on that side of the house.
>
> >> ... why would the truncate() take a long time even if the file
> >> is large, since it shouldn't actually be doing any work to change the
> >> size of the file (I'm assuming that the target file is either empty or
> >> about the same size during an overwrite, you didn't mention anything
> >> unusual as a requirement).
> >
> > ftruncate if it were simply setting the file size would be very fast,
> > except that setting the file size needs to have the data written out
> > (writes otherwise could risk extending the file size) so triggers a
> > call to writepages (filemap_write_and_wait) as a sideeffect
> > so all the cached data is finally written out which can take a few
> > seconds and cause the scp progress indicator to malfunction
> > and trigger the SIGALRM.  This is normally not a problem
> > fsync etc. can take longer than a single i/o operation.
>
> This seems more like a kernel/filesystem issue.  We shouldn't have to flush
> the entire file to disk just to change the file size.  In the exceedingly
> common case of the current size == truncate() size, the truncate should be
> a no-op unless the on-disk size is > new size?  In the unlikely case that
> file size > new size then it could truncate only the pages and on-disk
> blocks beyond the new size?
>
> I don't see how pages in cache < new size could cause the file size to be
> extended.  Is this a VFS/MM thing, or CIFS-specific?

I suspect that for various other network/cluster file systems, flushing (not
to disk but at least to the server, ie across the network) is considered safer
to do before certain metadata updates (including setting file size)


-- 
Thanks,

Steve