* scp bug due to progress indicator when copying from remote to local on Linux @ 2019-01-11 6:11 Steve French 2019-01-11 6:13 ` Fwd: " Steve French ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Steve French @ 2019-01-11 6:11 UTC (permalink / raw) To: CIFS, linux-fsdevel; +Cc: Pavel Shilovsky In discussing an interesting problem with scp recently with Pavel, we found a fairly obvious bug in scp when it is run with a progress indicator (which is the default when the source file is remote). scp triggers "SIGALRM" probably from update_progress_meter() in progressmeter.c when executed with "scp localhost:somelargefile /mnt" ie with an ssh source path but a local path for the target, as long as the flush of the large amount of cached data to disk (which seems to be triggered by the ftruncate call in scp.c) takes more than a few seconds (which can be common depending on disk or network speed). It is interesting that this can be avoided easily by running in quiet mode ("-q") which disables the progress meter in scp, but it seems very, very broken that scp of a large file can 'fail' when using a progress meter (unless caching were disabled on the file system) due to the slight delay in ftruncate due to triggering a flush of a large amount of cached data to disk sequentially. Any thoughts of whether scp is actively maintained and best approach to fix progressmeter.c so it doesn't break on Linux? -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Fwd: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 6:11 scp bug due to progress indicator when copying from remote to local on Linux Steve French @ 2019-01-11 6:13 ` Steve French 2019-01-11 8:50 ` Dr. Bernd Feige 2019-01-11 13:28 ` Matthew Wilcox 2 siblings, 0 replies; 14+ messages in thread From: Steve French @ 2019-01-11 6:13 UTC (permalink / raw) To: LKML ---------- Forwarded message --------- From: Steve French <smfrench@gmail.com> Date: Fri, Jan 11, 2019 at 12:11 AM Subject: scp bug due to progress indicator when copying from remote to local on Linux To: CIFS <linux-cifs@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org> Cc: CIFS <linux-cifs@vger.kernel.org>, Pavel Shilovsky <piastryyy@gmail.com> In discussing an interesting problem with scp recently with Pavel, we found a fairly obvious bug in scp when it is run with a progress indicator (which is the default when the source file is remote). scp triggers "SIGALRM" probably from update_progress_meter() in progressmeter.c when executed with "scp localhost:somelargefile /mnt" ie with an ssh source path but a local path for the target, as long as the flush of the large amount of cached data to disk (which seems to be triggered by the ftruncate call in scp.c) takes more than a few seconds (which can be common depending on disk or network speed). It is interesting that this can be avoided easily by running in quiet mode ("-q") which disables the progress meter in scp, but it seems very, very broken that scp of a large file can 'fail' when using a progress meter (unless caching were disabled on the file system) due to the slight delay in ftruncate due to triggering a flush of a large amount of cached data to disk sequentially. Any thoughts of whether scp is actively maintained and best approach to fix progressmeter.c so it doesn't break on Linux? -- Thanks, Steve -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 6:11 scp bug due to progress indicator when copying from remote to local on Linux Steve French 2019-01-11 6:13 ` Fwd: " Steve French @ 2019-01-11 8:50 ` Dr. Bernd Feige 2019-01-11 13:28 ` Matthew Wilcox 2 siblings, 0 replies; 14+ messages in thread From: Dr. Bernd Feige @ 2019-01-11 8:50 UTC (permalink / raw) To: Steve French, CIFS, linux-fsdevel; +Cc: Pavel Shilovsky Dear Steve, the most commonly used version is part of openssh, part of openbsd, and the Linux version maintained as "portable" version: https://www.openssh.com/portable.html primary SCM is at https://anongit.mindrot.org/openssh.git/ Best regards, Bernd Am Freitag, den 11.01.2019, 00:11 -0600 schrieb Steve French: > In discussing an interesting problem with scp recently with Pavel, we > found a fairly obvious bug in scp when it is run with a progress > indicator (which is the default when the source file is remote). > > scp triggers "SIGALRM" probably from update_progress_meter() in > progressmeter.c when executed with "scp localhost:somelargefile /mnt" > ie with an ssh source path but a local path for the target, as long > as > the flush of the large amount of cached data to disk (which seems to > be triggered by the ftruncate call in scp.c) takes more than a few > seconds (which can be common depending on disk or network speed). > > It is interesting that this can be avoided easily by running in quiet > mode ("-q") which disables the progress meter in scp, but it seems > very, very broken that scp of a large file can 'fail' when using a > progress meter (unless caching were disabled on the file system) due > to the slight delay in ftruncate due to triggering a flush of a large > amount of cached data to disk sequentially. > > Any thoughts of whether scp is actively maintained and best approach > to fix progressmeter.c so it doesn't break on Linux? > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 6:11 scp bug due to progress indicator when copying from remote to local on Linux Steve French 2019-01-11 6:13 ` Fwd: " Steve French 2019-01-11 8:50 ` Dr. Bernd Feige @ 2019-01-11 13:28 ` Matthew Wilcox 2019-01-11 21:13 ` Steve French 2 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2019-01-11 13:28 UTC (permalink / raw) To: Steve French; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: > In discussing an interesting problem with scp recently with Pavel, we > found a fairly obvious bug in scp when it is run with a progress > indicator (which is the default when the source file is remote). > > scp triggers "SIGALRM" probably from update_progress_meter() in > progressmeter.c when executed with "scp localhost:somelargefile /mnt" > ie with an ssh source path but a local path for the target, as long as > the flush of the large amount of cached data to disk (which seems to > be triggered by the ftruncate call in scp.c) takes more than a few > seconds (which can be common depending on disk or network speed). Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate to fail? > It is interesting that this can be avoided easily by running in quiet > mode ("-q") which disables the progress meter in scp, but it seems > very, very broken that scp of a large file can 'fail' when using a > progress meter (unless caching were disabled on the file system) due > to the slight delay in ftruncate due to triggering a flush of a large > amount of cached data to disk sequentially. > > Any thoughts of whether scp is actively maintained and best approach > to fix progressmeter.c so it doesn't break on Linux? > > -- > Thanks, > > Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 13:28 ` Matthew Wilcox @ 2019-01-11 21:13 ` Steve French 2019-01-11 21:22 ` Matthew Wilcox 2019-01-11 21:48 ` Andreas Dilger 0 siblings, 2 replies; 14+ messages in thread From: Steve French @ 2019-01-11 21:13 UTC (permalink / raw) To: Matthew Wilcox; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: > > In discussing an interesting problem with scp recently with Pavel, we > > found a fairly obvious bug in scp when it is run with a progress > > indicator (which is the default when the source file is remote). > > > > scp triggers "SIGALRM" probably from update_progress_meter() in > > progressmeter.c when executed with "scp localhost:somelargefile /mnt" > > ie with an ssh source path but a local path for the target, as long as > > the flush of the large amount of cached data to disk (which seems to > > be triggered by the ftruncate call in scp.c) takes more than a few > > seconds (which can be common depending on disk or network speed). > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > to fail? So ftruncate does not really fail (the file contents and size match on source and target after the copy) but the scp 'fails' and the user would be quite confused (and presumably the network stack doesn't like this signal, which can cause disconnects etc. which in theory could cause reconnect/data loss issues in some corner cases). Here is sample output from rerunning scp a few minutes ago. (in tracing it earlier I can see scp triggering the SIGALRM in the progress indicator code). scp returned non-zero value (failure), but scp 'succeeded' (file was copied fine). If I run with -q (quiet mode, no progress indicator, there is no problem). For a relatively small file size like this it fails less than 1/2 of the time, but if the file size is much larger (gigabytes) the ftruncate obviously takes longer and SIGALRM will always trigger. smfrench@smf-copy-test3:~$ scp localhost:250M /mnt/trgtfile smfrench@localhost's password: 250M 100% 250MB 18.6MB/s 00:13 /mnt/trgtfile: truncate: Interrupted system call smfrench@smf-copy-test3:~$ echo $? 1 smfrench@smf-copy-test3:~$ diff 250M /mnt/trgtfile Here is sample strace output from another run, note a very long series of crazy small write sizes (16K) which complete quite quickly since cached by Linux mm, then a relatively long (few second) pause as ftruncate triggers the flush which causes the SIGALRM time out in the scp progress indicator. <snip> write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 fcntl(7, F_GETFL) = 0x800 (flags O_RDONLY|O_NONBLOCK) fcntl(7, F_SETFL, O_RDONLY) = 0 ftruncate(3, 262144000) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- rt_sigreturn({mask=[ALRM]}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 getpgrp() = 82563 > > It is interesting that this can be avoided easily by running in quiet > > mode ("-q") which disables the progress meter in scp -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:13 ` Steve French @ 2019-01-11 21:22 ` Matthew Wilcox 2019-01-11 21:50 ` Steve French 2019-01-14 19:48 ` Pavel Shilovsky 2019-01-11 21:48 ` Andreas Dilger 1 sibling, 2 replies; 14+ messages in thread From: Matthew Wilcox @ 2019-01-11 21:22 UTC (permalink / raw) To: Steve French; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote: > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > > to fail? > > So ftruncate does not really fail (the file contents and size match on > source and target after the copy) but the scp 'fails' and the user > would be quite confused (and presumably the network stack doesn't like > this signal, which can cause disconnects etc. which in theory could > cause reconnect/data loss issues in some corner cases). You've run into the problem that userspace simply doesn't check the return value from syscalls. It's not just scp, it's every program. Looking through cifs, you seem to do a lot of wait_event_interruptible() where you maybe should be doing wait_event_killable()? > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > restarted if SA_RESTART is set) > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > rt_sigreturn({mask=[ALRM]}) = 0 > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > getpgrp() = 82563 Right ... so the code never calls ftruncate() again. Changing all of userspace is just not going to happen; maybe you could get stuff fixed in libc, but really ftruncate() should only be interrupted by a fatal signal and not by SIGALRM. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:22 ` Matthew Wilcox @ 2019-01-11 21:50 ` Steve French 2019-01-11 23:05 ` Matthew Wilcox 2019-01-14 19:48 ` Pavel Shilovsky 1 sibling, 1 reply; 14+ messages in thread From: Steve French @ 2019-01-11 21:50 UTC (permalink / raw) To: Matthew Wilcox; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 3:22 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > > > to fail? > > > > So ftruncate does not really fail (the file contents and size match on > > source and target after the copy) but the scp 'fails' and the user > > would be quite confused (and presumably the network stack doesn't like > > this signal, which can cause disconnects etc. which in theory could > > cause reconnect/data loss issues in some corner cases). > > You've run into the problem that userspace simply doesn't check the > return value from syscalls. It's not just scp, it's every program. > Looking through cifs, you seem to do a lot of wait_event_interruptible() > where you maybe should be doing wait_event_killable()? > > > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > > restarted if SA_RESTART is set) > > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > > rt_sigreturn({mask=[ALRM]}) = 0 > > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > > getpgrp() = 82563 > > Right ... so the code never calls ftruncate() again. Changing all of > userspace is just not going to happen; maybe you could get stuff fixed in > libc, but really ftruncate() should only be interrupted by a fatal signal > and not by SIGALRM. Looking at the places wait_event_interruptible is done I didn't see code in fs/cifs that would match the (presumably) code path, mostly those calls are in smbdirect (RDMA) code) - for example cifs_setattr does call filemap_write_and_wait but as it goes down into the mm layer and then to cifs_writepages and the SMB3 write code, I didn't spot a "wait_event_interruptible" in that path (I might have missed something in the mm layer). I do see one in the cifs reconnect path, but that is not what we are typically hitting. Any ideas how to match what we are blocked in when we get the annoying SIGALRM? Another vague thought - is it possible to block SIGALRM across all of cifs_setattr? If it is - why do so few (only 3!) file systems (ceph, jffs2, ocfs2 ever call sigprocmask)? -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:50 ` Steve French @ 2019-01-11 23:05 ` Matthew Wilcox 2019-01-11 23:18 ` Steve French 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2019-01-11 23:05 UTC (permalink / raw) To: Steve French; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 03:50:02PM -0600, Steve French wrote: > On Fri, Jan 11, 2019 at 3:22 PM Matthew Wilcox <willy@infradead.org> wrote: > > Right ... so the code never calls ftruncate() again. Changing all of > > userspace is just not going to happen; maybe you could get stuff fixed in > > libc, but really ftruncate() should only be interrupted by a fatal signal > > and not by SIGALRM. > > Looking at the places wait_event_interruptible is done I didn't see code > in fs/cifs that would match the (presumably) code path, mostly those > calls are in > smbdirect (RDMA) code) - for example cifs_setattr does call > filemap_write_and_wait > but as it goes down into the mm layer and then to cifs_writepages and > the SMB3 write > code, I didn't spot a "wait_event_interruptible" in that path (I might > have missed > something in the mm layer). I do see one in the cifs reconnect path, > but that is > not what we are typically hitting. Any ideas how to match what we > are blocked in when > we get the annoying SIGALRM? Another vague thought - is it possible > to block SIGALRM > across all of cifs_setattr? If it is - why do so few (only 3!) file > systems (ceph, jffs2, ocfs2 > ever call sigprocmask)? You can see where a task is currently sleeping with 'cat /proc/$pid/stack'. If you can provoke a long duration ftruncate, that'd be a good place to start looking. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 23:05 ` Matthew Wilcox @ 2019-01-11 23:18 ` Steve French 0 siblings, 0 replies; 14+ messages in thread From: Steve French @ 2019-01-11 23:18 UTC (permalink / raw) To: Matthew Wilcox; +Cc: CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 5:05 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Jan 11, 2019 at 03:50:02PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 3:22 PM Matthew Wilcox <willy@infradead.org> wrote: > > > Right ... so the code never calls ftruncate() again. Changing all of > > > userspace is just not going to happen; maybe you could get stuff fixed in > > > libc, but really ftruncate() should only be interrupted by a fatal signal > > > and not by SIGALRM. > > > > Looking at the places wait_event_interruptible is done I didn't see code > > in fs/cifs that would match the (presumably) code path, mostly those > > calls are in > > smbdirect (RDMA) code) - for example cifs_setattr does call > > filemap_write_and_wait > > but as it goes down into the mm layer and then to cifs_writepages and > > the SMB3 write > > code, I didn't spot a "wait_event_interruptible" in that path (I might > > have missed > > something in the mm layer). I do see one in the cifs reconnect path, > > but that is > > not what we are typically hitting. Any ideas how to match what we > > are blocked in when > > we get the annoying SIGALRM? Another vague thought - is it possible > > to block SIGALRM > > across all of cifs_setattr? If it is - why do so few (only 3!) file > > systems (ceph, jffs2, ocfs2 > > ever call sigprocmask)? > > You can see where a task is currently sleeping with 'cat /proc/$pid/stack'. > If you can provoke a long duration ftruncate, that'd be a good place to > start looking. Not surprisingly it is waiting in mm code: root@smf-copy-test3:~# cat /proc/92189/stack [<0>] io_schedule+0x16/0x40 [<0>] wait_on_page_bit_common+0x14f/0x350 [<0>] __filemap_fdatawait_range+0x104/0x160 [<0>] filemap_write_and_wait+0x4d/0x90 [<0>] cifs_setattr+0xc9/0xe80 [cifs] [<0>] notify_change+0x2d2/0x460 [<0>] do_truncate+0x78/0xc0 [<0>] do_sys_ftruncate+0x14c/0x1c0 [<0>] __x64_sys_ftruncate+0x1b/0x20 [<0>] do_syscall_64+0x5a/0x110 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] 0xffffffffffffffff So that brings me back to thinking about whether it is practical to mask signals (non killable signals) in a few places in cifs.ko as apparently at least a few file systems (ceph and jffs2 and ocfs2) do. In particular mask SIGALRM across calls to filemap_write_and_wait -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:22 ` Matthew Wilcox 2019-01-11 21:50 ` Steve French @ 2019-01-14 19:48 ` Pavel Shilovsky 1 sibling, 0 replies; 14+ messages in thread From: Pavel Shilovsky @ 2019-01-14 19:48 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Steve French, CIFS, linux-fsdevel пт, 11 янв. 2019 г. в 13:22, Matthew Wilcox <willy@infradead.org>: > > On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > > > to fail? > > > > So ftruncate does not really fail (the file contents and size match on > > source and target after the copy) but the scp 'fails' and the user > > would be quite confused (and presumably the network stack doesn't like > > this signal, which can cause disconnects etc. which in theory could > > cause reconnect/data loss issues in some corner cases). > > You've run into the problem that userspace simply doesn't check the > return value from syscalls. It's not just scp, it's every program. > Looking through cifs, you seem to do a lot of wait_event_interruptible() > where you maybe should be doing wait_event_killable()? We are doing wait_event_interruptible() mostly in places where we are waiting on a blocking byte-range lock or when we are waiting for a TCP connection to be established (e.g. after a reconnect) > > > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > > restarted if SA_RESTART is set) > > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > > rt_sigreturn({mask=[ALRM]}) = 0 > > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > > getpgrp() = 82563 > > Right ... so the code never calls ftruncate() again. Changing all of > userspace is just not going to happen; maybe you could get stuff fixed in > libc, but really ftruncate() should only be interrupted by a fatal signal > and not by SIGALRM. It seems that SA_RESTART is just not set for SCP. What do you think about returning ERESTARTNOINTR instead for this specific case - filemap_write_and_wait during ftruncate? It should force the syscall to be restarted regardless of the userspace program settings. -- Best regards, Pavel Shilovsky ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:13 ` Steve French 2019-01-11 21:22 ` Matthew Wilcox @ 2019-01-11 21:48 ` Andreas Dilger 2019-01-11 22:02 ` Steve French 1 sibling, 1 reply; 14+ messages in thread From: Andreas Dilger @ 2019-01-11 21:48 UTC (permalink / raw) To: Steve French; +Cc: Matthew Wilcox, CIFS, linux-fsdevel, Pavel Shilovsky [-- Attachment #1: Type: text/plain, Size: 5155 bytes --] On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@gmail.com> wrote: > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: >>> In discussing an interesting problem with scp recently with Pavel, we >>> found a fairly obvious bug in scp when it is run with a progress >>> indicator (which is the default when the source file is remote). >>> >>> scp triggers "SIGALRM" probably from update_progress_meter() in >>> progressmeter.c when executed with "scp localhost:somelargefile /mnt" >>> ie with an ssh source path but a local path for the target, as long as >>> the flush of the large amount of cached data to disk (which seems to >>> be triggered by the ftruncate call in scp.c) takes more than a few >>> seconds (which can be common depending on disk or network speed). >> >> Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate >> to fail? > > So ftruncate does not really fail (the file contents and size match on > source and target after the copy) but the scp 'fails' and the user > would be quite confused (and presumably the network stack doesn't like > this signal, which can cause disconnects etc. which in theory could > cause reconnect/data loss issues in some corner cases). > > Here is sample output from rerunning scp a few minutes ago. (in > tracing it earlier I can see scp triggering the SIGALRM in the > progress indicator code). scp returned non-zero value (failure), but > scp 'succeeded' (file was copied fine). If I run with -q (quiet mode, > no progress indicator, there is no problem). For a relatively small > file size like this it fails less than 1/2 of the time, but if the > file size is much larger (gigabytes) the ftruncate obviously takes > longer and SIGALRM will always trigger. I'm guessing this is writing to a CIFS filesystem as the target, or some other filesystem that doesn't handle sparse files? AFAIK, at one point there was a heuristic like "truncate-to-non-zero triggers file preallocation" before fallocate() was available? Otherwise, why would the truncate() take a long time even if the file is large, since it shouldn't actually be doing any work to change the size of the file (I'm assuming that the target file is either empty or about the same size during an overwrite, you didn't mention anything unusual as a requirement). Other than the duration of the truncate() call being excessive, it seems to me that everything is working as expected, or at least what the code asked for? It registered SIGALRM, so when a syscall returns -ERESTARTSYS it should actually restart that call. My typical response for issues like this is to find the github repo for the code and push a patch to fix the code myself. If I can't find the upstream project/source repo easily, download the .src.rpm and make a patch and submit it to the major distro package maintainers (RHEL, SLES, Ubuntu or Debian), so that at least it is fixed for the vast majority of users, and those maintainers can handle pushing the patch further upstream (and smaller distros will copy patches/sources from them). Cheers, Andreas > smfrench@smf-copy-test3:~$ scp localhost:250M /mnt/trgtfile > smfrench@localhost's password: > 250M > 100% 250MB > 18.6MB/s 00:13 /mnt/trgtfile: truncate: Interrupted system call > > smfrench@smf-copy-test3:~$ echo $? > 1 > > smfrench@smf-copy-test3:~$ diff 250M /mnt/trgtfile > > > Here is sample strace output from another run, note a very long series > of crazy small write sizes (16K) which complete quite quickly since > cached by Linux mm, then a relatively long (few second) pause as > ftruncate triggers the flush which causes the SIGALRM time out in the > scp progress indicator. > > <snip> > write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 16384) = 16384 > read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 16384) = 16384 > write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 16384) = 16384 > read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 16384) = 16384 > write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 16384) = 16384 > fcntl(7, F_GETFL) = 0x800 (flags O_RDONLY|O_NONBLOCK) > fcntl(7, F_SETFL, O_RDONLY) = 0 > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > restarted if SA_RESTART is set) > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > rt_sigreturn({mask=[ALRM]}) = 0 > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > getpgrp() = 82563 > > >>> It is interesting that this can be avoided easily by running in quiet >>> mode ("-q") which disables the progress meter in scp > > > > -- > Thanks, > > Steve Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 21:48 ` Andreas Dilger @ 2019-01-11 22:02 ` Steve French 2019-01-11 23:51 ` Andreas Dilger 0 siblings, 1 reply; 14+ messages in thread From: Steve French @ 2019-01-11 22:02 UTC (permalink / raw) To: Andreas Dilger; +Cc: Matthew Wilcox, CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 3:48 PM Andreas Dilger <adilger@dilger.ca> wrote: > > On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@gmail.com> wrote: > > > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: > >>> In discussing an interesting problem with scp recently with Pavel, we > >>> found a fairly obvious bug in scp when it is run with a progress > >>> indicator (which is the default when the source file is remote). > >>> > >>> scp triggers "SIGALRM" probably from update_progress_meter() in > >>> progressmeter.c when executed with "scp localhost:somelargefile /mnt" > >>> ie with an ssh source path but a local path for the target, as long as <snip> > I'm guessing this is writing to a CIFS filesystem as the target, or some > other filesystem that doesn't handle sparse files? AFAIK, at one point > there was a heuristic like "truncate-to-non-zero triggers file > preallocation" before fallocate() was available? cifs can handle sparse files (it has had protocol support for various fallocate equivalents for many, many years) , and I think this would fail to other file systems depending on how long flushing cached data takes (presumably a function of disk or network speed, size of cached data etc.) > Otherwise, why would the truncate() take a long time even if the file > is large, since it shouldn't actually be doing any work to change the > size of the file (I'm assuming that the target file is either empty or > about the same size during an overwrite, you didn't mention anything > unusual as a requirement). Note that scp (unlike rsync, robocopy and some other tools doesn't appear to offer an option to do a 'set the file size first' then do the writes ... (which helps with some types of file systems), but setting the file size is not the issue in this case (perhaps it would be in some cloud file systems, or cluster file systems but not in this case). > Other than the duration of the truncate() call being excessive, it seems > to me that everything is working as expected, or at least what the code > asked for? It registered SIGALRM, so when a syscall returns -ERESTARTSYS > it should actually restart that call. > > > My typical response for issues like this is to find the github repo for > the code and push a patch to fix the code myself. If I can't find the > upstream project/source repo easily, download the .src.rpm and make a > patch and submit it to the major distro package maintainers (RHEL, SLES, > Ubuntu or Debian), so that at least it is fixed for the vast majority > of users, and those maintainers can handle pushing the patch further > upstream (and smaller distros will copy patches/sources from them). I have built the scp code as an experiment and by the way there are many, many things that could be changed (other copy tools e.g. robocopy even rsync and cp) look much more mature (with lots more useful performance options). My earlier experiments e.g. changing the scp i/o size to something larger did help as expected for various cases (16K is suboptimal in many cases). But in this case it seems much simpler - the code path is basically read 16K over ssh write 16K to local/network file system (cached) (repeated 1000s of times) then ftruncate ftruncate if it were simply setting the file size would be very fast, except that setting the file size needs to have the data written out (writes otherwise could risk extending the file size) so triggers a call to writepages (filemap_write_and_wait) as a sideeffect so all the cached data is finally written out which can take a few seconds and cause the scp progress indicator to malfunction and trigger the SIGALRM. This is normally not a problem fsync etc. can take longer than a single i/o operation. -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 22:02 ` Steve French @ 2019-01-11 23:51 ` Andreas Dilger 2019-01-12 0:06 ` Steve French 0 siblings, 1 reply; 14+ messages in thread From: Andreas Dilger @ 2019-01-11 23:51 UTC (permalink / raw) To: Steve French; +Cc: Matthew Wilcox, CIFS, linux-fsdevel, Pavel Shilovsky [-- Attachment #1: Type: text/plain, Size: 4713 bytes --] On Jan 11, 2019, at 3:02 PM, Steve French <smfrench@gmail.com> wrote: > > On Fri, Jan 11, 2019 at 3:48 PM Andreas Dilger <adilger@dilger.ca> wrote: >> >> On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@gmail.com> wrote: >>> >>> On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: >>>>> In discussing an interesting problem with scp recently with Pavel, we >>>>> found a fairly obvious bug in scp when it is run with a progress >>>>> indicator (which is the default when the source file is remote). >>>>> >>>>> scp triggers "SIGALRM" probably from update_progress_meter() in >>>>> progressmeter.c when executed with "scp localhost:somelargefile /mnt" >>>>> ie with an ssh source path but a local path for the target, as long as > <snip> > >> My typical response for issues like this is to find the github repo for >> the code and push a patch to fix the code myself. If I can't find the >> upstream project/source repo easily, download the .src.rpm and make a >> patch and submit it to the major distro package maintainers (RHEL, SLES, >> Ubuntu or Debian), so that at least it is fixed for the vast majority >> of users, and those maintainers can handle pushing the patch further >> upstream (and smaller distros will copy patches/sources from them). > > I have built the scp code as an experiment and by the way there are many, > many things that could be changed (other copy tools e.g. robocopy even rsync > and cp) look much more mature (with lots more useful performance options). > My earlier experiments e.g. changing the scp i/o size to something larger > did help as expected for various cases (16K is suboptimal in many cases). Years ago I submitted a patch to fix GNU fileutils so that "cp" could use st_blksize to determine the data IO size for the copy. Originally it was allocate its copy buffer on the stack, but when Lustre returned st_blksize of 8MB it exploded. 2MB was safe at the time, and what we used for many years but fixing "cp" allowed us to increase st_blksize over time. Probably scp should be modified to do the same. It is often just a case of the developer didn't know any better, or the code was written so long ago that 16KB transfers were seen as a huge improvement over 4KB or 1KB transfers, and nobody has looked at it again. Since we all depend on other developers to know their side of the house (e.g. I'm wholly dependent on others for the crypto implementation of SSH/scp), we may as well provide some help when there is something as widely used as scp that we can improve in some useful way. > But in this case it seems much simpler - the code path is basically > read 16K over ssh > write 16K to local/network file system (cached) > (repeated 1000s of times) > then ftruncate Even if the filesystem is transferring large IOs from disk/network, doing many thousands of excess system calls is pure overhead that could be avoided. Also, even if scp could only do 16KB network transfers, staying in tight "scp copies data over ssh" loop is better for icache/dcache/context, then doing a single large write call to/from the filesystem is most efficient on that side of the house. >> ... why would the truncate() take a long time even if the file >> is large, since it shouldn't actually be doing any work to change the >> size of the file (I'm assuming that the target file is either empty or >> about the same size during an overwrite, you didn't mention anything >> unusual as a requirement). > > ftruncate if it were simply setting the file size would be very fast, > except that setting the file size needs to have the data written out > (writes otherwise could risk extending the file size) so triggers a > call to writepages (filemap_write_and_wait) as a sideeffect > so all the cached data is finally written out which can take a few > seconds and cause the scp progress indicator to malfunction > and trigger the SIGALRM. This is normally not a problem > fsync etc. can take longer than a single i/o operation. This seems more like a kernel/filesystem issue. We shouldn't have to flush the entire file to disk just to change the file size. In the exceedingly common case of the current size == truncate() size, the truncate should be a no-op unless the on-disk size is > new size? In the unlikely case that file size > new size then it could truncate only the pages and on-disk blocks beyond the new size? I don't see how pages in cache < new size could cause the file size to be extended. Is this a VFS/MM thing, or CIFS-specific? Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scp bug due to progress indicator when copying from remote to local on Linux 2019-01-11 23:51 ` Andreas Dilger @ 2019-01-12 0:06 ` Steve French 0 siblings, 0 replies; 14+ messages in thread From: Steve French @ 2019-01-12 0:06 UTC (permalink / raw) To: Andreas Dilger; +Cc: Matthew Wilcox, CIFS, linux-fsdevel, Pavel Shilovsky On Fri, Jan 11, 2019 at 5:51 PM Andreas Dilger <adilger@dilger.ca> wrote: > > On Jan 11, 2019, at 3:02 PM, Steve French <smfrench@gmail.com> wrote: > > > > On Fri, Jan 11, 2019 at 3:48 PM Andreas Dilger <adilger@dilger.ca> wrote: > >> > >> On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@gmail.com> wrote: > >>> > >>> On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@infradead.org> wrote: > >>>> > >>>> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote: > >>>>> In discussing an interesting problem with scp recently with Pavel, we > >>>>> found a fairly obvious bug in scp when it is run with a progress > >>>>> indicator (which is the default when the source file is remote). > >>>>> > >>>>> scp triggers "SIGALRM" probably from update_progress_meter() in > >>>>> progressmeter.c when executed with "scp localhost:somelargefile /mnt" > >>>>> ie with an ssh source path but a local path for the target, as long as > > <snip> > > > >> My typical response for issues like this is to find the github repo for > >> the code and push a patch to fix the code myself. If I can't find the > >> upstream project/source repo easily, download the .src.rpm and make a > >> patch and submit it to the major distro package maintainers (RHEL, SLES, > >> Ubuntu or Debian), so that at least it is fixed for the vast majority > >> of users, and those maintainers can handle pushing the patch further > >> upstream (and smaller distros will copy patches/sources from them). > > > > I have built the scp code as an experiment and by the way there are many, > > many things that could be changed (other copy tools e.g. robocopy even rsync > > and cp) look much more mature (with lots more useful performance options). > > My earlier experiments e.g. changing the scp i/o size to something larger > > did help as expected for various cases (16K is suboptimal in many cases). > > Years ago I submitted a patch to fix GNU fileutils so that "cp" could use > st_blksize to determine the data IO size for the copy. Originally it was > allocate its copy buffer on the stack, but when Lustre returned st_blksize > of 8MB it exploded. 2MB was safe at the time, and what we used for many > years but fixing "cp" allowed us to increase st_blksize over time. Probably > scp should be modified to do the same. > > It is often just a case of the developer didn't know any better, or the code > was written so long ago that 16KB transfers were seen as a huge improvement > over 4KB or 1KB transfers, and nobody has looked at it again. Since we all > depend on other developers to know their side of the house (e.g. I'm wholly > dependent on others for the crypto implementation of SSH/scp), we may as > well provide some help when there is something as widely used as scp that > we can improve in some useful way. > > > But in this case it seems much simpler - the code path is basically > > read 16K over ssh > > write 16K to local/network file system (cached) > > (repeated 1000s of times) > > then ftruncate > > Even if the filesystem is transferring large IOs from disk/network, doing > many thousands of excess system calls is pure overhead that could be avoided. > Also, even if scp could only do 16KB network transfers, staying in tight > "scp copies data over ssh" loop is better for icache/dcache/context, then > doing a single large write call to/from the filesystem is most efficient > on that side of the house. > > >> ... why would the truncate() take a long time even if the file > >> is large, since it shouldn't actually be doing any work to change the > >> size of the file (I'm assuming that the target file is either empty or > >> about the same size during an overwrite, you didn't mention anything > >> unusual as a requirement). > > > > ftruncate if it were simply setting the file size would be very fast, > > except that setting the file size needs to have the data written out > > (writes otherwise could risk extending the file size) so triggers a > > call to writepages (filemap_write_and_wait) as a sideeffect > > so all the cached data is finally written out which can take a few > > seconds and cause the scp progress indicator to malfunction > > and trigger the SIGALRM. This is normally not a problem > > fsync etc. can take longer than a single i/o operation. > > This seems more like a kernel/filesystem issue. We shouldn't have to flush > the entire file to disk just to change the file size. In the exceedingly > common case of the current size == truncate() size, the truncate should be > a no-op unless the on-disk size is > new size? In the unlikely case that > file size > new size then it could truncate only the pages and on-disk > blocks beyond the new size? > > I don't see how pages in cache < new size could cause the file size to be > extended. Is this a VFS/MM thing, or CIFS-specific? I suspect that for various other network/cluster file systems, flushing (not to disk but at least to the server, ie across the network) is considered safer to do before certain metadata updates (including setting file size) -- Thanks, Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-01-14 19:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-11 6:11 scp bug due to progress indicator when copying from remote to local on Linux Steve French 2019-01-11 6:13 ` Fwd: " Steve French 2019-01-11 8:50 ` Dr. Bernd Feige 2019-01-11 13:28 ` Matthew Wilcox 2019-01-11 21:13 ` Steve French 2019-01-11 21:22 ` Matthew Wilcox 2019-01-11 21:50 ` Steve French 2019-01-11 23:05 ` Matthew Wilcox 2019-01-11 23:18 ` Steve French 2019-01-14 19:48 ` Pavel Shilovsky 2019-01-11 21:48 ` Andreas Dilger 2019-01-11 22:02 ` Steve French 2019-01-11 23:51 ` Andreas Dilger 2019-01-12 0:06 ` Steve French
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.