linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] copy_file_range with sysfs file as input
@ 2021-01-25  7:54 Nicolas Boichat
  2021-01-25 19:44 ` Ian Lance Taylor
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nicolas Boichat @ 2021-01-25  7:54 UTC (permalink / raw)
  To: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein, Dave Chinner
  Cc: Luis Lozano, iant

Hi copy_file_range experts,

We hit this interesting issue when upgrading Go compiler from 1.13 to
1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
`/sys/kernel/debug/tracing/trace` to a temporary file.

Under the hood, Go now uses `copy_file_range` syscall to optimize the
copy operation. However, that fails to copy any content when the input
file is from sysfs/tracefs, with an apparent size of 0 (but there is
still content when you `cat` it, of course).

A repro case is available in comment7 (adapted from the man page),
also copied below [2].

Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and
5.10.3 (chromeos))
$ ./copyfrom /sys/kernel/debug/tracing/trace x
0 bytes copied
$ cat x
$ cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |

I can try to dig further, but thought you'd like to get a bug report
as soon as possible.

Thanks,

Nicolas

[1] http://issuetracker.google.com/issues/178332739
[2]
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
        int fd_in, fd_out;
        loff_t ret;

        if (argc != 3) {
                fprintf(stderr, "Usage: %s <source> <destination>\n", argv[0]);
                exit(EXIT_FAILURE);
        }

        fd_in = open(argv[1], O_RDONLY);
        if (fd_in == -1) {
                perror("open (argv[1])");
                exit(EXIT_FAILURE);
        }

        fd_out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0644);
        if (fd_out == -1) {
                perror("open (argv[2])");
                exit(EXIT_FAILURE);
        }

        ret = copy_file_range(fd_in, NULL, fd_out, NULL, 1024, 0);
        if (ret == -1) {
                perror("copy_file_range");
                exit(EXIT_FAILURE);
        }
        printf("%d bytes copied\n", (int)ret);

        close(fd_in);
        close(fd_out);
        exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-01-25  7:54 [BUG] copy_file_range with sysfs file as input Nicolas Boichat
@ 2021-01-25 19:44 ` Ian Lance Taylor
  2021-01-26  1:34 ` Dave Chinner
  2021-02-03  9:04 ` Greg KH
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Lance Taylor @ 2021-01-25 19:44 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano

Thanks for the note.  I'm not a kernel developer, but to me this
sounds like a kernel bug.  It seems particularly unfortunate that
copy_file_range returns 0 in this case.  From the perspective of the
Go standard library, what we would need is some mechanism to detect
when the copy_file_range system call will not or did not work
correctly.  As the biggest hammer, we currently only call
copy_file_range on kernel versions 5.3 and newer.  We can bump that
requirement if necessary.

Please feel free to open a bug about this at https://golang.org/issue,
but we'll need guidance as to what we should do to avoid the problem.
Thanks.

Ian

On Sun, Jan 24, 2021 at 11:54 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> Hi copy_file_range experts,
>
> We hit this interesting issue when upgrading Go compiler from 1.13 to
> 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> `/sys/kernel/debug/tracing/trace` to a temporary file.
>
> Under the hood, Go now uses `copy_file_range` syscall to optimize the
> copy operation. However, that fails to copy any content when the input
> file is from sysfs/tracefs, with an apparent size of 0 (but there is
> still content when you `cat` it, of course).
>
> A repro case is available in comment7 (adapted from the man page),
> also copied below [2].
>
> Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and
> 5.10.3 (chromeos))
> $ ./copyfrom /sys/kernel/debug/tracing/trace x
> 0 bytes copied
> $ cat x
> $ cat /sys/kernel/debug/tracing/trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0   #P:8
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| /     delay
> #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> #              | |         |   ||||      |         |
>
> I can try to dig further, but thought you'd like to get a bug report
> as soon as possible.
>
> Thanks,
>
> Nicolas
>
> [1] http://issuetracker.google.com/issues/178332739
> [2]
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <unistd.h>
>
> int
> main(int argc, char **argv)
> {
>         int fd_in, fd_out;
>         loff_t ret;
>
>         if (argc != 3) {
>                 fprintf(stderr, "Usage: %s <source> <destination>\n", argv[0]);
>                 exit(EXIT_FAILURE);
>         }
>
>         fd_in = open(argv[1], O_RDONLY);
>         if (fd_in == -1) {
>                 perror("open (argv[1])");
>                 exit(EXIT_FAILURE);
>         }
>
>         fd_out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0644);
>         if (fd_out == -1) {
>                 perror("open (argv[2])");
>                 exit(EXIT_FAILURE);
>         }
>
>         ret = copy_file_range(fd_in, NULL, fd_out, NULL, 1024, 0);
>         if (ret == -1) {
>                 perror("copy_file_range");
>                 exit(EXIT_FAILURE);
>         }
>         printf("%d bytes copied\n", (int)ret);
>
>         close(fd_in);
>         close(fd_out);
>         exit(EXIT_SUCCESS);
> }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-01-25  7:54 [BUG] copy_file_range with sysfs file as input Nicolas Boichat
  2021-01-25 19:44 ` Ian Lance Taylor
@ 2021-01-26  1:34 ` Dave Chinner
  2021-01-26  3:50   ` Nicolas Boichat
  2021-02-03  9:04 ` Greg KH
  2 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2021-01-26  1:34 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano, iant

On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote:
> Hi copy_file_range experts,
> 
> We hit this interesting issue when upgrading Go compiler from 1.13 to
> 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> `/sys/kernel/debug/tracing/trace` to a temporary file.
> 
> Under the hood, Go now uses `copy_file_range` syscall to optimize the
> copy operation. However, that fails to copy any content when the input
> file is from sysfs/tracefs, with an apparent size of 0 (but there is
> still content when you `cat` it, of course).
> 
> A repro case is available in comment7 (adapted from the man page),
> also copied below [2].
> 
> Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and
> 5.10.3 (chromeos))
> $ ./copyfrom /sys/kernel/debug/tracing/trace x
> 0 bytes copied

That's basically telling you that copy_file_range() was unable to
copy anything. The man page says:

RETURN VALUE
       Upon  successful  completion,  copy_file_range() will return
       the number of bytes copied between files.  This could be less
       than the length originally requested.  If the file offset
       of fd_in is at or past the end of file, no bytes are copied,
       and copy_file_range() returns zero.

THe man page explains it perfectly. Look at the trace file you are
trying to copy:

$ ls -l /sys/kernel/debug/tracing/trace
-rw-r--r-- 1 root root 0 Jan 19 12:17 /sys/kernel/debug/tracing/trace
$ cat /sys/kernel/debug/tracing/trace
tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |

Yup, the sysfs file reports it's size as zero length, so the CFR
syscall is saying "there's nothing to copy from this empty file" and
so correctly is returning zero without even trying to copy anything
because the file offset is at EOF...

IOWs, there's no copy_file_range() bug here - it's behaving as
documented.

'cat' "works" in this situation because it doesn't check the file
size and just attempts to read unconditionally from the file. Hence
it happily returns non-existent stale data from busted filesystem
implementations that allow data to be read from beyond EOF...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-01-26  1:34 ` Dave Chinner
@ 2021-01-26  3:50   ` Nicolas Boichat
  2021-01-26  7:49     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Boichat @ 2021-01-26  3:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano, iant

On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote:
> > Hi copy_file_range experts,
> >
> > We hit this interesting issue when upgrading Go compiler from 1.13 to
> > 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> > `/sys/kernel/debug/tracing/trace` to a temporary file.
> >
> > Under the hood, Go now uses `copy_file_range` syscall to optimize the
> > copy operation. However, that fails to copy any content when the input
> > file is from sysfs/tracefs, with an apparent size of 0 (but there is
> > still content when you `cat` it, of course).
> >
> > A repro case is available in comment7 (adapted from the man page),
> > also copied below [2].
> >
> > Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and
> > 5.10.3 (chromeos))
> > $ ./copyfrom /sys/kernel/debug/tracing/trace x
> > 0 bytes copied
>
> That's basically telling you that copy_file_range() was unable to
> copy anything. The man page says:
>
> RETURN VALUE
>        Upon  successful  completion,  copy_file_range() will return
>        the number of bytes copied between files.  This could be less
>        than the length originally requested.  If the file offset
>        of fd_in is at or past the end of file, no bytes are copied,
>        and copy_file_range() returns zero.
>
> THe man page explains it perfectly.

I'm not that confident the explanation is perfect ,-)

How does one define "EOF"? The read manpage
(https://man7.org/linux/man-pages/man2/read.2.html) defines it as a
zero return value. I don't think using the inode file size is
standard. Seems like the kernel is not even trying to read from the
source file here.

In any case, I can fix this issue by dropping the count check here:
https://elixir.bootlin.com/linux/latest/source/fs/read_write.c#L1445 .
I'll send a patch so that we can discuss based on that.

> Look at the trace file you are
> trying to copy:
>
> $ ls -l /sys/kernel/debug/tracing/trace
> -rw-r--r-- 1 root root 0 Jan 19 12:17 /sys/kernel/debug/tracing/trace
> $ cat /sys/kernel/debug/tracing/trace
> tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0   #P:8
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>
> Yup, the sysfs file reports it's size as zero length, so the CFR
> syscall is saying "there's nothing to copy from this empty file" and
> so correctly is returning zero without even trying to copy anything
> because the file offset is at EOF...
>
> IOWs, there's no copy_file_range() bug here - it's behaving as
> documented.
>
> 'cat' "works" in this situation because it doesn't check the file
> size and just attempts to read unconditionally from the file. Hence
> it happily returns non-existent stale data from busted filesystem
> implementations that allow data to be read from beyond EOF...

`cp` also works, so does `dd` and basically any other file operation.

(and I wouldn't call procfs, sysfs, debugfs and friends "busted", they
are just... special)


>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-01-26  3:50   ` Nicolas Boichat
@ 2021-01-26  7:49     ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2021-01-26  7:49 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano, iant

On Tue, Jan 26, 2021 at 11:50:50AM +0800, Nicolas Boichat wrote:
> On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote:
> > > Hi copy_file_range experts,
> > >
> > > We hit this interesting issue when upgrading Go compiler from 1.13 to
> > > 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> > > `/sys/kernel/debug/tracing/trace` to a temporary file.
> > >
> > > Under the hood, Go now uses `copy_file_range` syscall to optimize the
> > > copy operation. However, that fails to copy any content when the input
> > > file is from sysfs/tracefs, with an apparent size of 0 (but there is
> > > still content when you `cat` it, of course).
> > >
> > > A repro case is available in comment7 (adapted from the man page),
> > > also copied below [2].
> > >
> > > Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and
> > > 5.10.3 (chromeos))
> > > $ ./copyfrom /sys/kernel/debug/tracing/trace x
> > > 0 bytes copied
> >
> > That's basically telling you that copy_file_range() was unable to
> > copy anything. The man page says:
> >
> > RETURN VALUE
> >        Upon  successful  completion,  copy_file_range() will return
> >        the number of bytes copied between files.  This could be less
> >        than the length originally requested.  If the file offset
> >        of fd_in is at or past the end of file, no bytes are copied,
> >        and copy_file_range() returns zero.
> >
> > THe man page explains it perfectly.
> 
> I'm not that confident the explanation is perfect ,-)
> 
> How does one define "EOF"? The read manpage
> (https://man7.org/linux/man-pages/man2/read.2.html) defines it as a
> zero return value.

And so does copy_file_range(). That's the -API definition-, it does
not define the kernel implementation of how to decide when the file
is at EOF.

> I don't think using the inode file size is
> standard.

It is the standard VFS filesystem definition of EOF.

Indeed:

copy_file_range()
  vfs_copy_file_range()
    generic_copy_file_checks()
    .....

        /* Shorten the copy to EOF */
        size_in = i_size_read(inode_in);
        if (pos_in >= size_in)
                count = 0;
        else
                count = min(count, size_in - (uint64_t)pos_in);

That inode size check for EOF is -exactly- what is triggering here,
and a copy of zero length returns 0 bytes having done nothing.

The page cache read path does similar things in
generic_file_buffered_read() to avoid truncate races exposing
stale/bad data to userspace:


                /*
                 * i_size must be checked after we know the pages are Uptodate.
                 *
                 * Checking i_size after the check allows us to calculate
                 * the correct value for "nr", which means the zero-filled
                 * part of the page is not copied back to userspace (unless
                 * another truncate extends the file - this is desired though).
                 */
                isize = i_size_read(inode);
                if (unlikely(iocb->ki_pos >= isize))
                        goto put_pages;

> > 'cat' "works" in this situation because it doesn't check the file
> > size and just attempts to read unconditionally from the file. Hence
> > it happily returns non-existent stale data from busted filesystem
> > implementations that allow data to be read from beyond EOF...
> 
> `cp` also works, so does `dd` and basically any other file operation.

They do not use a syscall interface that can offload work to
filesystems, low level block layer software, hardware and/or remote
systems. copy_file_range() is restricted to regular files and does
complex stuff that read() and friends will never do, so we have
strictly enforced rules to prevent people from playing fast and
loose and silently corrupting user data with it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-01-25  7:54 [BUG] copy_file_range with sysfs file as input Nicolas Boichat
  2021-01-25 19:44 ` Ian Lance Taylor
  2021-01-26  1:34 ` Dave Chinner
@ 2021-02-03  9:04 ` Greg KH
  2021-02-03  9:18   ` Nicolas Boichat
  2 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2021-02-03  9:04 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano, iant

On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote:
> Hi copy_file_range experts,
> 
> We hit this interesting issue when upgrading Go compiler from 1.13 to
> 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> `/sys/kernel/debug/tracing/trace` to a temporary file.

Nit, the above file is NOT a sysfs file.  Odds are it is either a
debugfs, or a tracefs file, please check your mounts to determine which
it is, as that matters a lot on the kernel backend side for trying to
figure out what is going on here :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] copy_file_range with sysfs file as input
  2021-02-03  9:04 ` Greg KH
@ 2021-02-03  9:18   ` Nicolas Boichat
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Boichat @ 2021-02-03  9:18 UTC (permalink / raw)
  To: Greg KH
  Cc: Darrick J. Wong, linux-fsdevel, lkml, Amir Goldstein,
	Dave Chinner, Luis Lozano, Ian Lance Taylor

On Wed, Feb 3, 2021 at 5:04 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote:
> > Hi copy_file_range experts,
> >
> > We hit this interesting issue when upgrading Go compiler from 1.13 to
> > 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of
> > `/sys/kernel/debug/tracing/trace` to a temporary file.
>
> Nit, the above file is NOT a sysfs file.  Odds are it is either a
> debugfs, or a tracefs file, please check your mounts to determine which
> it is, as that matters a lot on the kernel backend side for trying to
> figure out what is going on here :)

Yes yes it's tracefs ,-) But, from the other thread
https://lkml.org/lkml/2021/1/26/2029 sysfs (and any other fs that
generates files dynamically like procfs) would likely hit issues as
well (but maybe in more rare circumstances).

Thanks!

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-03  9:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25  7:54 [BUG] copy_file_range with sysfs file as input Nicolas Boichat
2021-01-25 19:44 ` Ian Lance Taylor
2021-01-26  1:34 ` Dave Chinner
2021-01-26  3:50   ` Nicolas Boichat
2021-01-26  7:49     ` Dave Chinner
2021-02-03  9:04 ` Greg KH
2021-02-03  9:18   ` Nicolas Boichat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).