linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Jann Horn <jannh@google.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>, Karel Zak <kzak@redhat.com>,
	David Howells <dhowells@redhat.com>, Ian Kent <raven@themaw.net>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <christian@brauner.io>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]
Date: Tue, 3 Mar 2020 21:15:16 +0100	[thread overview]
Message-ID: <20200303201516.GA1136381@kroah.com> (raw)
In-Reply-To: <20200303165103.GA731597@kroah.com>

On Tue, Mar 03, 2020 at 05:51:03PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Mar 03, 2020 at 03:40:24PM +0100, Jann Horn wrote:
> > On Tue, Mar 3, 2020 at 3:30 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Tue, Mar 03, 2020 at 03:10:50PM +0100, Miklos Szeredi wrote:
> > > > On Tue, Mar 3, 2020 at 2:43 PM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Tue, Mar 03, 2020 at 02:34:42PM +0100, Miklos Szeredi wrote:
> > > >
> > > > > > If buffer is too small to fit the whole file, return error.
> > > > >
> > > > > Why?  What's wrong with just returning the bytes asked for?  If someone
> > > > > only wants 5 bytes from the front of a file, it should be fine to give
> > > > > that to them, right?
> > > >
> > > > I think we need to signal in some way to the caller that the result
> > > > was truncated (see readlink(2), getxattr(2), getcwd(2)), otherwise the
> > > > caller might be surprised.
> > >
> > > But that's not the way a "normal" read works.  Short reads are fine, if
> > > the file isn't big enough.  That's how char device nodes work all the
> > > time as well, and this kind of is like that, or some kind of "stream" to
> > > read from.
> > >
> > > If you think the file is bigger, then you, as the caller, can just pass
> > > in a bigger buffer if you want to (i.e. you can stat the thing and
> > > determine the size beforehand.)
> > >
> > > Think of the "normal" use case here, a sysfs read with a PAGE_SIZE
> > > buffer.  That way userspace "knows" it will always read all of the data
> > > it can from the file, we don't have to do any seeking or determining
> > > real file size, or anything else like that.
> > >
> > > We return the number of bytes read as well, so we "know" if we did a
> > > short read, and also, you could imply, if the number of bytes read are
> > > the exact same as the number of bytes of the buffer, maybe the file is
> > > either that exact size, or bigger.
> > >
> > > This should be "simple", let's not make it complex if we can help it :)
> > >
> > > > > > Verify that the number of bytes read matches the file size, otherwise
> > > > > > return error (may need to loop?).
> > > > >
> > > > > No, we can't "match file size" as sysfs files do not really have a sane
> > > > > "size".  So I don't want to loop at all here, one-shot, that's all you
> > > > > get :)
> > > >
> > > > Hmm.  I understand the no-size thing.  But looping until EOF (i.e.
> > > > until read return zero) might be a good idea regardless, because short
> > > > reads are allowed.
> > >
> > > If you want to loop, then do a userspace open/read-loop/close cycle.
> > > That's not what this syscall should be for.
> > >
> > > Should we call it: readfile-only-one-try-i-hope-my-buffer-is-big-enough()?  :)
> > 
> > So how is this supposed to work in e.g. the following case?
> > 
> > ========================================
> > $ cat map_lots_and_read_maps.c
> > #include <sys/mman.h>
> > #include <fcntl.h>
> > #include <unistd.h>
> > 
> > int main(void) {
> >   for (int i=0; i<1000; i++) {
> >     mmap(NULL, 0x1000, (i&1)?PROT_READ:PROT_NONE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> >   }
> >   int maps = open("/proc/self/maps", O_RDONLY);
> >   static char buf[0x100000];
> >   int res;
> >   do {
> >     res = read(maps, buf, sizeof(buf));
> >   } while (res > 0);
> > }
> > $ gcc -o map_lots_and_read_maps map_lots_and_read_maps.c
> > $ strace -e trace='!mmap' ./map_lots_and_read_maps
> > execve("./map_lots_and_read_maps", ["./map_lots_and_read_maps"],
> > 0x7ffebd297ac0 /* 51 vars */) = 0
> > brk(NULL)                               = 0x563a1184f000
> > access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> > fstat(3, {st_mode=S_IFREG|0644, st_size=208479, ...}) = 0
> > close(3)                                = 0
> > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> > read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320l\2\0\0\0\0\0"...,
> > 832) = 832
> > fstat(3, {st_mode=S_IFREG|0755, st_size=1820104, ...}) = 0
> > mprotect(0x7fb5c2d1a000, 1642496, PROT_NONE) = 0
> > close(3)                                = 0
> > arch_prctl(ARCH_SET_FS, 0x7fb5c2eb6500) = 0
> > mprotect(0x7fb5c2eab000, 12288, PROT_READ) = 0
> > mprotect(0x563a103e4000, 4096, PROT_READ) = 0
> > mprotect(0x7fb5c2f12000, 4096, PROT_READ) = 0
> > munmap(0x7fb5c2eb7000, 208479)          = 0
> > openat(AT_FDCWD, "/proc/self/maps", O_RDONLY) = 3
> > read(3, "563a103e1000-563a103e2000 r--p 0"..., 1048576) = 4075
> > read(3, "7fb5c2985000-7fb5c2986000 ---p 0"..., 1048576) = 4067
> > read(3, "7fb5c29d8000-7fb5c29d9000 r--p 0"..., 1048576) = 4067
> > read(3, "7fb5c2a2b000-7fb5c2a2c000 ---p 0"..., 1048576) = 4067
> > read(3, "7fb5c2a7e000-7fb5c2a7f000 r--p 0"..., 1048576) = 4067
> > read(3, "7fb5c2ad1000-7fb5c2ad2000 ---p 0"..., 1048576) = 4067
> > read(3, "7fb5c2b24000-7fb5c2b25000 r--p 0"..., 1048576) = 4067
> > read(3, "7fb5c2b77000-7fb5c2b78000 ---p 0"..., 1048576) = 4067
> > read(3, "7fb5c2bca000-7fb5c2bcb000 r--p 0"..., 1048576) = 4067
> > read(3, "7fb5c2c1d000-7fb5c2c1e000 ---p 0"..., 1048576) = 4067
> > read(3, "7fb5c2c70000-7fb5c2c71000 r--p 0"..., 1048576) = 4067
> > read(3, "7fb5c2cc3000-7fb5c2cc4000 ---p 0"..., 1048576) = 4078
> > read(3, "7fb5c2eca000-7fb5c2ecb000 r--p 0"..., 1048576) = 2388
> > read(3, "", 1048576)                    = 0
> > exit_group(0)                           = ?
> > +++ exited with 0 +++
> > $
> > ========================================
> > 
> > The kernel is randomly returning short reads *with different lengths*
> > that are vaguely around PAGE_SIZE, no matter how big the buffer
> > supplied by userspace is. And while repeated read() calls will return
> > consistent state thanks to the seqfile magic, repeated readfile()
> > calls will probably return garbage with half-complete lines.
> 
> Ah crap, I forgot about seqfile, I was only considering the "simple"
> cases that sysfs provides.
> 
> Ok, Miklos, you were totally right, I'll loop and read until the end of
> file or buffer, which ever comes first.

Hm, nope, this works just fine with the single "read" call.  I can read
/proc/self/maps with a single buffer, also larger files like
/sys/kernel/debug/usb/devices work just fine.

So maybe it is all sane without a loop.

I'll try to get rid of the fd now, and despite the interest in io_uring,
this might be a lot more "simple" overall.

thanks,

greg k-h

  parent reply	other threads:[~2020-03-03 20:15 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 18:01 [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] David Howells
2020-02-21 18:01 ` [PATCH 01/17] watch_queue: Add security hooks to rule on setting mount and sb watches " David Howells
2020-02-21 18:02 ` [PATCH 02/17] watch_queue: Implement mount topology and attribute change notifications " David Howells
2020-02-21 18:02 ` [PATCH 03/17] watch_queue: sample: Display mount tree " David Howells
2020-02-21 18:02 ` [PATCH 04/17] watch_queue: Introduce a non-repeating system-unique superblock ID " David Howells
2020-02-21 18:02 ` [PATCH 05/17] watch_queue: Add superblock notifications " David Howells
2020-02-21 18:02 ` [PATCH 06/17] watch_queue: sample: Display " David Howells
2020-02-21 18:02 ` [PATCH 07/17] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
2020-02-26  2:29   ` Aleksa Sarai
2020-02-28 14:44   ` David Howells
2020-02-21 18:02 ` [PATCH 08/17] fsinfo: Provide a bitmap of supported features " David Howells
2020-02-21 18:03 ` [PATCH 09/17] fsinfo: Allow fsinfo() to look up a mount object by ID " David Howells
2020-02-21 18:03 ` [PATCH 10/17] fsinfo: Allow mount information to be queried " David Howells
2020-03-04 14:58   ` Miklos Szeredi
2020-03-04 16:10   ` Miklos Szeredi
2020-02-21 18:03 ` [PATCH 11/17] fsinfo: sample: Mount listing program " David Howells
2020-02-21 18:03 ` [PATCH 12/17] fsinfo: Allow the mount topology propogation flags to be retrieved " David Howells
2020-02-21 18:03 ` [PATCH 13/17] fsinfo: Query superblock unique ID and notification counter " David Howells
2020-02-21 18:03 ` [PATCH 14/17] fsinfo: Add API documentation " David Howells
2020-02-21 18:03 ` [PATCH 15/17] fsinfo: Add support for AFS " David Howells
2020-02-21 18:03 ` [PATCH 16/17] fsinfo: Add example support for Ext4 " David Howells
2020-02-21 18:04 ` [PATCH 17/17] fsinfo: Add example support for NFS " David Howells
2020-02-21 20:21 ` [PATCH 00/17] VFS: Filesystem information and notifications " James Bottomley
2020-02-24 10:24   ` Miklos Szeredi
2020-02-24 14:55     ` James Bottomley
2020-02-24 15:28       ` Miklos Szeredi
2020-02-25 12:13         ` Steven Whitehouse
2020-02-25 15:28           ` James Bottomley
2020-02-25 15:47             ` Steven Whitehouse
2020-02-26  9:11             ` Miklos Szeredi
2020-02-26 10:51               ` Steven Whitehouse
2020-02-27  5:06               ` Ian Kent
2020-02-27  9:36                 ` Miklos Szeredi
2020-02-27 11:34                   ` Ian Kent
2020-02-27 13:45                     ` Miklos Szeredi
2020-02-27 15:14                       ` Karel Zak
2020-02-28  0:43                         ` Ian Kent
2020-02-28  8:35                           ` Miklos Szeredi
2020-02-28 12:27                             ` Greg Kroah-Hartman
2020-02-28 16:24                               ` Miklos Szeredi
2020-02-28 17:15                                 ` Al Viro
2020-03-02  8:43                                   ` Miklos Szeredi
2020-03-02 10:34                                 ` Karel Zak
2020-02-28 16:42                               ` David Howells
2020-02-28 15:08                             ` James Bottomley
2020-02-28 15:40                               ` Miklos Szeredi
2020-02-28  0:12                       ` Ian Kent
2020-02-28 15:52             ` Christian Brauner
2020-02-28 16:36             ` David Howells
2020-03-02  9:09               ` Miklos Szeredi
2020-03-02  9:38                 ` Greg Kroah-Hartman
2020-03-03  5:27                 ` Ian Kent
2020-03-03  7:46                   ` Miklos Szeredi
2020-03-06 16:25                     ` Miklos Szeredi
2020-03-06 19:43                       ` Al Viro
2020-03-06 19:54                         ` Miklos Szeredi
2020-03-06 19:58                         ` Al Viro
2020-03-06 20:05                           ` Al Viro
2020-03-06 20:11                             ` Miklos Szeredi
2020-03-06 20:37                             ` Al Viro
2020-03-06 20:38                               ` Al Viro
2020-03-06 20:45                                 ` Al Viro
2020-03-06 20:49                                   ` Al Viro
2020-03-06 20:51                                     ` Miklos Szeredi
2020-03-06 21:28                                       ` Al Viro
2020-03-06 20:56                                     ` Al Viro
2020-03-06 20:51                                   ` Miklos Szeredi
2020-03-07  9:48                       ` Greg Kroah-Hartman
2020-03-07 20:48                         ` Miklos Szeredi
2020-03-03  9:12                   ` David Howells
2020-03-03  9:26                     ` Miklos Szeredi
2020-03-03  9:48                       ` Miklos Szeredi
2020-03-03 10:21                         ` Steven Whitehouse
2020-03-03 10:32                           ` Miklos Szeredi
2020-03-03 11:09                             ` Ian Kent
2020-03-03 10:00                       ` Christian Brauner
2020-03-03 10:13                         ` Miklos Szeredi
2020-03-03 10:25                           ` Christian Brauner
2020-03-03 11:33                             ` Miklos Szeredi
2020-03-03 11:56                               ` Christian Brauner
2020-03-03 11:38                       ` Karel Zak
2020-03-03 13:03                         ` Greg Kroah-Hartman
2020-03-03 13:14                           ` Greg Kroah-Hartman
2020-03-03 13:34                             ` Miklos Szeredi
2020-03-03 13:43                               ` Greg Kroah-Hartman
2020-03-03 14:10                                 ` Greg Kroah-Hartman
2020-03-03 14:13                                   ` Jann Horn
2020-03-03 14:24                                     ` Greg Kroah-Hartman
2020-03-03 15:44                                       ` Jens Axboe
2020-03-03 16:37                                         ` Greg Kroah-Hartman
2020-03-03 16:51                                         ` Jeff Layton
2020-03-03 16:55                                           ` Jens Axboe
2020-03-03 19:02                                             ` Jeff Layton
2020-03-03 19:07                                               ` Jens Axboe
2020-03-03 19:23                                               ` Jens Axboe
2020-03-03 19:43                                                 ` Jeff Layton
2020-03-03 20:33                                                   ` Jens Axboe
2020-03-03 21:03                                                     ` Jeff Layton
2020-03-03 21:20                                                       ` Jens Axboe
2020-03-03 14:10                                 ` Miklos Szeredi
2020-03-03 14:29                                   ` Greg Kroah-Hartman
2020-03-03 14:40                                     ` Jann Horn
2020-03-03 16:51                                       ` Greg Kroah-Hartman
2020-03-03 16:57                                         ` Jann Horn
2020-03-03 20:15                                         ` Greg Kroah-Hartman [this message]
2020-03-03 14:40                                   ` David Howells
2020-03-04  4:20                                   ` Ian Kent
2020-03-03 14:19                                 ` David Howells
2020-03-03 16:59                                   ` Greg Kroah-Hartman
2020-03-03 14:23                               ` Christian Brauner
2020-03-03 15:23                                 ` Greg Kroah-Hartman
2020-03-03 15:53                                 ` David Howells
2020-03-04  2:01                           ` Ian Kent
2020-03-04 15:22                             ` Karel Zak
2020-03-04 16:49                               ` Greg Kroah-Hartman
2020-03-04 17:55                                 ` Karel Zak
2020-03-03 14:09                         ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200303201516.GA1136381@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=darrick.wong@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=jannh@google.com \
    --cc=kzak@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mszeredi@redhat.com \
    --cc=raven@themaw.net \
    --cc=swhiteho@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).