All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	criu@openvz.org, bpf@vger.kernel.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Cyrill Gorcunov" <gorcunov@gmail.com>,
	"Jann Horn" <jann@thejh.net>, "Kees Cook" <keescook@chromium.org>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Jeff Layton" <jlayton@redhat.com>,
	"Miklos Szeredi" <miklos@szeredi.hu>,
	"Matthew Wilcox" <willy@infradead.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Trond Myklebust" <trond.myklebust@hammerspace.com>,
	"Chris Wright" <chrisw@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>
Subject: Re: [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu
Date: Wed, 9 Dec 2020 09:44:59 -0800	[thread overview]
Message-ID: <20201209174459.GD2657@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20201209165411.GA16743@ZenIV.linux.org.uk>

On Wed, Dec 09, 2020 at 04:54:11PM +0000, Al Viro wrote:
> [paulmck Cc'd]
> 
> On Mon, Dec 07, 2020 at 10:49:04PM +0000, Al Viro wrote:
> > On Mon, Dec 07, 2020 at 10:46:57PM +0000, Al Viro wrote:
> > > On Fri, Nov 20, 2020 at 05:14:26PM -0600, Eric W. Biederman wrote:
> > > 
> > > >  /*
> > > >   * Check whether the specified fd has an open file.
> > > >   */
> > > > -#define fcheck(fd)	fcheck_files(current->files, fd)
> > > > +#define fcheck(fd)	files_lookup_fd_rcu(current->files, fd)
> > > 
> > > Huh?
> > > fs/file.c:1113: file = fcheck(oldfd);
> > > 	dup3(), under ->file_lock, no rcu_read_lock() in sight
> > > 
> > > fs/locks.c:2548:                f = fcheck(fd);
> > > 	fcntl_setlk(), ditto
> > > 
> > > fs/locks.c:2679:                f = fcheck(fd);
> > > 	fcntl_setlk64(), ditto
> > > 
> > > fs/notify/dnotify/dnotify.c:330:        f = fcheck(fd);
> > > 	fcntl_dirnotify(); this one _is_ under rcu_read_lock().
> > > 
> > > 
> > > IOW, unless I've missed something earlier in the series, this is wrong.
> > 
> > I have missed something, all right.  Ignore that comment...
> 
> Actually, I take that back - the use of fcheck() in dnotify _is_ interesting.
> At the very least it needs to be commented upon; what that code is trying
> to prevent is a race between fcntl_dirnotify() and close(2)/dup2(2) closing
> the descriptor in question.  The problem is, dnotify marks are removed
> when we detach from descriptor table; that's done by filp_close() calling
> dnotify_flush().
> 
> Suppose fcntl(fd, F_NOTIFY, 0) in one thread races with close(fd) in another
> (both sharing the same descriptor table).  If the former had created and
> inserted a mark *after* the latter has gotten past dnotify_flush(), there
> would be nothing to evict that mark.
> 
> That's the reason that fcheck() is there.  rcu_read_lock() used to be
> sufficient, but the locking has changed since then and even if it is
> still enough, that's not at all obvious.
> 
> Exclusion is not an issue; barriers, OTOH...  Back then we had
> ->i_lock taken both by dnotify_flush() before any checks and
> by fcntl_dirnotify() around the fcheck+insertion.  So on close
> side we had
> 	store NULL into descriptor table
> 	acquire ->i_lock
> 	fetch ->i_dnotify
> 	...
> 	release ->i_lock
> while on fcntl() side we had
> 	acquire ->i_lock
> 	fetch from descriptor table, sod off if not our file
> 	...
> 	store ->i_dnotify
> 	...
> 	release ->i_lock
> Storing NULL into descriptor table could get reordered into
> ->i_lock-protected area in dnotify_flush(), but it could not
> get reordered past releasing ->i_lock.  So fcntl_dirnotify()
> either grabbed ->i_lock before dnotify_flush() (in which case
> missing the store of NULL into descriptor table wouldn't
> matter, since dnotify_flush() would've observed the mark
> we'd inserted) or it would've seen that store to descriptor
> table.
> 
> Nowadays it's nowhere near as straightforward; in fcntl_dirnotify()
> we have
>         /* this is needed to prevent the fcntl/close race described below */
>         mutex_lock(&dnotify_group->mark_mutex);
> and it would appear to be similar to the original situation, with
> ->mark_mutex serving in place of ->i_lock.  However, dnotify_flush()
> might not take that mutex at all - it has
>         fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
>         if (!fsn_mark)
>                 return;
> before grabbing that thing.  So the things are trickier - we actually
> rely upon the barriers in fsnotify_find_mark().  And those are complicated.
> The case when it returns non-NULL is not a problem - there we have that
> mutex providing the barriers we need.  NULL can be returned in two cases:
> 	a) ->i_fsnotify_marks is not empty, but it contains no
> dnotify marks.  In that case we have ->i_fsnotify_marks.lock acquired
> and released.  By the time it gets around to fcheck(), fcntl_dirnotify() has
> either found or created and inserted a dnotify mark into that list, with
> ->i_fsnotify_marks.lock acquired/released around the insertion, so we
> are fine - either fcntl_dirnotify() gets there first (in which case
> dnotify_flush() will observe it), or release of that lock by
> fsnotify_find_mark() called by dnotify_flush() will serve as a barrier,
> making sure that store to descriptor table will be observed.
> 	b) fsnotify_find_mark() (fsnotify_grab_connector() it calls,
> actually) finds ->i_fsnotify_marks empty.  That's where the things
> get interesting; we have
>         idx = srcu_read_lock(&fsnotify_mark_srcu);
>         conn = srcu_dereference(*connp, &fsnotify_mark_srcu);
>         if (!conn)
>                 goto out;
> on dnotify_flush() side.  The matching store of fcntl_dirnotify()
> side would be in fsnotify_attach_connector_to_object(), where
> we have
>         /*
>          * cmpxchg() provides the barrier so that readers of *connp can see
>          * only initialized structure
>          */
>         if (cmpxchg(connp, NULL, conn)) {
>                 /* Someone else created list structure for us */
> 
> So we have
> A:
> 	store NULL to descriptor table
> 	srcu_read_lock
> 	srcu_dereference fetches NULL from ->i_fsnotify_marks
> vs.
> B:
> 	cmpxchg replaces NULL with non-NULL in ->i_fsnotify_marks
> 	fetch from descriptor table, can't miss the store done by A
> 
> Which might be safe, but the whole thing *RELLY* needs to be discussed
> in fcntl_dirnotify() in more details.  fs/notify/* guts are convoluted
> enough to confuse anyone unfamiliar with them.

This ordering relies on the full barrier in srcu_read_lock().  There was
a time when srcu_read_lock() did not have a full barrier, and there
might well be a time in the future when srcu_read_lock() no longer has a
full barrier.  So please add smp_mb__after_srcu_read_unlock() immediately
after the srcu_read_lock() if you are relying on that full barrier.

							Thanx, Paul

  reply	other threads:[~2020-12-09 17:45 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87r1on1v62.fsf@x220.int.ebiederm.org>
2020-11-20 23:14 ` [PATCH v2 01/24] exec: Move unshare_files to fix posix file locking during exec Eric W. Biederman
2020-11-20 23:58   ` Linus Torvalds
2020-12-07 22:22   ` Al Viro
     [not found]     ` <87zh2pusdw.fsf@x220.int.ebiederm.org>
2020-12-07 23:35       ` Al Viro
2020-11-20 23:14 ` [PATCH v2 02/24] exec: Simplify unshare_files Eric W. Biederman
2020-11-23 17:50   ` Oleg Nesterov
2020-11-23 18:25     ` Linus Torvalds
     [not found]       ` <87im9vx08i.fsf@x220.int.ebiederm.org>
     [not found]         ` <87pn42r0n7.fsf@x220.int.ebiederm.org>
2020-11-24 19:58           ` Linus Torvalds
2020-11-24 20:14             ` Arnd Bergmann
2020-11-24 23:44               ` Geoff Levand
2020-11-25  1:16                 ` Eric W. Biederman
2020-11-27 20:29                   ` Arnd Bergmann
2020-11-30 21:37                     ` Eric W. Biederman
2020-12-01  9:46                     ` Michael Ellerman
2020-11-25 21:51                 ` [RFC][PATCH] coredump: Document coredump code exclusively used by cell spufs Eric W. Biederman
2020-12-02 15:20                   ` Eric W. Biederman
2020-12-02 15:58                     ` Arnd Bergmann
2020-12-07 22:32       ` [PATCH v2 02/24] exec: Simplify unshare_files Al Viro
2020-11-20 23:14 ` [PATCH v2 03/24] exec: Remove reset_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 04/24] kcmp: In kcmp_epoll_target use fget_task Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 05/24] bpf: In bpf_task_fd_query " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 06/24] proc/fd: In proc_fd_link " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 07/24] file: Rename __fcheck_files to files_lookup_fd_raw Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 08/24] file: Factor files_lookup_fd_locked out of fcheck_files Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu Eric W. Biederman
2020-12-07 22:46   ` Al Viro
2020-12-07 22:49     ` Al Viro
2020-12-09 16:54       ` Al Viro
2020-12-09 17:44         ` Paul E. McKenney [this message]
2020-11-20 23:14 ` [PATCH v2 10/24] file: Rename fcheck lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 11/24] file: Implement task_lookup_fd_rcu Eric W. Biederman
2020-11-21 18:19   ` Cyrill Gorcunov
     [not found]     ` <87blfp1r8b.fsf@x220.int.ebiederm.org>
2020-11-22 13:52       ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 12/24] proc/fd: In tid_fd_mode use task_lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 13/24] kcmp: In get_file_raw_ptr " Eric W. Biederman
2020-11-23 21:05   ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 14/24] file: Implement task_lookup_next_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 15/24] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu Eric W. Biederman
2020-12-07 23:29   ` Al Viro
     [not found]     ` <877dprvs8e.fsf@x220.int.ebiederm.org>
2020-12-09  4:07       ` Al Viro
     [not found]         ` <877dprtxly.fsf@x220.int.ebiederm.org>
2020-12-09 14:23           ` Al Viro
2020-12-09 18:04             ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:13               ` Linus Torvalds
2020-12-09 19:50                 ` Al Viro
2020-12-09 21:32                   ` Eric W. Biederman
2020-12-10  6:13                     ` Al Viro
2020-12-10 19:29                       ` Eric W. Biederman
2020-12-10 21:36                         ` Al Viro
2020-12-10 22:30                           ` Christian Brauner
2020-12-10 22:54                             ` Al Viro
2020-12-10 23:10                               ` Al Viro
2020-12-09 21:42                   ` [PATCH -1/24] exec: Don't open code get_close_on_exec Eric W. Biederman
2020-12-09 22:04                 ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:49               ` Matthew Wilcox
2020-12-09 22:06                 ` Eric W. Biederman
2020-12-09 22:58                 ` Al Viro
2020-12-09 23:01                   ` Linus Torvalds
2020-12-09 23:04                     ` Linus Torvalds
2020-12-09 23:07                     ` Matthew Wilcox
2020-12-09 23:26                       ` Paul E. McKenney
2020-12-09 23:29                       ` Linus Torvalds
2020-12-10  0:39                         ` Paul E. McKenney
2020-12-10  0:41                           ` Linus Torvalds
2020-12-10  0:57                             ` Paul E. McKenney
2020-11-20 23:14 ` [PATCH v2 16/24] bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu Eric W. Biederman
2020-11-23 17:06   ` Yonghong Song
2020-11-20 23:14 ` [PATCH v2 17/24] proc/fd: In fdinfo seq_show don't use get_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 18/24] file: Merge __fd_install into fd_install Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 19/24] file: In f_dupfd read RLIMIT_NOFILE once Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 20/24] file: Merge __alloc_fd into alloc_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 21/24] file: Rename __close_fd to close_fd and remove the files parameter Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 22/24] file: Replace ksys_close with close_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 23/24] file: Rename __close_fd_get_file close_fd_get_file Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 24/24] file: Remove get_files_struct Eric W. Biederman
2020-11-21  0:05 ` [PATCH v2 00/24] exec: Move unshare_files and guarantee files_struct.count is correct Linus Torvalds
2020-11-28  5:12   ` Al Viro
2020-12-07 23:56     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201209174459.GD2657@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=berrange@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=chrisw@redhat.com \
    --cc=criu@openvz.org \
    --cc=daniel@iogearbox.net \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@gmail.com \
    --cc=jann@thejh.net \
    --cc=jlayton@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=keescook@chromium.org \
    --cc=kpsingh@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=oleg@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.