From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE179C282CC for ; Thu, 7 Feb 2019 09:23:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AA3B92080F for ; Thu, 7 Feb 2019 09:23:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=szeredi.hu header.i=@szeredi.hu header.b="JELjyQ3y" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726655AbfBGJXA (ORCPT ); Thu, 7 Feb 2019 04:23:00 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:53214 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726561AbfBGJXA (ORCPT ); Thu, 7 Feb 2019 04:23:00 -0500 Received: by mail-wm1-f65.google.com with SMTP id m1so5298180wml.2 for ; Thu, 07 Feb 2019 01:22:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=j2scvD+tXKod9KY4Tp0JnlAlCLc3ovezukIRv7Om5mo=; b=JELjyQ3ygXn/s8SjbpyX81WJTZIUy5t3dcGV9R0KpjPOPebai1Kb3uXFm9M7J288Bi 5myYBQghDsQxEWrX1QF5S9b+juRgxNitGU2pUeuh2QobDn74WeQVzK/CyBIs8uqg9QLK Y/w+0oijH36NecNTi9j34Vx2MPXp+pdIc429w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=j2scvD+tXKod9KY4Tp0JnlAlCLc3ovezukIRv7Om5mo=; b=c/wIY4Q+JzHzlXuHXq73omHiqWb+beBBA6XbCH3aj7IGPaH4IoJRNPs2ZRK0QIlj8f fkE3KJPOISMl/qZSDzMooMxZTaJj0/WCb+EigAd1odm8H0Fn7gEnoNIN71hZnLgv5V5O mfpaKzJ70FxYgqzuun5LXgCxCL4e+asmMZs/xG8uhS9YWLm/q1zH8LNTOW4b4HSFEuTD j+ssj4U23N+W4FUjt0OkcT4TMLvbR5NYH/60XlCwGVuYAtc+xYLjpzIQ6K+TP4CneX0w LVKMNrHlDvSDpxeare5vnSIz+GwS7zFRsF37T9u42aSDJxm5VNGjJ7QlwPqqKv8hSPQ+ khkw== X-Gm-Message-State: AHQUAua6GWnlFaN4vCyaPxVa/foCF0NVAntg6rFXW2T1CslEC+5rDDeZ 8A2WZro+vx0Fi51ST1BYH7P59g== X-Google-Smtp-Source: AHgI3IbWsyfI3AJsk0KyazzNfxdlM7gofVperoV7s0J+N8oRhKsQsk9nxXGTm6qn309z8ijDtkvNYA== X-Received: by 2002:a1c:9692:: with SMTP id y140mr6689370wmd.67.1549531377015; Thu, 07 Feb 2019 01:22:57 -0800 (PST) Received: from veci.piliscsaba.redhat.com (catv-212-96-48-140.catv.broadband.hu. [212.96.48.140]) by smtp.gmail.com with ESMTPSA id w14sm3731969wrt.95.2019.02.07.01.22.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 07 Feb 2019 01:22:56 -0800 (PST) Date: Thu, 7 Feb 2019 10:22:53 +0100 From: Miklos Szeredi To: Al Viro Cc: Jens Axboe , Jann Horn , linux-aio@kvack.org, linux-block@vger.kernel.org, Linux API , hch@lst.de, jmoyer@redhat.com, avi@scylladb.com, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 13/18] io_uring: add file set registration Message-ID: <20190207092253.GD19821@veci.piliscsaba.redhat.com> References: <20190129192702.3605-1-axboe@kernel.dk> <20190129192702.3605-14-axboe@kernel.dk> <20190204025612.GR2217@ZenIV.linux.org.uk> <785c6db4-095e-65b0-ded5-72b41af5174e@kernel.dk> <2b2137ed-8107-f7b6-f0ca-202dcfb87c97@kernel.dk> <40b27e78-9ee8-1395-feb3-a73aac87c9a7@kernel.dk> <20190206005638.GU2217@ZenIV.linux.org.uk> <8f124de2-d6da-d656-25e4-b4d9e58f880e@kernel.dk> <20190207040058.GW2217@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190207040058.GW2217@ZenIV.linux.org.uk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote: > So in theory it would be possible to have > * thread A: sendmsg() has SCM_RIGHTS created and populated, > complete with file refcount and ->inflight increments implied, > at which point it gets preempted and loses the timeslice. > * thread B: gets to run and removes all references > from descriptor table it shares with thread A. > * on another CPU we have garbage collector triggered; > it determines the set of potentially unreachable unix_sock and > everything in our SCM_RIGHTS _is_ in that set, now that no > other references remain. > * on the first CPU, thread A regains the timeslice > and inserts its SCM_RIGHTS into queue. And it does contain > references to sockets from the candidate set of running > garbage collector, confusing the hell out of it. Reminds me: long time ago there was a bug report, and based on that I found a bug in MSG_PEEK handling (not confirmed to have fixed the reported bug). This fix, although pretty simple, got lost somehow. While unix gc code is in your head, can you please review and I'll resend through davem? Thanks, Miklos --- From: Miklos Szeredi Subject: af_unix: fix garbage collect vs. MSG_PEEK Gc assumes that in-flight sockets that don't have an external ref can't gain one while unix_gc_lock is held. That is true because unix_notinflight() will be called before detaching fds, which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That one also clones the fds, also keeping them in the skb. But through MSG_PEEK an external reference can definitely be gained without ever touching unix_gc_lock. This patch adds unix_gc_barrier() that waits for a garbage collect run to finish (if there is one), before actually installing the peeked in-flight files to file descriptors. This prevents problems from a pure in-flight socket having its buffers modified while the garbage collect is taking place. Signed-off-by: Miklos Szeredi Cc: --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 15 +++++++++++++-- net/unix/garbage.c | 6 ++++++ 3 files changed, 20 insertions(+), 2 deletions(-) --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -12,6 +12,7 @@ void unix_inflight(struct user_struct *u void unix_notinflight(struct user_struct *user, struct file *fp); void unix_gc(void); void wait_for_unix_gc(void); +void unix_gc_barrier(void); struct sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *sk); --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1547,6 +1547,17 @@ static int unix_attach_fds(struct scm_co return 0; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + /* + * During garbage collection it is assumed that in-flight sockets don't + * get a new external reference. So we need to wait until current run + * finishes. + */ + unix_gc_barrier(); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2171,7 +2182,7 @@ static int unix_dgram_recvmsg(struct soc sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2412,7 +2423,7 @@ static int unix_stream_read_generic(stru /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk); --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -267,6 +267,12 @@ void wait_for_unix_gc(void) wait_event(unix_gc_wait, gc_in_progress == false); } +void unix_gc_barrier(void) +{ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + /* The external entry point: unix_gc() */ void unix_gc(void) {