From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 695EBC433F5 for ; Fri, 15 Oct 2021 23:09:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4BE9E611C2 for ; Fri, 15 Oct 2021 23:09:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243448AbhJOXLg (ORCPT ); Fri, 15 Oct 2021 19:11:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:58972 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235755AbhJOXLd (ORCPT ); Fri, 15 Oct 2021 19:11:33 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id D99DA6023D; Fri, 15 Oct 2021 23:09:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1634339366; bh=ovHjeXArni+e/8sBgT90JbZw6qCc/HGFvHPvAnnbBTg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RzDyi0ZMBJb11CVJtYqNyt/Y8Dox7F7bk8TknKTgD347hA4Jaztzzr7DsVQWjAwg5 ln39eHgOZjhzL5xNtRhXY6XwbzQH/cRKHcQyYZ8Ea864wCD/kLs3IgjPHn9PQEnmY/ hWAXj9VEEucoHc4FjyYV4aaPAHrB7pUZMI6sN8v5jeTtXN5CC+7GUHw/iFcbfW/xxE mPPc098tt5j9zWX3KHal6++1L8rQe+2f+n9jQwDKneBJ2RFJKRr8iiI7XLjq6zVIBM hobsBcPdaX71qEVGcKjCHgyCvLPTsovsGw9IqMVJEnI3r+K1fIP4a2UwN/etMxdLtI U0w6LQ7ig6+kg== Date: Sat, 16 Oct 2021 01:09:22 +0200 From: Alexey Gladkov To: "Eric W. Biederman" Cc: Rune Kleveland , Yu Zhao , Jordan Glover , LKML , linux-mm@kvack.org, containers@lists.linux-foundation.org Subject: Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting Message-ID: <20211015230922.7s7ab37k2sioa5vg@example.org> References: <878rzw77i3.fsf@disp2133> <20210929173611.fo5traia77o63gpw@example.org> <20210930130640.wudkpmn3cmah2cjz@example.org> <878rz8wwb6.fsf@disp2133> <87v92cvhbf.fsf@disp2133> <87mtnavszx.fsf_-_@disp2133> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87mtnavszx.fsf_-_@disp2133> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote: > > In commit fda31c50292a ("signal: avoid double atomic counter > increments for user accounting") Linus made a clever optimization to > how rlimits and the struct user_struct. Unfortunately that > optimization does not work in the obvious way when moved to nested > rlimits. The problem is that the last decrement of the per user > namespace per user sigpending counter might also be the last decrement > of the sigpending counter in the parent user namespace as well. Which > means that simply freeing the leaf ucount in __free_sigqueue is not > enough. > > Maintain the optimization and handle the tricky cases by introducing > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. > > By moving the entire optimization into functions that perform all of > the work it becomes possible to ensure that every level is handled > properly. > > I wish we had a single user across all of the threads whose rlimit > could be charged so we did not need this complexity. > > Cc: stable@vger.kernel.org > Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") > Signed-off-by: "Eric W. Biederman" > --- > > With a lot of help from Alex who found a way I could reproduce this > I believe I have found the issue. > > Could people who are seeing this issue test and verify this solves the > problem for them? > > include/linux/user_namespace.h | 2 ++ > kernel/signal.c | 25 +++++---------------- > kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++ > 3 files changed, 49 insertions(+), 19 deletions(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index eb70cabe6e7f..33a4240e6a6f 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t > > long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); > bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); > +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type); > +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type); > bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max); > > static inline void set_rlimit_ucount_max(struct user_namespace *ns, > diff --git a/kernel/signal.c b/kernel/signal.c > index a3229add4455..762de58c6e76 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, > */ > rcu_read_lock(); > ucounts = task_ucounts(t); > - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); > - switch (sigpending) { > - case 1: > - if (likely(get_ucounts(ucounts))) > - break; > - fallthrough; > - case LONG_MAX: > - /* > - * we need to decrease the ucount in the userns tree on any > - * failure to avoid counts leaking. > - */ > - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); > - rcu_read_unlock(); > - return NULL; > - } > + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); > rcu_read_unlock(); > + if (sigpending == LONG_MAX) > + return NULL; > > if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { > q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); > @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, > } > > if (unlikely(q == NULL)) { > - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) > - put_ucounts(ucounts); > + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); > } else { > INIT_LIST_HEAD(&q->list); > q->flags = sigqueue_flags; > @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q) > { > if (q->flags & SIGQUEUE_PREALLOC) > return; > - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) { > - put_ucounts(q->ucounts); > + if (q->ucounts) { > + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); > q->ucounts = NULL; > } > kmem_cache_free(sigqueue_cachep, q); > diff --git a/kernel/ucount.c b/kernel/ucount.c > index 3b7e176cf7a2..687d77aa66bb 100644 > --- a/kernel/ucount.c > +++ b/kernel/ucount.c > @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v) > return (new == 0); > } > > +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts, > + struct ucounts *last, enum ucount_type type) > +{ > + struct ucounts *iter; > + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) { > + long dec = atomic_long_add_return(-1, &iter->ucount[type]); > + WARN_ON_ONCE(dec < 0); > + if (dec == 0) > + put_ucounts(iter); > + } > +} > + > +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type) > +{ > + do_dec_rlimit_put_ucounts(ucounts, NULL, type); > +} > + > +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type) > +{ > + struct ucounts *iter; > + long dec, ret = 0; > + > + for (iter = ucounts; iter; iter = iter->ns->ucounts) { > + long max = READ_ONCE(iter->ns->ucount_max[type]); > + long new = atomic_long_add_return(1, &iter->ucount[type]); > + if (new < 0 || new > max) > + goto unwind; > + else if (iter == ucounts) > + ret = new; > + if ((new == 1) && (get_ucounts(iter) != iter)) get_ucounts can do put_ucounts. Are you sure it's correct to use get_ucounts here? > + goto dec_unwind; > + } > + return ret; > +dec_unwind: > + dec = atomic_long_add_return(1, &iter->ucount[type]); Should be -1 ? > + WARN_ON_ONCE(dec < 0); > +unwind: > + do_dec_rlimit_put_ucounts(ucounts, iter, type); > + return LONG_MAX; > +} > + > bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max) > { > struct ucounts *iter; > -- > 2.20.1 > -- Rgrds, legion