From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB194C433FE for ; Sun, 17 Oct 2021 19:35:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D93F61054 for ; Sun, 17 Oct 2021 19:35:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344520AbhJQTiB (ORCPT ); Sun, 17 Oct 2021 15:38:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230105AbhJQTiA (ORCPT ); Sun, 17 Oct 2021 15:38:00 -0400 Received: from mail-ua1-x92a.google.com (mail-ua1-x92a.google.com [IPv6:2607:f8b0:4864:20::92a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 559AAC06161C for ; Sun, 17 Oct 2021 12:35:50 -0700 (PDT) Received: by mail-ua1-x92a.google.com with SMTP id e10so1929861uab.3 for ; Sun, 17 Oct 2021 12:35:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hXHVDiTbb4QynJ05tIsEVJc9fCfxfVbBBopuJhEZiI4=; b=WOxsiY+XC5QmnU3cQSqx+ubIjAjRnEFI4fWBsMqnPBIxkCbsg8DB9OLxK4bMX+DgHO yQ/TfgSr5ZhSTq+iNzSxvvBdauOWoUL3kCbbHd8eZso6IoQL3EpeN4bRLwrzKB7Bma7M i9skuymzP+u5dMo04C1JWLfWWytA3sRzsaMS1T2KzNJMi2AIBpehYKVhaiu1dbM1kGYx bm1iJd40ENzTkSpIZ4bGwEXNOSnTOIW7PlDHZDvOVXhhVk44jX/EfW7Ed8TkMhJDIx1C D3VavvuE2UaT1W1B6zA8RJPaMkJ2QtDLVbWZ4PmRfFklipF+Uo3/6MRCwPgfJ/bfhy/T SFWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hXHVDiTbb4QynJ05tIsEVJc9fCfxfVbBBopuJhEZiI4=; b=4WSLO3nWD3aIB4LIYls8fBfwdIVKBoL5oQiGyq076IGLSaqf8QK8xJTBjxky+hOC7g asb/IcyaPV/U/fe1nDisj22zsg0DH1nYKYKw5m+YMm0lO1w4oT1PRrMp5RFRIjISY/Zz 8hJkBlmExsoyeiLTZnsLRSdDvCjFajPyuZ+z013BS2si8w7Cb1btzeE7Fx9wTJ89Mf4V ywhuFDYx3ykFU4yH1G6dgZoVGMfIPkFsdV+PKbWAgicEsTauFy8mcd4BuxfcDoyQf56B n6WP0yO64N/MjY7ZS3CMImlWH5RK8GW1U0cRbMJuv+uLrRcjzUw45GS2O1teayPyzazN JQnA== X-Gm-Message-State: AOAM531j+aSbJmw5u+ZjfGt09M0fwP+ePWcXZembvvr6WQNQTZgY/tn1 buIjelS3UFVlNWaYXmw5utml1Gt9DU+BMZ21LjhPrw== X-Google-Smtp-Source: ABdhPJzoVvbPeRB9+ETNsDM6qu+uPWrlpgh1qA0dj/BQuaQ/dyR38S2IE+ZFRFEpV2Deld6Kv3zukvke0xXh7Z79CTE= X-Received: by 2002:a67:ee88:: with SMTP id n8mr24421989vsp.58.1634499349260; Sun, 17 Oct 2021 12:35:49 -0700 (PDT) MIME-Version: 1.0 References: <878rzw77i3.fsf@disp2133> <20210929173611.fo5traia77o63gpw@example.org> <20210930130640.wudkpmn3cmah2cjz@example.org> <878rz8wwb6.fsf@disp2133> <87v92cvhbf.fsf@disp2133> <87mtnavszx.fsf_-_@disp2133> <20211015230922.7s7ab37k2sioa5vg@example.org> <87zgr8vpop.fsf@disp2133> In-Reply-To: <87zgr8vpop.fsf@disp2133> From: Yu Zhao Date: Sun, 17 Oct 2021 13:35:37 -0600 Message-ID: Subject: Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting To: "Eric W. Biederman" Cc: Alexey Gladkov , Rune Kleveland , Jordan Glover , LKML , Linux-MM , "containers\\@lists.linux-foundation.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 16, 2021 at 11:35 AM Eric W. Biederman wrote: > > Alexey Gladkov writes: > > > On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote: > >> > >> In commit fda31c50292a ("signal: avoid double atomic counter > >> increments for user accounting") Linus made a clever optimization to > >> how rlimits and the struct user_struct. Unfortunately that > >> optimization does not work in the obvious way when moved to nested > >> rlimits. The problem is that the last decrement of the per user > >> namespace per user sigpending counter might also be the last decrement > >> of the sigpending counter in the parent user namespace as well. Which > >> means that simply freeing the leaf ucount in __free_sigqueue is not > >> enough. > >> > >> Maintain the optimization and handle the tricky cases by introducing > >> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. > >> > >> By moving the entire optimization into functions that perform all of > >> the work it becomes possible to ensure that every level is handled > >> properly. > >> > >> I wish we had a single user across all of the threads whose rlimit > >> could be charged so we did not need this complexity. > >> > >> Cc: stable@vger.kernel.org > >> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") > >> Signed-off-by: "Eric W. Biederman" > >> --- > >> > >> With a lot of help from Alex who found a way I could reproduce this > >> I believe I have found the issue. > >> > >> Could people who are seeing this issue test and verify this solves the > >> problem for them? > >> > >> include/linux/user_namespace.h | 2 ++ > >> kernel/signal.c | 25 +++++---------------- > >> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++ > >> 3 files changed, 49 insertions(+), 19 deletions(-) > >> > >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > >> index eb70cabe6e7f..33a4240e6a6f 100644 > >> --- a/include/linux/user_namespace.h > >> +++ b/include/linux/user_namespace.h > >> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t > >> > >> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); > >> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); > >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type); > >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type); > >> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max); > >> > >> static inline void set_rlimit_ucount_max(struct user_namespace *ns, > >> diff --git a/kernel/signal.c b/kernel/signal.c > >> index a3229add4455..762de58c6e76 100644 > >> --- a/kernel/signal.c > >> +++ b/kernel/signal.c > >> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, > >> */ > >> rcu_read_lock(); > >> ucounts = task_ucounts(t); > >> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); > >> - switch (sigpending) { > >> - case 1: > >> - if (likely(get_ucounts(ucounts))) > >> - break; > >> - fallthrough; > >> - case LONG_MAX: > >> - /* > >> - * we need to decrease the ucount in the userns tree on any > >> - * failure to avoid counts leaking. > >> - */ > >> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); > >> - rcu_read_unlock(); > >> - return NULL; > >> - } > >> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); > >> rcu_read_unlock(); > >> + if (sigpending == LONG_MAX) > >> + return NULL; > >> > >> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { > >> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); > >> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, > >> } > >> > >> if (unlikely(q == NULL)) { > >> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) > >> - put_ucounts(ucounts); > >> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); > >> } else { > >> INIT_LIST_HEAD(&q->list); > >> q->flags = sigqueue_flags; > >> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q) > >> { > >> if (q->flags & SIGQUEUE_PREALLOC) > >> return; > >> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) { > >> - put_ucounts(q->ucounts); > >> + if (q->ucounts) { > >> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); > >> q->ucounts = NULL; > >> } > >> kmem_cache_free(sigqueue_cachep, q); > >> diff --git a/kernel/ucount.c b/kernel/ucount.c > >> index 3b7e176cf7a2..687d77aa66bb 100644 > >> --- a/kernel/ucount.c > >> +++ b/kernel/ucount.c > >> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v) > >> return (new == 0); > >> } > >> > >> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts, > >> + struct ucounts *last, enum ucount_type type) > >> +{ > >> + struct ucounts *iter; > >> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) { > >> + long dec = atomic_long_add_return(-1, &iter->ucount[type]); > >> + WARN_ON_ONCE(dec < 0); > >> + if (dec == 0) > >> + put_ucounts(iter); > >> + } > >> +} > >> + > >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type) > >> +{ > >> + do_dec_rlimit_put_ucounts(ucounts, NULL, type); > >> +} > >> + > >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type) > >> +{ > >> + struct ucounts *iter; > >> + long dec, ret = 0; > >> + > >> + for (iter = ucounts; iter; iter = iter->ns->ucounts) { > >> + long max = READ_ONCE(iter->ns->ucount_max[type]); > >> + long new = atomic_long_add_return(1, &iter->ucount[type]); > >> + if (new < 0 || new > max) > >> + goto unwind; > >> + else if (iter == ucounts) > >> + ret = new; > >> + if ((new == 1) && (get_ucounts(iter) != iter)) > > > > get_ucounts can do put_ucounts. Are you sure it's correct to use > > get_ucounts here? > > My only concern would be if we could not run inc_rlimit_get_ucounts > would not be safe to call under rcu_read_lock(). I don't see anything > in get_ucounts or put_ucounts that would not be safe under > rcu_read_lock(). > > For get_ucounts we do need to test to see if it fails. Either by > testing for NULL or testing to see if it does not return the expected > ucount. > > Does that make sense or do you have another concern? > > > >> + goto dec_unwind; > >> + } > >> + return ret; > >> +dec_unwind: > >> + dec = atomic_long_add_return(1, &iter->ucount[type]); > > > > Should be -1 ? > > Yes it should. I will fix and resend. Or just atomic_long_dec_return(). > >> + WARN_ON_ONCE(dec < 0); > >> +unwind: > >> + do_dec_rlimit_put_ucounts(ucounts, iter, type); > >> + return LONG_MAX; > >> +} > >> + > >> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max) > >> { > >> struct ucounts *iter; > >> -- > >> 2.20.1 > >> > > Eric