From: ebiederm@xmission.com (Eric W. Biederman)
To: Hillf Danton <hdanton@sina.com>
Cc: Rune Kleveland <rune.kleveland@infomedia.dk>,
Yu Zhao <yuzhao@google.com>, Alexey Gladkov <legion@kernel.org>,
Jordan Glover <Golden_Miller83@protonmail.ch>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, containers@lists.linux-foundation.org
Subject: Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
Date: Sat, 16 Oct 2021 13:00:49 -0500 [thread overview]
Message-ID: <87czo4voha.fsf@disp2133> (raw)
In-Reply-To: <20211016020833.1538-1-hdanton@sina.com> (Hillf Danton's message of "Sat, 16 Oct 2021 10:08:33 +0800")
Hillf Danton <hdanton@sina.com> writes:
> On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote:
>>
>> In commit fda31c50292a ("signal: avoid double atomic counter
>> increments for user accounting") Linus made a clever optimization to
>> how rlimits and the struct user_struct. Unfortunately that
>> optimization does not work in the obvious way when moved to nested
>> rlimits. The problem is that the last decrement of the per user
>> namespace per user sigpending counter might also be the last decrement
>> of the sigpending counter in the parent user namespace as well. Which
>> means that simply freeing the leaf ucount in __free_sigqueue is not
>> enough.
>>
>> Maintain the optimization and handle the tricky cases by introducing
>> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>>
>> By moving the entire optimization into functions that perform all of
>> the work it becomes possible to ensure that every level is handled
>> properly.
>>
>> I wish we had a single user across all of the threads whose rlimit
>> could be charged so we did not need this complexity.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>
>> With a lot of help from Alex who found a way I could reproduce this
>> I believe I have found the issue.
>>
>> Could people who are seeing this issue test and verify this solves the
>> problem for them?
>>
>> include/linux/user_namespace.h | 2 ++
>> kernel/signal.c | 25 +++++----------------
>> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 49 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index eb70cabe6e7f..33a4240e6a6f 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>>
>> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>>
>> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index a3229add4455..762de58c6e76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> */
>> rcu_read_lock();
>> ucounts = task_ucounts(t);
>> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - switch (sigpending) {
>> - case 1:
>> - if (likely(get_ucounts(ucounts)))
>> - break;
>> - fallthrough;
>> - case LONG_MAX:
>> - /*
>> - * we need to decrease the ucount in the userns tree on any
>> - * failure to avoid counts leaking.
>> - */
>> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - rcu_read_unlock();
>> - return NULL;
>> - }
>> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> rcu_read_unlock();
>> + if (sigpending == LONG_MAX)
>> + return NULL;
>>
>> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
>> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> }
>>
>> if (unlikely(q == NULL)) {
>> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
>> - put_ucounts(ucounts);
>> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> } else {
>> INIT_LIST_HEAD(&q->list);
>> q->flags = sigqueue_flags;
>> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>> {
>> if (q->flags & SIGQUEUE_PREALLOC)
>> return;
>> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
>> - put_ucounts(q->ucounts);
>> + if (q->ucounts) {
>> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> q->ucounts = NULL;
>> }
>> kmem_cache_free(sigqueue_cachep, q);
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 3b7e176cf7a2..687d77aa66bb 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>> return (new == 0);
>> }
>>
>> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> + struct ucounts *last, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
>> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
>> + WARN_ON_ONCE(dec < 0);
>> + if (dec == 0)
>> + put_ucounts(iter);
>> + }
>
> Given kfree in put_ucounts(), this has difficulty surviving tests like
> kasan if the put pairs with the get in the below
> inc_rlimit_get_ucounts().
I don't know if this is what you are thinking about but there is indeed
a bug in that loop caused by kfree.
The problem is that iter->ns->ucounts is read after put_ucounts. It
just needs to be read before hand.
>> +}
>> +
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>> +}
>> +
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + long dec, ret = 0;
>> +
>> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> + long max = READ_ONCE(iter->ns->ucount_max[type]);
>> + long new = atomic_long_add_return(1, &iter->ucount[type]);
>> + if (new < 0 || new > max)
>> + goto unwind;
>> + else if (iter == ucounts)
>> + ret = new;
>> + if ((new == 1) && (get_ucounts(iter) != iter))
>> + goto dec_unwind;
>
> Add a line of comment for get to ease readers.
/* you are not expected to understand this */
I think that is the classic comment from unix source. Seriously I can't
think of any comment that will make the situation more comprehensible.
> Hillf
>
>> + }
>> + return ret;
>> +dec_unwind:
>> + dec = atomic_long_add_return(1, &iter->ucount[type]);
>> + WARN_ON_ONCE(dec < 0);
>> +unwind:
>> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
>> + return LONG_MAX;
>> +}
>> +
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>> {
>> struct ucounts *iter;
>> --
>> 2.20.1
Eric
next prev parent reply other threads:[~2021-10-16 18:01 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-15 19:49 linux 5.14.3: free_user_ns causes NULL pointer dereference Jordan Glover
2021-09-15 21:02 ` Eric W. Biederman
2021-09-15 22:42 ` Jordan Glover
2021-09-15 23:44 ` Yu Zhao
2021-09-17 16:15 ` Eric W. Biederman
2021-09-17 18:45 ` Yu Zhao
2021-09-15 23:47 ` Jordan Glover
2021-09-16 17:30 ` Eric W. Biederman
2021-09-16 19:14 ` Alexey Gladkov
2021-09-28 13:40 ` Jordan Glover
2021-09-29 17:36 ` Alexey Gladkov
2021-09-29 21:39 ` Jordan Glover
2021-09-30 13:06 ` Alexey Gladkov
2021-09-30 22:27 ` Yu Zhao
2021-10-04 17:10 ` Eric W. Biederman
2021-10-04 17:19 ` Eric W. Biederman
2021-10-04 21:34 ` Yu Zhao
2021-10-11 13:39 ` Alexey Gladkov
[not found] ` <ccbccf82-dc50-00b2-1cfd-3da5e2c81dbf@infomedia.dk>
2021-10-12 17:31 ` Eric W. Biederman
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
2021-10-15 23:09 ` Alexey Gladkov
2021-10-16 17:34 ` Eric W. Biederman
2021-10-17 19:35 ` Yu Zhao
2021-10-18 15:35 ` Eric W. Biederman
2021-10-17 16:47 ` Rune Kleveland
2021-10-18 6:25 ` Yu Zhao
2021-10-18 10:31 ` Jordan Glover
2021-10-18 16:06 ` [PATCH v2] " Eric W. Biederman
2021-10-18 17:21 ` [PATCH 0/3] ucounts: misc fixes Eric W. Biederman
2021-10-18 17:23 ` [PATCH 1/3] ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds Eric W. Biederman
2021-10-18 17:23 ` [PATCH 2/3] ucounts: Proper error handling in set_cred_ucounts Eric W. Biederman
2021-10-18 17:24 ` [PATCH 3/3] ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring Eric W. Biederman
2021-10-18 17:54 ` [PATCH 0/4] ucounts: misc cleanups Eric W. Biederman
2021-10-18 17:55 ` [PATCH 1/4] ucounts: In set_cred_ucounts assume new->ucounts is non-NULL Eric W. Biederman
2021-10-18 17:56 ` [PATCH 2/4] ucounts: Remove unnecessary test for NULL ucount in get_ucounts Eric W. Biederman
2021-10-18 17:56 ` [PATCH 3/4] ucounts: Add get_ucounts_or_wrap for clarity Eric W. Biederman
2021-10-18 17:57 ` [PATCH 4/4] ucounts: Use atomic_long_sub_return " Eric W. Biederman
2021-10-18 22:29 ` [PATCH 0/4] ucounts: misc cleanups Yu Zhao
2021-10-18 22:28 ` [PATCH 0/3] ucounts: misc fixes Yu Zhao
2021-10-18 22:26 ` [PATCH v2] ucounts: Fix signal ucount refcounting Yu Zhao
[not found] ` <20211016020833.1538-1-hdanton@sina.com>
2021-10-16 18:00 ` Eric W. Biederman [this message]
[not found] ` <20211006021219.2010-1-hdanton@sina.com>
2021-10-06 6:22 ` linux 5.14.3: free_user_ns causes NULL pointer dereference Yu Zhao
2021-10-07 13:28 ` Jordan Glover
2021-10-03 19:37 ` Jordan Glover
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87czo4voha.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=Golden_Miller83@protonmail.ch \
--cc=containers@lists.linux-foundation.org \
--cc=hdanton@sina.com \
--cc=legion@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rune.kleveland@infomedia.dk \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).