All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Munehisa Kamata <kamatam@amazon.com>
Cc: ebiggers@kernel.org, hannes@cmpxchg.org, hdanton@sina.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mengcc@amazon.com, stable@vger.kernel.org
Subject: Re: [PATCH v2] sched/psi: fix use-after-free in ep_remove_wait_queue()
Date: Tue, 14 Feb 2023 09:10:58 -0800	[thread overview]
Message-ID: <CAJuCfpGiktjjPZYPp8LNtbmvYhkxh_icEWXOVgsq9qeq+w6s+g@mail.gmail.com> (raw)
In-Reply-To: <20230214070429.3613260-1-kamatam@amazon.com>

Thanks!
Overall LGTM, just a couple of nits (simplifications):

On Mon, Feb 13, 2023 at 11:04 PM Munehisa Kamata <kamatam@amazon.com> wrote:
>
> If a non-root cgroup gets removed when there is a thread that registered
> trigger and is polling on a pressure file within the cgroup, the polling
> waitqueue gets freed without clearing the queue and reference in the
> following path.

Let's remove "without clearing the queue and reference" in the above
sentence. The next section explains why this is problematic, therefore
mentioning that here is unnecessary IMHO.

>
>  do_rmdir
>    cgroup_rmdir
>      kernfs_drain_open_files
>        cgroup_file_release
>          cgroup_pressure_release
>            psi_trigger_destroy
>
> However, the polling thread can keep having the last reference to the
> pressure file that is tied to the freed waitqueue until explicit close or
> exit later.

Suggest replacing: However, the polling thread still has a reference
to the pressure file it is polling and will access the freed waitqueue
when file is closed or upon exit:

>
>  fput
>    ep_eventpoll_release
>      ep_free
>        ep_remove_wait_queue
>          remove_wait_queue
>
> Then, the thread accesses to the already-freed waitqueue when dropping the
> reference and results in use-after-free as pasted below.

Suggest replacing: This results is use-after-free as pasted below.

>
> The fundamental problem here is that the lifetime of the waitqueue is not
> tied to the file's real lifetime as shown above.

The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real
lifetime.

> Using wake_up_pollfree()
> here might be less than ideal, but it also is not fully contradicting the
> comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()") since the
> waitqueue's lifetime is not tied to file's one and can be considered as
> another special case. While this would be fixable by somehow making
> cgroup_file_release() be tied to the fput(), it would require sizable
> refactoring at cgroups or higher layer which might be more justifiable if
> we identify more cases like this.
>
>  BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
>  Write of size 4 at addr ffff88810e625328 by task a.out/4404
>
>  CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
>  Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
>  Call Trace:
>  <TASK>
>  dump_stack_lvl+0x73/0xa0
>  print_report+0x16c/0x4e0
>  ? _printk+0x59/0x80
>  ? __virt_addr_valid+0xb8/0x130
>  ? _raw_spin_lock_irqsave+0x60/0xc0
>  kasan_report+0xc3/0xf0
>  ? _raw_spin_lock_irqsave+0x60/0xc0
>  kasan_check_range+0x2d2/0x310
>  _raw_spin_lock_irqsave+0x60/0xc0
>  remove_wait_queue+0x1a/0xa0
>  ep_free+0x12c/0x170
>  ep_eventpoll_release+0x26/0x30
>  __fput+0x202/0x400
>  task_work_run+0x11d/0x170
>  do_exit+0x495/0x1130
>  ? update_cfs_rq_load_avg+0x2c2/0x2e0
>  do_group_exit+0x100/0x100
>  get_signal+0xd67/0xde0
>  ? finish_task_switch+0x15f/0x3a0
>  arch_do_signal_or_restart+0x2a/0x2b0
>  exit_to_user_mode_prepare+0x94/0x100
>  syscall_exit_to_user_mode+0x20/0x40
>  do_syscall_64+0x52/0x90
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0033:0x7f8e392bfb91
>  Code: Unable to access opcode bytes at 0x7f8e392bfb67.
>  RSP: 002b:00007fff261e08d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
>  RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f8e392bfb91
>  RDX: 0000000000000001 RSI: 00007fff261e08e8 RDI: 0000000000000004
>  RBP: 00007fff261e0920 R08: 0000000000400780 R09: 00007f8e3960f240
>  R10: 00000000000003df R11: 0000000000000246 R12: 00000000004005a0
>  R13: 00007fff261e0a00 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
>
>  Allocated by task 4404:
>  kasan_set_track+0x3d/0x60
>  __kasan_kmalloc+0x85/0x90
>  psi_trigger_create+0x113/0x3e0
>  pressure_write+0x146/0x2e0
>  cgroup_file_write+0x11c/0x250
>  kernfs_fop_write_iter+0x186/0x220
>  vfs_write+0x3d8/0x5c0
>  ksys_write+0x90/0x110
>  do_syscall_64+0x43/0x90
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
>  Freed by task 4407:
>  kasan_set_track+0x3d/0x60
>  kasan_save_free_info+0x27/0x40
>  ____kasan_slab_free+0x11d/0x170
>  slab_free_freelist_hook+0x87/0x150
>  __kmem_cache_free+0xcb/0x180
>  psi_trigger_destroy+0x2e8/0x310
>  cgroup_file_release+0x4f/0xb0
>  kernfs_drain_open_files+0x165/0x1f0
>  kernfs_drain+0x162/0x1a0
>  __kernfs_remove+0x1fb/0x310
>  kernfs_remove_by_name_ns+0x95/0xe0
>  cgroup_addrm_files+0x67f/0x700
>  cgroup_destroy_locked+0x283/0x3c0
>  cgroup_rmdir+0x29/0x100
>  kernfs_iop_rmdir+0xd1/0x140
>  vfs_rmdir+0xfe/0x240
>  do_rmdir+0x13d/0x280
>  __x64_sys_rmdir+0x2c/0x30
>  do_syscall_64+0x43/0x90
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> v2: updated commit message
>
> Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
> Fixes: 0e94682b73bf ("psi: introduce psi monitor")
> Cc: stable@vger.kernel.org
> Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
> Signed-off-by: Mengchi Cheng <mengcc@amazon.com>
> ---
>  kernel/sched/psi.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 8ac8b81bfee6..6e66c15f6450 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1343,10 +1343,11 @@ void psi_trigger_destroy(struct psi_trigger *t)
>
>         group = t->group;
>         /*
> -        * Wakeup waiters to stop polling. Can happen if cgroup is deleted
> -        * from under a polling process.
> +        * Wakeup waiters to stop polling and clear the queue to prevent it from
> +        * being accessed later. Can happen if cgroup is deleted from under a
> +        * polling process otherwise.

This "otherwise" at the end seems extra. Was there a continuation of
this comment which was removed without removing this "otherwise" ?

>          */
> -       wake_up_interruptible(&t->event_wait);
> +       wake_up_pollfree(&t->event_wait);
>
>         mutex_lock(&group->trigger_lock);
>
> --
> 2.38.1
>

  reply	other threads:[~2023-02-14 17:11 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 22:48 another use-after-free in ep_remove_wait_queue() Munehisa Kamata
2023-01-06 23:34 ` Suren Baghdasaryan
2023-01-07  8:07 ` Hillf Danton
2023-01-08 22:25   ` Munehisa Kamata
2023-01-08 23:49     ` Hillf Danton
2023-01-10  1:33       ` Suren Baghdasaryan
2023-01-10  3:06         ` Suren Baghdasaryan
2023-01-12 22:01           ` Suren Baghdasaryan
2023-01-13  2:25             ` Munehisa Kamata
2023-01-13 17:52               ` Suren Baghdasaryan
2023-01-19  3:06                 ` Suren Baghdasaryan
2023-01-19 21:01                   ` Suren Baghdasaryan
2023-01-19 22:25                     ` Johannes Weiner
2023-01-20  1:30                     ` Hillf Danton
2023-01-20  1:37                       ` Suren Baghdasaryan
2023-01-20  2:46                         ` Munehisa Kamata
2023-01-20  2:52                           ` Munehisa Kamata
2023-01-20  9:00                         ` Hillf Danton
2023-01-20 16:28                           ` Suren Baghdasaryan
2023-01-21  5:17                             ` Hillf Danton
2023-01-22  3:01                               ` Suren Baghdasaryan
2023-01-20  1:45                     ` Munehisa Kamata
2023-02-02  3:00                     ` [PATCH] sched/psi: fix " Munehisa Kamata
2023-02-02  4:56                       ` Eric Biggers
2023-02-02 21:11                         ` Suren Baghdasaryan
2023-02-09 17:09                           ` Suren Baghdasaryan
2023-02-09 18:46                             ` Eric Biggers
2023-02-09 19:13                               ` Suren Baghdasaryan
2023-02-13 23:50                                 ` Suren Baghdasaryan
2023-02-14  7:04                                   ` [PATCH v2] " Munehisa Kamata
2023-02-14 17:10                                     ` Suren Baghdasaryan [this message]
2023-02-14 18:13                                       ` [PATCH v3] " Munehisa Kamata
2023-02-14 18:28                                         ` Suren Baghdasaryan
2023-02-14 18:29                                           ` Suren Baghdasaryan
2023-02-14 18:55                                         ` Eric Biggers
2023-02-14 19:13                                           ` Suren Baghdasaryan
2023-02-14 18:37                                       ` [PATCH v2] " Munehisa Kamata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpGiktjjPZYPp8LNtbmvYhkxh_icEWXOVgsq9qeq+w6s+g@mail.gmail.com \
    --to=surenb@google.com \
    --cc=ebiggers@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=kamatam@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mengcc@amazon.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.