From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 649B9C433FE for ; Fri, 24 Sep 2021 18:56:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4402C61241 for ; Fri, 24 Sep 2021 18:56:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345491AbhIXS5n (ORCPT ); Fri, 24 Sep 2021 14:57:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344763AbhIXS5g (ORCPT ); Fri, 24 Sep 2021 14:57:36 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE24EC061571 for ; Fri, 24 Sep 2021 11:56:02 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id w14so9694821pfu.2 for ; Fri, 24 Sep 2021 11:56:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=NGoW/Pauk57xfcR+YHAaw7k6fzsKu4UrCFosKDZQNhc=; b=jsHCR5rFzTcloSpo/nmpIFZpciSorWh4Y+oVGCvzGVncHgEBO/G44pcvRIwJrshFKW VjoSHczn2EvtVLXEP8/pEhu/4ZtVKH9YvuBvdrRUqQEyhPp318CJtE9qHIy6oo1wiUs9 gOr5NSF1RSLgoKQhy1gViChfBGi54MpUO+B/c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=NGoW/Pauk57xfcR+YHAaw7k6fzsKu4UrCFosKDZQNhc=; b=KiojnTrrLQvBEP60YJqD0/InNMqURC9OQkURdQKaFmjR5v68Y8w3yWE9Nz610iaJKM RDp1jDjp10OrjT732pug0LuWyPi1hsxMCkowydgvWi3zF8YwTx0A4PwzLj7G15ILAhgM n0Ah7UIvwk2GWjr9k7AEaXBp1MzFtTM6ahLJeUb7IYGW5yrT97Kws5JuJQioIkE0PWN4 ssI+9obBj3KKflZoTk54CwT+CB7alpzLQQvPosh5ySAxRE75KKiEGsch+kuM8ZD2ADLg uvf/QwGr4UFnrrrORROqzJ1yVI3TJALhLJ9q0ipkhulRyrZ0ThorePodBXPjQkX7If8O gTMg== X-Gm-Message-State: AOAM533WMv/JcPKI81dqXVKvjHrDS9cYorNAQ8F3c5a7ZIf9++7YktLv szn1slbJtrz6xcQtPQlrLD7LXh3tl8v+NA== X-Google-Smtp-Source: ABdhPJyqUfeL746hH0cL1FYdt51gjGj7+Ui07J5qNrocIp8zuNCXfJ5ku/NGqVYCGF2G6hP7AMg8FQ== X-Received: by 2002:a65:40c4:: with SMTP id u4mr5120061pgp.186.1632509761981; Fri, 24 Sep 2021 11:56:01 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id x19sm9522763pfn.105.2021.09.24.11.56.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 11:56:01 -0700 (PDT) Date: Fri, 24 Sep 2021 11:56:00 -0700 From: Kees Cook To: "Eric W. Biederman" Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Oleg Nesterov , Al Viro , linux-api@vger.kernel.org Subject: Re: [PATCH 6/6] coredump: Limit coredumps to a single thread group Message-ID: <202109241154.A915C488E2@keescook> References: <87v92qx2c6.fsf@disp2133> <87y27mvnke.fsf@disp2133> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y27mvnke.fsf@disp2133> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 23, 2021 at 07:12:33PM -0500, Eric W. Biederman wrote: > > Today when a signal is delivered with a handler of SIG_DFL whose > default behavior is to generate a core dump not only that process but > every process that shares the mm is killed. > > In the case of vfork this looks like a real world problem. Consider > the following well defined sequence. > > if (vfork() == 0) { > execve(...); > _exit(EXIT_FAILURE); > } > > If a signal that generates a core dump is received after vfork but > before the execve changes the mm the process that called vfork will > also be killed (as the mm is shared). > > Similarly if the execve fails after the point of no return the kernel > delivers SIGSEGV which will kill both the exec'ing process and because > the mm is shared the process that called vfork as well. > > As far as I can tell this behavior is a violation of people's > reasonable expectations, POSIX, and is unnecessarily fragile when the > system is low on memory. > > Solve this by making a userspace visible change to only kill a single > process/thread group. This is possible because Jann Horn recently > modified[1] the coredump code so that the mm can safely be modified > while the coredump is happening. With LinuxThreads long gone I don't > expect anyone to have a notice this behavior change in practice. > > To accomplish this move the core_state pointer from mm_struct to > signal_struct, which allows different thread groups to coredump > simultatenously. > > In zap_threads remove the work to kill anything except for the current > thread group. > > [1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") > Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") > History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git > Signed-off-by: "Eric W. Biederman" This looks correct to me, but depends on the 5/6 not introducing any races. So, to that end: Reviewed-by: Kees Cook If you have some local tools you've been using for testing this series, can you toss them into tools/testing/selftests/ptrace/ ? I can help clean them up if want. -Kees > --- > fs/binfmt_elf.c | 4 +- > fs/binfmt_elf_fdpic.c | 2 +- > fs/coredump.c | 84 ++++-------------------------------- > fs/proc/array.c | 6 +-- > include/linux/mm_types.h | 13 ------ > include/linux/sched/signal.h | 13 ++++++ > kernel/exit.c | 13 ++---- > kernel/fork.c | 1 - > 8 files changed, 32 insertions(+), 104 deletions(-) > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 69d900a8473d..796e5327ee7d 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -1834,7 +1834,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, > /* > * Allocate a structure for each thread. > */ > - for (ct = &dump_task->mm->core_state->dumper; ct; ct = ct->next) { > + for (ct = &dump_task->signal->core_state->dumper; ct; ct = ct->next) { > t = kzalloc(offsetof(struct elf_thread_core_info, > notes[info->thread_notes]), > GFP_KERNEL); > @@ -2024,7 +2024,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, > if (!elf_note_info_init(info)) > return 0; > > - for (ct = current->mm->core_state->dumper.next; > + for (ct = current->signal->core_state->dumper.next; > ct; ct = ct->next) { > ets = kzalloc(sizeof(*ets), GFP_KERNEL); > if (!ets) > diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c > index 6d8fd6030cbb..c6f588dc4a9d 100644 > --- a/fs/binfmt_elf_fdpic.c > +++ b/fs/binfmt_elf_fdpic.c > @@ -1494,7 +1494,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) > if (dump_vma_snapshot(cprm, &vma_count, &vma_meta, &vma_data_size)) > goto end_coredump; > > - for (ct = current->mm->core_state->dumper.next; > + for (ct = current->signal->core_state->dumper.next; > ct; ct = ct->next) { > tmp = elf_dump_thread_status(cprm->siginfo->si_signo, > ct->task, &thread_status_size); > diff --git a/fs/coredump.c b/fs/coredump.c > index d576287fb88b..a6b3c196cdef 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -369,99 +369,34 @@ static int zap_process(struct task_struct *start, int exit_code, int flags) > return nr; > } > > -static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, > +static int zap_threads(struct task_struct *tsk, > struct core_state *core_state, int exit_code) > { > - struct task_struct *g, *p; > - unsigned long flags; > int nr = -EAGAIN; > > spin_lock_irq(&tsk->sighand->siglock); > if (!signal_group_exit(tsk->signal)) { > - mm->core_state = core_state; > + tsk->signal->core_state = core_state; > tsk->signal->group_exit_task = tsk; > nr = zap_process(tsk, exit_code, 0); > clear_tsk_thread_flag(tsk, TIF_SIGPENDING); > + tsk->flags |= PF_DUMPCORE; > + atomic_set(&core_state->nr_threads, nr); > } > spin_unlock_irq(&tsk->sighand->siglock); > - if (unlikely(nr < 0)) > - return nr; > - > - tsk->flags |= PF_DUMPCORE; > - if (atomic_read(&mm->mm_users) == nr + 1) > - goto done; > - /* > - * We should find and kill all tasks which use this mm, and we should > - * count them correctly into ->nr_threads. We don't take tasklist > - * lock, but this is safe wrt: > - * > - * fork: > - * None of sub-threads can fork after zap_process(leader). All > - * processes which were created before this point should be > - * visible to zap_threads() because copy_process() adds the new > - * process to the tail of init_task.tasks list, and lock/unlock > - * of ->siglock provides a memory barrier. > - * > - * do_exit: > - * The caller holds mm->mmap_lock. This means that the task which > - * uses this mm can't pass coredump_task_exit(), so it can't exit > - * or clear its ->mm. > - * > - * de_thread: > - * It does list_replace_rcu(&leader->tasks, ¤t->tasks), > - * we must see either old or new leader, this does not matter. > - * However, it can change p->sighand, so lock_task_sighand(p) > - * must be used. Since p->mm != NULL and we hold ->mmap_lock > - * it can't fail. > - * > - * Note also that "g" can be the old leader with ->mm == NULL > - * and already unhashed and thus removed from ->thread_group. > - * This is OK, __unhash_process()->list_del_rcu() does not > - * clear the ->next pointer, we will find the new leader via > - * next_thread(). > - */ > - rcu_read_lock(); > - for_each_process(g) { > - if (g == tsk->group_leader) > - continue; > - if (g->flags & PF_KTHREAD) > - continue; > - > - for_each_thread(g, p) { > - if (unlikely(!p->mm)) > - continue; > - if (unlikely(p->mm == mm)) { > - lock_task_sighand(p, &flags); > - nr += zap_process(p, exit_code, > - SIGNAL_GROUP_EXIT); > - unlock_task_sighand(p, &flags); > - } > - break; > - } > - } > - rcu_read_unlock(); > -done: > - atomic_set(&core_state->nr_threads, nr); > return nr; > } > > static int coredump_wait(int exit_code, struct core_state *core_state) > { > struct task_struct *tsk = current; > - struct mm_struct *mm = tsk->mm; > int core_waiters = -EBUSY; > > init_completion(&core_state->startup); > core_state->dumper.task = tsk; > core_state->dumper.next = NULL; > > - if (mmap_write_lock_killable(mm)) > - return -EINTR; > - > - if (!mm->core_state) > - core_waiters = zap_threads(tsk, mm, core_state, exit_code); > - mmap_write_unlock(mm); > - > + core_waiters = zap_threads(tsk, core_state, exit_code); > if (core_waiters > 0) { > struct core_thread *ptr; > > @@ -483,7 +418,7 @@ static int coredump_wait(int exit_code, struct core_state *core_state) > return core_waiters; > } > > -static void coredump_finish(struct mm_struct *mm, bool core_dumped) > +static void coredump_finish(bool core_dumped) > { > struct core_thread *curr, *next; > struct task_struct *task; > @@ -493,9 +428,10 @@ static void coredump_finish(struct mm_struct *mm, bool core_dumped) > current->signal->group_exit_code |= 0x80; > current->signal->group_exit_task = NULL; > current->signal->flags = SIGNAL_GROUP_EXIT; > + next = current->signal->core_state->dumper.next; > + current->signal->core_state = NULL; > spin_unlock_irq(¤t->sighand->siglock); > > - next = mm->core_state->dumper.next; > while ((curr = next) != NULL) { > next = curr->next; > task = curr->task; > @@ -507,8 +443,6 @@ static void coredump_finish(struct mm_struct *mm, bool core_dumped) > curr->task = NULL; > wake_up_process(task); > } > - > - mm->core_state = NULL; > } > > static bool dump_interrupted(void) > @@ -839,7 +773,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > fail_unlock: > kfree(argv); > kfree(cn.corename); > - coredump_finish(mm, core_dumped); > + coredump_finish(core_dumped); > revert_creds(old_cred); > fail_creds: > put_cred(cred); > diff --git a/fs/proc/array.c b/fs/proc/array.c > index 49be8c8ef555..520c51be1e57 100644 > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -408,9 +408,9 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task) > cpumask_pr_args(&task->cpus_mask)); > } > > -static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm) > +static inline void task_core_dumping(struct seq_file *m, struct task_struct *task) > { > - seq_put_decimal_ull(m, "CoreDumping:\t", !!mm->core_state); > + seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state); > seq_putc(m, '\n'); > } > > @@ -436,7 +436,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, > > if (mm) { > task_mem(m, mm); > - task_core_dumping(m, mm); > + task_core_dumping(m, task); > task_thp_status(m, mm); > mmput(mm); > } > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 7f8ee09c711f..1039f6ae922c 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -387,17 +387,6 @@ struct vm_area_struct { > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; > } __randomize_layout; > > -struct core_thread { > - struct task_struct *task; > - struct core_thread *next; > -}; > - > -struct core_state { > - atomic_t nr_threads; > - struct core_thread dumper; > - struct completion startup; > -}; > - > struct kioctx_table; > struct mm_struct { > struct { > @@ -518,8 +507,6 @@ struct mm_struct { > > unsigned long flags; /* Must use atomic bitops to access */ > > - struct core_state *core_state; /* coredumping support */ > - > #ifdef CONFIG_AIO > spinlock_t ioctx_lock; > struct kioctx_table __rcu *ioctx_table; > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index e5f4ce622ee6..a8fe2a593a3a 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -72,6 +72,17 @@ struct multiprocess_signals { > struct hlist_node node; > }; > > +struct core_thread { > + struct task_struct *task; > + struct core_thread *next; > +}; > + > +struct core_state { > + atomic_t nr_threads; > + struct core_thread dumper; > + struct completion startup; > +}; > + > /* > * NOTE! "signal_struct" does not have its own > * locking, because a shared signal_struct always > @@ -110,6 +121,8 @@ struct signal_struct { > int group_stop_count; > unsigned int flags; /* see SIGNAL_* flags below */ > > + struct core_state *core_state; /* coredumping support */ > + > /* > * PR_SET_CHILD_SUBREAPER marks a process, like a service > * manager, to re-parent orphan (double-forking) child processes > diff --git a/kernel/exit.c b/kernel/exit.c > index 774e6b5061b8..2b355e926c13 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -342,23 +342,18 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) > static void coredump_task_exit(struct task_struct *tsk) > { > struct core_state *core_state; > - struct mm_struct *mm; > - > - mm = tsk->mm; > - if (!mm) > - return; > > /* > * Serialize with any possible pending coredump. > - * We must hold mmap_lock around checking core_state > + * We must hold siglock around checking core_state > * and setting PF_POSTCOREDUMP. The core-inducing thread > * will increment ->nr_threads for each thread in the > * group without PF_POSTCOREDUMP set. > */ > - mmap_read_lock(mm); > + spin_lock_irq(&tsk->sighand->siglock); > tsk->flags |= PF_POSTCOREDUMP; > - core_state = mm->core_state; > - mmap_read_unlock(mm); > + core_state = tsk->signal->core_state; > + spin_unlock_irq(&tsk->sighand->siglock); > if (core_state) { > struct core_thread self; > > diff --git a/kernel/fork.c b/kernel/fork.c > index 9bd9f2da9e41..c8adb76982f7 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1044,7 +1044,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > seqcount_init(&mm->write_protect_seq); > mmap_init_lock(mm); > INIT_LIST_HEAD(&mm->mmlist); > - mm->core_state = NULL; > mm_pgtables_bytes_init(mm); > mm->map_count = 0; > mm->locked_vm = 0; > -- > 2.20.1 > -- Kees Cook