bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma
@ 2022-12-16 22:18 Kui-Feng Lee
  2022-12-16 22:18 ` [PATCH bpf-next v2 1/2] bpf: keep a reference to the mm, in case the task is dead Kui-Feng Lee
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Kui-Feng Lee @ 2022-12-16 22:18 UTC (permalink / raw)
  To: bpf, ast, andrii, kernel-team, song, yhs; +Cc: Kui-Feng Lee

This issue is related to task iterators over vma. A system crash can
occur when a task iterator travels through vma of tasks as the death
of a task will clear the pointer to its mm, even though the
task_struct is still held. As a result, an unexpected crash happens
due to a null pointer. To address this problem, a reference to mm is
kept on the iterator to make sure that the pointer is always
valid. This patch set provides a solution for this crash by properly
referencing mm on task iterators over vma.

The major changes from v1 are:

 - Fix commit logs of the test case.

 - Use reverse Christmas tree coding style.

 - Remove unnecessary error handling for time().

v1: https://lore.kernel.org/bpf/20221216015912.991616-1-kuifeng@meta.com/

Kui-Feng Lee (2):
  bpf: keep a reference to the mm, in case the task is dead.
  selftests/bpf: add a test for iter/task_vma for short-lived processes

 kernel/bpf/task_iter.c                        | 39 +++++++---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 73 +++++++++++++++++++
 2 files changed, 100 insertions(+), 12 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v2 1/2] bpf: keep a reference to the mm, in case the task is dead.
  2022-12-16 22:18 [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma Kui-Feng Lee
@ 2022-12-16 22:18 ` Kui-Feng Lee
  2022-12-16 22:18 ` [PATCH bpf-next v2 2/2] selftests/bpf: add a test for iter/task_vma for short-lived processes Kui-Feng Lee
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Kui-Feng Lee @ 2022-12-16 22:18 UTC (permalink / raw)
  To: bpf, ast, andrii, kernel-team, song, yhs
  Cc: Kui-Feng Lee, Nathan Slingerland, Yonghong Song

Fix the system crash that happens when a task iterator travel through
vma of tasks.

In task iterators, we used to access mm by following the pointer on
the task_struct; however, the death of a task will clear the pointer,
even though we still hold the task_struct.  That can cause an
unexpected crash for a null pointer when an iterator is visiting a
task that dies during the visit.  Keeping a reference of mm on the
iterator ensures we always have a valid pointer to mm.

Co-developed-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reported-by: Nathan Slingerland <slinger@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/task_iter.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index c2a2182ce570..c4ab9d6cdbe9 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -438,6 +438,7 @@ struct bpf_iter_seq_task_vma_info {
 	 */
 	struct bpf_iter_seq_task_common common;
 	struct task_struct *task;
+	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	u32 tid;
 	unsigned long prev_vm_start;
@@ -456,16 +457,19 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 	enum bpf_task_vma_iter_find_op op;
 	struct vm_area_struct *curr_vma;
 	struct task_struct *curr_task;
+	struct mm_struct *curr_mm;
 	u32 saved_tid = info->tid;
 
 	/* If this function returns a non-NULL vma, it holds a reference to
-	 * the task_struct, and holds read lock on vma->mm->mmap_lock.
+	 * the task_struct, holds a refcount on mm->mm_users, and holds
+	 * read lock on vma->mm->mmap_lock.
 	 * If this function returns NULL, it does not hold any reference or
 	 * lock.
 	 */
 	if (info->task) {
 		curr_task = info->task;
 		curr_vma = info->vma;
+		curr_mm = info->mm;
 		/* In case of lock contention, drop mmap_lock to unblock
 		 * the writer.
 		 *
@@ -504,13 +508,15 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 		 *    4.2) VMA2 and VMA2' covers different ranges, process
 		 *         VMA2'.
 		 */
-		if (mmap_lock_is_contended(curr_task->mm)) {
+		if (mmap_lock_is_contended(curr_mm)) {
 			info->prev_vm_start = curr_vma->vm_start;
 			info->prev_vm_end = curr_vma->vm_end;
 			op = task_vma_iter_find_vma;
-			mmap_read_unlock(curr_task->mm);
-			if (mmap_read_lock_killable(curr_task->mm))
+			mmap_read_unlock(curr_mm);
+			if (mmap_read_lock_killable(curr_mm)) {
+				mmput(curr_mm);
 				goto finish;
+			}
 		} else {
 			op = task_vma_iter_next_vma;
 		}
@@ -535,42 +541,47 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 			op = task_vma_iter_find_vma;
 		}
 
-		if (!curr_task->mm)
+		curr_mm = get_task_mm(curr_task);
+		if (!curr_mm)
 			goto next_task;
 
-		if (mmap_read_lock_killable(curr_task->mm))
+		if (mmap_read_lock_killable(curr_mm)) {
+			mmput(curr_mm);
 			goto finish;
+		}
 	}
 
 	switch (op) {
 	case task_vma_iter_first_vma:
-		curr_vma = find_vma(curr_task->mm, 0);
+		curr_vma = find_vma(curr_mm, 0);
 		break;
 	case task_vma_iter_next_vma:
-		curr_vma = find_vma(curr_task->mm, curr_vma->vm_end);
+		curr_vma = find_vma(curr_mm, curr_vma->vm_end);
 		break;
 	case task_vma_iter_find_vma:
 		/* We dropped mmap_lock so it is necessary to use find_vma
 		 * to find the next vma. This is similar to the  mechanism
 		 * in show_smaps_rollup().
 		 */
-		curr_vma = find_vma(curr_task->mm, info->prev_vm_end - 1);
+		curr_vma = find_vma(curr_mm, info->prev_vm_end - 1);
 		/* case 1) and 4.2) above just use curr_vma */
 
 		/* check for case 2) or case 4.1) above */
 		if (curr_vma &&
 		    curr_vma->vm_start == info->prev_vm_start &&
 		    curr_vma->vm_end == info->prev_vm_end)
-			curr_vma = find_vma(curr_task->mm, curr_vma->vm_end);
+			curr_vma = find_vma(curr_mm, curr_vma->vm_end);
 		break;
 	}
 	if (!curr_vma) {
 		/* case 3) above, or case 2) 4.1) with vma->next == NULL */
-		mmap_read_unlock(curr_task->mm);
+		mmap_read_unlock(curr_mm);
+		mmput(curr_mm);
 		goto next_task;
 	}
 	info->task = curr_task;
 	info->vma = curr_vma;
+	info->mm = curr_mm;
 	return curr_vma;
 
 next_task:
@@ -579,6 +590,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 
 	put_task_struct(curr_task);
 	info->task = NULL;
+	info->mm = NULL;
 	info->tid++;
 	goto again;
 
@@ -587,6 +599,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 		put_task_struct(curr_task);
 	info->task = NULL;
 	info->vma = NULL;
+	info->mm = NULL;
 	return NULL;
 }
 
@@ -658,7 +671,9 @@ static void task_vma_seq_stop(struct seq_file *seq, void *v)
 		 */
 		info->prev_vm_start = ~0UL;
 		info->prev_vm_end = info->vma->vm_end;
-		mmap_read_unlock(info->task->mm);
+		mmap_read_unlock(info->mm);
+		mmput(info->mm);
+		info->mm = NULL;
 		put_task_struct(info->task);
 		info->task = NULL;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v2 2/2] selftests/bpf: add a test for iter/task_vma for short-lived processes
  2022-12-16 22:18 [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma Kui-Feng Lee
  2022-12-16 22:18 ` [PATCH bpf-next v2 1/2] bpf: keep a reference to the mm, in case the task is dead Kui-Feng Lee
@ 2022-12-16 22:18 ` Kui-Feng Lee
  2022-12-28 22:20 ` [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma patchwork-bot+netdevbpf
  2023-01-04 21:16 ` Kui-Feng Lee
  3 siblings, 0 replies; 6+ messages in thread
From: Kui-Feng Lee @ 2022-12-16 22:18 UTC (permalink / raw)
  To: bpf, ast, andrii, kernel-team, song, yhs; +Cc: Kui-Feng Lee, Yonghong Song

When a task iterator traverses vma(s), it is possible task->mm might
become invalid in the middle of traversal and this may cause kernel
misbehave (e.g., crash)

This test case creates iterators repeatedly and forks short-lived
processes in the background to detect this bug.  The test will last
for 3 seconds to get the chance to trigger the issue.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 73 +++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 6f8ed61fc4b4..3af6450763e9 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -1465,6 +1465,77 @@ static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
 	bpf_iter_task_vma__destroy(skel);
 }
 
+static void test_task_vma_dead_task(void)
+{
+	struct bpf_iter_task_vma *skel;
+	int wstatus, child_pid = -1;
+	time_t start_tm, cur_tm;
+	int err, iter_fd = -1;
+	int wait_sec = 3;
+
+	skel = bpf_iter_task_vma__open();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open"))
+		return;
+
+	skel->bss->pid = getpid();
+
+	err = bpf_iter_task_vma__load(skel);
+	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
+		goto out;
+
+	skel->links.proc_maps = bpf_program__attach_iter(
+		skel->progs.proc_maps, NULL);
+
+	if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
+		skel->links.proc_maps = NULL;
+		goto out;
+	}
+
+	start_tm = time(NULL);
+	cur_tm = start_tm;
+
+	child_pid = fork();
+	if (child_pid == 0) {
+		/* Fork short-lived processes in the background. */
+		while (cur_tm < start_tm + wait_sec) {
+			system("echo > /dev/null");
+			cur_tm = time(NULL);
+		}
+		exit(0);
+	}
+
+	if (!ASSERT_GE(child_pid, 0, "fork_child"))
+		goto out;
+
+	while (cur_tm < start_tm + wait_sec) {
+		iter_fd = bpf_iter_create(bpf_link__fd(skel->links.proc_maps));
+		if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+			goto out;
+
+		/* Drain all data from iter_fd. */
+		while (cur_tm < start_tm + wait_sec) {
+			err = read_fd_into_buffer(iter_fd, task_vma_output, CMP_BUFFER_SIZE);
+			if (!ASSERT_GE(err, 0, "read_iter_fd"))
+				goto out;
+
+			cur_tm = time(NULL);
+
+			if (err == 0)
+				break;
+		}
+
+		close(iter_fd);
+		iter_fd = -1;
+	}
+
+	check_bpf_link_info(skel->progs.proc_maps);
+
+out:
+	waitpid(child_pid, &wstatus, 0);
+	close(iter_fd);
+	bpf_iter_task_vma__destroy(skel);
+}
+
 void test_bpf_sockmap_map_iter_fd(void)
 {
 	struct bpf_iter_sockmap *skel;
@@ -1586,6 +1657,8 @@ void test_bpf_iter(void)
 		test_task_file();
 	if (test__start_subtest("task_vma"))
 		test_task_vma();
+	if (test__start_subtest("task_vma_dead_task"))
+		test_task_vma_dead_task();
 	if (test__start_subtest("task_btf"))
 		test_task_btf();
 	if (test__start_subtest("tcp4"))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma
  2022-12-16 22:18 [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma Kui-Feng Lee
  2022-12-16 22:18 ` [PATCH bpf-next v2 1/2] bpf: keep a reference to the mm, in case the task is dead Kui-Feng Lee
  2022-12-16 22:18 ` [PATCH bpf-next v2 2/2] selftests/bpf: add a test for iter/task_vma for short-lived processes Kui-Feng Lee
@ 2022-12-28 22:20 ` patchwork-bot+netdevbpf
  2023-01-04 21:16 ` Kui-Feng Lee
  3 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-12-28 22:20 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, andrii, kernel-team, song, yhs

Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri, 16 Dec 2022 14:18:53 -0800 you wrote:
> This issue is related to task iterators over vma. A system crash can
> occur when a task iterator travels through vma of tasks as the death
> of a task will clear the pointer to its mm, even though the
> task_struct is still held. As a result, an unexpected crash happens
> due to a null pointer. To address this problem, a reference to mm is
> kept on the iterator to make sure that the pointer is always
> valid. This patch set provides a solution for this crash by properly
> referencing mm on task iterators over vma.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2,1/2] bpf: keep a reference to the mm, in case the task is dead.
    https://git.kernel.org/bpf/bpf/c/7ff94f276f8e
  - [bpf-next,v2,2/2] selftests/bpf: add a test for iter/task_vma for short-lived processes
    https://git.kernel.org/bpf/bpf/c/b7793c8db7d9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma
  2022-12-16 22:18 [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma Kui-Feng Lee
                   ` (2 preceding siblings ...)
  2022-12-28 22:20 ` [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma patchwork-bot+netdevbpf
@ 2023-01-04 21:16 ` Kui-Feng Lee
  2023-01-04 21:21   ` Alexei Starovoitov
  3 siblings, 1 reply; 6+ messages in thread
From: Kui-Feng Lee @ 2023-01-04 21:16 UTC (permalink / raw)
  To: Kernel Team, Yonghong Song, ast, andrii, song, Kui-Feng Lee, bpf

Hi everyone,

This patchset seems to be forgot during the holiday season.
Hope this message gets some notice.

On Fri, 2022-12-16 at 14:18 -0800, Kui-Feng Lee wrote:
> This issue is related to task iterators over vma. A system crash can
> occur when a task iterator travels through vma of tasks as the death
> of a task will clear the pointer to its mm, even though the
> task_struct is still held. As a result, an unexpected crash happens
> due to a null pointer. To address this problem, a reference to mm is
> kept on the iterator to make sure that the pointer is always
> valid. This patch set provides a solution for this crash by properly
> referencing mm on task iterators over vma.
> 
> The major changes from v1 are:
> 
>  - Fix commit logs of the test case.
> 
>  - Use reverse Christmas tree coding style.
> 
>  - Remove unnecessary error handling for time().
> 
> v1:
> https://lore.kernel.org/bpf/20221216015912.991616-1-kuifeng@meta.com/
> 
> Kui-Feng Lee (2):
>   bpf: keep a reference to the mm, in case the task is dead.
>   selftests/bpf: add a test for iter/task_vma for short-lived
> processes
> 
>  kernel/bpf/task_iter.c                        | 39 +++++++---
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 73
> +++++++++++++++++++
>  2 files changed, 100 insertions(+), 12 deletions(-)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma
  2023-01-04 21:16 ` Kui-Feng Lee
@ 2023-01-04 21:21   ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2023-01-04 21:21 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: Kernel Team, Yonghong Song, ast, andrii, song, bpf

On Wed, Jan 4, 2023 at 1:17 PM Kui-Feng Lee <kuifeng@meta.com> wrote:
>
> Hi everyone,
>
> This patchset seems to be forgot during the holiday season.
> Hope this message gets some notice.

It was applied to bpf tree
commit 7ff94f276f8e ("bpf: keep a reference to the mm, in case the
task is dead.")

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-04 21:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 22:18 [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma Kui-Feng Lee
2022-12-16 22:18 ` [PATCH bpf-next v2 1/2] bpf: keep a reference to the mm, in case the task is dead Kui-Feng Lee
2022-12-16 22:18 ` [PATCH bpf-next v2 2/2] selftests/bpf: add a test for iter/task_vma for short-lived processes Kui-Feng Lee
2022-12-28 22:20 ` [PATCH bpf-next v2 0/2] bpf: fix the crash caused by task iterators over vma patchwork-bot+netdevbpf
2023-01-04 21:16 ` Kui-Feng Lee
2023-01-04 21:21   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).