From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422856AbcFMLXx (ORCPT ); Mon, 13 Jun 2016 07:23:53 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:35240 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161164AbcFMLXw (ORCPT ); Mon, 13 Jun 2016 07:23:52 -0400 Date: Mon, 13 Jun 2016 13:23:49 +0200 From: Michal Hocko To: linux-mm@kvack.org Cc: Tetsuo Handa , David Rientjes , Oleg Nesterov , Vladimir Davydov , Andrew Morton , LKML Subject: Re: [PATCH 0/10 -v4] Handle oom bypass more gracefully Message-ID: <20160613112348.GC6518@dhcp22.suse.cz> References: <1465473137-22531-1-git-send-email-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1465473137-22531-1-git-send-email-mhocko@kernel.org> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 09-06-16 13:52:07, Michal Hocko wrote: > I would like to explore ways how to remove kthreads (use_mm) special > case. It shouldn't be that hard, we just have to teach the page fault > handler to recognize oom victim mm and enforce EFAULT for kthreads > which have borrowed that mm. So I was trying to come up with solution for this which would require to hook into the pagefault an enforce EFAULT when the mm is being reaped by the oom_repaer. Not hard but then I have checked the current users and none of them is really needing to read from the userspace (aka copy_from_user/get_user). So we actually do not need to do anything special. Copying _to_ the userspace should be OK because there is no risk of the corruption. So I believe we should be able to simply do the following. Or is anybody seeing a reason this would be unsafe? --- >>From 136eabbee783e3e21ea07b289d38e4f947c84850 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 10 Jun 2016 16:27:49 +0200 Subject: [PATCH] oom, oom_reaper: allow to reap mm shared by the kthreads oom reaper was skipped for an mm which is shared with the kernel thread (aka use_mm()). The primary concern was that such a kthread might want to read from the userspace memory and see zero page as a result of the oom reaper action. This seems to be overly conservative because none of the current use_mm() users need to do copy_from_user or get_user. aio code used to rely on copy_from_user but this is long gone along with use_mm() usage in fs/aio.c. We currently have only 3 users in the kernel: - ffs_user_copy_worker, ep_user_copy_worker only do copy_to_iter() - vhost_worker only copies over to the userspace as well AFAICS In fact relying on copy_from_user in the kernel thread context is quite dubious because it expects an active cooperation from the userspace to have a consistent data (e.g. userspace can do MADV_DONTNEED as well). Add a note to use_mm about the copy_from_user risk and allow the oom killer to invoke the oom_reaper for mms shared with kthreads. This will practically cause all the sane use cases to be reapable. Signed-off-by: Michal Hocko --- mm/mmu_context.c | 5 +++++ mm/oom_kill.c | 14 +++++++------- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/mm/mmu_context.c b/mm/mmu_context.c index f802c2d216a7..27449747f8de 100644 --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -16,6 +16,11 @@ * mm context. * (Note: this routine is intended to be called only * from a kernel thread context) + * + * Do not use copy_from_user from this context because the + * address space might got reclaimed behind the back by + * the oom_reaper so an unexpected zero page might be + * encountered. */ void use_mm(struct mm_struct *mm) { diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 6303bc7caeda..b6a7027643b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -921,13 +921,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, continue; if (same_thread_group(p, victim)) continue; - if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p)) { - /* - * We cannot use oom_reaper for the mm shared by this - * process because it wouldn't get killed and so the - * memory might be still used. Hide the mm from the oom - * killer to guarantee OOM forward progress. - */ + if (is_global_init(p)) { can_oom_reap = false; set_bit(MMF_OOM_REAPED, &mm->flags); pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n", @@ -935,6 +929,12 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, task_pid_nr(p), p->comm); continue; } + /* + * No use_mm() user needs to read from the userspace so we are + * ok to reap it. + */ + if (unlikely(p->flags & PF_KTHREAD)) + continue; do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true); } rcu_read_unlock(); -- 2.8.1 -- Michal Hocko SUSE Labs