From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F249DC433FE for ; Tue, 19 Oct 2021 13:27:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7300461374 for ; Tue, 19 Oct 2021 13:27:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7300461374 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D4AC2900002; Tue, 19 Oct 2021 09:27:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFA366B0071; Tue, 19 Oct 2021 09:27:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE981900002; Tue, 19 Oct 2021 09:27:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id B01966B006C for ; Tue, 19 Oct 2021 09:27:19 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 63B2D39F5F for ; Tue, 19 Oct 2021 13:27:19 +0000 (UTC) X-FDA: 78713263398.24.7A076B4 Received: from relay.sw.ru (relay.sw.ru [185.231.240.75]) by imf23.hostedemail.com (Postfix) with ESMTP id 8443A90000AD for ; Tue, 19 Oct 2021 13:27:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=/PQsb2dM0oFSPlxvAAc6dyIPl0/QcXaCgaYTwTTHcCo=; b=cZpZJ2mRO3f3GNSKw XLAO8S6kyieJLawIdS/gJRQz8yix6idY6Hc/p++bKMFw7d+AijUA1Nc5XPrZqKDIbrtVGT2CGVZZH xQEGfdidIGKULnHpY+TzuVwSuLIcwPK4lXaqzVZ4uk5eZL+Dj3zFeCWCM9kABRKqdSOqpGLexHD98 =; Received: from [172.29.1.17] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mcp95-006UEO-Q1; Tue, 19 Oct 2021 16:27:11 +0300 Subject: Re: [PATCH memcg 0/1] false global OOM triggered by memcg-limited task To: Michal Hocko Cc: Johannes Weiner , Vladimir Davydov , Andrew Morton , Roman Gushchin , Uladzislau Rezki , Vlastimil Babka , Shakeel Butt , Mel Gorman , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org References: <9d10df01-0127-fb40-81c3-cc53c9733c3e@virtuozzo.com> <6b751abe-aa52-d1d8-2631-ec471975cc3a@virtuozzo.com> <339ae4b5-6efd-8fc2-33f1-2eb3aee71cb2@virtuozzo.com> <687bf489-f7a7-5604-25c5-0c1a09e0905b@virtuozzo.com> From: Vasily Averin Message-ID: <6c422150-593f-f601-8f91-914c6c5e82f4@virtuozzo.com> Date: Tue, 19 Oct 2021 16:26:50 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8443A90000AD X-Stat-Signature: q9wnnzegnzqy8jbfqrd765gcgtz9g9qf Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=virtuozzo.com header.s=relay header.b=cZpZJ2mR; dmarc=pass (policy=quarantine) header.from=virtuozzo.com; spf=pass (imf23.hostedemail.com: domain of vvs@virtuozzo.com designates 185.231.240.75 as permitted sender) smtp.mailfrom=vvs@virtuozzo.com X-HE-Tag: 1634650034-695785 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.10.2021 15:04, Michal Hocko wrote: > On Tue 19-10-21 13:54:42, Michal Hocko wrote: >> On Tue 19-10-21 13:30:06, Vasily Averin wrote: >>> On 19.10.2021 11:49, Michal Hocko wrote: >>>> On Tue 19-10-21 09:30:18, Vasily Averin wrote: >>>> [...] >>>>> With my patch ("memcg: prohibit unconditional exceeding the limit of dying tasks") try_charge_memcg() can fail: >>>>> a) due to fatal signal >>>>> b) when mem_cgroup_oom -> mem_cgroup_out_of_memory -> out_of_memory() returns false (when select_bad_process() found nothing) >>>>> >>>>> To handle a) we can follow to your suggestion and skip excution of out_of_memory() in pagefault_out_of memory() >>>>> To handle b) we can go to retry: if mem_cgroup_oom() return OOM_FAILED. >>> >>>> How is b) possible without current being killed? Do we allow remote >>>> charging? >>> >>> out_of_memory for memcg_oom >>> select_bad_process >>> mem_cgroup_scan_tasks >>> oom_evaluate_task >>> oom_badness >>> >>> /* >>> * Do not even consider tasks which are explicitly marked oom >>> * unkillable or have been already oom reaped or the are in >>> * the middle of vfork >>> */ >>> adj = (long)p->signal->oom_score_adj; >>> if (adj == OOM_SCORE_ADJ_MIN || >>> test_bit(MMF_OOM_SKIP, &p->mm->flags) || >>> in_vfork(p)) { >>> task_unlock(p); >>> return LONG_MIN; >>> } >>> >>> This time we handle userspace page fault, so we cannot be kenrel thread, >>> and cannot be in_vfork(). >>> However task can be marked as oom unkillable, >>> i.e. have p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN >> >> You are right. I am not sure there is a way out of this though. The task >> can only retry for ever in this case. There is nothing actionable here. >> We cannot kill the task and there is no other way to release the memory. > > Btw. don't we force the charge in that case? We should force charge for allocation from inside page fault handler, to prevent endless cycle in retried page faults. However we should not do it for allocations from task context, to prevent memcg-limited vmalloc-eaters from to consume all host memory. Also I would like to return to the following hunk. @@ -1575,7 +1575,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ - ret = should_force_charge() || out_of_memory(&oc); + ret = task_is_dying() || out_of_memory(&oc); unlock: mutex_unlock(&oom_lock); Now I think it's better to keep task_is_dying() check here. if task is dying, it is not necessary to push other task to free the memory. We broke vmalloc cycle already, so it looks like nothing should prevent us from returning to userspace, handle fatal signal, exit and free the memory. Thank you, Vasily Averin