From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48602C433E1 for ; Fri, 21 Aug 2020 18:00:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D4DB20702 for ; Fri, 21 Aug 2020 18:00:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="I/ansN58" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D4DB20702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 964008D0065; Fri, 21 Aug 2020 14:00:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9161D8D0002; Fri, 21 Aug 2020 14:00:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 804358D0065; Fri, 21 Aug 2020 14:00:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 6B7378D0002 for ; Fri, 21 Aug 2020 14:00:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2811F2490 for ; Fri, 21 Aug 2020 18:00:06 +0000 (UTC) X-FDA: 77175339612.30.vase99_27098212703b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id E8404180B3C85 for ; Fri, 21 Aug 2020 18:00:05 +0000 (UTC) X-HE-Tag: vase99_27098212703b X-Filterd-Recvd-Size: 5654 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Aug 2020 18:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598032803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hfd87vPX7imdizr5awfMVB6fuuvJvnhJKohnbOuYDr0=; b=I/ansN58pe3Vt1diNIWBtbOZmFy3SrMyGNagcONimsBQiz0BvXic0eWJO6bMI45qeak8gM XkoBZZ6U0ZcXpJrV7q+43WD/vaiRJpBrP/RF/LEDuYwVwc+PNnj4jX0PfQpSA/yqrsHRF6 7pgdFWzlumwgO9jT0bTILd7dYmJjVQs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-272-tl_K1AAsMAWrM8DCAs5e7Q-1; Fri, 21 Aug 2020 13:59:58 -0400 X-MC-Unique: tl_K1AAsMAWrM8DCAs5e7Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 77D781006706; Fri, 21 Aug 2020 17:59:54 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.40.192.15]) by smtp.corp.redhat.com (Postfix) with SMTP id D451F7E318; Fri, 21 Aug 2020 17:59:44 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Fri, 21 Aug 2020 19:59:54 +0200 (CEST) Date: Fri, 21 Aug 2020 19:59:43 +0200 From: Oleg Nesterov To: Suren Baghdasaryan Cc: "Eric W. Biederman" , Michal Hocko , Tetsuo Handa , Christian Brauner , Tim Murray , mingo@kernel.org, Peter Zijlstra , Thomas Gleixner , esyr@redhat.com, christian@kellner.me, areber@redhat.com, Shakeel Butt , cyphar@cyphar.com, adobriyan@gmail.com, Andrew Morton , gladkov.alexey@gmail.com, Michel Lespinasse , daniel.m.jordan@oracle.com, avagin@gmail.com, bernd.edlinger@hotmail.de, John Johansen , laoar.shao@gmail.com, Minchan Kim , kernel-team , LKML , linux-fsdevel@vger.kernel.org, linux-mm Subject: Re: [PATCH 1/1] mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary Message-ID: <20200821175943.GD19445@redhat.com> References: <20200820140054.fdkbotd4tgfrqpe6@wittgenstein> <637ab0e7-e686-0c94-753b-b97d24bb8232@i-love.sakura.ne.jp> <87k0xtv0d4.fsf@x220.int.ebiederm.org> <20200820162645.GP5033@dhcp22.suse.cz> <87r1s0txxe.fsf@x220.int.ebiederm.org> <20200821111558.GG4546@redhat.com> <20200821163300.GB19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200821163300.GB19445@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Rspamd-Queue-Id: E8404180B3C85 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08/21, Oleg Nesterov wrote: > > On 08/21, Suren Baghdasaryan wrote: > > > > On Fri, Aug 21, 2020 at 4:16 AM Oleg Nesterov wrote: > > > > > > bool probably_has_other_mm_users(tsk) > > > { > > > return atomic_read_acquire(&tsk->mm->mm_users) > > > > atomic_read(&tsk->signal->live); > > > } > > > > > > The barrier implied by _acquire ensures that if we race with the exiting > > > task and see the result of exit_mm()->mmput(mm), then we must also see > > > the result of atomic_dec_and_test(signal->live). > > > > > > Either way, if we want to fix the race with clone(CLONE_VM) we need other > > > changes. > > > > The way I understand this condition in __set_oom_adj() sync logic is > > that we would be ok with false positives (when we loop unnecessarily) > > but we can't tolerate false negatives (when oom_score_adj gets out of > > sync). > > Yes, > > > With the clone(CLONE_VM) race not addressed we are allowing > > false negatives and IMHO that's not acceptable because it creates a > > possibility for userspace to get an inconsistent picture. When > > developing the patch I did think about using (p->mm->mm_users > > > p->signal->nr_threads) condition and had to reject it due to that > > reason. > > Not sure I understand... I mean, the test_bit(MMF_PROC_SHARED) you propose > is equally racy and we need copy_oom_score() at the end of copy_process() > either way? On a second thought I agree that probably_has_other_mm_users() above can't work ;) Compared to the test_bit(MMF_PROC_SHARED) check it is not _equally_ racy, it adds _another_ race with clone(CLONE_VM). Suppose a single-threaded process P does clone(CLONE_VM); // creates the child C // mm_users == 2; P->signal->live == 1; clone(CLONE_THREAD | CLONE_VM); // mm_users == 3; P->signal->live == 2; the problem is that in theory clone(CLONE_THREAD | CLONE_VM) can increment _both_ counters between atomic_read_acquire(mm_users) and atomic_read(live) in probably_has_other_mm_users() so it can observe mm_users == live == 2. Oleg.