From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA6ABC433F5 for ; Mon, 1 Nov 2021 20:00:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C34E161051 for ; Mon, 1 Nov 2021 20:00:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230450AbhKAUCs (ORCPT ); Mon, 1 Nov 2021 16:02:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbhKAUCk (ORCPT ); Mon, 1 Nov 2021 16:02:40 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13D1BC061714 for ; Mon, 1 Nov 2021 13:00:06 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id j75so19652362ybj.6 for ; Mon, 01 Nov 2021 13:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uIiikhiKANm16gJ3zgFho5/Ja/cA/584/loqdnT20mg=; b=mLGfJvKpxQcN5+1bgKSbhr3CHnyK9Dv9MIU6eTvJOMcjxJ6TtQmw1KCpU9NLyLAZVu oM8SJNjWglOkujwTODQqHVHHRuQZG8YysvfDKgOm+QZGhOxwvUk5l5cUqnLp6+oDBnVv c5fqH52azYDTueOFKLAoi3HKoVJ/52QBi5Zm8IeJzBzlVA7pcb6yKvmNn1Bs7Qy0ZeFL sfOq9v28z0b4N91opwk1CbmLK+heKbmf161m/HU62X+PKykdp+KU0uKwPAiv3I8xAtko jKAXX/YYMRnNzKgzSW7c83HnxxqAMeXjYff/YxfAdcwjFNJ0aGXYfV1Hw153/RjQSTzA MItw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uIiikhiKANm16gJ3zgFho5/Ja/cA/584/loqdnT20mg=; b=OIAexr7Hl9qqpLssZyxElCqXa8YI6/mVtDe6KeiwQBWzecSBkFO+1nhefJYdbZp1qc sLCHwAzP0Kg3rMPw0ZQ4MyMKiF6Y58+sXgRAiw3mcy/YQJ/Cp8fPIrWBK9kTBfprN/i1 ZIORzrTna1lZXBKTTGsiwVrAGibHxNYaB/GUoJXjCymIUNDqw3AV4uAgwC7ZwbIc0umc RdUKN7ILpUcTJhHTQd4trUFtlrBoIgrQfFcvRqm41wGesO5q/b5reMBUi1k7gkMXre93 K0kaSqJWsWxPyNDprJpYrFQpe5gU1gFqngUssfNislw4QmvVjhZDJcxClhe1/Z1yJo56 vgvQ== X-Gm-Message-State: AOAM53063dsC7UlLemeOSx+cd82aliHFsBesDqZQtbW6WSOGKecEVHi2 kE61zRhv4EounomNivG4/j3uz9QLzr+VoFFWpxMOrw== X-Google-Smtp-Source: ABdhPJy9YUocfasSw70FNO/svv5wulusp1+kDqDhT223oXWC6yRkNWKoM+3TyDkuuF9r8ipCnxwJivDO8jiIMO0U7kg= X-Received: by 2002:a05:6902:120e:: with SMTP id s14mr39312732ybu.161.1635796804356; Mon, 01 Nov 2021 13:00:04 -0700 (PDT) MIME-Version: 1.0 References: <20211022014658.263508-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 1 Nov 2021 12:59:53 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: prevent a race between process_mrelease and exit_mmap To: Michal Hocko Cc: Andrew Morton , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Minchan Kim , Christian Brauner , Christoph Hellwig , Oleg Nesterov , David Hildenbrand , Jann Horn , Shakeel Butt , Andy Lutomirski , Christian Brauner , Florian Weimer , Jan Engelhardt , Linux API , linux-mm , LKML , kernel-team , "Kirill A. Shutemov" , Andrea Arcangeli Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 1, 2021 at 8:44 AM Suren Baghdasaryan wrote: > > On Mon, Nov 1, 2021 at 1:37 AM Michal Hocko wrote: > > > > On Fri 29-10-21 09:07:39, Suren Baghdasaryan wrote: > > > On Fri, Oct 29, 2021 at 6:03 AM Michal Hocko wrote: > > [...] > > > > Well, I still do not see why that is a problem. This syscall is meant to > > > > release the address space not to do it fast. > > > > > > It's the same problem for a userspace memory reaper as for the > > > oom-reaper. The goal is to release the memory of the victim and to > > > quickly move on to the next one if needed. > > > > The purpose of the oom_reaper is to _guarantee_ a forward progress. It > > doesn't have to be quick or optimized for speed. > > Fair enough. Then the same guarantees should apply to userspace memory > reapers. I think you clarified that well in your replies in > https://lore.kernel.org/all/20170725154514.GN26723@dhcp22.suse.cz: > > Because there is no _guarantee_ that the final __mmput will release > the memory in finite time. And we cannot guarantee that longterm. > ... > __mmput calls into exit_aio and that can wait for completion and there > is no way to guarantee this will finish in finite time. > > > > > [...] > > > > > > Btw. the above code will not really tell you much on a larger machine > > > > unless you manage to trigger mmap_sem contection. Otherwise you are > > > > measuring the mmap_sem writelock fast path and that should be really > > > > within a noise comparing to the whole address space destruction time. If > > > > that is not the case then we have a real problem with the locking... > > > > > > My understanding of that discussion is that the concern was that even > > > taking uncontended mmap_sem writelock would regress the exit path. > > > That was what I wanted to confirm. Am I misreading it? > > > > No, your reading match my recollection. I just think that code > > robustness in exchange of a rw semaphore write lock fast path is a > > reasonable price to pay even if that has some effect on micro > > benchmarks. > > I'm with you on this one, that's why I wanted to measure the price we > would pay. Below are the test results: > > Test: https://lore.kernel.org/all/20170725142626.GJ26723@dhcp22.suse.cz/ > Compiled: gcc -O2 -static test.c -o test > Test machine: 128 core / 256 thread 2x AMD EPYC 7B12 64-Core Processor > (family 17h) > > baseline (Linus master, f31531e55495ca3746fb895ffdf73586be8259fa) > p50 (median) 87412 > p95 168210 > p99 190058 > average 97843.8 > stdev 29.85% > > unconditional mmap_write_lock in exit_mmap (last column is the change > from the baseline) > p50 (median) 88312 +1.03% > p95 170797 +1.54% > p99 191813 +0.92% > average 97659.5 -0.19% > stdev 32.41% > > unconditional mmap_write_lock in exit_mmap + Matthew's patch (last > column is the change from the baseline) > p50 (median) 88807 +1.60% > p95 167783 -0.25% > p99 187853 -1.16% > average 97491.4 -0.36% > stdev 30.61% > > stdev is quite high in all cases, so the test is very noisy. Need to clarify that what I called here "stdev" is actually stdev / average in %. > The impact seems quite low IMHO. WDYT? > > > -- > > Michal Hocko > > SUSE Labs