From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E16BC433F5 for ; Fri, 8 Apr 2022 08:41:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB3316B0072; Fri, 8 Apr 2022 04:41:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B62D48D0001; Fri, 8 Apr 2022 04:41:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A03796B0075; Fri, 8 Apr 2022 04:41:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 91F1D6B0072 for ; Fri, 8 Apr 2022 04:41:32 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6F88660666 for ; Fri, 8 Apr 2022 08:41:32 +0000 (UTC) X-FDA: 79333068024.16.DBB4E7B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id B4B9540003 for ; Fri, 8 Apr 2022 08:41:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649407291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hfm3ta5VF34jEcNTC76QZOHm3mie3+2AwQoQ84bW2eg=; b=HHXMPnKCSvldf1yi1T2MuLN/5kkXpvjYxI2TMrOVw/AAaqdJ76LVRbGkR/LO46wbKhHihS OXAlvRddOniUZu7t6Yxt8N1a06ln1cTEtBEHTDRNrGTrbkEOWTWqNsqnZOxBZa4pCVp82X 048QavF6oxN953AADhYTDfRep849EvM= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-86ozhLbxNDOMgP5QocZwEg-1; Fri, 08 Apr 2022 04:41:30 -0400 X-MC-Unique: 86ozhLbxNDOMgP5QocZwEg-1 Received: by mail-qk1-f198.google.com with SMTP id v14-20020a05620a0f0e00b00699f4ea852cso3194983qkl.9 for ; Fri, 08 Apr 2022 01:41:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=hfm3ta5VF34jEcNTC76QZOHm3mie3+2AwQoQ84bW2eg=; b=i5e0HBrb3kD7VxHvGKmN+WRr4JvXHa/5NMd/tbBITN/B+Aow3NEgLPgMxJE4rDMcQN pC25ZZF6BYFni1YRJeNNiP57qhJKFByoyQ+mN+I28oHbMMBnq2vziOM0GMOYPSlKgQUr nrWXwKw6N8QDB2FvHWJ7lI2saB+2GpLkVrKIoDeiQswuYySdPJp0uiL3mt4mGSJZHaRc 30O4dQh1FLt5qVt4RXZ7AXSrY5ubxYQkv9qWxU/Ghd8DsstDJTvhvOI2D8PWE+ovozQ4 rMNJkWncqPfSuyRSXn2JxjwhGnZcgBGX7ruNKFLcR+GDiRtfubLzT59S4tMFnBvoNEGx JuGg== X-Gm-Message-State: AOAM532zg5dSROdxBmwIoX/HgB1cYychbZMgbM0ULMaKtnTXR6iMowcQ g2VjTnlze0ermitj98EEbinOsHqC+/uFb0eXtBXvxG9bfOylJ3Sxk5mtFcUlBroNZ1lz1NeQQ+8 UqXR7qny6CX8= X-Received: by 2002:ad4:5ba2:0:b0:441:402c:2534 with SMTP id 2-20020ad45ba2000000b00441402c2534mr14981911qvq.75.1649407289636; Fri, 08 Apr 2022 01:41:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw3NYk1umtpKK/Yl6GxoWxDcDTzzxyZCQ9JzpN5/chhlU6uzvzsQ5/3if7JRUo2FsPUePWm5w== X-Received: by 2002:ad4:5ba2:0:b0:441:402c:2534 with SMTP id 2-20020ad45ba2000000b00441402c2534mr14981891qvq.75.1649407289457; Fri, 08 Apr 2022 01:41:29 -0700 (PDT) Received: from [192.168.0.188] ([24.48.139.231]) by smtp.gmail.com with ESMTPSA id t19-20020ac85893000000b002e1afa26591sm19538890qta.52.2022.04.08.01.41.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Apr 2022 01:41:29 -0700 (PDT) Message-ID: Date: Fri, 8 Apr 2022 04:41:27 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Michal Hocko , Andrea Arcangeli , Andrew Morton , Davidlohr Bueso , Thomas Gleixner , Ingo Molnar , Joel Savitz , Darren Hart , stable@kernel.org References: <20220408032809.3696798-1-npache@redhat.com> <20220408081549.GM2731@worktop.programming.kicks-ass.net> From: Nico Pache In-Reply-To: <20220408081549.GM2731@worktop.programming.kicks-ass.net> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: rx6hpotzkrrkwercr4qgzhbmrj3cg4yt Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HHXMPnKC; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf07.hostedemail.com: domain of npache@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=npache@redhat.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B4B9540003 X-HE-Tag: 1649407291-997491 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/8/22 04:15, Peter Zijlstra wrote: > On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote: >> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can >> be targeted by the oom reaper. This mapping is used to store the futex >> robust list head; the kernel does not keep a copy of the robust list and >> instead references a userspace address to maintain the robustness during >> a process death. A race can occur between exit_mm and the oom reaper that >> allows the oom reaper to free the memory of the futex robust list before >> the exit path has handled the futex death: >> >> CPU1 CPU2 >> ------------------------------------------------------------------------ >> page_fault >> do_exit "signal" >> wake_oom_reaper >> oom_reaper >> oom_reap_task_mm (invalidates mm) >> exit_mm >> exit_mm_release >> futex_exit_release >> futex_cleanup >> exit_robust_list >> get_user (EFAULT- can't access memory) >> >> If the get_user EFAULT's, the kernel will be unable to recover the >> waiters on the robust_list, leaving userspace mutexes hung indefinitely. >> >> Use the robust_list address stored in the kernel to skip the VMA that holds >> it, allowing a successful futex_cleanup. >> >> Theoretically a failure can still occur if there are locks mapped as >> PRIVATE|ANON; however, the robust futexes are a best-effort approach. >> This patch only strengthens that best-effort. >> >> The following case can still fail: >> robust head (skipped) -> private lock (reaped) -> shared lock (skipped) > > This is still all sorts of confused.. it's a list head, the entries can > be in any random other VMA. You must not remove *any* user memory before > doing the robust thing. Not removing the VMA that contains the head is > pointless in the extreme. Not sure how its pointless if it fixes all the different reproducers we've written for it. As for the private lock case we stated here, we havent been able to reproduce it, but I could see how it can be a potential issue (which is why its noted). > > Did you not read the previous discussion? I did... Thats why I added the blurb about best-effort and the case that can theoretically still fail. The oom reaper doesn't reap shared memory but WITHOUT this change it was reaping the head (PRIVATE|ANON) which is needed to get to those shared mappings; so shared locks are safe with this added change. If a user implements private locks they can only be used within that process. If that process is OOMing then it doesnt really matter what happens to the futexes... all of those threads running under that process will terminate anyways. Perhaps I'm misunderstanding you. -- Nico