From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02757C49361 for ; Tue, 15 Jun 2021 12:10:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C794E6140C for ; Tue, 15 Jun 2021 12:10:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbhFOMMN (ORCPT ); Tue, 15 Jun 2021 08:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229557AbhFOMMM (ORCPT ); Tue, 15 Jun 2021 08:12:12 -0400 Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AA9DC06175F for ; Tue, 15 Jun 2021 05:10:08 -0700 (PDT) Received: by mail-lj1-x22b.google.com with SMTP id k8so7765937lja.4 for ; Tue, 15 Jun 2021 05:10:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=; b=hn6SsOwGYHR0CT0tDtiihzIzNVHWuJ5IrzWQY7kMPAPAkOcNK/166MPvX5oPF0j2qO 9T/1zTMuqXEJ4b6ZgOC6Knr2m2Yqb2HfB8NrNqGrd4izzEpDRZwQu15NaWtzVMNqbTZ3 7dJdgKBsPI3lfkwZM1wMH/372Y7ijszwqY2alGxa0ROsPoSE9JOK9BrBKfBi1ucrs+q6 Y7asH2E5+HafTOB16GdKR1Br5TfolOTf9iW970559LAuX/4rScPQ7EzCAw+yJ8GcPiNj XGtWxud65paFus1Ok5qmMCntw4zfOD+Qwpb0VTJslYHXyUYBkuT6DesWQVfZvQ4GEa/3 i25A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=; b=cXlQsWxwRBrN02jMG5Mw2pSB/ysd7Hynzj2sSINdsipl7FdjGbx/OSYARhFsm23H3Z pPJygaOarLkhbzFNPZN+HQl/pw/BiVIedvcUsRxX7IFxI6JAaUAoiq0Jw8z1MB2AeEeL 7Xnken7JC1Y91mO7H4Ez/TkcxnJATfKjsPFs8op6/gaEFFklHR1z5r1n+IOFUAer0o6b M4voZ6xQe5fcpS54+dLo0anUKpbA5hVy9VD0kGC0rdiAKFqkIEKGzlrm+B9AP1WlENyA dscOtjAJKGaf+Y4QgEN4YbrrF7ClrU26ufQUF7qwRP822xZnuOv9G65d74fuHoR9CtyA 0t6w== X-Gm-Message-State: AOAM531EfkRwo8Aah87KU1IuVHVLERp4+VDsENjLYnnq0qeGm1bynGfa XZlOEjWjfVUDSWjogQtCQfsEPghggfUkkJE5yLC2wg== X-Google-Smtp-Source: ABdhPJw8A6vOZsaQEDsWTxdTmFkA/8e8Rr9IEnGktDJsMz7Tqoh1ranC3q6kCm2luqXihS9dlvk8IM++pXxR3peUytE= X-Received: by 2002:a2e:b5ae:: with SMTP id f14mr17742867ljn.94.1623759005720; Tue, 15 Jun 2021 05:10:05 -0700 (PDT) MIME-Version: 1.0 References: <20210615012014.1100672-1-jannh@google.com> <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com> In-Reply-To: <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com> From: Jann Horn Date: Tue, 15 Jun 2021 14:09:38 +0200 Message-ID: Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page() To: John Hubbard , Matthew Wilcox Cc: Andrew Morton , Linux-MM , kernel list , "Kirill A . Shutemov" , Jan Kara , stable Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 15, 2021 at 8:37 AM John Hubbard wrote: > On 6/14/21 6:20 PM, Jann Horn wrote: > > try_grab_compound_head() is used to grab a reference to a page from > > get_user_pages_fast(), which is only protected against concurrent > > freeing of page tables (via local_irq_save()), but not against > > concurrent TLB flushes, freeing of data pages, or splitting of compound > > pages. [...] > Reviewed-by: John Hubbard Thanks! [...] > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > > return NULL; > > if (unlikely(!page_cache_add_speculative(head, refs))) > > return NULL; > > + > > + /* > > + * At this point we have a stable reference to the head page; but it > > + * could be that between the compound_head() lookup and the refcount > > + * increment, the compound page was split, in which case we'd end up > > + * holding a reference on a page that has nothing to do with the page > > + * we were given anymore. > > + * So now that the head page is stable, recheck that the pages still > > + * belong together. > > + */ > > + if (unlikely(compound_head(page) != head)) { > > I was just wondering about what all could happen here. Such as: page gets split, > reallocated into a different-sized compound page, one that still has page pointing > to head. I think that's OK, because we don't look at or change other huge page > fields. > > But I thought I'd mention the idea in case anyone else has any clever ideas about > how this simple check might be insufficient here. It seems fine to me, but I > routinely lack enough imagination about concurrent operations. :) Hmmm... I think the scariest aspect here is probably the interaction with concurrent allocation of a compound page on architectures with store-store reordering (like ARM). *If* the page allocator handled compound pages with lockless, non-atomic percpu freelists, I think it might be possible that the zeroing of tail_page->compound_head in put_page() could be reordered after the page has been freed, reallocated and set to refcount 1 again? That shouldn't be possible at the moment, but it is still a bit scary. I think the lockless page cache code also has to deal with somewhat similar ordering concerns when it uses page_cache_get_speculative(), e.g. in mapping_get_entry() - first it looks up a page pointer with xas_load(), and any access to the page later on would be a _dependent load_, but if the page then gets freed, reallocated, and inserted into the page cache again before the refcount increment and the re-check using xas_reload(), then there would be no data dependency from xas_reload() to the following use of the page... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B5E3C48BDF for ; Tue, 15 Jun 2021 12:10:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 36BB86140C for ; Tue, 15 Jun 2021 12:10:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36BB86140C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F1C66B0036; Tue, 15 Jun 2021 08:10:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A21D6B006E; Tue, 15 Jun 2021 08:10:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 842A66B0070; Tue, 15 Jun 2021 08:10:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 533CF6B0036 for ; Tue, 15 Jun 2021 08:10:08 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CEC5ABBF2 for ; Tue, 15 Jun 2021 12:10:07 +0000 (UTC) X-FDA: 78255840054.13.9DF86C7 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by imf11.hostedemail.com (Postfix) with ESMTP id 7E207200109C for ; Tue, 15 Jun 2021 12:09:56 +0000 (UTC) Received: by mail-lj1-f180.google.com with SMTP id s22so24684306ljg.5 for ; Tue, 15 Jun 2021 05:10:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=; b=hn6SsOwGYHR0CT0tDtiihzIzNVHWuJ5IrzWQY7kMPAPAkOcNK/166MPvX5oPF0j2qO 9T/1zTMuqXEJ4b6ZgOC6Knr2m2Yqb2HfB8NrNqGrd4izzEpDRZwQu15NaWtzVMNqbTZ3 7dJdgKBsPI3lfkwZM1wMH/372Y7ijszwqY2alGxa0ROsPoSE9JOK9BrBKfBi1ucrs+q6 Y7asH2E5+HafTOB16GdKR1Br5TfolOTf9iW970559LAuX/4rScPQ7EzCAw+yJ8GcPiNj XGtWxud65paFus1Ok5qmMCntw4zfOD+Qwpb0VTJslYHXyUYBkuT6DesWQVfZvQ4GEa/3 i25A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=; b=VGS71HsYEi/0eI2e87MXphR2nL9f3GBwxtC8ZVh6HZjFSaZSwgqV9hLv8+H0MO2OOI 5TE3oalTsskbMqmrn15QM2ggmhxhrJQ3fDz+9IoaNuxGLrjFSJ0cn8rIbKq0Qp8n+bfQ bil86OSUBQNKbG1CPhrXukR9fD0ysGdxqUE51ySNMevI3sYQbC0bZb6WvQCSCG/i2JNJ Hhhh5j9CZH3Gq6RLHxKNtfHPah+nf/EY+naUA/mwECigmhOMxxosGMeWYmL09eM8BeE7 PRZGqLsKbZc2ZOZOv0FPm9ToJPYZ2FUbFHIwqCVMVnhuF/pD9vAmJuP12fSu/B97Kj5I N+tA== X-Gm-Message-State: AOAM530hksod4NXBy8Ijdq9T75DGat24ZVZfSNkD7pdx1CqawSArO7V7 uTfsyYczKnR1/BJjrIDSBT43efzOzdRbYtyp2twdiQ== X-Google-Smtp-Source: ABdhPJw8A6vOZsaQEDsWTxdTmFkA/8e8Rr9IEnGktDJsMz7Tqoh1ranC3q6kCm2luqXihS9dlvk8IM++pXxR3peUytE= X-Received: by 2002:a2e:b5ae:: with SMTP id f14mr17742867ljn.94.1623759005720; Tue, 15 Jun 2021 05:10:05 -0700 (PDT) MIME-Version: 1.0 References: <20210615012014.1100672-1-jannh@google.com> <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com> In-Reply-To: <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com> From: Jann Horn Date: Tue, 15 Jun 2021 14:09:38 +0200 Message-ID: Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page() To: John Hubbard , Matthew Wilcox Cc: Andrew Morton , Linux-MM , kernel list , "Kirill A . Shutemov" , Jan Kara , stable Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=hn6SsOwG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=jannh@google.com X-Rspamd-Server: rspam02 X-Stat-Signature: t5rwh4n7qfxyxi7rmpcfiz5fk7zqy6rp X-Rspamd-Queue-Id: 7E207200109C X-HE-Tag: 1623758996-482594 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 15, 2021 at 8:37 AM John Hubbard wrote: > On 6/14/21 6:20 PM, Jann Horn wrote: > > try_grab_compound_head() is used to grab a reference to a page from > > get_user_pages_fast(), which is only protected against concurrent > > freeing of page tables (via local_irq_save()), but not against > > concurrent TLB flushes, freeing of data pages, or splitting of compound > > pages. [...] > Reviewed-by: John Hubbard Thanks! [...] > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > > return NULL; > > if (unlikely(!page_cache_add_speculative(head, refs))) > > return NULL; > > + > > + /* > > + * At this point we have a stable reference to the head page; but it > > + * could be that between the compound_head() lookup and the refcount > > + * increment, the compound page was split, in which case we'd end up > > + * holding a reference on a page that has nothing to do with the page > > + * we were given anymore. > > + * So now that the head page is stable, recheck that the pages still > > + * belong together. > > + */ > > + if (unlikely(compound_head(page) != head)) { > > I was just wondering about what all could happen here. Such as: page gets split, > reallocated into a different-sized compound page, one that still has page pointing > to head. I think that's OK, because we don't look at or change other huge page > fields. > > But I thought I'd mention the idea in case anyone else has any clever ideas about > how this simple check might be insufficient here. It seems fine to me, but I > routinely lack enough imagination about concurrent operations. :) Hmmm... I think the scariest aspect here is probably the interaction with concurrent allocation of a compound page on architectures with store-store reordering (like ARM). *If* the page allocator handled compound pages with lockless, non-atomic percpu freelists, I think it might be possible that the zeroing of tail_page->compound_head in put_page() could be reordered after the page has been freed, reallocated and set to refcount 1 again? That shouldn't be possible at the moment, but it is still a bit scary. I think the lockless page cache code also has to deal with somewhat similar ordering concerns when it uses page_cache_get_speculative(), e.g. in mapping_get_entry() - first it looks up a page pointer with xas_load(), and any access to the page later on would be a _dependent load_, but if the page then gets freed, reallocated, and inserted into the page cache again before the refcount increment and the re-check using xas_reload(), then there would be no data dependency from xas_reload() to the following use of the page...