From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 02757C49361
	for <linux-kernel@archiver.kernel.org>; Tue, 15 Jun 2021 12:10:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C794E6140C
	for <linux-kernel@archiver.kernel.org>; Tue, 15 Jun 2021 12:10:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230052AbhFOMMN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 15 Jun 2021 08:12:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35348 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229557AbhFOMMM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 15 Jun 2021 08:12:12 -0400
Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AA9DC06175F
        for <linux-kernel@vger.kernel.org>; Tue, 15 Jun 2021 05:10:08 -0700 (PDT)
Received: by mail-lj1-x22b.google.com with SMTP id k8so7765937lja.4
        for <linux-kernel@vger.kernel.org>; Tue, 15 Jun 2021 05:10:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=;
        b=hn6SsOwGYHR0CT0tDtiihzIzNVHWuJ5IrzWQY7kMPAPAkOcNK/166MPvX5oPF0j2qO
         9T/1zTMuqXEJ4b6ZgOC6Knr2m2Yqb2HfB8NrNqGrd4izzEpDRZwQu15NaWtzVMNqbTZ3
         7dJdgKBsPI3lfkwZM1wMH/372Y7ijszwqY2alGxa0ROsPoSE9JOK9BrBKfBi1ucrs+q6
         Y7asH2E5+HafTOB16GdKR1Br5TfolOTf9iW970559LAuX/4rScPQ7EzCAw+yJ8GcPiNj
         XGtWxud65paFus1Ok5qmMCntw4zfOD+Qwpb0VTJslYHXyUYBkuT6DesWQVfZvQ4GEa/3
         i25A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=;
        b=cXlQsWxwRBrN02jMG5Mw2pSB/ysd7Hynzj2sSINdsipl7FdjGbx/OSYARhFsm23H3Z
         pPJygaOarLkhbzFNPZN+HQl/pw/BiVIedvcUsRxX7IFxI6JAaUAoiq0Jw8z1MB2AeEeL
         7Xnken7JC1Y91mO7H4Ez/TkcxnJATfKjsPFs8op6/gaEFFklHR1z5r1n+IOFUAer0o6b
         M4voZ6xQe5fcpS54+dLo0anUKpbA5hVy9VD0kGC0rdiAKFqkIEKGzlrm+B9AP1WlENyA
         dscOtjAJKGaf+Y4QgEN4YbrrF7ClrU26ufQUF7qwRP822xZnuOv9G65d74fuHoR9CtyA
         0t6w==
X-Gm-Message-State: AOAM531EfkRwo8Aah87KU1IuVHVLERp4+VDsENjLYnnq0qeGm1bynGfa
        XZlOEjWjfVUDSWjogQtCQfsEPghggfUkkJE5yLC2wg==
X-Google-Smtp-Source: ABdhPJw8A6vOZsaQEDsWTxdTmFkA/8e8Rr9IEnGktDJsMz7Tqoh1ranC3q6kCm2luqXihS9dlvk8IM++pXxR3peUytE=
X-Received: by 2002:a2e:b5ae:: with SMTP id f14mr17742867ljn.94.1623759005720;
 Tue, 15 Jun 2021 05:10:05 -0700 (PDT)
MIME-Version: 1.0
References: <20210615012014.1100672-1-jannh@google.com> <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com>
In-Reply-To: <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com>
From:   Jann Horn <jannh@google.com>
Date:   Tue, 15 Jun 2021 14:09:38 +0200
Message-ID: <CAG48ez3Vbcvh4AisU7=ukeJeSjHGTKQVd0NOU6XOpRru7oP_ig@mail.gmail.com>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()
To:     John Hubbard <jhubbard@nvidia.com>,
        Matthew Wilcox <willy@infradead.org>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Linux-MM <linux-mm@kvack.org>,
        kernel list <linux-kernel@vger.kernel.org>,
        "Kirill A . Shutemov" <kirill@shutemov.name>,
        Jan Kara <jack@suse.cz>, stable <stable@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@nvidia.com> wrote:
> On 6/14/21 6:20 PM, Jann Horn wrote:
> > try_grab_compound_head() is used to grab a reference to a page from
> > get_user_pages_fast(), which is only protected against concurrent
> > freeing of page tables (via local_irq_save()), but not against
> > concurrent TLB flushes, freeing of data pages, or splitting of compound
> > pages.
[...]
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thanks!

[...]
> > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> >       if (WARN_ON_ONCE(page_ref_count(head) < 0))
> >               return NULL;
> >       if (unlikely(!page_cache_add_speculative(head, refs)))
> >               return NULL;
> > +
> > +     /*
> > +      * At this point we have a stable reference to the head page; but it
> > +      * could be that between the compound_head() lookup and the refcount
> > +      * increment, the compound page was split, in which case we'd end up
> > +      * holding a reference on a page that has nothing to do with the page
> > +      * we were given anymore.
> > +      * So now that the head page is stable, recheck that the pages still
> > +      * belong together.
> > +      */
> > +     if (unlikely(compound_head(page) != head)) {
>
> I was just wondering about what all could happen here. Such as: page gets split,
> reallocated into a different-sized compound page, one that still has page pointing
> to head. I think that's OK, because we don't look at or change other huge page
> fields.
>
> But I thought I'd mention the idea in case anyone else has any clever ideas about
> how this simple check might be insufficient here. It seems fine to me, but I
> routinely lack enough imagination about concurrent operations. :)

Hmmm... I think the scariest aspect here is probably the interaction
with concurrent allocation of a compound page on architectures with
store-store reordering (like ARM). *If* the page allocator handled
compound pages with lockless, non-atomic percpu freelists, I think it
might be possible that the zeroing of tail_page->compound_head in
put_page() could be reordered after the page has been freed,
reallocated and set to refcount 1 again?

That shouldn't be possible at the moment, but it is still a bit scary.


I think the lockless page cache code also has to deal with somewhat
similar ordering concerns when it uses page_cache_get_speculative(),
e.g. in mapping_get_entry() - first it looks up a page pointer with
xas_load(), and any access to the page later on would be a _dependent
load_, but if the page then gets freed, reallocated, and inserted into
the page cache again before the refcount increment and the re-check
using xas_reload(), then there would be no data dependency from
xas_reload() to the following use of the page...

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=bVtU=LJ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9B5E3C48BDF
	for <linux-mm@archiver.kernel.org>; Tue, 15 Jun 2021 12:10:09 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 36BB86140C
	for <linux-mm@archiver.kernel.org>; Tue, 15 Jun 2021 12:10:09 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36BB86140C
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 9F1C66B0036; Tue, 15 Jun 2021 08:10:08 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9A21D6B006E; Tue, 15 Jun 2021 08:10:08 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 842A66B0070; Tue, 15 Jun 2021 08:10:08 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180])
	by kanga.kvack.org (Postfix) with ESMTP id 533CF6B0036
	for <linux-mm@kvack.org>; Tue, 15 Jun 2021 08:10:08 -0400 (EDT)
Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id CEC5ABBF2
	for <linux-mm@kvack.org>; Tue, 15 Jun 2021 12:10:07 +0000 (UTC)
X-FDA: 78255840054.13.9DF86C7
Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180])
	by imf11.hostedemail.com (Postfix) with ESMTP id 7E207200109C
	for <linux-mm@kvack.org>; Tue, 15 Jun 2021 12:09:56 +0000 (UTC)
Received: by mail-lj1-f180.google.com with SMTP id s22so24684306ljg.5
        for <linux-mm@kvack.org>; Tue, 15 Jun 2021 05:10:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=;
        b=hn6SsOwGYHR0CT0tDtiihzIzNVHWuJ5IrzWQY7kMPAPAkOcNK/166MPvX5oPF0j2qO
         9T/1zTMuqXEJ4b6ZgOC6Knr2m2Yqb2HfB8NrNqGrd4izzEpDRZwQu15NaWtzVMNqbTZ3
         7dJdgKBsPI3lfkwZM1wMH/372Y7ijszwqY2alGxa0ROsPoSE9JOK9BrBKfBi1ucrs+q6
         Y7asH2E5+HafTOB16GdKR1Br5TfolOTf9iW970559LAuX/4rScPQ7EzCAw+yJ8GcPiNj
         XGtWxud65paFus1Ok5qmMCntw4zfOD+Qwpb0VTJslYHXyUYBkuT6DesWQVfZvQ4GEa/3
         i25A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=WNpdr2S1NDbOx7bwbHaY+rxGjF1pZI1K0hiTte/Oqqs=;
        b=VGS71HsYEi/0eI2e87MXphR2nL9f3GBwxtC8ZVh6HZjFSaZSwgqV9hLv8+H0MO2OOI
         5TE3oalTsskbMqmrn15QM2ggmhxhrJQ3fDz+9IoaNuxGLrjFSJ0cn8rIbKq0Qp8n+bfQ
         bil86OSUBQNKbG1CPhrXukR9fD0ysGdxqUE51ySNMevI3sYQbC0bZb6WvQCSCG/i2JNJ
         Hhhh5j9CZH3Gq6RLHxKNtfHPah+nf/EY+naUA/mwECigmhOMxxosGMeWYmL09eM8BeE7
         PRZGqLsKbZc2ZOZOv0FPm9ToJPYZ2FUbFHIwqCVMVnhuF/pD9vAmJuP12fSu/B97Kj5I
         N+tA==
X-Gm-Message-State: AOAM530hksod4NXBy8Ijdq9T75DGat24ZVZfSNkD7pdx1CqawSArO7V7
	uTfsyYczKnR1/BJjrIDSBT43efzOzdRbYtyp2twdiQ==
X-Google-Smtp-Source: ABdhPJw8A6vOZsaQEDsWTxdTmFkA/8e8Rr9IEnGktDJsMz7Tqoh1ranC3q6kCm2luqXihS9dlvk8IM++pXxR3peUytE=
X-Received: by 2002:a2e:b5ae:: with SMTP id f14mr17742867ljn.94.1623759005720;
 Tue, 15 Jun 2021 05:10:05 -0700 (PDT)
MIME-Version: 1.0
References: <20210615012014.1100672-1-jannh@google.com> <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com>
In-Reply-To: <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com>
From: Jann Horn <jannh@google.com>
Date: Tue, 15 Jun 2021 14:09:38 +0200
Message-ID: <CAG48ez3Vbcvh4AisU7=ukeJeSjHGTKQVd0NOU6XOpRru7oP_ig@mail.gmail.com>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()
To: John Hubbard <jhubbard@nvidia.com>, Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>, 
	kernel list <linux-kernel@vger.kernel.org>, "Kirill A . Shutemov" <kirill@shutemov.name>, 
	Jan Kara <jack@suse.cz>, stable <stable@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20161025 header.b=hn6SsOwG;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=jannh@google.com
X-Rspamd-Server: rspam02
X-Stat-Signature: t5rwh4n7qfxyxi7rmpcfiz5fk7zqy6rp
X-Rspamd-Queue-Id: 7E207200109C
X-HE-Tag: 1623758996-482594
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@nvidia.com> wrote:
> On 6/14/21 6:20 PM, Jann Horn wrote:
> > try_grab_compound_head() is used to grab a reference to a page from
> > get_user_pages_fast(), which is only protected against concurrent
> > freeing of page tables (via local_irq_save()), but not against
> > concurrent TLB flushes, freeing of data pages, or splitting of compound
> > pages.
[...]
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thanks!

[...]
> > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> >       if (WARN_ON_ONCE(page_ref_count(head) < 0))
> >               return NULL;
> >       if (unlikely(!page_cache_add_speculative(head, refs)))
> >               return NULL;
> > +
> > +     /*
> > +      * At this point we have a stable reference to the head page; but it
> > +      * could be that between the compound_head() lookup and the refcount
> > +      * increment, the compound page was split, in which case we'd end up
> > +      * holding a reference on a page that has nothing to do with the page
> > +      * we were given anymore.
> > +      * So now that the head page is stable, recheck that the pages still
> > +      * belong together.
> > +      */
> > +     if (unlikely(compound_head(page) != head)) {
>
> I was just wondering about what all could happen here. Such as: page gets split,
> reallocated into a different-sized compound page, one that still has page pointing
> to head. I think that's OK, because we don't look at or change other huge page
> fields.
>
> But I thought I'd mention the idea in case anyone else has any clever ideas about
> how this simple check might be insufficient here. It seems fine to me, but I
> routinely lack enough imagination about concurrent operations. :)

Hmmm... I think the scariest aspect here is probably the interaction
with concurrent allocation of a compound page on architectures with
store-store reordering (like ARM). *If* the page allocator handled
compound pages with lockless, non-atomic percpu freelists, I think it
might be possible that the zeroing of tail_page->compound_head in
put_page() could be reordered after the page has been freed,
reallocated and set to refcount 1 again?

That shouldn't be possible at the moment, but it is still a bit scary.


I think the lockless page cache code also has to deal with somewhat
similar ordering concerns when it uses page_cache_get_speculative(),
e.g. in mapping_get_entry() - first it looks up a page pointer with
xas_load(), and any access to the page later on would be a _dependent
load_, but if the page then gets freed, reallocated, and inserted into
the page cache again before the refcount increment and the re-check
using xas_reload(), then there would be no data dependency from
xas_reload() to the following use of the page...