From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=q3d4=PO=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0A9D6C433FE
	for <linux-mm@archiver.kernel.org>; Tue, 26 Oct 2021 21:24:53 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 6698B60E09
	for <linux-mm@archiver.kernel.org>; Tue, 26 Oct 2021 21:24:52 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6698B60E09
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id DE5D1940008; Tue, 26 Oct 2021 17:24:51 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D95C5940007; Tue, 26 Oct 2021 17:24:51 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C5D04940008; Tue, 26 Oct 2021 17:24:51 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173])
	by kanga.kvack.org (Postfix) with ESMTP id B6E9A940007
	for <linux-mm@kvack.org>; Tue, 26 Oct 2021 17:24:51 -0400 (EDT)
Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 768201828B311
	for <linux-mm@kvack.org>; Tue, 26 Oct 2021 21:24:51 +0000 (UTC)
X-FDA: 78739868382.07.2D8C749
Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176])
	by imf17.hostedemail.com (Postfix) with ESMTP id 18E1CF0003A6
	for <linux-mm@kvack.org>; Tue, 26 Oct 2021 21:24:50 +0000 (UTC)
Received: by mail-lj1-f176.google.com with SMTP id l2so1117035lji.6
        for <linux-mm@kvack.org>; Tue, 26 Oct 2021 14:24:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=soleen.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=eH7S5IumUVeN7L8l1WgDLmgF00kuecfKrHYb8RcvYCs=;
        b=AjHm0pXtMOyYcyYD3wvK5+j6X3gTYWxd2SRvwXksgKKjz9avBBM7spcuif+K+BVMJY
         d7mI5LVg0RqlXjn3XsrNUkpcG9WkpQFA3Ptcvsg45ufd9zxnAqWUyqDqlpcwKxP5Sefv
         SjtrYVFeeO/7duhIxQhyuB8jLaYa0aP3Dlt4cUDecgxkq81mn0J+sezkaNIBA0x5OvpC
         0ITSj5rk5Cx/U52Fbx9JhzyKC+KPngCZzQ15+AvS+iR0n2Eb96MbUn+rLAPhNzD3oJ+7
         nsbuERUC51elKjxAh+TTG02wbx4PUcqCbvSpON8ShUxJ1GXhM8UYvgAjkBOkxUFojuHr
         7s4w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=eH7S5IumUVeN7L8l1WgDLmgF00kuecfKrHYb8RcvYCs=;
        b=ffVYICYOWIzqaynxMvwuImBLjGDH/92OVMSbR6OE5f2AHfb1lpdEDQflHFM5OL/N9k
         OO35zobBhE/DxlIpuTVLzJN0NTuSJDMnl859hD+jU1hjqd5sxg/y8gg+YMST3xTn9I/D
         pZ/dqI9Da5P7nPb2lvT4ph3OkvYcFYK29cWnfdshSjuv5baFD8KxvH5u1aPGEe4PF097
         mfg4d/cwTLIi6tAQM70Xybe/ZUHjW7KyETsq1hPwwqpmKBOAUV4gCayOfgcyQM1USNxB
         VK0hspu3Rgp91/DhakwuXJT7o7H1t2IaBugHUvPjzFRo7DatJpvklWLNe4ipPmmqHkp/
         61xA==
X-Gm-Message-State: AOAM533SRd9x7Sq+jRsPjPu7CIMSi/Sm9NJv87HgNVN6qhH84LZaSott
	s0aIVwW5DeXv5cimf4lWkAzsfVG0ANOHulHn/PeDZg==
X-Google-Smtp-Source: ABdhPJxgEpVsONeBJ/mXU1kEZqRAB95gArl1xcPgE1AXUX5wSDjbzr/NDKJ0OuOSRof/wfJ+uUwFFDvm4D0CbfOmcYI=
X-Received: by 2002:a2e:810c:: with SMTP id d12mr29001490ljg.177.1635283489290;
 Tue, 26 Oct 2021 14:24:49 -0700 (PDT)
MIME-Version: 1.0
References: <20211026173822.502506-1-pasha.tatashin@soleen.com>
 <YXhHq52jDrU61V4E@casper.infradead.org> <CA+CK2bBbGEJa4s7HtgQkG_tZmN66UaVcDRovgLz4q+a0VGSqZg@mail.gmail.com>
 <YXhhbSSRQrG5Av6P@casper.infradead.org>
In-Reply-To: <YXhhbSSRQrG5Av6P@casper.infradead.org>
From: Pasha Tatashin <pasha.tatashin@soleen.com>
Date: Tue, 26 Oct 2021 17:24:12 -0400
Message-ID: <CA+CK2bBhMHv=G0Hd=rmCHUjKDOw0sYhOX2q3DfBXSJWFdQj8BQ@mail.gmail.com>
Subject: Re: [RFC 0/8] Hardening page _refcount
To: Matthew Wilcox <willy@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, 
	linux-m68k@lists.linux-m68k.org, 
	Anshuman Khandual <anshuman.khandual@arm.com>, Andrew Morton <akpm@linux-foundation.org>, 
	william.kucharski@oracle.com, Mike Kravetz <mike.kravetz@oracle.com>, 
	Vlastimil Babka <vbabka@suse.cz>, Geert Uytterhoeven <geert@linux-m68k.org>, schmitzmic@gmail.com, 
	Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@redhat.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, Roman Gushchin <guro@fb.com>, songmuchun@bytedance.com, 
	weixugc@google.com, Greg Thelen <gthelen@google.com>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 18E1CF0003A6
X-Stat-Signature: nfp3h8a574fuj37ow6qaspbks1ztcrwo
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=soleen.com header.s=google header.b=AjHm0pXt;
	spf=pass (imf17.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com;
	dmarc=none
X-HE-Tag: 1635283490-716251
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Oct 26, 2021 at 4:14 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 02:30:25PM -0400, Pasha Tatashin wrote:
> > On Tue, Oct 26, 2021 at 2:24 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > I think this is overkill.  Won't we get exactly the same protection
> > > by simply testing that page->_refcount == 0 in set_page_count()?
> > > Anything which triggers that BUG_ON would already be buggy because
> > > it can race with speculative gets.
> >
> > We can't because set_page_count(v) is used for
> > 1. changing _refcount form a current value to unconstrained v
> > 2.  initialize _refcount from undefined state to v.
> >
> > In this work we forbid the first case, and reduce the second case to
> > initialize only to 1.
>
> Anything that is calling set_page_refcount() on something which is
> not 0 is buggy today

For performance reasons the memblock allocator does not zero struct
page memory, we initialize every field in struct page individually,
that includes page->_refcount. Most of the time the boot memory is
zeroed by firmware, but it is not guaranteed, non-zero pages can come
from bootloader, or from freed firmware pages. Also, after kexec
memory state is not zeroed as well, and struct pages can contain
garbage until fields are initialized.

This is a valid reason to do set_page_recount() with non-zero
refcounts, but the function is too generic, as in this case we really
need to set it only to 1: so instead rename it to page_ref_init() and
set it only to 1.

Another example:
In __free_pages_core() and in init_cma_reserved_pageblock() we do
set_page_refcount() when _ref_count is 1 and we change it to 0.

In this case doing dec_return check makes more sense in order to
verify that the ref_count was indeed 1, and we won't have a double
free.

>  There are several ways to increment the page
> refcount speculatively if it is not 0.  eg lockless GUP and page cache
> reads.  So we could have:
>
> CPU 0: alloc_page() (refcount now 1)
> CPU 1: get_page_unless_zero() (refcount now 2)
> CPU 0: set_page_refcount(5) (refcount now 5)
> CPU 1: put_page() (refcount now 4)
>
> Now the refcount is wrong.  So it is *only* safe to call
> set_page_refcount() if the refcount is 0.  If you can find somewhere
> that's calling set_page_refcount() on a non-0 refcount, that's a bug
> that needs to be fixed.

Right, add_return/sub_return() with check after operation ensures that
there are no races where we could have some writes to refcount between
knowing it is 0 and calling set_page_refcount().

Thanks,
Pasha