All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Yu Zhao <yuzhao@google.com>, Andy Lutomirski <luto@kernel.org>,
	Peter Xu <peterx@redhat.com>, Pavel Emelyanov <xemul@openvz.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Minchan Kim <minchan@kernel.org>, Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Kees Cook <keescook@chromium.org>,
	Leon Romanovsky <leonro@nvidia.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Jan Kara <jack@suse.cz>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Nadav Amit <nadav.amit@gmail.com>, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse
Date: Wed, 13 Jan 2021 09:52:47 +0100	[thread overview]
Message-ID: <0cbefee2-70e9-9666-2d0c-ee2807e0fef9@redhat.com> (raw)
In-Reply-To: <CAHk-=wjWMieNV3nAJgoG5prEHBEcOZiREmLUr499tA9NMttEqQ@mail.gmail.com>

On 13.01.21 04:31, Linus Torvalds wrote:
> On Tue, Jan 12, 2021 at 6:16 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> The thing about the speculative page cache references is that they can
>> temporarily bump a refcount on a page which _used_ to be in the page
>> cache and has now been reallocated as some other kind of page.
> 
> Oh, and thinking about this made me think we might actually have a
> serious bug here, and it has nothing what-so-ever to do with COW, GUP,
> or even the page count itself.
> 
> It's unlikely enough that I think it's mostly theoretical, but tell me
> I'm wrong.
> 
> PLEASE tell me I'm wrong:
> 
> CPU1 does page_cache_get_speculative under RCU lock
> 
> CPU2 frees and re-uses the page
> 
>     CPU1                CPU2
>     ----                ----
> 
>     page = xas_load(&xas);
>     if (!page_cache_get_speculative(page))
>             goto repeat;
>     .. succeeds ..
> 
>                         remove page from XA
>                         release page
>                         reuse for something else
> 
>     .. and then re-check ..
>     if (unlikely(page != xas_reload(&xas))) {
>             put_page(page);
>             goto repeat;
>     }
> 
> ok, the above all looks fine. We got the speculative ref, but then we
> noticed that its' not valid any more, so we put it again. All good,
> right?
> 
> Wrong.
> 
> What if that "reuse for something else" was actually really quick, and
> both allocated and released it?
> 
> That still sounds good, right? Yes, now the "put_page()" will be the
> one that _actually_ releases the page, but we're still fine, right?
> 
> Very very wrong.
> 
> The "reuse for something else" on CPU2 might have gotten not an
> order-0 page, but a *high-order* page. So it allocated (and then
> immediately free'd) maybe an order-2 allocation with _four_ pages, and
> the re-use happened when we had coalesced the buddy pages.
> 
> But when we release the page on CPU1, we will release just _one_ page,
> and the other three pages will be lost forever.
> 
> IOW, we restored the page count perfectly fine, but we screwed up the
> page sizes and buddy information.
> 
> Ok, so the above is so unlikely from a timing standpoint that I don't
> think it ever happens, but I don't see why it couldn't happen in
> theory.
> 
> Please somebody tell me I'm missing some clever thing we do to make
> sure this can actually not happen..

Wasn't that tackled by latest (not merged AFAIKs) __free_pages() changes?

I'm only able to come up with the doc update, not with the oroginal
fix/change

https://lkml.kernel.org/r/20201027025523.3235-1-willy@infradead.org

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-01-13  8:54 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-10  0:44 [PATCH 0/1] mm: restore full accuracy in COW page reuse Andrea Arcangeli
2021-01-10  0:44 ` [PATCH 1/1] " Andrea Arcangeli
2021-01-10  2:54   ` Andrea Arcangeli
2021-01-11 14:11     ` Kirill A. Shutemov
2021-01-10  0:55 ` [PATCH 0/1] " Linus Torvalds
2021-01-10  0:55   ` Linus Torvalds
2021-01-10  1:19   ` Linus Torvalds
2021-01-10  1:19     ` Linus Torvalds
2021-01-10  1:37     ` Linus Torvalds
2021-01-10  1:37       ` Linus Torvalds
2021-01-10  3:24       ` Andrea Arcangeli
2021-01-10  2:51     ` Andrea Arcangeli
2021-01-10  3:51       ` Linus Torvalds
2021-01-10  3:51         ` Linus Torvalds
2021-01-10 19:30         ` Linus Torvalds
2021-01-10 19:30           ` Linus Torvalds
2021-01-11  1:18           ` Jason Gunthorpe
2021-01-11  7:26           ` John Hubbard
2021-01-11 12:42             ` Matthew Wilcox
2021-01-11 16:05             ` Jason Gunthorpe
2021-01-11 16:15               ` Michal Hocko
2021-01-11 19:19             ` Linus Torvalds
2021-01-11 19:19               ` Linus Torvalds
2021-01-11 22:18               ` Linus Torvalds
2021-01-11 22:18                 ` Linus Torvalds
2021-01-12 17:07                 ` Andy Lutomirski
2021-01-12 17:07                   ` Andy Lutomirski
2021-01-12 23:51                 ` Jerome Glisse
2021-01-13  2:16                 ` Matthew Wilcox
2021-01-13  2:43                   ` Linus Torvalds
2021-01-13  2:43                     ` Linus Torvalds
2021-01-13  3:31                   ` Linus Torvalds
2021-01-13  3:31                     ` Linus Torvalds
2021-01-13  8:52                     ` David Hildenbrand [this message]
2021-01-13  8:57                       ` David Hildenbrand
2021-01-13 12:32                     ` Kirill A. Shutemov
2021-01-13 12:55                       ` Matthew Wilcox
2021-01-13 19:54                         ` Linus Torvalds
2021-01-13 19:54                           ` Linus Torvalds
2021-01-13 23:54           ` Peter Xu
2021-01-11 15:52       ` Jason Gunthorpe
2021-01-15  8:59 ` David Hildenbrand
2021-01-15 18:37   ` Jason Gunthorpe
2021-01-15 19:46     ` David Hildenbrand
2021-01-15 19:53       ` Jason Gunthorpe
2021-01-16  3:40       ` John Hubbard
2021-01-16 11:42         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cbefee2-70e9-9666-2d0c-ee2807e0fef9@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xemul@openvz.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.