linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	akpm@linux-foundation.org, willy@infradead.org, vbabka@suse.cz,
	dhowells@redhat.com, neilb@suse.de, surenb@google.com,
	minchan@kernel.org, sfr@canb.auug.org.au, rcampbell@nvidia.com,
	naoya.horiguchi@nec.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/swapfile: unuse_pte can map random data if swap read fails
Date: Tue, 19 Apr 2022 12:16:35 -0400	[thread overview]
Message-ID: <Yl7gY6G8/To1yHOe@xz-m1.local> (raw)
In-Reply-To: <21003e7a-01e4-c751-dd41-fce4149d424c@redhat.com>

On Tue, Apr 19, 2022 at 01:14:29PM +0200, David Hildenbrand wrote:
> On 19.04.22 10:08, Alistair Popple wrote:
> > David Hildenbrand <david@redhat.com> writes:
> > 
> >> On 19.04.22 09:29, Miaohe Lin wrote:
> >>> On 2022/4/19 11:51, Alistair Popple wrote:
> >>>> Miaohe Lin <linmiaohe@huawei.com> writes:
> >>>>
> >>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> >>>>> page filled with random data is mapped into user address space. In case
> >>>>> of error, a special swap entry indicating swap read fails is set to the
> >>>>> page table. So the swapcache page can be freed and the user won't end up
> >>>>> with a permanently mounted swap because a sector is bad. And if the page
> >>>>> is accessed later, the user process will be killed so that corrupted data
> >>>>> is never consumed. On the other hand, if the page is never accessed, the
> >>>>> user won't even notice it.
> >>>>
> >>>> Hi Miaohe,
> >>>>> It seems we're not actually using the pfn that gets stored in the special swap
> >>>> entry here. Is my understanding correct? If so I think it would be better to use
> >>>
> >>> Yes, you're right. The pfn is not used now. What we need here is a special swap entry
> >>> to do the right things. I think we can change to store some debugging information instead
> >>> of pfn if needed in the future.
> >>>
> >>>> the new PTE markers Peter introduced[1] rather than adding another swap entry
> >>>> type.
> >>>
> >>> IIUC, we should not reuse that swap entry here. From definition:
> >>>
> >>> PTE markers
> >>> `========='
> >>> ...
> >>> PTE marker is a new type of swap entry that is ony applicable to file
> >>> backed memories like shmem and hugetlbfs.  It's used to persist some
> >>> pte-level information even if the original present ptes in pgtable are
> >>> zapped.
> >>>
> >>> It's designed for file backed memories while swapin error entry is for anonymous
> >>> memories. And there has some differences in processing. So it's not a good idea
> >>> to reuse pte markers. Or am I miss something?
> >>
> >> I tend to agree. As raised in my other reply, maybe we can simply reuse
> >> hwpoison entries and update the documentation of them accordingly.
> > 
> > Unless I've missed something I don't think PTE markers should be restricted
> > solely to file backed memory. It's true that the only user of them at the moment
> > is UFFD-WP for file backed memory, but PTE markers are just a special swap entry
> > same as what is added here.
> 
> There is a difference.
> 
> What we want here is "there used to be something mapped but it's not
> readable anymore. Please fail hard when userspace tries accessing
> this.". Just like with hwpoison entries.
> 
> What a pte marker expresses is that "here is nothing mapped right now
> but we have additional metadata available here. For file-backed memory,
> it translates to: If we ever touch this page, lookup the pagecache what
> to map here."
> 
> In the anonymous memory world, this would map to "populate the zeropage
> or a fresh anonymous page on access." and keep the metadata around.

So far it's defined like that, but it does not necessarily need to.  IMHO
PTE marker could work here for the anonymous use case as Alistair stated.
Say, it's fairly simple to not go into anonymous page handling at all if we
see this pte marker with the new bit set.  It's indeed just tailored for
such use case where we don't need to store special data like pfn.

Hwpoison entry looks good to me too, but as discussed we may need to
reserve pfn=0 or -1 or anything we're sure an invalid value, and then we'll
also need to cover the rest hwpoison related code (carefully, as rightfully
pointed out by Miaohe on the difference of VM_FAULT_* fields being
returned) to not faultly treat the "swp device read error" with general
MCEs.

From that POV it seems pte markers would be slightly cleaner, we'll need to
touch up existing pte markers code path to start accept anonymous vmas,
though.  No strong opinion on this.

Btw, is there an error dumped into dmesg when the read error happens (e.g.,
would block IO path trigger some warning already)?  I'm wondering whether
we should report it to the user somehow so that the user should know even
earlier than when the bad page is accessed, then the user could potentially
do something useful.

-- 
Peter Xu


  reply	other threads:[~2022-04-19 16:16 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-16  3:05 [PATCH v2] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
2022-04-19  3:51 ` Alistair Popple
2022-04-19  7:29   ` Miaohe Lin
2022-04-19  7:39     ` David Hildenbrand
2022-04-19  8:08       ` Alistair Popple
2022-04-19 11:14         ` David Hildenbrand
2022-04-19 16:16           ` Peter Xu [this message]
2022-04-19 11:14         ` Miaohe Lin
2022-04-19  7:53   ` Alistair Popple
2022-04-19 11:26     ` Miaohe Lin
2022-04-20  0:25       ` Alistair Popple
2022-04-20  6:15         ` Miaohe Lin
2022-04-20  7:07           ` David Hildenbrand
2022-04-20  8:37             ` Miaohe Lin
2022-04-19  7:37 ` David Hildenbrand
2022-04-19 11:21   ` Miaohe Lin
2022-04-19 11:46     ` David Hildenbrand
2022-04-19 12:00       ` Miaohe Lin
2022-04-19 12:12         ` David Hildenbrand
2022-04-19 12:45           ` Miaohe Lin
2022-04-19 21:36 ` Peter Xu
2022-04-20  5:56   ` [PATCH] mm/swap: Fix lost swap bits in unuse_pte() kernel test robot
2022-04-20  6:23     ` Miaohe Lin
2022-04-20  6:39       ` [kbuild-all] " Philip Li
2022-04-20  6:52         ` Miaohe Lin
2022-04-20  6:48       ` Chen, Rong A
2022-04-20  6:56         ` Miaohe Lin
2022-04-20  6:21   ` [PATCH v2] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
2022-04-20 13:32     ` Peter Xu
2022-04-21  1:50       ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yl7gY6G8/To1yHOe@xz-m1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=neilb@suse.de \
    --cc=rcampbell@nvidia.com \
    --cc=sfr@canb.auug.org.au \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).