linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* poisoned pages do not play well in the buddy allocator
@ 2019-08-26 10:41 Oscar Salvador
  2019-08-26 11:14 ` Michal Hocko
  2019-08-27  1:34 ` Naoya Horiguchi
  0 siblings, 2 replies; 7+ messages in thread
From: Oscar Salvador @ 2019-08-26 10:41 UTC (permalink / raw)
  To: n-horiguchi; +Cc: mhocko, mike.kravetz, linux-mm, linux-kernel, vbabka

Hi,

When analyzing a problem reported by one of our customers, I stumbbled upon an issue
that origins from the fact that poisoned pages end up in the buddy allocator.

Let me break down the stepts that lie to the problem:

1) We soft-offline a page
2) Page gets flagged as HWPoison and is being sent to the buddy allocator.
   This is done through set_hwpoison_free_buddy_page().
3) Kcompactd wakes up in order to perform some compaction.
4) compact_zone() will call migrate_pages()
5) migrate_pages() will try to get a new page from compaction_alloc() to migrate to
6) if cc->freelist is empty, compaction_alloc() will call isolate_free_pagesblock()
7) isolate_free_pagesblock only checks for PageBuddy() to assume that a page is OK
   to be used to migrate to. Since HWPoisoned page are also PageBuddy, we add
   the page to the list. (same problem exists in fast_isolate_freepages()).

The outcome of that is that we end up happily handing poisoned pages in compaction_alloc,
so if we ever got a fault on that page through *_fault, we will return VM_FAULT_HWPOISON,
and the process will be killed.

I first though that I could get away with it by checking PageHWPoison in
{fast_isolate_freepages/isolate_free_pagesblock}, but not really.
It might be that the page we are checking is an order > 0 page, so the first page
might not be poisoned, but the one the follows might be, and we end up in the
same situation.

After some more thought, I really came to the conclusion that HWPoison pages should not
really be in the buddy allocator, as this is only asking for problems.
In this case it is only compaction code, but it could be happening somewhere else,
and one would expect that the pages you got from the buddy allocator are __ready__ to use.

I __think__ that we thought we were safe to put HWPoison pages in the buddy allocator as we
perform healthy checks when getting a page from there, so we skip poisoned pages

Of course, this is not the end of the story, now that someone got a page, if he frees it,
there is a high chance that this page ends up in a pcplist (I saw that).
Unless we are on CONFIG_VM_DEBUG, we do not check for the health of pages got from pcplist,
as we do when getting a page from the buddy allocator.

I checked [1], and it seems that [2] was going towards fixing this kind of issue.

I think it is about time to revamp the whole thing.

@Naoya: I could give it a try if you are busy.

[1] https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
[2] https://lore.kernel.org/linux-mm/1541746035-13408-9-git-send-email-n-horiguchi@ah.jp.nec.com/

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-30 13:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 10:41 poisoned pages do not play well in the buddy allocator Oscar Salvador
2019-08-26 11:14 ` Michal Hocko
2019-08-27  1:34 ` Naoya Horiguchi
2019-08-27  7:28   ` Oscar Salvador
2019-08-30 10:45     ` Oscar Salvador
2019-08-30 12:36       ` Michal Hocko
2019-08-30 13:21         ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).