From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757537AbcDEIU6 (ORCPT ); Tue, 5 Apr 2016 04:20:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:45045 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751195AbcDEIUz (ORCPT ); Tue, 5 Apr 2016 04:20:55 -0400 Subject: Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page To: Naoya Horiguchi References: <1459321935-3655-1-git-send-email-minchan@kernel.org> <1459321935-3655-2-git-send-email-minchan@kernel.org> <56FE706D.7080507@suse.cz> <20160404013917.GC6543@bbox> <20160404044458.GA20250@hori1.linux.bs1.fc.nec.co.jp> <57027E47.7070909@suse.cz> <20160405015402.GA30962@hori1.linux.bs1.fc.nec.co.jp> Cc: Minchan Kim , Andrew Morton , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "jlayton@poochiereds.net" , "bfields@fieldses.org" , Joonsoo Kim , "koct9i@gmail.com" , "aquini@redhat.com" , "virtualization@lists.linux-foundation.org" , Mel Gorman , Hugh Dickins , Sergey Senozhatsky , Rik van Riel , "rknize@motorola.com" , Gioh Kim , Sangseok Lee , Chan Gyun Jeong , Al Viro , YiPing Xu From: Vlastimil Babka Message-ID: <57037562.3040203@suse.cz> Date: Tue, 5 Apr 2016 10:20:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160405015402.GA30962@hori1.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/05/2016 03:54 AM, Naoya Horiguchi wrote: > On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote: >> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote: >>> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote: > ... >>>>> >>>>> Also (but not your fault) the put_page() preceding >>>>> test_set_page_hwpoison(page)) IMHO deserves a comment saying which >>>>> pin we are releasing and which one we still have (hopefully? if I >>>>> read description of da1b13ccfbebe right) otherwise it looks like >>>>> doing something with a page that we just potentially freed. >>>> >>>> Yes, while I read the code, I had same question. I think the releasing >>>> refcount is for get_any_page. >>> >>> As the other callers of page migration do, soft_offline_page expects the >>> migration source page to be freed at this put_page() (no pin remains.) >>> The refcount released here is from isolate_lru_page() in __soft_offline_page(). >>> (the pin by get_any_page is released by put_hwpoison_page just after it.) >>> >>> .. yes, doing something just after freeing page looks weird, but that's >>> how PageHWPoison flag works. IOW, many other page flags are maintained >>> only during one "allocate-free" life span, but PageHWPoison still does >>> its job beyond it. >> >> But what prevents the page from being allocated again between put_page() >> and test_set_page_hwpoison()? In that case we would be marking page >> poisoned while still in use, which is the same as marking it while still >> in use after a failed migration? > > Actually nothing prevents that race. But I think that the result of the race > is that the error page can be reused for allocation, which results in killing > processes at page fault time. Soft offline is kind of mild/precautious thing > (for correctable errors that don't require immediate handling), so killing > processes looks to me an overkill. And marking hwpoison means that we can no > longer do retry from userspace. So you agree that this race is a bug? It may turn a soft-offline attempt into a killed process. In that case we should fix it the same as we are fixing the failed migration case. Maybe it will be just enough to switch the test_set_page_hwpoison() and put_page() calls? > And another practical thing is the race with unpoison_memory() as described > in commit da1b13ccfbebe. unpoison_memory() properly works only for properly > poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile. > That's why I'd like to avoid setting PageHWPoison for in-use pages if possible. > >> (Also, which part prevents pages with PageHWPoison to be allocated >> again, anyway? I can't find it and test_set_page_hwpoison() doesn't >> remove from buddy freelists). > > check_new_page() in mm/page_alloc.c should prevent reallocation of PageHWPoison. > As you pointed out, memory error handler doens't remove it from buddy freelists. Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when searching. In any case that results in a bad_page() warning, right? Is it desirable for a soft-offlined page? If we didn't free poisoned pages to buddy system, they wouldn't trigger this warning. > BTW, it might be a bit off-topic, but recently I felt that check_new_page() > might be improvable, because when check_new_page() returns 1, the whole buddy > block (not only the bad page) seems to be leaked from buddy freelist. > For example, if thp (order 9) is requested, and PageHWPoison (or any other > types of bad pages) is found in an order 9 block, all 512 page are discarded. > Unpoison can't bring it back to buddy. > So, some code to split buddy block including bad page (and recovering code from > unpoison) might be helpful, although that's another story ... Hm sounds like another argument for not freeing the page to buddy lists in the first place. Maybe a hook in free_pages_check()? > Thanks, > Naoya Horiguchi >