* [RFC] HWPOISON: soft offlining for non-lru movable page
@ 2017-01-18 4:00 Yisheng Xie
2017-01-18 9:45 ` Naoya Horiguchi
2017-01-18 9:51 ` Michal Hocko
0 siblings, 2 replies; 6+ messages in thread
From: Yisheng Xie @ 2017-01-18 4:00 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: n-horiguchi, mhocko, akpm, minchan, vbabka, guohanjun, qiuxishi
This patch is to extends soft offlining framework to support
non-lru page, which already support migration after
commit bda807d44454 ("mm: migrate: support non-lru movable page
migration")
When memory corrected errors occur on a non-lru movable page,
we can choose to stop using it by migrating data onto another
page and disable the original (maybe half-broken) one.
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
mm/memory-failure.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f283c7e..10043a4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
{
int ret = __get_any_page(page, pfn, flags);
- if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
+ if (ret == 1 && !PageHuge(page) &&
+ !PageLRU(page) && !__PageMovable(page)) {
/*
* Try to free it.
*/
@@ -1549,6 +1550,54 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
return ret;
}
+static int soft_offline_movable_page(struct page *page, int flags)
+{
+ int ret;
+ unsigned long pfn = page_to_pfn(page);
+ LIST_HEAD(pagelist);
+
+ /*
+ * This double-check of PageHWPoison is to avoid the race with
+ * memory_failure(). See also comment in __soft_offline_page().
+ */
+ lock_page(page);
+ if (PageHWPoison(page)) {
+ unlock_page(page);
+ put_hwpoison_page(page);
+ pr_info("soft offline: %#lx movable page already poisoned\n",
+ pfn);
+ return -EBUSY;
+ }
+ unlock_page(page);
+
+ ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
+ /*
+ * get_any_page() and isolate_movable_page() takes a refcount each,
+ * so need to drop one here.
+ */
+ put_hwpoison_page(page);
+ if (!ret) {
+ pr_info("soft offline: %#lx movable page failed to isolate\n",
+ pfn);
+ return -EBUSY;
+ }
+
+ list_add(&page->lru, &pagelist);
+ ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
+ MIGRATE_SYNC, MR_MEMORY_FAILURE);
+ if (ret) {
+ if (!list_empty(&pagelist))
+ putback_movable_pages(&pagelist);
+
+ pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
+ pfn, ret, page->flags);
+ if (ret > 0)
+ ret = -EIO;
+ }
+
+ return ret;
+}
+
static int soft_offline_huge_page(struct page *page, int flags)
{
int ret;
@@ -1705,8 +1754,10 @@ static int soft_offline_in_use_page(struct page *page, int flags)
if (PageHuge(page))
ret = soft_offline_huge_page(page, flags);
- else
+ else if (PageLRU(page))
ret = __soft_offline_page(page, flags);
+ else
+ ret = soft_offline_movable_page(page, flags);
return ret;
}
--
1.7.12.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] HWPOISON: soft offlining for non-lru movable page
2017-01-18 4:00 [RFC] HWPOISON: soft offlining for non-lru movable page Yisheng Xie
@ 2017-01-18 9:45 ` Naoya Horiguchi
2017-01-20 9:52 ` Yisheng Xie
2017-01-18 9:51 ` Michal Hocko
1 sibling, 1 reply; 6+ messages in thread
From: Naoya Horiguchi @ 2017-01-18 9:45 UTC (permalink / raw)
To: Yisheng Xie
Cc: linux-mm, linux-kernel, mhocko, akpm, minchan, vbabka, guohanjun,
qiuxishi
On Wed, Jan 18, 2017 at 12:00:54PM +0800, Yisheng Xie wrote:
> This patch is to extends soft offlining framework to support
> non-lru page, which already support migration after
> commit bda807d44454 ("mm: migrate: support non-lru movable page
> migration")
>
> When memory corrected errors occur on a non-lru movable page,
> we can choose to stop using it by migrating data onto another
> page and disable the original (maybe half-broken) one.
>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
It looks OK in my quick glance. I'll do some testing more tomorrow.
Thanks,
Naoya Horiguchi
> ---
> mm/memory-failure.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f283c7e..10043a4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
> {
> int ret = __get_any_page(page, pfn, flags);
>
> - if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
> + if (ret == 1 && !PageHuge(page) &&
> + !PageLRU(page) && !__PageMovable(page)) {
> /*
> * Try to free it.
> */
> @@ -1549,6 +1550,54 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
> return ret;
> }
>
> +static int soft_offline_movable_page(struct page *page, int flags)
> +{
> + int ret;
> + unsigned long pfn = page_to_pfn(page);
> + LIST_HEAD(pagelist);
> +
> + /*
> + * This double-check of PageHWPoison is to avoid the race with
> + * memory_failure(). See also comment in __soft_offline_page().
> + */
> + lock_page(page);
> + if (PageHWPoison(page)) {
> + unlock_page(page);
> + put_hwpoison_page(page);
> + pr_info("soft offline: %#lx movable page already poisoned\n",
> + pfn);
> + return -EBUSY;
> + }
> + unlock_page(page);
> +
> + ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> + /*
> + * get_any_page() and isolate_movable_page() takes a refcount each,
> + * so need to drop one here.
> + */
> + put_hwpoison_page(page);
> + if (!ret) {
> + pr_info("soft offline: %#lx movable page failed to isolate\n",
> + pfn);
> + return -EBUSY;
> + }
> +
> + list_add(&page->lru, &pagelist);
> + ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> + MIGRATE_SYNC, MR_MEMORY_FAILURE);
> + if (ret) {
> + if (!list_empty(&pagelist))
> + putback_movable_pages(&pagelist);
> +
> + pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
> + pfn, ret, page->flags);
> + if (ret > 0)
> + ret = -EIO;
> + }
> +
> + return ret;
> +}
> +
> static int soft_offline_huge_page(struct page *page, int flags)
> {
> int ret;
> @@ -1705,8 +1754,10 @@ static int soft_offline_in_use_page(struct page *page, int flags)
>
> if (PageHuge(page))
> ret = soft_offline_huge_page(page, flags);
> - else
> + else if (PageLRU(page))
> ret = __soft_offline_page(page, flags);
> + else
> + ret = soft_offline_movable_page(page, flags);
>
> return ret;
> }
> --
> 1.7.12.4
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] HWPOISON: soft offlining for non-lru movable page
2017-01-18 4:00 [RFC] HWPOISON: soft offlining for non-lru movable page Yisheng Xie
2017-01-18 9:45 ` Naoya Horiguchi
@ 2017-01-18 9:51 ` Michal Hocko
2017-01-19 1:21 ` Yisheng Xie
1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2017-01-18 9:51 UTC (permalink / raw)
To: Yisheng Xie
Cc: linux-mm, linux-kernel, n-horiguchi, akpm, minchan, vbabka,
guohanjun, qiuxishi
On Wed 18-01-17 12:00:54, Yisheng Xie wrote:
> This patch is to extends soft offlining framework to support
> non-lru page, which already support migration after
> commit bda807d44454 ("mm: migrate: support non-lru movable page
> migration")
>
> When memory corrected errors occur on a non-lru movable page,
> we can choose to stop using it by migrating data onto another
> page and disable the original (maybe half-broken) one.
soft_offline_movable_page duplicates quite a lot from
__soft_offline_page. Would it be better to handle both cases in
__soft_offline_page?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] HWPOISON: soft offlining for non-lru movable page
2017-01-18 9:51 ` Michal Hocko
@ 2017-01-19 1:21 ` Yisheng Xie
0 siblings, 0 replies; 6+ messages in thread
From: Yisheng Xie @ 2017-01-19 1:21 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-mm, linux-kernel, n-horiguchi, akpm, minchan, vbabka,
guohanjun, qiuxishi
On 2017/1/18 17:51, Michal Hocko wrote:
> On Wed 18-01-17 12:00:54, Yisheng Xie wrote:
>> This patch is to extends soft offlining framework to support
>> non-lru page, which already support migration after
>> commit bda807d44454 ("mm: migrate: support non-lru movable page
>> migration")
>>
>> When memory corrected errors occur on a non-lru movable page,
>> we can choose to stop using it by migrating data onto another
>> page and disable the original (maybe half-broken) one.
>
> soft_offline_movable_page duplicates quite a lot from
> __soft_offline_page. Would it be better to handle both cases in
> __soft_offline_page?
>
Hi Michal,
Thanks for reviewing.
Yes, the most code of soft_offline_movable_page is duplicates with
__soft_offline_page, I use a single function to make code looks clear,
just as what soft_offline_hugetlb_page do.
I will try to make a v2 as your suggestion.
Thanks
Yisheng Xie.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] HWPOISON: soft offlining for non-lru movable page
2017-01-18 9:45 ` Naoya Horiguchi
@ 2017-01-20 9:52 ` Yisheng Xie
2017-01-23 4:26 ` Naoya Horiguchi
0 siblings, 1 reply; 6+ messages in thread
From: Yisheng Xie @ 2017-01-20 9:52 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, linux-kernel, mhocko, akpm, minchan, vbabka, guohanjun,
qiuxishi
Hi Naoya,
On 2017/1/18 17:45, Naoya Horiguchi wrote:
> On Wed, Jan 18, 2017 at 12:00:54PM +0800, Yisheng Xie wrote:
>> This patch is to extends soft offlining framework to support
>> non-lru page, which already support migration after
>> commit bda807d44454 ("mm: migrate: support non-lru movable page
>> migration")
>>
>> When memory corrected errors occur on a non-lru movable page,
>> we can choose to stop using it by migrating data onto another
>> page and disable the original (maybe half-broken) one.
>>
>> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
>
> It looks OK in my quick glance. I'll do some testing more tomorrow.
>
Thanks for reviewing.
I have do some basic test like offline movable page and unpoison it.
Do you have some test suit or test suggestion? So I can do some more
test of it for double check? Very thanks for that.
Thanks
Yisheng Xie.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] HWPOISON: soft offlining for non-lru movable page
2017-01-20 9:52 ` Yisheng Xie
@ 2017-01-23 4:26 ` Naoya Horiguchi
0 siblings, 0 replies; 6+ messages in thread
From: Naoya Horiguchi @ 2017-01-23 4:26 UTC (permalink / raw)
To: Yisheng Xie
Cc: linux-mm, linux-kernel, mhocko, akpm, minchan, vbabka, guohanjun,
qiuxishi
On Fri, Jan 20, 2017 at 05:52:13PM +0800, Yisheng Xie wrote:
> Hi Naoya,
>
> On 2017/1/18 17:45, Naoya Horiguchi wrote:
> > On Wed, Jan 18, 2017 at 12:00:54PM +0800, Yisheng Xie wrote:
> >> This patch is to extends soft offlining framework to support
> >> non-lru page, which already support migration after
> >> commit bda807d44454 ("mm: migrate: support non-lru movable page
> >> migration")
> >>
> >> When memory corrected errors occur on a non-lru movable page,
> >> we can choose to stop using it by migrating data onto another
> >> page and disable the original (maybe half-broken) one.
> >>
> >> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> >
> > It looks OK in my quick glance. I'll do some testing more tomorrow.
> >
> Thanks for reviewing.
> I have do some basic test like offline movable page and unpoison it.
> Do you have some test suit or test suggestion? So I can do some more
> test of it for double check? Very thanks for that.
I've tried soft offline on zram pages with your v2 patch, and it works fine.
I have no specific suggestion about other testcases.
Thanks,
Naoya Horiguchi
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-01-23 4:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18 4:00 [RFC] HWPOISON: soft offlining for non-lru movable page Yisheng Xie
2017-01-18 9:45 ` Naoya Horiguchi
2017-01-20 9:52 ` Yisheng Xie
2017-01-23 4:26 ` Naoya Horiguchi
2017-01-18 9:51 ` Michal Hocko
2017-01-19 1:21 ` Yisheng Xie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).