linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2] HWPOISON: soft offlining for non-lru movable page
@ 2017-01-19 14:59 ysxie
  2017-01-23  4:39 ` Naoya Horiguchi
  2017-01-23  5:14 ` Minchan Kim
  0 siblings, 2 replies; 4+ messages in thread
From: ysxie @ 2017-01-19 14:59 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: n-horiguchi, mhocko, akpm, minchan, vbabka, guohanjun, qiuxishi

From: Yisheng Xie <xieyisheng1@huawei.com>

This patch is to extends soft offlining framework to support
non-lru page, which already support migration after
commit bda807d44454 ("mm: migrate: support non-lru movable page
migration")

When memory corrected errors occur on a non-lru movable page,
we can choose to stop using it by migrating data onto another
page and disable the original (maybe half-broken) one.

Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
---
v2:
 delete function soft_offline_movable_page() and hanle non-lru movable
 page in __soft_offline_page() as Michal Hocko suggested.

Any comment is more than welcome.

 mm/memory-failure.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f283c7e..74be9e1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 {
 	int ret = __get_any_page(page, pfn, flags);
 
-	if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
+	if (ret == 1 && !PageHuge(page) &&
+	    !PageLRU(page) && !__PageMovable(page)) {
 		/*
 		 * Try to free it.
 		 */
@@ -1609,7 +1610,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 
 static int __soft_offline_page(struct page *page, int flags)
 {
-	int ret;
+	int ret = -1;
 	unsigned long pfn = page_to_pfn(page);
 
 	/*
@@ -1619,7 +1620,8 @@ static int __soft_offline_page(struct page *page, int flags)
 	 * so there's no race between soft_offline_page() and memory_failure().
 	 */
 	lock_page(page);
-	wait_on_page_writeback(page);
+	if (PageLRU(page))
+		wait_on_page_writeback(page);
 	if (PageHWPoison(page)) {
 		unlock_page(page);
 		put_hwpoison_page(page);
@@ -1630,7 +1632,8 @@ static int __soft_offline_page(struct page *page, int flags)
 	 * Try to invalidate first. This should work for
 	 * non dirty unmapped page cache pages.
 	 */
-	ret = invalidate_inode_page(page);
+	if (PageLRU(page))
+		ret = invalidate_inode_page(page);
 	unlock_page(page);
 	/*
 	 * RED-PEN would be better to keep it isolated here, but we
@@ -1649,7 +1652,10 @@ static int __soft_offline_page(struct page *page, int flags)
 	 * Try to migrate to a new page instead. migrate.c
 	 * handles a large number of cases for us.
 	 */
-	ret = isolate_lru_page(page);
+	if (PageLRU(page))
+		ret = isolate_lru_page(page);
+	else
+		ret = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
 	/*
 	 * Drop page reference which is came from get_any_page()
 	 * successful isolate_lru_page() already took another one.
@@ -1657,18 +1663,15 @@ static int __soft_offline_page(struct page *page, int flags)
 	put_hwpoison_page(page);
 	if (!ret) {
 		LIST_HEAD(pagelist);
-		inc_node_page_state(page, NR_ISOLATED_ANON +
+		if (PageLRU(page))
+			inc_node_page_state(page, NR_ISOLATED_ANON +
 					page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (ret) {
-			if (!list_empty(&pagelist)) {
-				list_del(&page->lru);
-				dec_node_page_state(page, NR_ISOLATED_ANON +
-						page_is_file_cache(page));
-				putback_lru_page(page);
-			}
+			if (!list_empty(&pagelist))
+				putback_movable_pages(&pagelist);
 
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 				pfn, ret, page->flags);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC v2] HWPOISON: soft offlining for non-lru movable page
  2017-01-19 14:59 [RFC v2] HWPOISON: soft offlining for non-lru movable page ysxie
@ 2017-01-23  4:39 ` Naoya Horiguchi
  2017-01-23  5:14 ` Minchan Kim
  1 sibling, 0 replies; 4+ messages in thread
From: Naoya Horiguchi @ 2017-01-23  4:39 UTC (permalink / raw)
  To: ysxie
  Cc: linux-mm, linux-kernel, mhocko, akpm, minchan, vbabka, guohanjun,
	qiuxishi

On Thu, Jan 19, 2017 at 10:59:03PM +0800, ysxie@foxmail.com wrote:
> From: Yisheng Xie <xieyisheng1@huawei.com>
> 
> This patch is to extends soft offlining framework to support
> non-lru page, which already support migration after
> commit bda807d44454 ("mm: migrate: support non-lru movable page
> migration")
> 
> When memory corrected errors occur on a non-lru movable page,
> we can choose to stop using it by migrating data onto another
> page and disable the original (maybe half-broken) one.
> 
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> Suggested-by: Michal Hocko <mhocko@kernel.org>

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
> v2:
>  delete function soft_offline_movable_page() and hanle non-lru movable
>  page in __soft_offline_page() as Michal Hocko suggested.
> 
> Any comment is more than welcome.
> 
>  mm/memory-failure.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f283c7e..74be9e1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
>  {
>  	int ret = __get_any_page(page, pfn, flags);
>  
> -	if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
> +	if (ret == 1 && !PageHuge(page) &&
> +	    !PageLRU(page) && !__PageMovable(page)) {
>  		/*
>  		 * Try to free it.
>  		 */
> @@ -1609,7 +1610,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  
>  static int __soft_offline_page(struct page *page, int flags)
>  {
> -	int ret;
> +	int ret = -1;
>  	unsigned long pfn = page_to_pfn(page);
>  
>  	/*
> @@ -1619,7 +1620,8 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * so there's no race between soft_offline_page() and memory_failure().
>  	 */
>  	lock_page(page);
> -	wait_on_page_writeback(page);
> +	if (PageLRU(page))
> +		wait_on_page_writeback(page);
>  	if (PageHWPoison(page)) {
>  		unlock_page(page);
>  		put_hwpoison_page(page);
> @@ -1630,7 +1632,8 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * Try to invalidate first. This should work for
>  	 * non dirty unmapped page cache pages.
>  	 */
> -	ret = invalidate_inode_page(page);
> +	if (PageLRU(page))
> +		ret = invalidate_inode_page(page);
>  	unlock_page(page);
>  	/*
>  	 * RED-PEN would be better to keep it isolated here, but we
> @@ -1649,7 +1652,10 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * Try to migrate to a new page instead. migrate.c
>  	 * handles a large number of cases for us.
>  	 */
> -	ret = isolate_lru_page(page);
> +	if (PageLRU(page))
> +		ret = isolate_lru_page(page);
> +	else
> +		ret = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>  	/*
>  	 * Drop page reference which is came from get_any_page()
>  	 * successful isolate_lru_page() already took another one.
> @@ -1657,18 +1663,15 @@ static int __soft_offline_page(struct page *page, int flags)
>  	put_hwpoison_page(page);
>  	if (!ret) {
>  		LIST_HEAD(pagelist);
> -		inc_node_page_state(page, NR_ISOLATED_ANON +
> +		if (PageLRU(page))
> +			inc_node_page_state(page, NR_ISOLATED_ANON +
>  					page_is_file_cache(page));
>  		list_add(&page->lru, &pagelist);
>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>  		if (ret) {
> -			if (!list_empty(&pagelist)) {
> -				list_del(&page->lru);
> -				dec_node_page_state(page, NR_ISOLATED_ANON +
> -						page_is_file_cache(page));
> -				putback_lru_page(page);
> -			}
> +			if (!list_empty(&pagelist))
> +				putback_movable_pages(&pagelist);
>  
>  			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  				pfn, ret, page->flags);
> -- 
> 1.9.1
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC v2] HWPOISON: soft offlining for non-lru movable page
  2017-01-19 14:59 [RFC v2] HWPOISON: soft offlining for non-lru movable page ysxie
  2017-01-23  4:39 ` Naoya Horiguchi
@ 2017-01-23  5:14 ` Minchan Kim
  2017-01-23 12:39   ` Yisheng Xie
  1 sibling, 1 reply; 4+ messages in thread
From: Minchan Kim @ 2017-01-23  5:14 UTC (permalink / raw)
  To: ysxie
  Cc: linux-mm, linux-kernel, n-horiguchi, mhocko, akpm, vbabka,
	guohanjun, qiuxishi

Hello,

On Thu, Jan 19, 2017 at 10:59:03PM +0800, ysxie@foxmail.com wrote:
> From: Yisheng Xie <xieyisheng1@huawei.com>
> 
> This patch is to extends soft offlining framework to support
> non-lru page, which already support migration after
> commit bda807d44454 ("mm: migrate: support non-lru movable page
> migration")
> 
> When memory corrected errors occur on a non-lru movable page,
> we can choose to stop using it by migrating data onto another
> page and disable the original (maybe half-broken) one.
> 
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> ---
> v2:
>  delete function soft_offline_movable_page() and hanle non-lru movable
>  page in __soft_offline_page() as Michal Hocko suggested.
> 
> Any comment is more than welcome.
> 
>  mm/memory-failure.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f283c7e..74be9e1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
>  {
>  	int ret = __get_any_page(page, pfn, flags);
>  
> -	if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
> +	if (ret == 1 && !PageHuge(page) &&
> +	    !PageLRU(page) && !__PageMovable(page)) {

__PageMovable without holding page_lock could be raced so need to check
if it's okay to miss some of pages offlining by the race.
When I read description of soft_offline_page, it seems to be okay.
Just wanted double check. :)

>  		/*
>  		 * Try to free it.
>  		 */
> @@ -1609,7 +1610,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  
>  static int __soft_offline_page(struct page *page, int flags)
>  {
> -	int ret;
> +	int ret = -1;
>  	unsigned long pfn = page_to_pfn(page);
>  
>  	/*
> @@ -1619,7 +1620,8 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * so there's no race between soft_offline_page() and memory_failure().
>  	 */
>  	lock_page(page);
> -	wait_on_page_writeback(page);
> +	if (PageLRU(page))
> +		wait_on_page_writeback(page);

I doubt we need to add such limitation(i.e., Only LRU pages could be write-backed).
Do you have some reason to add that code?

>  	if (PageHWPoison(page)) {
>  		unlock_page(page);
>  		put_hwpoison_page(page);
> @@ -1630,7 +1632,8 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * Try to invalidate first. This should work for
>  	 * non dirty unmapped page cache pages.
>  	 */
> -	ret = invalidate_inode_page(page);
> +	if (PageLRU(page))
> +		ret = invalidate_inode_page(page);

Ditto.

>  	unlock_page(page);
>  	/*
>  	 * RED-PEN would be better to keep it isolated here, but we
> @@ -1649,7 +1652,10 @@ static int __soft_offline_page(struct page *page, int flags)
>  	 * Try to migrate to a new page instead. migrate.c
>  	 * handles a large number of cases for us.
>  	 */
> -	ret = isolate_lru_page(page);
> +	if (PageLRU(page))
> +		ret = isolate_lru_page(page);
> +	else
> +		ret = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>  	/*
>  	 * Drop page reference which is came from get_any_page()
>  	 * successful isolate_lru_page() already took another one.
> @@ -1657,18 +1663,15 @@ static int __soft_offline_page(struct page *page, int flags)
>  	put_hwpoison_page(page);
>  	if (!ret) {
>  		LIST_HEAD(pagelist);
> -		inc_node_page_state(page, NR_ISOLATED_ANON +
> +		if (PageLRU(page))

isolate_lru_page removes PG_lru so this check will be false. Namely, happens
isolated count mismatch happens.


> +			inc_node_page_state(page, NR_ISOLATED_ANON +
>  					page_is_file_cache(page));
>  		list_add(&page->lru, &pagelist);
>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>  		if (ret) {
> -			if (!list_empty(&pagelist)) {
> -				list_del(&page->lru);
> -				dec_node_page_state(page, NR_ISOLATED_ANON +
> -						page_is_file_cache(page));
> -				putback_lru_page(page);
> -			}
> +			if (!list_empty(&pagelist))
> +				putback_movable_pages(&pagelist);
>  
>  			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  				pfn, ret, page->flags);
> -- 
> 1.9.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC v2] HWPOISON: soft offlining for non-lru movable page
  2017-01-23  5:14 ` Minchan Kim
@ 2017-01-23 12:39   ` Yisheng Xie
  0 siblings, 0 replies; 4+ messages in thread
From: Yisheng Xie @ 2017-01-23 12:39 UTC (permalink / raw)
  To: Minchan Kim, ysxie
  Cc: linux-mm, linux-kernel, n-horiguchi, mhocko, akpm, vbabka,
	guohanjun, qiuxishi

Hi Minchan,
Thanks for reviewing.
On 2017/1/23 13:14, Minchan Kim wrote:
> Hello,
> 
> On Thu, Jan 19, 2017 at 10:59:03PM +0800, ysxie@foxmail.com wrote:
>> From: Yisheng Xie <xieyisheng1@huawei.com>
>>
>> @@ -1527,7 +1527,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
>>  {
>>  	int ret = __get_any_page(page, pfn, flags);
>>  
>> -	if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
>> +	if (ret == 1 && !PageHuge(page) &&
>> +	    !PageLRU(page) && !__PageMovable(page)) {
> 
> __PageMovable without holding page_lock could be raced so need to check
> if it's okay to miss some of pages offlining by the race.
> When I read description of soft_offline_page, it seems to be okay.
> Just wanted double check. :)
Yes, I have thought about whether should add page_lock to avoid race. For it is ok to
miss some of pages caused by race, I do not add page_lock.

> 
>>  		/*
>>  		 * Try to free it.
>>  		 */
>> @@ -1609,7 +1610,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
>>  
>>  static int __soft_offline_page(struct page *page, int flags)
>>  {
>> -	int ret;
>> +	int ret = -1;
>>  	unsigned long pfn = page_to_pfn(page);
>>  
>>  	/*
>> @@ -1619,7 +1620,8 @@ static int __soft_offline_page(struct page *page, int flags)
>>  	 * so there's no race between soft_offline_page() and memory_failure().
>>  	 */
>>  	lock_page(page);
>> -	wait_on_page_writeback(page);
>> +	if (PageLRU(page))
>> +		wait_on_page_writeback(page);
> 
> I doubt we need to add such limitation(i.e., Only LRU pages could be write-backed).
> Do you have some reason to add that code?

I add this check for not quite sure about whether non-lru page will as marked as
PageWriteBack(page). I will delete no need limitation in next version.

> 
>>  	if (PageHWPoison(page)) {
>>  		unlock_page(page);
>>  		put_hwpoison_page(page);
>> @@ -1630,7 +1632,8 @@ static int __soft_offline_page(struct page *page, int flags)
>>  	 * Try to invalidate first. This should work for
>>  	 * non dirty unmapped page cache pages.
>>  	 */
>> -	ret = invalidate_inode_page(page);
>> +	if (PageLRU(page))
>> +		ret = invalidate_inode_page(page);
> 
> Ditto.
> 
>>  	unlock_page(page);
>>  	/*
>>  	 * RED-PEN would be better to keep it isolated here, but we
>> @@ -1649,7 +1652,10 @@ static int __soft_offline_page(struct page *page, int flags)
>>  	 * Try to migrate to a new page instead. migrate.c
>>  	 * handles a large number of cases for us.
>>  	 */
>> -	ret = isolate_lru_page(page);
>> +	if (PageLRU(page))
>> +		ret = isolate_lru_page(page);
>> +	else
>> +		ret = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>>  	/*
>>  	 * Drop page reference which is came from get_any_page()
>>  	 * successful isolate_lru_page() already took another one.
>> @@ -1657,18 +1663,15 @@ static int __soft_offline_page(struct page *page, int flags)
>>  	put_hwpoison_page(page);
>>  	if (!ret) {
>>  		LIST_HEAD(pagelist);
>> -		inc_node_page_state(page, NR_ISOLATED_ANON +
>> +		if (PageLRU(page))
> 
> isolate_lru_page removes PG_lru so this check will be false. Namely, happens
> isolated count mismatch happens.
> 
Really sorry about that. That's my mistake.
I will use !__PageMovable(page) instead in v3.

Thanks
Yisheng Xie.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-23 12:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19 14:59 [RFC v2] HWPOISON: soft offlining for non-lru movable page ysxie
2017-01-23  4:39 ` Naoya Horiguchi
2017-01-23  5:14 ` Minchan Kim
2017-01-23 12:39   ` Yisheng Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).