linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Mon, 11 Mar 2024 09:01:16 +0000	[thread overview]
Message-ID: <db9cd982-5ce5-4709-8999-fd81e3d437fd@arm.com> (raw)
In-Reply-To: <Ze4rsB6s3hKK_tyC@casper.infradead.org>

On 10/03/2024 21:52, Matthew Wilcox wrote:
> On Sun, Mar 10, 2024 at 08:46:58PM +0000, Matthew Wilcox wrote:
>> On Sun, Mar 10, 2024 at 07:59:46PM +0000, Ryan Roberts wrote:
>>> I've now been able to repro this without any of my code on top - just mm-unstable and your fix for the the memcg uncharging ordering issue. So we have separate, more difficultt to repro bug. I've discovered CONFIG_DEBUG_LIST so enabled that. I'll try to bisect in the morning, but I suspect it will be slow going.
>>>
>>> [  390.317982] ------------[ cut here ]------------
>>> [  390.318646] list_del corruption. prev->next should be fffffc00152a9090, but was fffffc002798a490. (prev=fffffc002798a490)
>>
>> Interesting.  So prev->next is pointing to prev, ie prev is an empty
>> list, but it should be pointing to this entry ... this is feeling like
>> another missing lock.
> 
> Let's check that we're not inverting the order of memcg_uncharge and
> removing a folio from the deferred list (build tested only, but only
> one line of this will be new to you):

OK found it - its another instance of the same issue...

Applied your below patch (resulting code: mm-unstable (d7182786dd0a) + yesterday's fix ("mm: Remove folio from deferred split list before uncharging it") + below patch).

The new check triggered:

[  153.459843] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffffd5fc0 pfn:0x4da690
[  153.460667] head: order:4 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  153.461218] memcg:ffff0000c7fa1000
[  153.461519] anon flags: 0xbfffc00000a0048(uptodate|head|mappedtodisk|swapbacked|node=0|zone=2|lastcpupid=0xffff)
[  153.462678] page_type: 0xffffffff()
[  153.463294] raw: 0bfffc00000a0048 dead000000000100 dead000000000122 ffff0000fbfa29c1
[  153.470267] raw: 0000000ffffd5fc0 0000000000000000 00000000ffffffff ffff0000c7fa1000
[  153.471395] head: 0bfffc00000a0048 dead000000000100 dead000000000122 ffff0000fbfa29c1
[  153.472494] head: 0000000ffffd5fc0 0000000000000000 00000000ffffffff ffff0000c7fa1000
[  153.473357] head: 0bfffc0000020204 fffffc001269a401 dead000000000122 00000000ffffffff
[  153.481663] head: 0000001000000000 0000000000000000 00000000ffffffff 0000000000000000
[  153.482438] page dumped because: VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !list_empty(&folio->_deferred_list))
[  153.483464] ------------[ cut here ]------------
[  153.484000] kernel BUG at mm/memcontrol.c:7486!
[  153.484484] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[  153.485249] Modules linked in:
[  153.485621] CPU: 33 PID: 2146 Comm: usemem Not tainted 6.8.0-rc5-00463-gb5100df1d6f3 #5
[  153.486552] Hardware name: linux,dummy-virt (DT)
[  153.487300] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  153.488363] pc : uncharge_folio+0x1d0/0x2c8
[  153.488922] lr : uncharge_folio+0x1d0/0x2c8
[  153.489384] sp : ffff80008ea0b6d0
[  153.489747] x29: ffff80008ea0b6d0 x28: 0000000000000000 x27: 00000000fffffffe
[  153.490626] x26: dead000000000100 x25: dead000000000122 x24: 0000000000000020
[  153.491435] x23: ffff80008ea0b918 x22: ffff0000c7f88850 x21: ffff0000c7f88800
[  153.492255] x20: ffff80008ea0b730 x19: fffffc001269a400 x18: 0000000000000006
[  153.493087] x17: 212026262031203e x16: 20296f696c6f6628 x15: 0720072007200720
[  153.494175] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[  153.495186] x11: 0720072007200720 x10: ffff0013f5e7b7c0 x9 : ffff800080128e84
[  153.496142] x8 : 00000000ffffbfff x7 : ffff0013f5e7b7c0 x6 : 80000000ffffc000
[  153.497050] x5 : ffff0013a5987d08 x4 : 0000000000000000 x3 : 0000000000000000
[  153.498041] x2 : 0000000000000000 x1 : ffff0000cbc2c500 x0 : 0000000000000063
[  153.499149] Call trace:
[  153.499470]  uncharge_folio+0x1d0/0x2c8
[  153.500045]  __mem_cgroup_uncharge_folios+0x5c/0xb0
[  153.500795]  move_folios_to_lru+0x5bc/0x5e0
[  153.501275]  shrink_lruvec+0x5f8/0xb30
[  153.501833]  shrink_node+0x4d8/0x8b0
[  153.502227]  do_try_to_free_pages+0xe0/0x5a8
[  153.502835]  try_to_free_mem_cgroup_pages+0x128/0x2d0
[  153.503708]  try_charge_memcg+0x114/0x658
[  153.504344]  __mem_cgroup_charge+0x6c/0xd0
[  153.505007]  __handle_mm_fault+0x42c/0x1640
[  153.505684]  handle_mm_fault+0x70/0x290
[  153.506136]  do_page_fault+0xfc/0x4d8
[  153.506659]  do_translation_fault+0xa4/0xc0
[  153.507140]  do_mem_abort+0x4c/0xa8
[  153.507716]  el0_da+0x2c/0x78
[  153.508169]  el0t_64_sync_handler+0xb8/0x130
[  153.508810]  el0t_64_sync+0x190/0x198
[  153.509410] Code: 910c8021 a9025bf5 a90363f7 97fd7bef (d4210000) 
[  153.510309] ---[ end trace 0000000000000000 ]---
[  153.510974] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  153.511727] SMP: stopping secondary CPUs
[  153.513519] Kernel Offset: disabled
[  153.514090] CPU features: 0x0,00000020,7002014a,2140720b
[  153.514960] Memory Limit: none
[  153.515457] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---


move_folios_to_lru+0x5bc/0x5e0 is:

static unsigned int move_folios_to_lru(struct lruvec *lruvec,
		struct list_head *list)
{
	...

	if (free_folios.nr) {
		spin_unlock_irq(&lruvec->lru_lock);
		mem_cgroup_uncharge_folios(&free_folios);  <<<<<<<<<<< HERE
		free_unref_folios(&free_folios);
		spin_lock_irq(&lruvec->lru_lock);
	}

	return nr_moved;
}

And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this.

A quick grep over the entire series has a lot of hits for "uncharge". I wonder if we need a full audit of that series for other places that could potentially be doing the same thing?


> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bb57b3d0c8cd..61fd1a4b424d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
>  {
>  	if (!folio || !folio_test_large(folio))
>  		return;
> -	if (folio_order(folio) > 1)
> -		INIT_LIST_HEAD(&folio->_deferred_list);
>  	folio_set_large_rmappable(folio);
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 79d0848c10a5..690c68c18c23 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
>  	atomic_set(&folio->_entire_mapcount, -1);
>  	atomic_set(&folio->_nr_pages_mapped, 0);
>  	atomic_set(&folio->_pincount, 0);
> +	if (order > 1)
> +		INIT_LIST_HEAD(&folio->_deferred_list);
>  }
>  
>  static inline void prep_compound_tail(struct page *head, int tail_idx)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 138bcfa18234..e2334c4ee550 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
>  	struct obj_cgroup *objcg;
>  
>  	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> +	VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
> +			!list_empty(&folio->_deferred_list), folio);
>  
>  	/*
>  	 * Nobody should be changing or seriously looking at
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bdff5c0a7c76..1c1925b92934 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
>  		}
>  		break;
>  	case 2:
> -		/*
> -		 * the second tail page: ->mapping is
> -		 * deferred_list.next -- ignore value.
> -		 */
> +		/* the second tail page: deferred_list overlaps ->mapping */
> +		if (unlikely(!list_empty(&folio->_deferred_list))) {
> +			bad_page(page, "on deferred list");
> +			goto out;
> +		}
>  		break;
>  	default:
>  		if (page->mapping != TAIL_MAPPING) {



  reply	other threads:[~2024-03-11  9:01 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts [this message]
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db9cd982-5ce5-4709-8999-fd81e3d437fd@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).