All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Fri, 8 Mar 2024 18:09:25 +0000	[thread overview]
Message-ID: <c85bf170-c71b-4e92-9ad8-d504ba874f98@arm.com> (raw)
In-Reply-To: <a1917564-d25f-470d-a431-045c41abc72f@arm.com>

On 08/03/2024 17:13, Ryan Roberts wrote:
> + DavidH
> 
> On 08/03/2024 16:03, Matthew Wilcox wrote:
>> On Fri, Mar 08, 2024 at 03:11:35PM +0000, Matthew Wilcox wrote:
>>> Actually, I have a clue!  The third and fourth word have the same value.
>>> That's indicative of an empty list_head.  And if this were LRU, that would
>>> be the second and third word.  And the PFN is congruent to 2 modulo 4.
>>> So this is the second tail page, and that's an empty deferred_list.
>>> So how do we init a list_head after a folio gets freed?
>>
>> We should probably add this patch anyway, because why wouldn't we want
>> to check this.  Maybe it'll catch your offender?
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 025ad1a7df7b..fc9c7ca24c4c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1007,9 +1007,12 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
>>  		break;
>>  	case 2:
>>  		/*
>> -		 * the second tail page: ->mapping is
>> -		 * deferred_list.next -- ignore value.
>> +		 * the second tail page: ->mapping is deferred_list.next
>>  		 */
>> +		if (unlikely(!list_empty(&folio->_deferred_list))) {
>> +			bad_page(page, "still on deferred list");
>> +			goto out;
>> +		}
>>  		break;
>>  	default:
>>  		if (page->mapping != TAIL_MAPPING) {
>>
>> (thinking about it, this may not be right for all tail pages; will Slab
>> stumble over this?  It doesn't seem to stumble on _entire_mapcount, but
>> then we always initialise _entire_mapcount for all compound pages
>> and we don't initialise _deferred_list for slab ... gah)
> 
> Yeah I'm getting a huge number of hits for this check. Most either have kfree() or free_slab() or page_to_skb() (networking code?) in the stack. Ideally need to filter on anon pages only, but presumably we have already ditched that info? Actually looks like the head page hasn't been nuked yet so should be able to test the low bit of mapping... let me have a play.

I think the world is trying to tell me "its Friday night. Stop". I can no longer
reproduce the non-NULL mapping oops that I was able to hit reliably this morning.

I do have this one though:

[  197.332914] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
[  197.334250] Mem abort info:
[  197.334476]   ESR = 0x0000000096000044
[  197.334759]   EC = 0x25: DABT (current EL), IL = 32 bits
[  197.335161]   SET = 0, FnV = 0
[  197.335393]   EA = 0, S1PTW = 0
[  197.335622]   FSC = 0x04: level 0 translation fault
[  197.335985] Data abort info:
[  197.336201]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[  197.336606]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  197.336998]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  197.337424] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000215dc0000
[  197.337927] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  197.338585] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[  197.339058] Modules linked in:
[  197.339296] CPU: 61 PID: 2369 Comm: usemem Not tainted
6.8.0-rc5-00392-g827ce916aa61 #38
[  197.339920] Hardware name: linux,dummy-virt (DT)
[  197.340273] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  197.340790] pc : deferred_split_scan+0x210/0x260
[  197.341154] lr : deferred_split_scan+0x70/0x260
[  197.341792] sp : ffff80008b453770
[  197.342050] x29: ffff80008b453770 x28: 00000000000000f7 x27: ffff80008b453988
[  197.342618] x26: ffff0000c260e540 x25: 0000000000000080 x24: ffff800081f0fe38
[  197.343170] x23: 0000000000000000 x22: 00000000000000f8 x21: ffff80008b453988
[  197.343703] x20: ffff0000ca897bd8 x19: ffff0000ca897b98 x18: 0000000000000000
[  197.344245] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000041557f9
[  197.344783] x14: 00000000041557f8 x13: 00000000041557f9 x12: 0000000000000000
[  197.345343] x11: 0000000000000040 x10: ffff800083cfed48 x9 : ffff80008b4537c0
[  197.345895] x8 : ffff800083cb2d10 x7 : 0000000000001b48 x6 : fffffc001a4a9090
[  197.346458] x5 : 0000000000000000 x4 : fffffc001a4a9090 x3 : fffffc001a4a9000
[  197.346994] x2 : fffffc001a4a9000 x1 : 0000000000000000 x0 : 0000000000000000
[  197.347534] Call trace:
[  197.347729]  deferred_split_scan+0x210/0x260
[  197.348069]  do_shrink_slab+0x184/0x750
[  197.348377]  shrink_slab+0x4d4/0x9c0
[  197.348646]  shrink_node+0x214/0x860
[  197.348923]  do_try_to_free_pages+0xd0/0x560
[  197.349257]  try_to_free_mem_cgroup_pages+0x14c/0x330
[  197.349641]  try_charge_memcg+0x1cc/0x788
[  197.349957]  __mem_cgroup_charge+0x6c/0xd0
[  197.350282]  __handle_mm_fault+0x1000/0x1a28
[  197.350624]  handle_mm_fault+0x7c/0x418
[  197.350933]  do_page_fault+0x100/0x690
[  197.351232]  do_translation_fault+0xb4/0xd0
[  197.351564]  do_mem_abort+0x4c/0xa8
[  197.351841]  el0_da+0x54/0xb8
[  197.352087]  el0t_64_sync_handler+0xe4/0x158
[  197.352432]  el0t_64_sync+0x190/0x198
[  197.352718] Code: 2a0503e6 35fff4a6 a9491446 f90004c5 (f90000a6)
[  197.353204] ---[ end trace 0000000000000000 ]---


deferred_split_scan+0x210/0x260 is the code that I added back:

if (!folio_try_get(folio)) {
	/* We lost race with folio_put() */
	list_del_init(&folio->_deferred_list); <<<< HERE
	ds_queue->split_queue_len--;
	continue;
}

We have the spinlock here so that really should not be happening. So does that
mean the list is being manipulated outside of the lock somewhere? Or maybe its
mapping (actually one of the deferred_list pointers being cleared by the buddy?
I dunno... give up. Will resume on Monday. Have a good weekend.



  reply	other threads:[~2024-03-08 18:09 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts [this message]
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c85bf170-c71b-4e92-9ad8-d504ba874f98@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.