All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Sat, 9 Mar 2024 09:38:42 +0000	[thread overview]
Message-ID: <08c01e9d-beda-435c-93ac-f303a89379df@arm.com> (raw)
In-Reply-To: <95f91a7f-13dd-4212-97ce-cf0b4060828a@arm.com>

[...]
>>>
>>> If so, we need to take the folios off the list (or otherwise mark them)
>>> so that they can't be processed by more than one CPU at a time.  And
>>> that leads me to this patch (yes, folio_prep_large_rmappable() is
>>> now vestigial, but removing it increases the churn a bit much for this
>>> stage of debugging)
>>
>> Looks sensible on first review. I'll do some testing now to see if I can
>> re-triger the non-NULL mapping issue. Will get back to you in the next couple of
>> hours.
>>
[...]
>>> -		if (folio_try_get(folio)) {
>>> -			list_move(&folio->_deferred_list, &list);
>>> -		} else {
>>> +		list_del_init(&folio->_deferred_list);
>>> +		sc->nr_to_scan--;
>>> +		if (!folio_try_get(folio)) {
>>>  			/* We lost race with folio_put() */
>>> -			list_del_init(&folio->_deferred_list);
>>>  			ds_queue->split_queue_len--;
> 
> I think split_queue_len is getting out of sync with the number of items on the
> queue? We only decrement it if we lost the race with folio_put(). But we are
> unconditionally taking folios off the list here. So we are definitely out of
> sync until we take the lock again below. But we only put folios back on the list
> that failed to split. A successful split used to decrement this variable
> (because the folio was on _a_ list). But now it doesn't. So we are always
> mismatched after the first failed split?

Oops, I meant first *sucessful* split.
> 
> I'll fix this up before I start testing.
> 

I've run the full test 5 times, and haven't seen any slow down or RCU stall
warning. But on the 5th time, I saw the non-NULL mapping oops (your new check
did not trigger):

[  944.475632] BUG: Bad page state in process usemem  pfn:252932
[  944.477314] page:00000000ad4feba6 refcount:0 mapcount:0
mapping:000000003a777cd9 index:0x1 pfn:0x252932
[  944.478575] aops:0x0 ino:dead000000000122
[  944.479130] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
[  944.479934] page_type: 0xffffffff()
[  944.480328] raw: 0bfffc0000000000 0000000000000000 fffffc00084a4c90
fffffc00084a4c90
[  944.481734] raw: 0000000000000001 0000000000000000 00000000ffffffff
0000000000000000
[  944.482475] page dumped because: non-NULL mapping
[  944.482938] Modules linked in:
[  944.483238] CPU: 9 PID: 3805 Comm: usemem Not tainted
6.8.0-rc5-00391-g44b0dc848590 #39
[  944.484308] Hardware name: linux,dummy-virt (DT)
[  944.484981] Call trace:
[  944.485315]  dump_backtrace+0x9c/0x100
[  944.485840]  show_stack+0x20/0x38
[  944.486300]  dump_stack_lvl+0x90/0xb0
[  944.486833]  dump_stack+0x18/0x28
[  944.487178]  bad_page+0x88/0x128
[  944.487523]  __rmqueue_pcplist+0x24c/0xa18
[  944.488001]  get_page_from_freelist+0x278/0x1288
[  944.488579]  __alloc_pages+0xec/0x1018
[  944.489097]  alloc_pages_mpol+0x11c/0x278
[  944.489455]  vma_alloc_folio+0x70/0xd0
[  944.489988]  do_huge_pmd_anonymous_page+0xb8/0xda8
[  944.490819]  __handle_mm_fault+0xd00/0x1a28
[  944.491497]  handle_mm_fault+0x7c/0x418
[  944.491978]  do_page_fault+0x100/0x690
[  944.492357]  do_translation_fault+0xb4/0xd0
[  944.492968]  do_mem_abort+0x4c/0xa8
[  944.493353]  el0_da+0x54/0xb8
[  944.493818]  el0t_64_sync_handler+0xe4/0x158
[  944.494581]  el0t_64_sync+0x190/0x198
[  944.495218] Disabling lock debugging due to kernel taint


So what do we know?

 - the above page looks like it was the 3rd page of a large folio
    - words 3 and 4 are the same, meaning they are likely empty _deferred_list
    - pfn alignment is correct for this
 - The _deferred_list for all previously freed large folios was empty
    - but the folio could have been in the new deferred split batch?
 - free_tail_page_prepare() zeroed mapping/_deferred_list during free
 - _deferred_list was subsequently reinitialized to "empty" while on free list

So how about this for a rough hypothesis:


CPU1                                  CPU2
deferred_split_scan
list_del_init
folio_batch_add
                                      folio_put -> free
		                        free_tail_page_prepare
			                  is on deferred list? -> no
split_huge_page_to_list_to_order
  list_empty(folio->_deferred_list)
    -> yes
  list_del_init
			                  mapping = NULL
					    -> (_deferred_list.prev = NULL)
			                put page on free list
    INIT_LIST_HEAD(entry);
      -> "mapping" no longer NULL


But CPU1 is holding a reference, so that could only happen if a reference was
put one too many times. Ugh.




FYI, this is the fixed-up fix patch I have:


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd745bcc97ff..fde016451e1a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
 {
        if (!folio || !folio_test_large(folio))
                return;
-       if (folio_order(folio) > 1)
-               INIT_LIST_HEAD(&folio->_deferred_list);
        folio_set_large_rmappable(folio);
 }

@@ -3312,7 +3310,7 @@ static unsigned long deferred_split_scan(struct shrinker
*shrink,
        struct pglist_data *pgdata = NODE_DATA(sc->nid);
        struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
        unsigned long flags;
-       LIST_HEAD(list);
+       struct folio_batch batch;
        struct folio *folio, *next;
        int split = 0;

@@ -3321,36 +3319,39 @@ static unsigned long deferred_split_scan(struct shrinker
*shrink,
                ds_queue = &sc->memcg->deferred_split_queue;
 #endif

+       folio_batch_init(&batch);
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-       /* Take pin on all head pages to avoid freeing them under us */
+       /* Take ref on all folios to avoid freeing them under us */
        list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
                                                        _deferred_list) {
-               if (folio_try_get(folio)) {
-                       list_move(&folio->_deferred_list, &list);
-               } else {
-                       /* We lost race with folio_put() */
-                       list_del_init(&folio->_deferred_list);
-                       ds_queue->split_queue_len--;
-               }
-               if (!--sc->nr_to_scan)
+               list_del_init(&folio->_deferred_list);
+               ds_queue->split_queue_len--;
+               sc->nr_to_scan--;
+               if (folio_try_get(folio) &&
+                   folio_batch_add(&batch, folio) == 0)
+                       break;
+               if (!sc->nr_to_scan)
                        break;
        }
        spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);

-       list_for_each_entry_safe(folio, next, &list, _deferred_list) {
+       while ((folio = folio_batch_next(&batch)) != NULL) {
                if (!folio_trylock(folio))
-                       goto next;
-               /* split_huge_page() removes page from list on success */
+                       continue;
                if (!split_folio(folio))
                        split++;
                folio_unlock(folio);
-next:
-               folio_put(folio);
        }

        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-       list_splice_tail(&list, &ds_queue->split_queue);
+       while ((folio = folio_batch_next(&batch)) != NULL) {
+               if (!folio_test_large(folio))
+                       continue;
+               list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
+               ds_queue->split_queue_len++;
+       }
        spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
+       folios_put(&batch);

        /*
         * Stop shrinker if we didn't split any page, but the queue is empty.
diff --git a/mm/internal.h b/mm/internal.h
index 1dfdc3bde1b0..14c21d06f233 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -432,6 +432,8 @@ static inline void prep_compound_head(struct page *page,
unsigned int order)
        atomic_set(&folio->_entire_mapcount, -1);
        atomic_set(&folio->_nr_pages_mapped, 0);
        atomic_set(&folio->_pincount, 0);
+       if (order > 1)
+               INIT_LIST_HEAD(&folio->_deferred_list);
 }

 static inline void prep_compound_tail(struct page *head, int tail_idx)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 025ad1a7df7b..fc9c7ca24c4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1007,9 +1007,12 @@ static int free_tail_page_prepare(struct page *head_page,
struct page *page)
                break;
        case 2:
                /*
-                * the second tail page: ->mapping is
-                * deferred_list.next -- ignore value.
+                * the second tail page: ->mapping is deferred_list.next
                 */
+               if (unlikely(!list_empty(&folio->_deferred_list))) {
+                       bad_page(page, "still on deferred list");
+                       goto out;
+               }
                break;
        default:
                if (page->mapping != TAIL_MAPPING) {




  reply	other threads:[~2024-03-09  9:38 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts [this message]
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08c01e9d-beda-435c-93ac-f303a89379df@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.