All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pulavarty, Badari" <badari.pulavarty@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "david@fromorbit.com" <david@fromorbit.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bfoster@redhat.com" <bfoster@redhat.com>,
	"huangzhaoyang@gmail.com" <huangzhaoyang@gmail.com>,
	"ke.wang@unisoc.com" <ke.wang@unisoc.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
	"Shutemov, Kirill" <kirill.shutemov@intel.com>,
	"Tang, Feng" <feng.tang@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"Zanussi, Tom" <tom.zanussi@intel.com>
Subject: RE: [RFC PATCH] mm: move xa forward when run across zombie page
Date: Mon, 31 Oct 2022 19:25:05 +0000	[thread overview]
Message-ID: <DM6PR11MB3978F27D63F743CDA577645D9C379@DM6PR11MB3978.namprd11.prod.outlook.com> (raw)
In-Reply-To: <Y1Md0hzhkqzik/WA@casper.infradead.org>

Hi,

Just want to give an update on the issue, hoping to get more thoughts/suggestions.

I have been adding lot of debug to try to root cause the issue.
When I enabled CONFIG_VM_DEBUG, I run into following assertion failure:

[ 1810.282055] entry: 0 folio: ffe6dfc30e428040 
[ 1810.282059] page dumped because: VM_BUG_ON_PAGE(entry != folio)
[ 1810.282062] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1810.282084] #PF: supervisor read access in kernel mode
[ 1810.282095] #PF: error_code(0x0000) - not-present page
[ 1810.282104] PGD 0
[ 1810.282110] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1810.282119] CPU: 86 PID: 15043 Comm: kdamond.1 Kdump: loaded Tainted: G S          E      6.1.0-rc1+ #32
[ 1810.282145] RIP: 0010:dump_page+0x25/0x340
[ 1810.282156] Code: 0b cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55 49 89 f5 41 54 53 48 83 ec 20 48 85 f6 0f 85 7d 72 ab 00 <49> 8b 07 48 83 f8 ff 0f 84 82 71 ab 00 49 8b 5f 08 f6 c3 01 0f 85
[ 1810.282185] RSP: 0018:ff3fae02170637b8 EFLAGS: 00010046
[ 1810.282193] RAX: 0000000000000033 RBX: ffe6dfc30e428040 RCX: 0000000000000002
[ 1810.282204] RDX: 0000000000000000 RSI: ffffffffb85ad649 RDI: 00000000ffffffff
[ 1810.282215] RBP: ff3fae0217063800 R08: 0000000000000000 R09: c0000000fffeffff
[ 1810.282225] R10: 0000000000000001 R11: ff3fae0217063620 R12: 0000000000000001
[ 1810.282234] R13: ffffffffb85c87e0 R14: 0000000000000000 R15: 0000000000000000
[ 1810.282244] FS:  0000000000000000(0000) GS:ff25c2ea7e780000(0000) knlGS:0000000000000000
[ 1810.282255] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1810.282264] CR2: 0000000000000000 CR3: 000000552f40a006 CR4: 0000000000771ee0
[ 1810.282274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1810.282284] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1810.282293] PKRU: 55555554
[ 1810.282299] Call Trace:
[ 1810.282304]  <TASK>
[ 1810.282310]  __delete_from_swap_cache.cold.20+0x33/0x35
[ 1810.282321]  delete_from_swap_cache+0x50/0xa0
[ 1810.282330]  folio_free_swap+0xab/0xe0
[ 1810.282339]  free_swap_cache+0x8a/0xa0
[ 1810.282346]  free_page_and_swap_cache+0x12/0xb0
[ 1810.282356]  split_huge_page_to_list+0xf13/0x10d0     <<<<<<<<<<<<<<<<<<
[ 1810.282365]  madvise_cold_or_pageout_pte_range+0x528/0x1390
[ 1810.282374]  walk_pgd_range+0x5fe/0xa10
[ 1810.282383]  __walk_page_range+0x184/0x190
[ 1810.282391]  walk_page_range+0x120/0x190
[ 1810.282398]  madvise_pageout+0x10b/0x2a0
[ 1810.282406]  ? set_track_prepare+0x48/0x70
[ 1810.282415]  madvise_vma_behavior+0x2f2/0xb10
[ 1810.282422]  ? find_vma_prev+0x72/0xc0
[ 1810.282431]  do_madvise+0x21b/0x440
[ 1810.282439]  damon_va_apply_scheme+0x76/0xa0
[ 1810.282448]  kdamond_fn+0xbe9/0xe10
[ 1810.282456]  ? damon_split_region_at+0x70/0x70
[ 1810.282675]  kthread+0xfc/0x130
[ 1810.282837]  ? kthread_complete_and_exit+0x20/0x20

Since I am not using hugepages explicitly..  I recompiled the kernel with 

CONFIG_TRANSPARENT_HUGEPAGE=n

And my problem went away (including the original issue).

Thanks,
Badari

-----Original Message-----
From: Matthew Wilcox <willy@infradead.org> 
Sent: Friday, October 21, 2022 3:32 PM
To: Pulavarty, Badari <badari.pulavarty@intel.com>
Cc: david@fromorbit.com; akpm@linux-foundation.org; bfoster@redhat.com; huangzhaoyang@gmail.com; ke.wang@unisoc.com; linux-fsdevel@vger.kernel.org; inux-kernel@vger.kernel.org; linux-mm@kvack.org; zhaoyang.huang@unisoc.com; Shutemov, Kirill <kirill.shutemov@intel.com>; Tang, Feng <feng.tang@intel.com>; Huang, Ying <ying.huang@intel.com>; Yin, Fengwei <fengwei.yin@intel.com>; Hansen, Dave <dave.hansen@intel.com>; Zanussi, Tom <tom.zanussi@intel.com>
Subject: Re: [RFC PATCH] mm: move xa forward when run across zombie page

On Fri, Oct 21, 2022 at 09:37:36PM +0000, Pulavarty, Badari wrote:
> I have been tracking similar issue(s) with soft lockup or panics on my system consistently with my workload.
> Tried multiple kernel versions. Issue seem to happen consistently on 
> 6.1-rc1 (while it seem to happen on 5.17, 5.19, 6.0.X)
> 
> PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
> 
>     RIP: 0000000000000001  RSP: ff3d8e7f0d9978ea  RFLAGS: ff3d8e7f0d9978e8
>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
>     RDX: 000000006b9c66f1  RSI: ff506ca15ff33c20  RDI: 0000000000000000
>     RBP: ffffffff84bc64cc   R8: ff3d8e412cabdff0   R9: ffffffff84c00e8b
>     R10: ff506ca15ff33b69  R11: 0000000000000000  R12: ff506ca15ff33b58
>     R13: ffffffff84bc79a3  R14: ff506ca15ff33b38  R15: 0000000000000000
>     ORIG_RAX: ff506ca15ff33a80  CS: ff506ca15ff33c78  SS: 0000
> #9 [ff506ca15ff33c18] xas_load at ffffffff84b49a7f
> #10 [ff506ca15ff33c28] __filemap_get_folio at ffffffff840985da
> #11 [ff506ca15ff33ce8] swap_cache_get_folio at ffffffff841119db

Oh, this is interesting.  It's the swapper address_space.
I bet that 0xffffffff85044560 (the value of a_ops) is the address of swap_ops in your kernel?

I don't know if it will help, but it's an interesting data point.

> Looking at the crash dump, mapping->host became NULL. Not sure what exactly is happening.

That's always true for the swapper_spaces, AIUI.

>   a_ops = 0xffffffff85044560,

  parent reply	other threads:[~2022-10-31 19:25 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 21:37 [RFC PATCH] mm: move xa forward when run across zombie page Pulavarty, Badari
2022-10-21 22:31 ` Matthew Wilcox
2022-10-21 22:40   ` Pulavarty, Badari
2022-10-31 19:25   ` Pulavarty, Badari [this message]
2022-10-31 19:39     ` Hugh Dickins
2022-10-31 21:33       ` Pulavarty, Badari
  -- strict thread matches above, loose matches on Subject: below --
2022-10-14  5:30 zhaoyang.huang
2022-10-14 12:11 ` Matthew Wilcox
2022-10-17  5:34   ` Zhaoyang Huang
2022-10-17  6:58     ` Zhaoyang Huang
2022-10-17 15:55     ` Matthew Wilcox
2022-10-18  2:52       ` Zhaoyang Huang
2022-10-18  3:09         ` Matthew Wilcox
2022-10-18 22:30           ` Dave Chinner
2022-10-19  1:16             ` Dave Chinner
2022-10-19  4:47               ` Dave Chinner
2022-10-19  5:48                 ` Zhaoyang Huang
2022-10-19 13:06                   ` Matthew Wilcox
2022-10-20  1:27                     ` Zhaoyang Huang
2022-10-26 19:49                   ` Matthew Wilcox
2022-10-27  1:57                     ` Zhaoyang Huang
2022-10-19 11:49             ` Brian Foster
2022-10-20  2:04               ` Dave Chinner
2022-10-20  3:12                 ` Zhaoyang Huang
2022-10-19 15:23             ` Matthew Wilcox
2022-10-19 22:04               ` Dave Chinner
2022-10-19 22:46                 ` Dave Chinner
2022-10-19 23:42                   ` Dave Chinner
2022-10-20 21:52                 ` Matthew Wilcox
2022-10-26  8:38                   ` Zhaoyang Huang
2022-10-26 14:38                     ` Matthew Wilcox
2022-10-26 16:01                   ` Matthew Wilcox
2022-10-28  4:05                     ` Dave Chinner
2022-11-01  7:17                   ` Dave Chinner
2024-04-11  7:04                     ` Zhaoyang Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR11MB3978F27D63F743CDA577645D9C379@DM6PR11MB3978.namprd11.prod.outlook.com \
    --to=badari.pulavarty@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=david@fromorbit.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=huangzhaoyang@gmail.com \
    --cc=ke.wang@unisoc.com \
    --cc=kirill.shutemov@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tom.zanussi@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.