Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Qian Cai <cai@lca.pw>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	John Hubbard <jhubbard@nvidia.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	William Kucharski <william.kucharski@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v2 2/6] mm: Dump compound page information on a second line
Date: Tue, 4 Aug 2020 17:14:14 +0100
Message-ID: <20200804161414.GG23808@casper.infradead.org> (raw)
In-Reply-To: <20200804153742.GA33283@lca.pw>

On Tue, Aug 04, 2020 at 11:37:43AM -0400, Qian Cai wrote:
> On Thu, Jul 09, 2020 at 09:21:13PM +0100, Matthew Wilcox (Oracle) wrote:
> > Simplify both the implementation and the output by splitting all the
> > compound page information onto a second line.
> > 
> > Reported-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> dump_page() starts to crash occasionally. The offset below points to this
> patch. Any thought?

I don't think this patch is the cause.  It just moves around code
that is implicated by the chain below.  You're actually getting _more_
useful output with this patch than you were before, so there's no call
to abandon this patch.  But maybe we can make dump_page() more robust.

> [ 2999.340055][ T5516] page:0000000084997a23 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888bf243bf40 pfn:0xbf2438
> [ 2999.398824][ T5516] page:0000000084997a23 refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888bf243ad40 pfn:0xbf2438

This is fun.  We've got a recursive call to dump_page(), so we dump the
same page twice.  First, we find its head:

        struct page *head = compound_head(page);

(and I believe it is a tail page at this point)

... then the page is split, perhaps?  Anyway, by the time we call this
line, the subpage has a refcount of 1:

        pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
                        page, page_ref_count(head), mapcount, mapping,
                        page_to_pgoff(page), page_to_pfn(page));

by this point, head no longer has PageHead set, and we end up calling
compound_mapcount() here:

                        pr_warn("head:%p order:%u compound_mapcount:%d\n",
                                        head, compound_order(head),
                                        compound_mapcount(head));

but compound_mapcount() checks that the page is compound and it's not,
so we call __dump_page() again, and print the page _again_, this time
with a refcount of 0 (because it's been truncated?).

I don't really know what to do about this.  Should dump_page() try to
take a reference on the page to prevent this from happening?  Or open-
code compound_mapcount() to prevent a crash, even though it will
print incorrect data due to this race?

> [ 2999.455037][ T5516] flags: 0x1bfffc000000000()
> [ 2999.476870][ T5516] raw: 01bfffc000000000 dead000000000100 dead000000000122 0000000000000000
> [ 2999.518459][ T5516] raw: ffff888bf243ad40 00000000002a0000 00000000ffffffff 0000000000000000
> [ 2999.560177][ T5516] page dumped because: VM_BUG_ON_PAGE(!PageCompound(page))
> [ 2999.595087][ T5516] ------------[ cut here ]------------
> [ 2999.625075][ T5516] kernel BUG at include/linux/mm.h:786!
> [ 2999.652433][ T5516] invalid opcode: 0000 [#1] SMP KASAN PTI
> [ 2999.679523][ T5516] CPU: 35 PID: 5516 Comm: test.sh Not tainted 5.8.0-next-20200804 #2
> [ 2999.718652][ T5516] Hardware name: HP ProLiant ML350 Gen9, BIOS P92 07/11/2014
> [ 2999.755698][ T5516] RIP: 0010:__dump_page+0xe8b/0x12f0
> compound_mapcount at include/linux/mm.h:786
> (inlined by) compound_mapcount at include/linux/mm.h:784
> (inlined by) __dump_page at mm/debug.c:108
> [ 2999.784590][ T5516] Code: 06 49 01 c1 e9 72 f4 ff ff 48 89 df e8 5e 12 16 00 e9 09 fe ff ff 48 c7 c6 80 41 cc a8 48 89 df e8 7a f1 ff ff 0f 1f 44 00 00 <0f> 0b 48 c7 c7 80 25 cb ad e8 33 f5 71 00 48 c7 c6 80 41 cc a8 48
> [ 2999.890974][ T5516] RSP: 0018:ffffc9000a3c78d0 EFLAGS: 00010282
> [ 2999.920196][ T5516] RAX: 0000000000000000 RBX: ffffea002fc90e00 RCX: ffffffffa8095282
> [ 2999.958834][ T5516] RDX: 1ffffd4005f921c7 RSI: 0000000000000000 RDI: ffffea002fc90e38
> [ 2999.997374][ T5516] RBP: 1ffff92001478f22 R08: ffffed118cf0de7a R09: ffffed118cf0de7a
> [ 3000.035972][ T5516] R10: ffff888c6786f3cf R11: ffffed118cf0de79 R12: ffffea002fc90e08
> [ 3000.074836][ T5516] R13: ffffea002fc90e00 R14: ffffea002fc90e08 R15: ffffea002fc90e00
> [ 3000.113842][ T5516] FS:  00007fae1ff5e740(0000) GS:ffff888c67840000(0000) knlGS:0000000000000000
> [ 3000.156525][ T5516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3000.189099][ T5516] CR2: 00005578d493a1b8 CR3: 0000000610836005 CR4: 00000000001706e0
> [ 3000.227643][ T5516] Call Trace:
> [ 3000.243118][ T5516]  ? page_init_poison+0x40/0x40
> [ 3000.266242][ T5516]  ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
> [ 3000.294621][ T5516]  ? _raw_spin_unlock_irq+0x1f/0x30
> [ 3000.322775][ T5516]  ? trace_hardirqs_on+0x20/0x1b5
> [ 3000.350565][ T5516]  ? lockdep_hardirqs_on_prepare+0x4e0/0x4e0
> [ 3000.383465][ T5516]  ? start_isolate_page_range+0x301/0x690
> [ 3000.416514][ T5516]  ? drain_all_pages+0x4a/0x7a0
> [ 3000.441585][ T5516]  ? mark_lock+0x147/0x1800
> [ 3000.463184][ T5516]  ? init_pwq+0x340/0x340
> [ 3000.483738][ T5516]  ? find_held_lock+0x33/0x1c0
> [ 3000.506497][ T5516]  ? print_irqtrace_events+0x270/0x270
> [ 3000.532505][ T5516]  ? start_isolate_page_range+0x344/0x690
> [ 3000.560632][ T5516]  ? mark_held_locks+0xb0/0x110
> [ 3000.583751][ T5516]  ? rwlock_bug.part.1+0x90/0x90
> [ 3000.610492][ T5516]  ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
> [ 3000.640146][ T5516]  ? _raw_spin_unlock_irqrestore+0x39/0x40
> [ 3000.671680][ T5516]  dump_page+0x9/0x20
> [ 3000.690807][ T5516]  start_isolate_page_range+0x4ac/0x690
> [ 3000.717529][ T5516]  ? state_show+0xc0/0xc0
> [ 3000.738137][ T5516]  __offline_pages+0x199/0xe30
> [ 3000.761471][ T5516]  ? __mutex_lock+0x4aa/0x1390
> [ 3000.784203][ T5516]  ? rwlock_bug.part.1+0x90/0x90
> [ 3000.807941][ T5516]  ? device_offline+0x75/0x1c0
> [ 3000.833233][ T5516]  ? __add_memory+0x60/0x60
> [ 3000.857530][ T5516]  ? mutex_lock_io_nested+0x1270/0x1270
> [ 3000.886608][ T5516]  ? klist_next+0x253/0x430
> [ 3000.911512][ T5516]  ? device_is_dependent+0x140/0x140
> [ 3000.937751][ T5516]  ? device_for_each_child+0xe1/0x140
> [ 3000.963576][ T5516]  ? device_remove_class_symlinks+0x1a0/0x1a0
> [ 3000.992753][ T5516]  ? state_show+0xc0/0xc0
> [ 3001.013490][ T5516]  memory_subsys_offline+0xcb/0x160
> [ 3001.038346][ T5516]  device_offline+0x131/0x1c0
> [ 3001.060614][ T5516]  ? sysfs_file_ops+0x160/0x160
> [ 3001.083866][ T5516]  state_store+0x78/0xa0
> [ 3001.103882][ T5516]  kernfs_fop_write+0x208/0x410
> [ 3001.127602][ T5516]  ? __sb_start_write+0x1f8/0x2e0
> [ 3001.151596][ T5516]  vfs_write+0x1a3/0x5d0
> [ 3001.171713][ T5516]  ksys_write+0xec/0x1c0
> [ 3001.192625][ T5516]  ? __x64_sys_read+0xa0/0xa0
> [ 3001.215692][ T5516]  ? syscall_enter_from_user_mode+0x20/0x210
> [ 3001.245499][ T5516]  ? trace_hardirqs_on+0x20/0x1b5
> [ 3001.269466][ T5516]  do_syscall_64+0x33/0x40
> [ 3001.290710][ T5516]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3001.319164][ T5516] RIP: 0033:0x7fae1f63fb28
> [ 3001.340214][ T5516] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 35 4b 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [ 3001.448642][ T5516] RSP: 002b:00007ffdad11d068 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3001.492000][ T5516] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fae1f63fb28
> [ 3001.531309][ T5516] RDX: 0000000000000008 RSI: 00005649a73b6240 RDI: 0000000000000001
> [ 3001.571310][ T5516] RBP: 00005649a73b6240 R08: 000000000000000a R09: 00007fae1f6d0c80
> [ 3001.610126][ T5516] R10: 000000000000000a R11: 0000000000000246 R12: 00007fae1f9106c0
> [ 3001.648869][ T5516] R13: 0000000000000008 R14: 00007fae1f90b880 R15: 0000000000000008
> [ 3001.687284][ T5516] Modules linked in: loop kvm_intel kvm irqbypass ip_tables x_tables sd_mod ahci libahci tg3 firmware_class libata libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
> [ 3001.776222][ T5516] ---[ end trace 263095a22deddb65 ]---
> [ 3001.802318][ T5516] RIP: 0010:__dump_page+0xe8b/0x12f0
> [ 3001.828306][ T5516] Code: 06 49 01 c1 e9 72 f4 ff ff 48 89 df e8 5e 12 16 00 e9 09 fe ff ff 48 c7 c6 80 41 cc a8 48 89 df e8 7a f1 ff ff 0f 1f 44 00 00 <0f> 0b 48 c7 c7 80 25 cb ad e8 33 f5 71 00 48 c7 c6 80 41 cc a8 48
> [ 3001.931924][ T5516] RSP: 0018:ffffc9000a3c78d0 EFLAGS: 00010282
> [ 3001.965478][ T5516] RAX: 0000000000000000 RBX: ffffea002fc90e00 RCX: ffffffffa8095282
> [ 3002.005570][ T5516] RDX: 1ffffd4005f921c7 RSI: 0000000000000000 RDI: ffffea002fc90e38
> [ 3002.044022][ T5516] RBP: 1ffff92001478f22 R08: ffffed118cf0de7a R09: ffffed118cf0de7a
> [ 3002.082857][ T5516] R10: ffff888c6786f3cf R11: ffffed118cf0de79 R12: ffffea002fc90e08
> [ 3002.121273][ T5516] R13: ffffea002fc90e00 R14: ffffea002fc90e08 R15: ffffea002fc90e00
> [ 3002.160008][ T5516] FS:  00007fae1ff5e740(0000) GS:ffff888c67840000(0000) knlGS:0000000000000000
> [ 3002.204033][ T5516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3002.235743][ T5516] CR2: 00005578d493a1b8 CR3: 0000000610836005 CR4: 00000000001706e0
> [ 3002.274246][ T5516] Kernel panic - not syncing: Fatal exception
> [ 3002.303579][ T5516] Kernel Offset: 0x26400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 3002.360369][ T5516] ---[ end Kernel panic - not syncing: Fatal exception ]---


  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09 20:21 [PATCH v2 0/6] Improvements for dump_page() Matthew Wilcox (Oracle)
2020-07-09 20:21 ` [PATCH v2 1/6] mm: Handle page->mapping better in dump_page Matthew Wilcox (Oracle)
2020-07-11  1:30   ` John Hubbard
2020-07-11  1:32     ` John Hubbard
2020-07-14 11:06   ` Vlastimil Babka
2020-07-09 20:21 ` [PATCH v2 2/6] mm: Dump compound page information on a second line Matthew Wilcox (Oracle)
2020-07-11  1:35   ` John Hubbard
2020-07-14 12:03   ` Vlastimil Babka
2020-08-04 15:37   ` Qian Cai
2020-08-04 16:14     ` Matthew Wilcox [this message]
2020-08-04 18:39       ` [PATCH] mm, dump_page: do not crash with bad compound_page() John Hubbard
2020-08-04 18:48         ` Matthew Wilcox
2020-08-04 19:17           ` John Hubbard
2020-08-04 19:42             ` Matthew Wilcox
2020-07-09 20:21 ` [PATCH v2 3/6] mm: Print head flags in dump_page Matthew Wilcox (Oracle)
2020-07-11  1:40   ` John Hubbard
2020-07-14 12:06   ` Vlastimil Babka
2020-07-09 20:21 ` [PATCH v2 4/6] mm: Switch dump_page to get_kernel_nofault Matthew Wilcox (Oracle)
2020-07-14 12:11   ` Vlastimil Babka
2020-07-09 20:21 ` [PATCH v2 5/6] mm: Print the inode number in dump_page Matthew Wilcox (Oracle)
2020-07-11  2:04   ` John Hubbard
2020-07-14 12:18   ` Vlastimil Babka
2020-07-09 20:21 ` [PATCH v2 6/6] mm: Print hashed address of struct page Matthew Wilcox (Oracle)
2020-07-11  1:48   ` John Hubbard
2020-07-09 20:54 ` [PATCH v2 0/6] Improvements for dump_page() Mike Rapoport
2020-07-09 20:54 ` William Kucharski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804161414.GG23808@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git