All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, dchinner@redhat.com,
	djwong@kernel.org, guro@fb.com, jack@suse.cz,
	jencce.kernel@gmail.com, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, torvalds@linux-foundation.org,
	willy@infradead.org
Subject: [patch 11/15] writeback, cgroup: do not reparent dax inodes
Date: Fri, 23 Jul 2021 15:50:32 -0700	[thread overview]
Message-ID: <20210723225032.m8CFwVQJU%akpm@linux-foundation.org> (raw)
In-Reply-To: <20210723154926.c6cda0f262b1990b950a5886@linux-foundation.org>

From: Roman Gushchin <guro@fb.com>
Subject: writeback, cgroup: do not reparent dax inodes

The inode switching code is not suited for dax inodes.  An attempt to
switch a dax inode to a parent writeback structure (as a part of a
writeback cleanup procedure) results in a panic like this:

  [  987.071651] run fstests generic/270 at 2021-07-15 05:54:02
  [  988.704940] XFS (pmem0p2): EXPERIMENTAL big timestamp feature in
  use.  Use at your own risk!
  [  988.746847] XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use
  at your own risk
  [  988.786070] XFS (pmem0p2): EXPERIMENTAL inode btree counters
  feature in use. Use at your own risk!
  [  988.828639] XFS (pmem0p2): Mounting V5 Filesystem
  [  988.854019] XFS (pmem0p2): Ending clean mount
  [  988.874550] XFS (pmem0p2): Quotacheck needed: Please wait.
  [  988.900618] XFS (pmem0p2): Quotacheck: Done.
  [  989.090783] XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
  [  989.092751] XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
  [  989.092962] XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
  [ 1010.105586] BUG: unable to handle page fault for address: 0000000005b0f669
  [ 1010.141817] #PF: supervisor read access in kernel mode
  [ 1010.167824] #PF: error_code(0x0000) - not-present page
  [ 1010.191499] PGD 0 P4D 0
  [ 1010.203346] Oops: 0000 [#1] SMP PTI
  [ 1010.219596] CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted
  5.14.0-rc1-master-8096acd7442e+ #8
  [ 1010.260441] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
  Gen9, BIOS P89 09/13/2016
  [ 1010.297792] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
  [ 1010.324832] RIP: 0010:inode_do_switch_wbs+0xaf/0x470
  [ 1010.347261] Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48
  c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff
  ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08
  0f 85
  [ 1010.434307] RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002
  [ 1010.457795] RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0
  [ 1010.489922] RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff
  [ 1010.522085] RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228
  [ 1010.554234] R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130
  [ 1010.586414] R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0
  [ 1010.619394] FS:  0000000000000000(0000) GS:ffff89ee5fb40000(0000)
  knlGS:0000000000000000
  [ 1010.658874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1010.688085] CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0
  [ 1010.722129] Call Trace:
  [ 1010.733132]  inode_switch_wbs_work_fn+0xb6/0x2a0
  [ 1010.754121]  process_one_work+0x1e6/0x380
  [ 1010.772512]  worker_thread+0x53/0x3d0
  [ 1010.789221]  ? process_one_work+0x380/0x380
  [ 1010.807964]  kthread+0x10f/0x130
  [ 1010.822043]  ? set_kthread_struct+0x40/0x40
  [ 1010.840818]  ret_from_fork+0x22/0x30
  [ 1010.856851] Modules linked in: xt_CHECKSUM xt_MASQUERADE
  xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat
  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables
  nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr
  intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp
  coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt
  irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl
  syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma
  dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr
  intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad
  lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod
  t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3
  ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi
  dm_mirror dm_region_hash dm_log dm_mod
  [ 1011.200864] CR2: 0000000005b0f669
  [ 1011.215700] ---[ end trace ed2105faff8384f3 ]---
  [ 1011.241727] RIP: 0010:inode_do_switch_wbs+0xaf/0x470
  [ 1011.264306] Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48
  c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff
  ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08
  0f 85
  [ 1011.348821] RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002
  [ 1011.372734] RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0
  [ 1011.405826] RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff
  [ 1011.437852] RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228
  [ 1011.469926] R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130
  [ 1011.502179] R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0
  [ 1011.534233] FS:  0000000000000000(0000) GS:ffff89ee5fb40000(0000)
  knlGS:0000000000000000
  [ 1011.571247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1011.597063] CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0
  [ 1011.629160] Kernel panic - not syncing: Fatal exception
  [ 1011.653802] Kernel Offset: 0x15200000 from 0xffffffff81000000
  (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  [ 1011.713723] ---[ end Kernel panic - not syncing: Fatal exception ]---

The crash happens on an attempt to iterate over attached pagecache pages
and check the dirty flag: a dax inode's xarray contains pfn's instead of
generic struct page pointers.

This happens for DAX and not for other kinds of non-page entries in the
inodes because it's a tagged iteration, and shadow/swap entries are never
tagged; only DAX entries get tagged.

Fix the problem by bailing out (with the false return value) of
inode_prepare_sbs_switch() if a dax inode is passed.

[willy@infradead.org: changelog addition]
Link: https://lkml.kernel.org/r/20210719171350.3876830-1-guro@fb.com
Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Murphy Zhou <jencce.kernel@gmail.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/fs-writeback.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/fs/fs-writeback.c~writeback-cgroup-do-not-reparent-dax-inodes
+++ a/fs/fs-writeback.c
@@ -521,6 +521,9 @@ static bool inode_prepare_wbs_switch(str
 	 */
 	smp_mb();
 
+	if (IS_DAX(inode))
+		return false;
+
 	/* while holding I_WB_SWITCH, no one else can update the association */
 	spin_lock(&inode->i_lock);
 	if (!(inode->i_sb->s_flags & SB_ACTIVE) ||
_

  parent reply	other threads:[~2021-07-23 22:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 22:49 incoming Andrew Morton
2021-07-23 22:50 ` [patch 01/15] userfaultfd: do not untag user pointers Andrew Morton
2021-07-23 22:50 ` [patch 02/15] selftest: use mmap instead of posix_memalign to allocate memory Andrew Morton
2021-07-23 22:50 ` [patch 03/15] kfence: defer kfence_test_init to ensure that kunit debugfs is created Andrew Morton
2021-07-23 22:50 ` [patch 04/15] kfence: move the size check to the beginning of __kfence_alloc() Andrew Morton
2021-07-23 22:50 ` [patch 05/15] kfence: skip all GFP_ZONEMASK allocations Andrew Morton
2021-07-23 22:50 ` [patch 06/15] mm: call flush_dcache_page() in memcpy_to_page() and memzero_page() Andrew Morton
2021-07-24  6:59   ` Christoph Hellwig
2021-07-24 16:23     ` Matthew Wilcox
2021-07-23 22:50 ` [patch 07/15] mm: use kmap_local_page in memzero_page Andrew Morton
2021-07-23 22:50 ` [patch 08/15] mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction Andrew Morton
2021-07-23 22:50 ` [patch 09/15] memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions Andrew Morton
2021-07-23 22:50 ` [patch 10/15] writeback, cgroup: remove wb from offline list before releasing refcnt Andrew Morton
2021-07-23 22:50 ` Andrew Morton [this message]
2021-07-23 22:50 ` [patch 12/15] mm/secretmem: wire up ->set_page_dirty Andrew Morton
2021-07-23 22:50 ` [patch 13/15] mm: mmap_lock: fix disabling preemption directly Andrew Morton
2021-07-23 22:50 ` [patch 14/15] mm: fix the deadlock in finish_fault() Andrew Morton
2021-07-23 22:50 ` [patch 15/15] hugetlbfs: fix mount mode command line processing Andrew Morton
2021-07-24  1:41   ` Al Viro
2021-07-26  5:22     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210723225032.m8CFwVQJU%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=guro@fb.com \
    --cc=jack@suse.cz \
    --cc=jencce.kernel@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.