mm-commits Archive on lore.kernel.org
 help / color / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, gechangwei@live.cn, ghe@suse.com,
	jlbec@evilplan.org, joseph.qi@linux.alibaba.com,
	junxiao.bi@oracle.com, linux-mm@kvack.org, mark@fasheh.com,
	mm-commits@vger.kernel.org, piaojun@huawei.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org,
	wen.gang.wang@oracle.com
Subject: [patch 14/14] ocfs2: initialize ip_next_orphan
Date: Fri, 13 Nov 2020 22:52:23 -0800
Message-ID: <20201114065223.JN1eernhY%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201113225115.b24faebc85f710d5aff55aa7@linux-foundation.org>

From: Wengang Wang <wen.gang.wang@oracle.com>
Subject: ocfs2: initialize ip_next_orphan

Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.

In one node in the cluster, there is the following callback trace:

# cat /proc/21473/stack
[<ffffffffc09a2f06>] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
[<ffffffffc09a4481>] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
[<ffffffffc09b2ce2>] ocfs2_evict_inode+0x152/0x820 [ocfs2]
[<ffffffff8122b36e>] evict+0xae/0x1a0
[<ffffffff8122bd26>] iput+0x1c6/0x230
[<ffffffffc09b60ed>] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
[<ffffffffc0992ae0>] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
[<ffffffffc099a1e9>] ocfs2_dir_foreach+0x29/0x30 [ocfs2]
[<ffffffffc09b7716>] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
[<ffffffffc09b9b4e>] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
[<ffffffff810a1399>] process_one_work+0x169/0x4a0
[<ffffffff810a1bcb>] worker_thread+0x5b/0x560
[<ffffffff810a7a2b>] kthread+0xcb/0xf0
[<ffffffff816f5d21>] ret_from_fork+0x61/0x90
[<ffffffffffffffff>] 0xffffffffffffffff

The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,

2067         /* Skip inodes which are already added to recover list, since dio may
2068          * happen concurrently with unlink/rename */
2069         if (OCFS2_I(iter)->ip_next_orphan) {
2070                 iput(iter);
2071                 return 0;
2072         }
2073

The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().

While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference).  So I don't think the inode is really in
the recover list (no vmcore to confirm).

Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up for
unlinked inodes.  The on disk inode eviction could involve a lot of IOs
which may need long time to finish.  That means this node could hold the
cluster lock for very long time, that can lead to the lock requests (from
other nodes) to the orhpan directory hang for long time.

Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.

This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.

Fix: initialize ip_next_orphan as NULL.

Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/super.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/ocfs2/super.c~ocfs2-initialize-ip_next_orphan
+++ a/fs/ocfs2/super.c
@@ -1713,6 +1713,7 @@ static void ocfs2_inode_init_once(void *
 
 	oi->ip_blkno = 0ULL;
 	oi->ip_clusters = 0;
+	oi->ip_next_orphan = NULL;
 
 	ocfs2_resv_init_once(&oi->ip_la_data_resv);
 
_

      parent reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14  6:51 incoming Andrew Morton
2020-11-14  6:51 ` [patch 01/14] mm/compaction: count pages and stop correctly during page isolation Andrew Morton
2020-11-14  6:51 ` [patch 02/14] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Andrew Morton
2020-11-14  6:51 ` [patch 03/14] mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit Andrew Morton
2020-11-14 21:39   ` Linus Torvalds
2020-11-14 22:14     ` Matthew Wilcox
2020-11-14  6:51 ` [patch 04/14] mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov Andrew Morton
2020-11-14  6:51 ` [patch 05/14] mm/slub: fix panic in slab_alloc_node() Andrew Morton
2020-11-14  6:51 ` [patch 06/14] mm/gup: use unpin_user_pages() in __gup_longterm_locked() Andrew Morton
2020-11-14  6:51 ` [patch 07/14] compiler.h: fix barrier_data() on clang Andrew Morton
2020-11-14  6:52 ` [patch 08/14] Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" Andrew Morton
2020-11-14  6:52 ` [patch 09/14] reboot: fix overflow parsing reboot cpu number Andrew Morton
2020-11-14  6:52 ` [patch 10/14] kernel/watchdog: fix watchdog_allowed_mask not used warning Andrew Morton
2020-11-14  6:52 ` [patch 11/14] mm: memcontrol: fix missing wakeup polling thread Andrew Morton
2020-11-14  6:52 ` [patch 12/14] hugetlbfs: fix anon huge page migration race Andrew Morton
2020-11-14  6:52 ` [patch 13/14] panic: don't dump stack twice on warn Andrew Morton
2020-11-14  6:52 ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201114065223.JN1eernhY%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=gechangwei@live.cn \
    --cc=ghe@suse.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark@fasheh.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=piaojun@huawei.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

mm-commits Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/mm-commits/0 mm-commits/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 mm-commits mm-commits/ https://lore.kernel.org/mm-commits \
		mm-commits@vger.kernel.org
	public-inbox-index mm-commits

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.mm-commits


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git