From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Mon, 13 Jan 2020 08:04:19 -0600 Subject: [Cluster-devel] [GFS2 PATCH 0/2 v3] Fix infinite loop in ail1 flush with jdata Message-ID: <20200113140421.867659-1-rpeterso@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi. This patch set fixes a problem in which gfs2 can become deadlocked while doing normal IO on jdata files. The problem is best observed by repeatedly running xfstests generic/269 repeatedly with jdata files. The specifics of the hang are best described in the second patch. The first patch reverts e955537e3262de8e56f070b13817f525f472fa00. The defective patch caused tr->tr_num_revoke to sometimes be a negative number, since you can remove more revokes than you add. However, since tr_num_revoke is declared unsigned, it triggered this assert in gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <= tr->tr_blocks) && (tr->tr_num_revoke <= tr->tr_revokes))) The management of revokes is not very good since we moved them from a private list to a global list hung off the superblock pointer, sdp. So we will probably want to revisit this and rework how revokes are handled. In the meantime, it is safest to just revert the patch until we can fix it properly. The second patch fixes an infinite loop deadlock while flushing the ail1 list for jdata pages. The patch comments describe the problem and circumstances fairly well. Bob Peterson (2): Revert "gfs2: eliminate tr_num_revoke_rm" gfs2: keep a redirty list for jdata pages that are PageChecked in ail1 fs/gfs2/incore.h | 2 ++ fs/gfs2/log.c | 30 +++++++++++++++++++++++++++++- fs/gfs2/trans.c | 7 ++++--- 3 files changed, 35 insertions(+), 4 deletions(-) -- 2.24.1