From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1C3FC433FE for ; Sun, 6 Dec 2020 06:41:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92A2522DFB for ; Sun, 6 Dec 2020 06:41:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92A2522DFB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 229176B0068; Sun, 6 Dec 2020 01:41:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2886B006C; Sun, 6 Dec 2020 01:41:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 073766B006E; Sun, 6 Dec 2020 01:41:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id DB4716B0068 for ; Sun, 6 Dec 2020 01:41:56 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9D223180AD807 for ; Sun, 6 Dec 2020 06:41:56 +0000 (UTC) X-FDA: 77561912232.13.coast50_390e197273d3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 7F52918140B60 for ; Sun, 6 Dec 2020 06:41:56 +0000 (UTC) X-HE-Tag: coast50_390e197273d3 X-Filterd-Recvd-Size: 12411 Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Sun, 6 Dec 2020 06:41:55 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id o25so11625089oie.5 for ; Sat, 05 Dec 2020 22:41:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Yf7zPQOJGOGST1kr7aRDb+hEVvmcjR4NTe2pf0HQRUE=; b=QePB48uCarK3ddowCiXI+o6c6RAi6TQftuyLc76APBu0NfSAmdrsEXIeFLp6tdrJdO pB+m9RjVOLxQ5eznWFcPJaKeZlWFvlU6VNkRADlR8IKnNFX7tEeGrF7DFl5U+8ojK+GS C1xSa2YZsYwi9hpNuVVorGcMNoogrQeJA7D/aYlJ8uwZ91tX431R4VpHcYs7ZqABFOTp tzazXtxOawiufJSNwe4QvFXCp6KQLFoDjE+lvB/6yw2rB8g8234wvFa1IIjn00Vh365q ipwnCMdlm1hHAKQ+dkb35RRq30y7DGJKjlugmAxf8iG13WDvRJ4oFXrG/dTBZ+KxTBhE Aqzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Yf7zPQOJGOGST1kr7aRDb+hEVvmcjR4NTe2pf0HQRUE=; b=T+C7Jgv0WLjaoa1i7s5nEqQgQjwGGn9DjCkqyOHsUhncCZLRLzklzbqxvKQ0x3oCcW ECYLgCMlu/CleDqv2tqEmoIc324I0fuH+/pDwCqwGViSvfXJKejszE+lsYxfDfQjkxEv UGGN63lmFI+do5QD4BPPiHkuzOZOCECqYusmKbA0seemT35N5wyCyILAZLs2RWX2pTfe r4Ol0gxBPeIBBqhG6EK5w9H83qcojVzoEyKLvbq8eE4ruxia4sY1xQ+Cmquesxgkh3UT JD421ZlzX2/V8F5CEBnE16MSScP1YpsFZPTmhgdAHIG68focXpfYd886sVQq+N4r+hjP QIuA== X-Gm-Message-State: AOAM533kULKgR9WIQnadc/KMnyYTV5K60AM0J7RWw9vH+Sujz1M9Qxx9 A5jbHI4OJJlPRkiyfob+oZw= X-Google-Smtp-Source: ABdhPJwHxlLgoXfodNhX/Rsnt84ieUkqXh0UaPXjPCSJQav7RfU1+W5SjuDerhP+7UyY/90pR8R6zg== X-Received: by 2002:aca:eccb:: with SMTP id k194mr4957090oih.112.1607236915440; Sat, 05 Dec 2020 22:41:55 -0800 (PST) Received: from localhost.localdomain ([122.225.203.131]) by smtp.gmail.com with ESMTPSA id y18sm1817553ooj.20.2020.12.05.22.41.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 05 Dec 2020 22:41:54 -0800 (PST) From: Yafang Shao To: darrick.wong@oracle.com, willy@infradead.org, hch@infradead.org, david@fromorbit.com, mhocko@kernel.org, 000akpm@linux-foundation.org, dhowells@redhat.com, jlayton@redhat.com Cc: linux-fsdevel@vger.kernel.org, linux-cachefs@redhat.com, linux-xfs@vger.kernel.org, linux-mm@kvack.org, Yafang Shao , Christoph Hellwig Subject: [PATCH v9 2/2] xfs: avoid transaction reservation recursion Date: Sun, 6 Dec 2020 14:40:46 +0800 Message-Id: <20201206064046.2921-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20201206064046.2921-1-laoar.shao@gmail.com> References: <20201206064046.2921-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: PF_FSTRANS which is used to avoid transaction reservation recursion, is dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which means to avoid filesystem reclaim recursion. That change is subtle. Let's take the exmple of the check of WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to PF_MEMALLOC_NOFS is not proper. Below comment is quoted from Dave, > It wasn't for memory allocation recursion protection in XFS - it was fo= r > transaction reservation recursion protection by something trying to flu= sh > data pages while holding a transaction reservation. Doing > this could deadlock the journal because the existing reservation > could prevent the nested reservation for being able to reserve space > in the journal and that is a self-deadlock vector. > IOWs, this check is not protecting against memory reclaim recursion > bugs at all (that's the previous check [1]). This check is > protecting against the filesystem calling writepages directly from a > context where it can self-deadlock. > So what we are seeing here is that the PF_FSTRANS -> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > about what type of error this check was protecting against. As a result, we should reintroduce PF_FSTRANS. As current->journal_info isn't used in XFS, we can reuse it to indicate whehter the task is in fstrans or not, Per Willy. To achieve that, some new helpers are introduc= e in this patch, per Dave: - xfs_trans_context_set() Used in xfs_trans_alloc() - xfs_trans_context_clear() Used in xfs_trans_commit() and xfs_trans_cancel() - xfs_trans_context_active() To check whehter current is in fs transcation or not Darrick helped fix the error occurred in xfs/141.[2] No obvious error occurred when I run xfstests in my test machine. [1]. Below check is to avoid memory reclaim recursion. if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) =3D=3D PF_MEMALLOC)) goto redirty; [2]. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia/ Cc: Darrick J. Wong Cc: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Dave Chinner Cc: Michal Hocko Cc: David Howells Cc: Jeff Layton Signed-off-by: Yafang Shao --- fs/iomap/buffered-io.c | 7 ------- fs/xfs/xfs_aops.c | 23 +++++++++++++++++++++-- fs/xfs/xfs_linux.h | 4 ---- fs/xfs/xfs_trans.c | 25 +++++++++++++------------ fs/xfs/xfs_trans.h | 23 +++++++++++++++++++++++ 5 files changed, 57 insertions(+), 25 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 10cc7979ce38..3c53fa6ce64d 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct write= back_control *wbc, void *data) PF_MEMALLOC)) goto redirty; =20 - /* - * Given that we do not allow direct reclaim to call us, we should - * never be called in a recursive filesystem reclaim context. - */ - if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) - goto redirty; - /* * Is this page beyond the end of the file? * diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 4304c6416fbb..1541ea5956fa 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -62,7 +62,8 @@ xfs_setfilesize_trans_alloc( * We hand off the transaction to the completion thread now, so * clear the flag here. */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_clear(tp); + return 0; } =20 @@ -125,7 +126,7 @@ xfs_setfilesize_ioend( * thus we need to mark ourselves as being in a transaction manually. * Similarly for freeze protection. */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_set(tp); __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS); =20 /* we abort the update if there was an IO error */ @@ -568,6 +569,16 @@ xfs_vm_writepage( { struct xfs_writepage_ctx wpc =3D { }; =20 + /* + * Given that we do not allow direct reclaim to call us, we should + * never be called while in a filesystem transaction. + */ + if (xfs_trans_context_active()) { + redirty_page_for_writepage(wbc, page); + unlock_page(page); + return 0; + } + return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops); } =20 @@ -579,6 +590,14 @@ xfs_vm_writepages( struct xfs_writepage_ctx wpc =3D { }; =20 xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); + + /* + * Given that we do not allow direct reclaim to call us, we should + * never be called while in a filesystem transaction. + */ + if (xfs_trans_context_active()) + return 0; + return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); } =20 diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 5b7a1e201559..6ab0f8043c73 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -102,10 +102,6 @@ typedef __u32 xfs_nlink_t; #define xfs_cowb_secs xfs_params.cowb_timer.val =20 #define current_cpu() (raw_smp_processor_id()) -#define current_set_flags_nested(sp, f) \ - (*(sp) =3D current->flags, current->flags |=3D (f)) -#define current_restore_flags_nested(sp, f) \ - (current->flags =3D ((current->flags & ~(f)) | (*(sp) & (f)))) =20 #define NBBY 8 /* number of bits per byte */ =20 diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index c94e71f741b6..09ae5c181299 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -67,6 +67,11 @@ xfs_trans_free( xfs_extent_busy_sort(&tp->t_busy); xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false); =20 + /* Detach the transaction from this thread. */ + ASSERT(current->journal_info !=3D NULL); + if (current->journal_info =3D=3D tp) + xfs_trans_context_clear(tp); + trace_xfs_trans_free(tp, _RET_IP_); if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT)) sb_end_intwrite(tp->t_mountp->m_super); @@ -119,7 +124,11 @@ xfs_trans_dup( =20 ntp->t_rtx_res =3D tp->t_rtx_res - tp->t_rtx_res_used; tp->t_rtx_res =3D tp->t_rtx_res_used; + + /* Associate the new transaction with this thread. */ + ASSERT(current->journal_info =3D=3D tp); ntp->t_pflags =3D tp->t_pflags; + current->journal_info =3D ntp; =20 /* move deferred ops over to the new tp */ xfs_defer_move(ntp, tp); @@ -153,8 +162,6 @@ xfs_trans_reserve( int error =3D 0; bool rsvd =3D (tp->t_flags & XFS_TRANS_RESERVE) !=3D 0; =20 - /* Mark this thread as being in a transaction */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); =20 /* * Attempt to reserve the needed disk blocks by decrementing @@ -163,10 +170,8 @@ xfs_trans_reserve( */ if (blocks > 0) { error =3D xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); - if (error !=3D 0) { - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + if (error !=3D 0) return -ENOSPC; - } tp->t_blk_res +=3D blocks; } =20 @@ -241,8 +246,6 @@ xfs_trans_reserve( tp->t_blk_res =3D 0; } =20 - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - return error; } =20 @@ -284,6 +287,8 @@ xfs_trans_alloc( INIT_LIST_HEAD(&tp->t_dfops); tp->t_firstblock =3D NULLFSBLOCK; =20 + /* Mark this thread as being in a transaction */ + xfs_trans_context_set(tp); error =3D xfs_trans_reserve(tp, resp, blocks, rtextents); if (error) { xfs_trans_cancel(tp); @@ -878,7 +883,6 @@ __xfs_trans_commit( =20 xfs_log_commit_cil(mp, tp, &commit_lsn, regrant); =20 - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); xfs_trans_free(tp); =20 /* @@ -910,7 +914,7 @@ __xfs_trans_commit( xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket); tp->t_ticket =3D NULL; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_free_items(tp, !!error); xfs_trans_free(tp); =20 @@ -970,9 +974,6 @@ xfs_trans_cancel( tp->t_ticket =3D NULL; } =20 - /* mark this thread as no longer being in a transaction */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - xfs_trans_free_items(tp, dirty); xfs_trans_free(tp); } diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 084658946cc8..ceb530bf5c4b 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -268,4 +268,27 @@ xfs_trans_item_relog( return lip->li_ops->iop_relog(lip, tp); } =20 +static inline void +xfs_trans_context_set(struct xfs_trans *tp) +{ + ASSERT(!current->journal_info); + current->journal_info =3D tp; + tp->t_pflags =3D memalloc_nofs_save(); +} + +static inline void +xfs_trans_context_clear(struct xfs_trans *tp) +{ + ASSERT(current->journal_info =3D=3D tp); + current->journal_info =3D NULL; + memalloc_nofs_restore(tp->t_pflags); +} + +static inline bool +xfs_trans_context_active(void) +{ + /* Use journal_info to indicate current is in a transaction */ + return current->journal_info !=3D NULL; +} + #endif /* __XFS_TRANS_H__ */ --=20 2.18.4