From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F58DC433DF for ; Mon, 24 Aug 2020 01:43:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1417020774 for ; Mon, 24 Aug 2020 01:43:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GJfleXBW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1417020774 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D9A38D0003; Sun, 23 Aug 2020 21:43:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B0E66B0006; Sun, 23 Aug 2020 21:43:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89EE28D0003; Sun, 23 Aug 2020 21:43:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 754526B0005 for ; Sun, 23 Aug 2020 21:43:23 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3BDDE1EE6 for ; Mon, 24 Aug 2020 01:43:23 +0000 (UTC) X-FDA: 77183764686.23.quiet61_28142952704f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id E705337606 for ; Mon, 24 Aug 2020 01:43:22 +0000 (UTC) X-HE-Tag: quiet61_28142952704f X-Filterd-Recvd-Size: 11211 Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 24 Aug 2020 01:43:22 +0000 (UTC) Received: by mail-qt1-f194.google.com with SMTP id d27so5187704qtg.4 for ; Sun, 23 Aug 2020 18:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=p/Aa+4/PAdW4UzIOLAlILdbf31440xpvhPKoHCUn5Lg=; b=GJfleXBWQMZRAOCc58l26UD/2OnznQ2xXFpx5YS7Fs7cjuIeKUo3/hmL3O1/AQ54Gw YO9FZttNqQAeqMIqCjXe5pZFkNRiNsJFkcqjZRxre0JWraYrAI95JtpjQlc5zz/BDmgL vGgJrb6B+cYTsu7c1SXcNV418I5nnFTjRxiE3kaSNsY9uMhczkmsv04tcSvWa/F+gjfS PmMNi8utmjew4Lgte2XbsxM5DDqRs5XjLaNcr+EvQro5mJKf9IDSIRTL8Qlz9uPZwr3W gtxsiOFu10pRLbY2uvQIqfDExU55gkIN5X9auBAIgIj2wa8SsSH4+XfnWdHAjfLWh4FJ VQKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=p/Aa+4/PAdW4UzIOLAlILdbf31440xpvhPKoHCUn5Lg=; b=f4vtnILZg+kZeQJK2welAM80n/0Eblne7Bvq5wegPfdPES32EKwopxwRpsY6cfOuLk m0PcB/f1sUFEtpks129IPW/a3K3exkRywOjyJ019WM0eJNmZ05UKXu7GZVu9mgUtGury bkYbdP7pixXyrLJ1gln3EKMDiwdt3LeA4P+HOiokQbEvgftpV6JrDRKkRftQt93vTdGp yqAREMOjlHHHLtk2CKupKSuZ+rjeC6LqCTPp1+hMfamY8wWNJN67FOcJI0l+U/gOaqHS iGMEpOOw2myjDzPdb3K/Of7OPK+Lv8jnwAO4ruon3tSnN6RXR1biFvC3VE9yGeHDrA8f HMLw== X-Gm-Message-State: AOAM5306NgpDKp76eIpERm4XdXxpYkKT+hu2xQ6fZjgvU3R2crCElUl5 rUoLhSf3io8tI9+zQ6lgXRM= X-Google-Smtp-Source: ABdhPJxhQgveld9b97WUsuzJNGeORAh9h3i0bi1kXDqhtpmrqPFEOAB2upif5n9TfDCs11oA+PTOpA== X-Received: by 2002:ac8:3a84:: with SMTP id x4mr3004272qte.361.1598233401764; Sun, 23 Aug 2020 18:43:21 -0700 (PDT) Received: from localhost.localdomain ([50.236.19.102]) by smtp.gmail.com with ESMTPSA id t8sm10205236qtc.50.2020.08.23.18.43.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Aug 2020 18:43:21 -0700 (PDT) From: Yafang Shao To: david@fromorbit.com, hch@infradead.org, darrick.wong@oracle.com, willy@infradead.org, mhocko@kernel.org, akpm@linux-foundation.org Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [PATCH v6 2/2] xfs: avoid transaction reservation recursion Date: Mon, 24 Aug 2020 09:42:34 +0800 Message-Id: <20200824014234.7109-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20200824014234.7109-1-laoar.shao@gmail.com> References: <20200824014234.7109-1-laoar.shao@gmail.com> X-Rspamd-Queue-Id: E705337606 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: PF_FSTRANS which is used to avoid transaction reservation recursion, is dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which means to avoid filesystem reclaim recursion. That change is subtle. Let's take the exmple of the check of WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to PF_MEMALLOC_NOFS is not proper. Below comment is quoted from Dave, > It wasn't for memory allocation recursion protection in XFS - it was for > transaction reservation recursion protection by something trying to flush > data pages while holding a transaction reservation. Doing > this could deadlock the journal because the existing reservation > could prevent the nested reservation for being able to reserve space > in the journal and that is a self-deadlock vector. > IOWs, this check is not protecting against memory reclaim recursion > bugs at all (that's the previous check [1]). This check is > protecting against the filesystem calling writepages directly from a > context where it can self-deadlock. > So what we are seeing here is that the PF_FSTRANS -> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > about what type of error this check was protecting against. As a result, we should reintroduce PF_FSTRANS. As current->journal_info isn't used in XFS, we can reuse it to indicate whehter the task is in fstrans or not, Per Willy. To achieve that, four new helpers are introduce in this patch, per Dave: - xfs_trans_context_set() Used in xfs_trans_alloc() - xfs_trans_context_clear() Used in xfs_trans_commit() and xfs_trans_cancel() - xfs_trans_context_update() Used in xfs_trans_roll() - fstrans_context_active() To check whehter current is in fs transcation or not [1]. Below check is to avoid memory reclaim recursion. if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)) goto redirty; Cc: Dave Chinner Cc: Christoph Hellwig Cc: Michal Hocko Cc: Darrick J. Wong Cc: Matthew Wilcox Signed-off-by: Yafang Shao --- fs/iomap/buffered-io.c | 4 ++-- fs/xfs/xfs_aops.c | 5 +++-- fs/xfs/xfs_linux.h | 4 ---- fs/xfs/xfs_trans.c | 19 +++++++++---------- fs/xfs/xfs_trans.h | 23 +++++++++++++++++++++++ include/linux/iomap.h | 7 +++++++ 6 files changed, 44 insertions(+), 18 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index bcfc288dba3f..8043224ec079 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1500,9 +1500,9 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data) /* * Given that we do not allow direct reclaim to call us, we should - * never be called in a recursive filesystem reclaim context. + * never be called while in a filesystem transaction. */ - if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) + if (WARN_ON_ONCE(fstrans_context_active())) goto redirty; /* diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index b35611882ff9..83e0a1840221 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -62,7 +62,8 @@ xfs_setfilesize_trans_alloc( * We hand off the transaction to the completion thread now, so * clear the flag here. */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_clear(tp); + return 0; } @@ -125,7 +126,7 @@ xfs_setfilesize_ioend( * thus we need to mark ourselves as being in a transaction manually. * Similarly for freeze protection. */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_set(tp); __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS); /* we abort the update if there was an IO error */ diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index ab737fed7b12..8a4f6db77e33 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -102,10 +102,6 @@ typedef __u32 xfs_nlink_t; #define xfs_cowb_secs xfs_params.cowb_timer.val #define current_cpu() (raw_smp_processor_id()) -#define current_set_flags_nested(sp, f) \ - (*(sp) = current->flags, current->flags |= (f)) -#define current_restore_flags_nested(sp, f) \ - (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) #define NBBY 8 /* number of bits per byte */ diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index ed72867b1a19..5f3a4ff51b3c 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -153,8 +153,6 @@ xfs_trans_reserve( int error = 0; bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; - /* Mark this thread as being in a transaction */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); /* * Attempt to reserve the needed disk blocks by decrementing @@ -163,10 +161,8 @@ xfs_trans_reserve( */ if (blocks > 0) { error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); - if (error != 0) { - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + if (error != 0) return -ENOSPC; - } tp->t_blk_res += blocks; } @@ -241,8 +237,6 @@ xfs_trans_reserve( tp->t_blk_res = 0; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - return error; } @@ -284,6 +278,8 @@ xfs_trans_alloc( INIT_LIST_HEAD(&tp->t_dfops); tp->t_firstblock = NULLFSBLOCK; + /* Mark this thread as being in a transaction */ + xfs_trans_context_set(tp); error = xfs_trans_reserve(tp, resp, blocks, rtextents); if (error) { xfs_trans_cancel(tp); @@ -878,7 +874,8 @@ __xfs_trans_commit( xfs_log_commit_cil(mp, tp, &commit_lsn, regrant); - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + if (!regrant) + xfs_trans_context_clear(tp); xfs_trans_free(tp); /* @@ -910,7 +907,8 @@ __xfs_trans_commit( xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket); tp->t_ticket = NULL; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + + xfs_trans_context_clear(tp); xfs_trans_free_items(tp, !!error); xfs_trans_free(tp); @@ -971,7 +969,7 @@ xfs_trans_cancel( } /* mark this thread as no longer being in a transaction */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_clear(tp); xfs_trans_free_items(tp, dirty); xfs_trans_free(tp); @@ -1013,6 +1011,7 @@ xfs_trans_roll( if (error) return error; + xfs_trans_context_update(trans, *tpp); /* * Reserve space in the log for the next transaction. * This also pushes items in the "AIL", the list of logged items, diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index b752501818d2..895f560229d6 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -243,4 +243,27 @@ void xfs_trans_buf_copy_type(struct xfs_buf *dst_bp, extern kmem_zone_t *xfs_trans_zone; +static inline void +xfs_trans_context_set(struct xfs_trans *tp) +{ + ASSERT(!current->journal_info); + current->journal_info = tp; + tp->t_pflags = memalloc_nofs_save(); +} + +static inline void +xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new) +{ + ASSERT(current->journal_info == old); + current->journal_info = new; +} + +static inline void +xfs_trans_context_clear(struct xfs_trans *tp) +{ + ASSERT(current->journal_info == tp); + current->journal_info = NULL; + memalloc_nofs_restore(tp->t_pflags); +} + #endif /* __XFS_TRANS_H__ */ diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4d1d3c3469e9..54194dd6009d 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -271,4 +271,11 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, # define iomap_swapfile_activate(sis, swapfile, pagespan, ops) (-EIO) #endif /* CONFIG_SWAP */ +/* Use the journal_info to indicate current is in a transaction */ +static inline bool +fstrans_context_active(void) +{ + return current->journal_info != NULL; +} + #endif /* LINUX_IOMAP_H */ -- 2.18.1