linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 4.9 16/74] Btrfs: fix assertion failure during fsync and use of stale transaction
Date: Fri, 20 Sep 2019 00:03:29 +0200	[thread overview]
Message-ID: <20190919214805.890256386@linuxfoundation.org> (raw)
In-Reply-To: <20190919214800.519074117@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

commit 410f954cb1d1c79ae485dd83a175f21954fd87cd upstream.

Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 fs/btrfs/tree-log.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4846,7 +4846,7 @@ again:
 				err = btrfs_log_inode(trans, root, other_inode,
 						      LOG_OTHER_INODE,
 						      0, LLONG_MAX, ctx);
-				iput(other_inode);
+				btrfs_add_delayed_iput(other_inode);
 				if (err)
 					goto out_unlock;
 				else
@@ -5264,7 +5264,7 @@ process_leaf:
 			}
 
 			if (btrfs_inode_in_log(di_inode, trans->transid)) {
-				iput(di_inode);
+				btrfs_add_delayed_iput(di_inode);
 				break;
 			}
 
@@ -5276,7 +5276,7 @@ process_leaf:
 			if (!ret &&
 			    btrfs_must_commit_transaction(trans, di_inode))
 				ret = 1;
-			iput(di_inode);
+			btrfs_add_delayed_iput(di_inode);
 			if (ret)
 				goto next_dir_inode;
 			if (ctx->log_new_dentries) {
@@ -5422,7 +5422,7 @@ static int btrfs_log_all_parents(struct
 			if (!ret && ctx && ctx->log_new_dentries)
 				ret = log_new_dir_dentries(trans, root,
 							   dir_inode, ctx);
-			iput(dir_inode);
+			btrfs_add_delayed_iput(dir_inode);
 			if (ret)
 				goto out;
 		}



  parent reply	other threads:[~2019-09-19 22:18 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 22:03 [PATCH 4.9 00/74] 4.9.194-stable review Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 01/74] bridge/mdb: remove wrong use of NLM_F_MULTI Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 02/74] cdc_ether: fix rndis support for Mediatek based smartphones Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 03/74] ipv6: Fix the link time qualifier of ping_v6_proc_exit_net() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 04/74] isdn/capi: check message length in capi_write() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 05/74] net: Fix null de-reference of device refcount Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 06/74] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 07/74] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 08/74] sctp: Fix the link time qualifier of sctp_ctrlsock_exit() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 09/74] sctp: use transport pf_retrans in sctp_do_8_2_transport_strike Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 10/74] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 11/74] tipc: add NULL pointer check before calling kfree_rcu Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 12/74] tun: fix use-after-free when register netdev failed Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 13/74] Revert "MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur" Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 14/74] gpio: fix line flag validation in linehandle_create Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 15/74] gpio: fix line flag validation in lineevent_create Greg Kroah-Hartman
2019-09-19 22:03 ` Greg Kroah-Hartman [this message]
2019-09-19 22:03 ` [PATCH 4.9 17/74] genirq: Prevent NULL pointer dereference in resend_irqs() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 18/74] KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 19/74] KVM: x86: work around leak of uninitialized stack contents Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 20/74] KVM: nVMX: handle page fault in vmread Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 21/74] MIPS: VDSO: Prevent use of smp_processor_id() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 22/74] MIPS: VDSO: Use same -m%-float cflag as the kernel proper Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 23/74] clk: rockchip: Dont yell about bad mmc phases when getting Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 24/74] mtd: rawnand: mtk: Fix wrongly assigned OOB buffer pointer issue Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 25/74] driver core: Fix use-after-free and double free on glue directory Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 26/74] crypto: talitos - check AES key size Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 27/74] crypto: talitos - fix CTR alg blocksize Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 28/74] crypto: talitos - check data blocksize in ablkcipher Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 29/74] crypto: talitos - fix ECB algs ivsize Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 30/74] crypto: talitos - Do not modify req->cryptlen on decryption Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 31/74] crypto: talitos - HMAC SNOOP NO AFEU mode requires SW icv checking Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 32/74] drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 33/74] nvmem: Use the same permissions for eeprom as for nvmem Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 34/74] x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 35/74] USB: usbcore: Fix slab-out-of-bounds bug during device reset Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 36/74] media: tm6000: double free if usb disconnect while streaming Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 37/74] powerpc/mm/radix: Use the right page size for vmemmap mapping Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 38/74] x86/boot: Add missing bootparam that breaks boot on some platforms Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 39/74] xen-netfront: do not assume sk_buff_head list is empty in error handling Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 40/74] KVM: coalesced_mmio: add bounds checking Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 41/74] serial: sprd: correct the wrong sequence of arguments Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 42/74] tty/serial: atmel: reschedule TX after RX was started Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 43/74] mwifiex: Fix three heap overflow at parsing element in cfg80211_ap_settings Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 44/74] ARM: OMAP2+: Fix missing SYSC_HAS_RESET_STATUS for dra7 epwmss Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 45/74] s390/bpf: fix lcgr instruction encoding Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 46/74] ARM: OMAP2+: Fix omap4 errata warning on other SoCs Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 47/74] s390/bpf: use 32-bit index for tail calls Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 48/74] NFSv4: Fix return values for nfs4_file_open() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 49/74] NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 50/74] Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105 Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 51/74] qed: Add cleanup in qed_slowpath_start() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 52/74] ARM: 8874/1: mm: only adjust sections of valid mm structures Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 53/74] batman-adv: Only read OGM2 tvlv_len after buffer len check Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 54/74] r8152: Set memory to all 0xFFs on failed reg reads Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 55/74] x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 56/74] netfilter: nf_conntrack_ftp: Fix debug output Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 57/74] NFSv2: Fix eof handling Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 58/74] NFSv2: Fix write regression Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 59/74] cifs: set domainName when a domain-key is used in multiuser Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 60/74] cifs: Use kzfree() to zero out the password Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 61/74] ARM: 8901/1: add a criteria for pfn_valid of arm Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 62/74] sky2: Disable MSI on yet another ASUS boards (P6Xxxx) Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 63/74] perf/x86/intel: Restrict period on Nehalem Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 64/74] perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 65/74] tools/power turbostat: fix buffer overrun Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 66/74] net: seeq: Fix the function used to release some memory in an error handling path Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 67/74] dmaengine: ti: dma-crossbar: Fix a memory leak bug Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 68/74] dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 69/74] x86/uaccess: Dont leak the AC flags into __get_user() argument evaluation Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 70/74] keys: Fix missing null pointer check in request_key_auth_describe() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 71/74] iommu/amd: Fix race in increase_address_space() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 72/74] floppy: fix usercopy direction Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 73/74] media: technisat-usb2: break out of loop at end of buffer Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 74/74] ARC: export "abort" for modules Greg Kroah-Hartman
2019-09-20  3:19 ` [PATCH 4.9 00/74] 4.9.194-stable review kernelci.org bot
2019-09-20  6:48 ` Naresh Kamboju
2019-09-20 13:42 ` Guenter Roeck
2019-09-20 13:47 ` Jon Hunter
2019-09-20 21:28 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919214805.890256386@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).