All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 4.9 16/74] Btrfs: fix assertion failure during fsync and use of stale transaction
Date: Fri, 20 Sep 2019 00:03:29 +0200	[thread overview]
Message-ID: <20190919214805.890256386@linuxfoundation.org> (raw)
In-Reply-To: <20190919214800.519074117@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

commit 410f954cb1d1c79ae485dd83a175f21954fd87cd upstream.

Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 fs/btrfs/tree-log.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4846,7 +4846,7 @@ again:
 				err = btrfs_log_inode(trans, root, other_inode,
 						      LOG_OTHER_INODE,
 						      0, LLONG_MAX, ctx);
-				iput(other_inode);
+				btrfs_add_delayed_iput(other_inode);
 				if (err)
 					goto out_unlock;
 				else
@@ -5264,7 +5264,7 @@ process_leaf:
 			}
 
 			if (btrfs_inode_in_log(di_inode, trans->transid)) {
-				iput(di_inode);
+				btrfs_add_delayed_iput(di_inode);
 				break;
 			}
 
@@ -5276,7 +5276,7 @@ process_leaf:
 			if (!ret &&
 			    btrfs_must_commit_transaction(trans, di_inode))
 				ret = 1;
-			iput(di_inode);
+			btrfs_add_delayed_iput(di_inode);
 			if (ret)
 				goto next_dir_inode;
 			if (ctx->log_new_dentries) {
@@ -5422,7 +5422,7 @@ static int btrfs_log_all_parents(struct
 			if (!ret && ctx && ctx->log_new_dentries)
 				ret = log_new_dir_dentries(trans, root,
 							   dir_inode, ctx);
-			iput(dir_inode);
+			btrfs_add_delayed_iput(dir_inode);
 			if (ret)
 				goto out;
 		}



  parent reply	other threads:[~2019-09-19 22:18 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 22:03 [PATCH 4.9 00/74] 4.9.194-stable review Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 01/74] bridge/mdb: remove wrong use of NLM_F_MULTI Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 02/74] cdc_ether: fix rndis support for Mediatek based smartphones Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 03/74] ipv6: Fix the link time qualifier of ping_v6_proc_exit_net() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 04/74] isdn/capi: check message length in capi_write() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 05/74] net: Fix null de-reference of device refcount Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 06/74] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 07/74] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 08/74] sctp: Fix the link time qualifier of sctp_ctrlsock_exit() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 09/74] sctp: use transport pf_retrans in sctp_do_8_2_transport_strike Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 10/74] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 11/74] tipc: add NULL pointer check before calling kfree_rcu Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 12/74] tun: fix use-after-free when register netdev failed Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 13/74] Revert "MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur" Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 14/74] gpio: fix line flag validation in linehandle_create Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 15/74] gpio: fix line flag validation in lineevent_create Greg Kroah-Hartman
2019-09-19 22:03 ` Greg Kroah-Hartman [this message]
2019-09-19 22:03 ` [PATCH 4.9 17/74] genirq: Prevent NULL pointer dereference in resend_irqs() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 18/74] KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 19/74] KVM: x86: work around leak of uninitialized stack contents Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 20/74] KVM: nVMX: handle page fault in vmread Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 21/74] MIPS: VDSO: Prevent use of smp_processor_id() Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 22/74] MIPS: VDSO: Use same -m%-float cflag as the kernel proper Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 23/74] clk: rockchip: Dont yell about bad mmc phases when getting Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 24/74] mtd: rawnand: mtk: Fix wrongly assigned OOB buffer pointer issue Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 25/74] driver core: Fix use-after-free and double free on glue directory Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 26/74] crypto: talitos - check AES key size Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 27/74] crypto: talitos - fix CTR alg blocksize Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 28/74] crypto: talitos - check data blocksize in ablkcipher Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 29/74] crypto: talitos - fix ECB algs ivsize Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 30/74] crypto: talitos - Do not modify req->cryptlen on decryption Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 31/74] crypto: talitos - HMAC SNOOP NO AFEU mode requires SW icv checking Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 32/74] drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 33/74] nvmem: Use the same permissions for eeprom as for nvmem Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 34/74] x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 35/74] USB: usbcore: Fix slab-out-of-bounds bug during device reset Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 36/74] media: tm6000: double free if usb disconnect while streaming Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 37/74] powerpc/mm/radix: Use the right page size for vmemmap mapping Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 38/74] x86/boot: Add missing bootparam that breaks boot on some platforms Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 39/74] xen-netfront: do not assume sk_buff_head list is empty in error handling Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 40/74] KVM: coalesced_mmio: add bounds checking Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 41/74] serial: sprd: correct the wrong sequence of arguments Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 42/74] tty/serial: atmel: reschedule TX after RX was started Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 43/74] mwifiex: Fix three heap overflow at parsing element in cfg80211_ap_settings Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 44/74] ARM: OMAP2+: Fix missing SYSC_HAS_RESET_STATUS for dra7 epwmss Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 45/74] s390/bpf: fix lcgr instruction encoding Greg Kroah-Hartman
2019-09-19 22:03 ` [PATCH 4.9 46/74] ARM: OMAP2+: Fix omap4 errata warning on other SoCs Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 47/74] s390/bpf: use 32-bit index for tail calls Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 48/74] NFSv4: Fix return values for nfs4_file_open() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 49/74] NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 50/74] Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105 Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 51/74] qed: Add cleanup in qed_slowpath_start() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 52/74] ARM: 8874/1: mm: only adjust sections of valid mm structures Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 53/74] batman-adv: Only read OGM2 tvlv_len after buffer len check Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 54/74] r8152: Set memory to all 0xFFs on failed reg reads Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 55/74] x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 56/74] netfilter: nf_conntrack_ftp: Fix debug output Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 57/74] NFSv2: Fix eof handling Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 58/74] NFSv2: Fix write regression Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 59/74] cifs: set domainName when a domain-key is used in multiuser Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 60/74] cifs: Use kzfree() to zero out the password Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 61/74] ARM: 8901/1: add a criteria for pfn_valid of arm Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 62/74] sky2: Disable MSI on yet another ASUS boards (P6Xxxx) Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 63/74] perf/x86/intel: Restrict period on Nehalem Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 64/74] perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 65/74] tools/power turbostat: fix buffer overrun Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 66/74] net: seeq: Fix the function used to release some memory in an error handling path Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 67/74] dmaengine: ti: dma-crossbar: Fix a memory leak bug Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 68/74] dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 69/74] x86/uaccess: Dont leak the AC flags into __get_user() argument evaluation Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 70/74] keys: Fix missing null pointer check in request_key_auth_describe() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 71/74] iommu/amd: Fix race in increase_address_space() Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 72/74] floppy: fix usercopy direction Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 73/74] media: technisat-usb2: break out of loop at end of buffer Greg Kroah-Hartman
2019-09-19 22:04 ` [PATCH 4.9 74/74] ARC: export "abort" for modules Greg Kroah-Hartman
2019-09-20  3:19 ` [PATCH 4.9 00/74] 4.9.194-stable review kernelci.org bot
2019-09-20  6:48 ` Naresh Kamboju
2019-09-20 13:42 ` Guenter Roeck
2019-09-20 13:47 ` Jon Hunter
2019-09-20 13:47   ` Jon Hunter
2019-09-20 21:28 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919214805.890256386@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.