From: Kamal Mostafa <kamal@canonical.com>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
kernel-team@lists.ubuntu.com
Cc: Filipe Manana <fdmanana@suse.com>, Kamal Mostafa <kamal@canonical.com>
Subject: [PATCH 3.13.y-ckt 59/86] Btrfs: fix race leading to incorrect item deletion when dropping extents
Date: Wed, 2 Dec 2015 14:54:20 -0800 [thread overview]
Message-ID: <1449096887-23017-60-git-send-email-kamal@canonical.com> (raw)
In-Reply-To: <1449096887-23017-1-git-send-email-kamal@canonical.com>
3.13.11-ckt31 -stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana <fdmanana@suse.com>
commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.
While running a stress test I got the following warning triggered:
[191627.672810] ------------[ cut here ]------------
[191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
(...)
[191627.701485] Call Trace:
[191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
[191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
[191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
[191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
[191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
[191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
[191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
[191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
[191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
[191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
[191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
[191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
[191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
[191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.744206] ---[ end trace bbfddacb7aaada8d ]---
$ cat -n fs/btrfs/file.c
691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
(...)
758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
759 if (key.objectid > ino ||
760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
761 break;
762
763 fi = btrfs_item_ptr(leaf, path->slots[0],
764 struct btrfs_file_extent_item);
765 extent_type = btrfs_file_extent_type(leaf, fi);
766
767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
778 } else {
779 WARN_ON(1);
780 extent_end = search_start;
781 }
(...)
This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:
CPU 1 CPU 2
btrfs_finish_ordered_io()
insert_reserved_file_extent()
__btrfs_drop_extents()
Searches for the key
(257 EXTENT_DATA 4096) through
btrfs_lookup_file_extent()
Key not found and we get a path where
path->nodes[0] == leaf X and
path->slots[0] == N
Because path->slots[0] is >=
btrfs_header_nritems(leaf X), we call
btrfs_next_leaf()
btrfs_next_leaf() releases the path
inserts key
(257 INODE_REF 4096)
at the end of leaf X,
leaf X now has N + 1 keys,
and the new key is at
slot N
btrfs_next_leaf() searches for
key (257 INODE_REF 256), with
path->keep_locks set to 1,
because it was the last key it
saw in leaf X
finds it in leaf X again and
notices it's no longer the last
key of the leaf, so it returns 0
with path->nodes[0] == leaf X and
path->slots[0] == N (which is now
< btrfs_header_nritems(leaf X)),
pointing to the new key
(257 INODE_REF 4096)
__btrfs_drop_extents() casts the
item at path->nodes[0], slot
path->slots[0], to a struct
btrfs_file_extent_item - it does
not skip keys for the target
inode with a type less than
BTRFS_EXTENT_DATA_KEY
(BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
sees a bogus value for the type
field triggering the WARN_ON in
the trace shown above, and sets
extent_end = search_start (4096)
does the if-then-else logic to
fixup 0 length extent items created
by a past bug from hole punching:
if (extent_end == key.offset &&
extent_end >= search_start)
goto delete_extent_item;
that evaluates to true and it ends
up deleting the key pointed to by
path->slots[0], (257 INODE_REF 4096),
from leaf X
The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).
So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
fs/btrfs/file.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 6651664..9d0f7d4 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -749,8 +749,16 @@ next_slot:
}
btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
- if (key.objectid > ino ||
- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+ if (key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(key.objectid < ino) ||
+ key.type < BTRFS_EXTENT_DATA_KEY) {
+ ASSERT(del_nr == 0);
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
break;
fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -768,8 +776,8 @@ next_slot:
extent_end = key.offset +
btrfs_file_extent_inline_len(leaf, fi);
} else {
- WARN_ON(1);
- extent_end = search_start;
+ /* can't happen */
+ BUG();
}
if (extent_end <= search_start) {
--
1.9.1
next prev parent reply other threads:[~2015-12-02 23:09 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-02 22:53 [3.13.y-ckt stable] Linux 3.13.11-ckt31 stable review Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 01/86] x86/setup: Extend low identity map to cover whole kernel range Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 02/86] x86/setup: Fix low identity map for >= 2GB " Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 03/86] drm/radeon: add quirk for MSI R7 370 Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 04/86] drm/radeon: add quirk for ASUS " Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 05/86] drm/radeon: fix quirk for MSI R7 370 Armor 2X Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 06/86] irda: precedence bug in irlmp_seq_hb_idx() Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 07/86] macvtap: unbreak receiving of gro skb with frag list Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 08/86] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 09/86] stmmac: Correctly report PTP capabilities Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 10/86] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 11/86] qmi_wwan: fix entry for HP lt4112 LTE/HSPA+ Gobi 4G Module Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 12/86] net: avoid NULL deref in inet_ctl_sock_destroy() Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 13/86] net: fix a race in dst_release() Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 14/86] [3.13-stable only] fib_rules: Fix dump_rules() not to exit early Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 15/86] HID: core: Avoid uninitialized buffer access Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 16/86] [media] v4l2-compat-ioctl32: fix alignment for ARM64 Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 17/86] net: mvneta: Fix CPU_MAP registers initialisation Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 18/86] mtd: mtdpart: fix add_mtd_partitions error path Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 19/86] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 20/86] ARM: 8426/1: dma-mapping: add missing range check in dma_mmap() Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 21/86] ARM: 8427/1: dma-mapping: add support for offset parameter " Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 22/86] spi: ti-qspi: Fix data corruption seen on r/w stress test Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 23/86] lockd: create NSM handles per net namespace Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 24/86] ARM: common: edma: Fix channel parameter for irq callbacks Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 25/86] iommu/vt-d: Fix error in detect ATS capability Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 26/86] iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 27/86] ext4: fix potential use after free in __ext4_journal_stop Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 28/86] [PATCH] fix calculation of meta_bg descriptor backups Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 29/86] ext4, jbd2: ensure entering into panic after recording an error in superblock Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 30/86] vTPM: fix memory allocation flag for rtce buffer at kernel boot Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 31/86] [media] media: vb2 dma-contig: Fully cache synchronise buffers in prepare and finish Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 32/86] Bluetooth: hidp: fix device disconnect on idle timeout Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 33/86] Bluetooth: ath3k: Add new AR3012 0930:021c id Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 34/86] Bluetooth: ath3k: Add support of AR3012 0cf3:817b device Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 35/86] spi: atmel: Fix DMA-setup for transfers with more than 8 bits per word Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 36/86] staging: rtl8712: Add device ID for Sitecom WLA2100 Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 37/86] ACPI: Use correct IRQ when uninstalling ACPI interrupt handler Kamal Mostafa
2015-12-02 22:53 ` [PATCH 3.13.y-ckt 38/86] ALSA: hda/realtek - Dell XPS one ALC3260 speaker no sound after resume back Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 39/86] ALSA: hda - Disable 64bit address for Creative HDA controllers Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 40/86] MAINTAINERS: Add public mailing list for ARC Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 41/86] megaraid_sas: Do not use PAGE_SIZE for max_sectors Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 42/86] arm64: Fix compat register mappings Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 43/86] can: Use correct type in sizeof() in nla_put() Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 44/86] mtd: blkdevs: fix potential deadlock + lockdep warnings Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 45/86] Revert "dm mpath: fix stalls when handling invalid ioctls" Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 46/86] crypto: algif_hash - Only export and import on sockets with data Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 47/86] xtensa: fixes for configs without loop option Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 48/86] megaraid_sas : SMAP restriction--do not access user memory from IOCTL code Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 49/86] mac80211: allow null chandef in tracing Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 50/86] recordmcount: Fix endianness handling bug for nop_mcount Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 51/86] KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 52/86] KVM: VMX: fix SMEP and SMAP without EPT Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 53/86] ALSA: hda - Apply pin fixup for HP ProBook 6550b Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 54/86] ALSA: hda - Add Intel Lewisburg device IDs Audio Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 55/86] firewire: ohci: fix JMicron JMB38x IT context discovery Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 56/86] proc: actually make proc_fd_permission() thread-friendly Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 57/86] printk: prevent userland from spoofing kernel messages Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 58/86] x86/cpu: Call verify_cpu() after having entered long mode too Kamal Mostafa
2015-12-02 22:54 ` Kamal Mostafa [this message]
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 60/86] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 61/86] perf: Fix inherited events vs. tracepoint filters Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 62/86] scsi_sysfs: Fix queue_ramp_up_period return code Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 63/86] Btrfs: fix race when listing an inode's xattrs Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 64/86] storvsc: Don't set the SRB_FLAGS_QUEUE_ACTION_ENABLE flag Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 65/86] KVM: x86: Defining missing x86 vectors Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 66/86] KVM: x86: work around infinite loop in microcode when #AC is delivered Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 67/86] KVM: svm: unconditionally intercept #DB Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 68/86] drm/ast: Initialized data needed to map fbdev memory Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 69/86] FS-Cache: Increase reference of parent after registering, netfs success Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 70/86] FS-Cache: Don't override netfs's primary_index if registering failed Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 71/86] FS-Cache: Handle a write to the page immediately beyond the EOF marker Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 72/86] binfmt_elf: Don't clobber passed executable's file header Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 73/86] wm831x_power: Use IRQF_ONESHOT to request threaded IRQs Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 74/86] mwifiex: fix mwifiex_rdeeprom_read() Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 75/86] dmaengine: dw: convert to __ffs() Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 76/86] devres: fix a for loop bounds check Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 77/86] MIPS: atomic: Fix comment describing atomic64_add_unless's return value Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 78/86] ipv6: fix tunnel error handling Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 79/86] perf trace: Fix documentation for -i Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 80/86] bonding: fix panic on non-ARPHRD_ETHER enslave failure Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 81/86] mac80211: fix driver RSSI event calculations Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 82/86] packet: fix match_fanout_group() Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 83/86] ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 84/86] drm: Fix return value of drm_framebuffer_init() Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 85/86] netfilter: nfnetlink: don't probe module if it exists Kamal Mostafa
2015-12-02 22:54 ` [PATCH 3.13.y-ckt 86/86] TPM: Avoid reference to potentially freed memory Kamal Mostafa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1449096887-23017-60-git-send-email-kamal@canonical.com \
--to=kamal@canonical.com \
--cc=fdmanana@suse.com \
--cc=kernel-team@lists.ubuntu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).