linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Coly Li <colyli@suse.de>,
	NeilBrown <neilb@suse.com>,
	Jack Wang <jinpu.wang@profitbricks.com>, Shaohua Li <shli@fb.com>
Subject: [PATCH 3.18 072/124] md/raid1/10: fix potential deadlock
Date: Thu, 20 Apr 2017 08:35:47 +0200	[thread overview]
Message-ID: <20170420063559.834633954@linuxfoundation.org> (raw)
In-Reply-To: <20170420063557.021306233@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shaohua Li <shli@fb.com>

commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream.

Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:

1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer

If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.

The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:

    if (need to split) {
        split = bio_split(bio, ...)
        bio_chain(...)
        make_request_fn(split);
        generic_make_request(bio);
   } else
        make_request_fn(mddev, bio);

This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices.  These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first.  Then it will process the remainder
of the original bio once the first section has been fully processed.
"

Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.

Cc: Coly Li <colyli@suse.de>
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/raid10.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1578,7 +1578,25 @@ static void make_request(struct mddev *m
 			split = bio;
 		}
 
+		/*
+		 * If a bio is splitted, the first part of bio will pass
+		 * barrier but the bio is queued in current->bio_list (see
+		 * generic_make_request). If there is a raise_barrier() called
+		 * here, the second part of bio can't pass barrier. But since
+		 * the first part bio isn't dispatched to underlaying disks
+		 * yet, the barrier is never released, hence raise_barrier will
+		 * alays wait. We have a deadlock.
+		 * Note, this only happens in read path. For write path, the
+		 * first part of bio is dispatched in a schedule() call
+		 * (because of blk plug) or offloaded to raid10d.
+		 * Quitting from the function immediately can change the bio
+		 * order queued in bio_list and avoid the deadlock.
+		 */
 		__make_request(mddev, split);
+		if (split != bio && bio_data_dir(bio) == READ) {
+			generic_make_request(bio);
+			break;
+		}
 	} while (split != bio);
 
 	/* In case raid10d snuck in to freeze_array */

  parent reply	other threads:[~2017-04-20  6:57 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-20  6:34 [PATCH 3.18 000/124] 3.18.50-stable review Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 001/124] CIFS: store results of cifs_reopen_file to avoid infinite wait Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 002/124] Input: xpad - add support for Razer Wildcat gamepad Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 003/124] perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 004/124] x86/vdso: Plug race between mapping and ELF header setup Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 005/124] x86/vdso: Ensure vdso32_enabled gets set to valid values only Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 006/124] iscsi-target: Fix TMR reference leak during session shutdown Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 007/124] iscsi-target: Drop work-around for legacy GlobalSAN initiator Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 008/124] scsi: sr: Sanity check returned mode data Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 009/124] scsi: sd: Fix capacity calculation with 32-bit sector_t Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 010/124] xen, fbfront: fix connecting to backend Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 011/124] ftrace: Fix removing of second function probe Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 012/124] char: Drop bogus dependency of DEVPORT on !M68K Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 013/124] char: lack of bool string made CONFIG_DEVPORT always on Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 014/124] Revert "ARM: 8457/1: psci-smp is built only for SMP" Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 015/124] kvm: fix page struct leak in handle_vmon Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 016/124] drm/vmwgfx: Type-check lookups of fence objects Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 017/124] drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl() Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 018/124] drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl() Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 019/124] drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 020/124] drm/vmwgfx: Remove getparam error message Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 021/124] drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl() Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 022/124] c6x/ptrace: Remove useless PTRACE_SETREGSET implementation Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 023/124] mips/ptrace: Preserve previous registers for short regset write Greg Kroah-Hartman
2017-04-20  6:34 ` [PATCH 3.18 024/124] sparc/ptrace: " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 025/124] metag/ptrace: " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 026/124] metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 027/124] metag/ptrace: Reject partial NT_METAG_RPIPE writes Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 028/124] s390/decompressor: fix initrd corruption caused by bss clear Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 029/124] s390/uaccess: get_user() should zero on failure (again) Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 030/124] Reset TreeId to zero on SMB2 TREE_CONNECT Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 031/124] ptrace: fix PTRACE_LISTEN race corrupting task->state Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 032/124] Drivers: hv: balloon: dont crash when memory is added in non-sorted order Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 037/124] ALSA: seq: Fix racy cell insertions during snd_seq_pool_done() Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 038/124] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 039/124] powerpc/mm: Add missing global TLB invalidate if cxl is active Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 040/124] powerpc: Dont try to fix up misaligned load-with-reservation instructions Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 041/124] powerpc/boot: Fix zImage TOC alignment Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 042/124] target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 043/124] scsi: lpfc: Add shutdown method for kexec Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 044/124] scsi: libiscsi: add lock around task lists to fix list corruption regression Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 045/124] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 046/124] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 047/124] xen/acpi: upload PM state from init-domain to Xen Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 049/124] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
2017-04-20  7:46   ` Richard Genoud
2017-04-20 13:37     ` Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 050/124] zram: do not use copy_page with non-page aligned address Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 051/124] powerpc: Disable HFSCR[TM] if TM is not supported Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 052/124] crypto: ahash - Fix EINPROGRESS notification callback Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 053/124] [media] dvb-usb-v2: avoid use-after-free Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 055/124] rtc: tegra: Implement clock handling Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 056/124] mm: Tighten x86 /dev/mem with zeroing reads Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 057/124] [media] dvb-usb: dont use stack for firmware load Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 059/124] virtio-console: avoid DMA from stack Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 061/124] rtl8150: Use heap buffers for all register access Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 062/124] catc: Combine failure cleanup code in catc_probe() Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 063/124] catc: Use heap buffer for memory size test Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 065/124] sctp: deny peeloff operation on asocs with threads sleeping on it Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 066/124] net sched actions: decrement module reference count after table flush Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 067/124] KVM: PPC: Book3S PR: Fix illegal opcode emulation Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 069/124] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 070/124] arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 071/124] perf/core: Fix event inheritance on fork() Greg Kroah-Hartman
2017-04-20  6:35 ` Greg Kroah-Hartman [this message]
2017-04-20  6:35 ` [PATCH 3.18 073/124] target: Fix VERIFY_16 handling in sbc_parse_cdb Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 074/124] isdn/gigaset: fix NULL-deref at probe Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 075/124] percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 076/124] ipv4: provide stronger user input validation in nl_fib_input() Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 077/124] Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000 Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 078/124] Input: iforce - validate number of endpoints before using them Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 079/124] Input: ims-pcu " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 080/124] Input: hanwang " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 081/124] Input: yealink " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 082/124] Input: cm109 " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 083/124] Input: kbtab " Greg Kroah-Hartman
2017-04-20  6:35 ` [PATCH 3.18 084/124] Input: sur40 " Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 086/124] net/mlx5: Increase number of max QPs in default profile Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 087/124] net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 088/124] net: properly release sk_frag.page Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 089/124] net: unix: properly re-increment inflight counter of GC discarded candidates Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 090/124] socket, bpf: fix sk_filter use after free in sk_clone_lock Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 091/124] tcp: initialize icsk_ack.lrcvtime at session start time Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 092/124] mmc: ushc: fix NULL-deref at probe Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 093/124] uwb: hwa-rc: " Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 094/124] uwb: i1480-dfu: " Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 095/124] USB: usbtmc: add missing endpoint sanity check Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 096/124] iio: adc: ti_am335x_adc: fix fifo overrun recovery Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 097/124] ext4: mark inode dirty after converting inline directory Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 098/124] mmc: sdhci: Do not disable interrupts while waiting for clock Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 099/124] iommu/vt-d: Fix NULL pointer dereference in device_to_iommu Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 100/124] igb: Workaround for igb i210 firmware issue Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 101/124] igb: add i211 to i210 PHY workaround Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 102/124] xfs: dont allow di_size with high bit set Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 103/124] xfs: fix up xfs_swap_extent_forks inline extent handling Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 104/124] xfs: clear _XBF_PAGES from buffers when readahead page Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 105/124] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 106/124] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 107/124] serial: 8250_pci: Detach low-level driver during PCI error recovery Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 108/124] [media] uvcvideo: uvc_scan_fallback() for webcams with broken chain Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 109/124] block: allow WRITE_SAME commands with the SG_IO ioctl Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 110/124] virtio_balloon: init 1st buffer in stats vq Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 111/124] pinctrl: qcom: Dont clear status bit on irq_unmask Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 112/124] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 114/124] mm/mempolicy.c: fix error handling in set_mempolicy and mbind Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 115/124] ring-buffer: Fix return value check in test_ringbuffer() Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 116/124] MIPS: Flush wrong invalid FTLB entry for huge page Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 117/124] metag/usercopy: Drop unused macros Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 118/124] metag/usercopy: Fix alignment error checking Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 119/124] metag/usercopy: Add early abort to copy_to_user Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 120/124] metag/usercopy: Zero rest of buffer from copy_from_user Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 121/124] metag/usercopy: Set flags before ADDZ Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 122/124] metag/usercopy: Fix src fixup in from user rapf loops Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 123/124] metag/usercopy: Add missing fixups Greg Kroah-Hartman
2017-04-20  6:36 ` [PATCH 3.18 124/124] give up on gcc ilog2() constant optimizations Greg Kroah-Hartman
2017-04-20 13:46 ` [PATCH 3.18 000/124] 3.18.50-stable review Guenter Roeck
2017-04-20 14:28   ` Greg Kroah-Hartman
2017-04-20 19:53     ` Greg Kroah-Hartman
2017-04-21  3:38       ` Guenter Roeck
2017-04-21  4:35         ` Greg Kroah-Hartman
2017-04-21 15:58           ` Guenter Roeck
2017-04-21 17:47             ` Greg Kroah-Hartman
     [not found] ` <58f91c24.84a0df0a.dc1f9.4c38@mx.google.com>
2017-04-21  7:17   ` Greg Kroah-Hartman
2017-04-21 10:16     ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170420063559.834633954@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=colyli@suse.de \
    --cc=jinpu.wang@profitbricks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=shli@fb.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).