linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Alex Chen <alexchen@synology.com>,
	Alex Wu <alexwu@synology.com>,
	Chung-Chiang Cheng <cccheng@synology.com>,
	BingJing Chang <bingjingc@synology.com>, Shaohua Li <shli@fb.com>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH 3.18 041/105] md/raid5: fix data corruption of replacements after originals dropped
Date: Mon, 24 Sep 2018 13:33:27 +0200	[thread overview]
Message-ID: <20180924113118.180103471@linuxfoundation.org> (raw)
In-Reply-To: <20180924113113.268650190@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: BingJing Chang <bingjingc@synology.com>

[ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ]

During raid5 replacement, the stripes can be marked with R5_NeedReplace
flag. Data can be read from being-replaced devices and written to
replacing spares without reading all other devices. (It's 'replace'
mode. s.replacing = 1) If a being-replaced device is dropped, the
replacement progress will be interrupted and resumed with pure recovery
mode. However, existing stripes before being interrupted cannot read
from the dropped device anymore. It prints lots of WARN_ON messages.
And it results in data corruption because existing stripes write
problematic data into its replacement device and update the progress.

\# Erase disks (1MB + 2GB)
dd if=/dev/zero of=/dev/sda bs=1MB count=2049
dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
\# Ensure array stores non-zero data
dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
\# Start replacement
mdadm /dev/md0 -a /dev/sdd
mdadm /dev/md0 --replace /dev/sda

Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
echo check > /sys/block/md0/md/sync_action
cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

Soon after you hot-plug out /dev/sda, you will see many WARN_ON
messages. The replacement recovery will be interrupted shortly. After
the recovery finishes, it will result in data corruption.

Actually, it's just an unhandled case of replacement. In commit
<f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.),
if a NeedReplace device is not UPTODATE then that is an error, the
commit just simply print WARN_ON but also mark these corrupted stripes
with R5_WantReplace. (it means it's ready for writes.)

To fix this case, we can leverage 'sync and replace' mode mentioned in
commit <9a3e1101b827> (md/raid5: detect and handle replacements during
recovery.). We can add logics to detect and use 'sync and replace' mode
for these stripes.

Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/raid5.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3703,6 +3703,12 @@ static void analyse_stripe(struct stripe
 			s->failed++;
 			if (rdev && !test_bit(Faulty, &rdev->flags))
 				do_recovery = 1;
+			else if (!rdev) {
+				rdev = rcu_dereference(
+				    conf->disks[i].replacement);
+				if (rdev && !test_bit(Faulty, &rdev->flags))
+					do_recovery = 1;
+			}
 		}
 	}
 	if (test_bit(STRIPE_SYNCING, &sh->state)) {



  parent reply	other threads:[~2018-09-24 11:38 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-24 11:32 [PATCH 3.18 000/105] 3.18.123-stable review Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 001/105] cifs: check if SMB2 PDU size has been padded and suppress the warning Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 002/105] hfsplus: dont return 0 when fill_super() failed Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 003/105] hfs: prevent crash on exit from failed search Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 004/105] fork: dont copy inconsistent signal handler state to child Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 005/105] reiserfs: change j_timestamp type to time64_t Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 006/105] fat: validate ->i_start before using Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 007/105] scripts: modpost: check memory allocation results Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 008/105] mm/fadvise.c: fix signed overflow UBSAN complaint Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 009/105] ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest() Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 010/105] mfd: sm501: Set coherent_dma_mask when creating subdevices Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 011/105] platform/x86: asus-nb-wmi: Add keymap entry for lid flip action on UX360 Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 012/105] net/9p: fix error path of p9_virtio_probe Greg Kroah-Hartman
2018-09-24 11:32 ` [PATCH 3.18 013/105] powerpc: Fix size calculation using resource_size() Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 014/105] s390/dasd: fix hanging offline processing due to canceled worker Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 015/105] scsi: aic94xx: fix an error code in aic94xx_init() Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 016/105] PCI: mvebu: Fix I/O space end address calculation Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 017/105] dm kcopyd: avoid softlockup in run_complete_job Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 018/105] staging: comedi: ni_mio_common: fix subdevice flags for PFI subdevice Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 019/105] selftests/powerpc: Kill child processes on SIGINT Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 020/105] smb3: fix reset of bytes read and written stats Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 021/105] SMB3: Number of requests sent should be displayed for SMB3 not just CIFS Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 022/105] powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 023/105] btrfs: replace: Reset on-disk dev stats value after replace Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 024/105] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 025/105] btrfs: Dont remove block group that still has pinned down bytes Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 026/105] debugobjects: Make stack check warning more informative Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 027/105] kbuild: make missing $DEPMOD a Warning instead of an Error Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 028/105] irda: Fix memory leak caused by repeated binds of irda socket Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 029/105] irda: Only insert new objects into the global database via setsockopt Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 030/105] enic: do not call enic_change_mtu in enic_probe Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 031/105] Fixes: Commit 86af955d02bb ("mm: numa: avoid waiting on freed migrated pages") Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 032/105] ASoC: wm8994: Fix missing break in switch Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 033/105] i2c: xiic: Make the start and the byte count write atomic Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 034/105] cfq: Give a chance for arming slice idle timer in case of group_idle Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 035/105] kthread: Fix use-after-free if kthread fork fails Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 036/105] kthread: fix boot hang (regression) on MIPS/OpenRISC Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 037/105] staging: rt5208: Fix a sleep-in-atomic bug in xd_copy_page Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 038/105] staging/rts5208: Fix read overflow in memcpy Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 039/105] Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 040/105] scsi: target: fix __transport_register_session locking Greg Kroah-Hartman
2018-09-24 11:33 ` Greg Kroah-Hartman [this message]
2018-09-24 11:33 ` [PATCH 3.18 042/105] uio: potential double frees if __uio_register_device() fails Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 043/105] tty: rocket: Fix possible buffer overwrite on register_PCI Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 044/105] macintosh/via-pmu: Add missing mmio accessors Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 045/105] ath10k: prevent active scans on potential unusable channels Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 046/105] MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 047/105] ata: libahci: Correct setting of DEVSLP register Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 048/105] scsi: 3ware: fix return 0 on the error path of probe Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 049/105] Bluetooth: hidp: Fix handling of strncpy for hid->name information Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 050/105] x86/mm: Remove in_nmi() warning from vmalloc_fault() Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 051/105] gpio: ml-ioh: Fix buffer underwrite on probe error path Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 052/105] net: mvneta: fix mtu change on port without link Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 053/105] net: dcb: For wild-card lookups, use priority -1, not 0 Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 054/105] partitions/aix: append null character to print data from disk Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 055/105] partitions/aix: fix usage of uninitialized lv_info and lvname structures Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 056/105] mfd: ti_am335x_tscadc: Fix struct clk memory leak Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 057/105] f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 058/105] MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 059/105] xhci: Fix use-after-free in xhci_free_virt_device Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 060/105] netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 061/105] mm: get rid of vmacache_flush_all() entirely Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 062/105] ALSA: msnd: Fix the default sample sizes Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 063/105] ALSA: usb-audio: Fix multiple definitions in AU0828_DEVICE() macro Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 064/105] xfrm: fix passing zero to ERR_PTR() warning Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 065/105] gfs2: Special-case rindex for gfs2_grow Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 066/105] MIPS: ath79: fix system restart Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 067/105] mtd/maps: fix solutionengine.c printk format warnings Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 068/105] fbdev: omapfb: off by one in omapfb_register_client() Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 069/105] video: goldfishfb: fix memory leak on driver remove Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 070/105] fbdev/via: fix defined but not used warning Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 071/105] perf powerpc: Fix callchain ip filtering when return address is in a register Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 072/105] fbdev: Distinguish between interlaced and progressive modes Greg Kroah-Hartman
2018-09-24 11:33 ` [PATCH 3.18 073/105] perf powerpc: Fix callchain ip filtering Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 074/105] powerpc/powernv: opal_put_chars partial write fix Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 075/105] mac80211: restrict delayed tailroom needed decrement Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 076/105] s390/qeth: fix race in used-buffer accounting Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 077/105] s390/qeth: reset layer2 attribute on layer switch Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 078/105] platform/x86: toshiba_acpi: Fix defined but not used build warnings Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 079/105] RDMA/cma: Protect cma dev list with lock Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 080/105] pstore: Fix incorrect persistent ram buffer mapping Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 081/105] xen/netfront: fix waiting for xenbus state change Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 082/105] IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 083/105] Tools: hv: Fix a bug in the key delete code Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 084/105] usb: Dont die twice if PCI xhci host is not responding in resume Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 085/105] USB: Add quirk to support DJI CineSSD Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 086/105] usb: Avoid use-after-free by flushing endpoints early in usb_set_interface() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 087/105] usb: host: u132-hcd: Fix a sleep-in-atomic-context bug in u132_get_frame() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 088/105] USB: serial: io_ti: fix array underflow in completion handler Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 089/105] usb: misc: uss720: Fix two sleep-in-atomic-context bugs Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 090/105] USB: yurex: Fix buffer over-read in yurex_write() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 091/105] usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 092/105] cifs: prevent integer overflow in nxt_dir_entry() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 093/105] CIFS: fix wrapping bugs in num_entries() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 094/105] binfmt_elf: Respect error return from `regset->active Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 095/105] audit: fix use-after-free in audit_add_watch Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 096/105] mtdchar: fix overflows in adjustment of `count` Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 097/105] MIPS: loongson64: cs5536: Fix PCI_OHCI_INT_REG reads Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 098/105] ARM: hisi: handle of_iomap and fix missing of_node_put Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 099/105] ARM: hisi: check " Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 100/105] parport: sunbpp: fix error return code Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 101/105] rtc: bq4802: add error handling for devm_ioremap Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 102/105] ALSA: pcm: Fix snd_interval_refine first/last with open min/max Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 103/105] drm/panel: type promotion bug in s6e8aa0_read_mtp_id() Greg Kroah-Hartman
2018-09-24 11:34 ` [PATCH 3.18 104/105] IB/nes: Fix a compiler warning Greg Kroah-Hartman
2018-09-24 16:38   ` Joe Perches
2018-09-24 17:59     ` Greg Kroah-Hartman
2018-09-24 18:03       ` Joe Perches
2018-09-24 18:40         ` Greg Kroah-Hartman
2018-09-24 22:39         ` Sasha Levin
2018-09-25  5:45           ` Joe Perches
2018-09-25  8:55           ` Greg Kroah-Hartman
2018-09-25 11:11             ` Joe Perches
2018-09-25 11:32               ` Greg Kroah-Hartman
2018-09-25 11:38                 ` Joe Perches
2018-09-24 11:34 ` [PATCH 3.18 105/105] USB: serial: ti_usb_3410_5052: fix array underflow in completion handler Greg Kroah-Hartman
2018-09-24 22:13 ` [PATCH 3.18 000/105] 3.18.123-stable review Shuah Khan
2018-09-25 20:38 ` Guenter Roeck
2018-09-25 20:40   ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180924113118.180103471@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@microsoft.com \
    --cc=alexchen@synology.com \
    --cc=alexwu@synology.com \
    --cc=bingjingc@synology.com \
    --cc=cccheng@synology.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).