* [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
@ 2015-06-19 18:52 Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 1/5] Btrfs: remove misleading handling of missing device scrub Omar Sandoval
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
Hi,
Here's version 2 of the missing device RAID 5/6 fixes. The original
problem was reported by a user on Bugzilla: the kernel crashed when
attempting to replace a missing device in a RAID 6 filesystem. This is
detailed and fixed in patch 4. After the initial posting, Zhao Lei
reported a similar issue when doing a scrub on a RAID 5 filesystem with
a missing device. This is fixed in the added patch 5.
My new-and-improved-and-overengineered reproducer as well as Zhao Lei's
reproducer can be found below.
Thanks!
v1: http://article.gmane.org/gmane.comp.file-systems.btrfs/45045
v1->v2:
- Add missing scrub_wr_submit() in scrub_missing_raid56_worker()
- Add clarifying comment in dev->missing case of scrub_stripe()
(Zhaolei)
- Add fix for scrub with missing device (patch 5)
Omar Sandoval (5):
Btrfs: remove misleading handling of missing device scrub
Btrfs: count devices correctly in readahead during RAID 5/6 replace
Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
Btrfs: fix device replace of a missing RAID 5/6 device
Btrfs: fix parity scrub of RAID 5/6 with missing device
fs/btrfs/raid56.c | 87 ++++++++++++++++++++---
fs/btrfs/raid56.h | 10 ++-
fs/btrfs/reada.c | 4 +-
fs/btrfs/scrub.c | 202 +++++++++++++++++++++++++++++++++++++++++++++---------
4 files changed, 259 insertions(+), 44 deletions(-)
Reproducer 1:
----
#!/bin/bash
usage () {
USAGE_STRING="Usage: $0 [OPTION]...
Options:
-m failure mode; MODE is 'eio', 'missing', or 'corrupt' (defaults to
'missing')
-n number of files to write, each twice as big as the last, the first
being 1M in size (defaults to 4)
-o operation to perform; OP is 'replace' or 'scrub' (defaults to
'replace')
-r RAID profile; RAID is 'raid0', 'raid1', 'raid10', 'raid5', or 'raid6'
(defaults to 'raid5')
Miscellaneous:
-h display this help message and exit"
case "$1" in
out)
echo "$USAGE_STRING"
exit 0
;;
err)
echo "$USAGE_STRING" >&2
exit 1
;;
esac
}
MODE=missing
RAID=raid5
OP=replace
NUM_FILES=4
while getopts "m:n:o:r:h" OPT; do
case "$OPT" in
m)
MODE="$OPTARG"
;;
r)
RAID="$OPTARG"
;;
o)
OP="$OPTARG"
;;
n)
NUM_FILES="$OPTARG"
if [[ ! "$NUM_FILES" =~ ^[0-9]+$ ]]; then
usage "err"
fi
;;
h)
usage "out"
;;
*)
usage "err"
;;
esac
done
case "$MODE" in
eio|missing|corrupt)
;;
*)
usage err
;;
esac
case "$RAID" in
raid[01])
NUM_RAID_DISKS=2
;;
raid10)
NUM_RAID_DISKS=4
;;
raid5)
NUM_RAID_DISKS=3
;;
raid6)
NUM_RAID_DISKS=4
;;
*)
usage err
;;
esac
case "$OP" in
replace)
NUM_DISKS=$((NUM_RAID_DISKS + 1))
;;
scrub)
NUM_DISKS=$NUM_RAID_DISKS
;;
*)
usage err
;;
esac
echo "Running $OP on $RAID with $MODE"
SRC_DISK=$((NUM_RAID_DISKS - 1))
TARGET_DISK=$((NUM_DISKS - 1))
NUM_SECTORS=$((1024 * 1024))
LOOP_DEVICES=()
DM_DEVICES=()
cleanup () {
echo "Done. Press enter to cleanup..."
read
if findmnt /mnt; then
umount /mnt
fi
for DM in "${DM_DEVICES[@]}"; do
dmsetup remove "$DM"
done
for LOOP in "${LOOP_DEVICES[@]}"; do
losetup --detach "$LOOP"
done
for ((i = 0; i < NUM_DISKS; i++)); do
rm -f disk${i}.img
done
}
trap 'cleanup; exit 1' ERR
echo "Creating disk images..."
for ((i = 0; i < NUM_DISKS; i++)); do
rm -f disk${i}.img
dd if=/dev/zero of=disk${i}.img bs=512 seek=$NUM_SECTORS count=0
LOOP_DEVICES+=("$(losetup --find --show disk${i}.img)")
done
echo "Creating loopback devices..."
for LOOP in "${LOOP_DEVICES[@]}"; do
DM="${LOOP/\/dev\/loop/dm}"
dmsetup create "$DM" --table "0 $NUM_SECTORS linear $LOOP 0"
DM_DEVICES+=("$DM")
done
echo "Creating filesystem..."
FS_DEVICES=("${DM_DEVICES[@]:0:$NUM_RAID_DISKS}")
FS_DEVICES=("${FS_DEVICES[@]/#//dev/mapper/}")
echo "${FS_DEVICES[@]}"
MOUNT_DEVICE="${FS_DEVICES[$(((SRC_DISK + 1) % NUM_RAID_DISKS))]}"
mkfs.btrfs -d "$RAID" -m "$RAID" "${FS_DEVICES[@]}"
mount "$MOUNT_DEVICE" /mnt
for ((i = 0; i < NUM_FILES; i++)); do
dd if=/dev/urandom of=/mnt/file$i bs=1M count=$((1 << $i))
done
sync
case "$MODE" in
eio)
echo "Killing disk..."
dmsetup suspend "${DM_DEVICES[$SRC_DISK]}"
dmsetup reload "${DM_DEVICES[$SRC_DISK]}" --table "0 $NUM_SECTORS error"
dmsetup resume "${DM_DEVICES[$SRC_DISK]}"
;;
missing)
echo "Removing disk and remounting degraded..."
umount /mnt
dmsetup remove "${DM_DEVICES[$SRC_DISK]}"
unset DM_DEVICES[$SRC_DISK]
mount -o degraded "$MOUNT_DEVICE" /mnt
;;
corrupt)
echo "Corrupting disk and remounting degraded..."
umount /mnt
dd if=/dev/zero of=/dev/mapper/"${DM_DEVICES[$SRC_DISK]}" bs=1M count=1
mount -o degraded "$MOUNT_DEVICE" /mnt
;;
esac
case "$OP" in
replace)
echo "Replacing disk..."
btrfs replace start -B $((SRC_DISK + 1)) /dev/mapper/"${DM_DEVICES[$TARGET_DISK]}" /mnt
;;
scrub)
echo "Scrubbing filesystem..."
btrfs scrub start -B /mnt
;;
esac
echo "Scrubbing to double-check..."
btrfs scrub start -Br /mnt
cleanup
----
Reproducer 2:
----
#!/bin/bash
FS_DEVS=(/dev/vdb /dev/vdc /dev/vdd)
PRUNE_DEV=/dev/vdc
MNT=/mnt
do_cmd()
{
echo " $*"
local output
local ret
output=$("$@" 2>&1)
ret="$?"
[[ "$ret" != 0 ]] && {
echo "$output"
}
return "$ret"
}
mkdir -p "$MNT"
for ((i = 0; i < 10; i++)); do
umount "$MNT" &>/dev/null
done
dmesg -c >/dev/null
echo "1: Creating filesystem"
do_cmd mkfs.btrfs -f -d raid5 -m raid5 "${FS_DEVS[@]}" || exit 1
do_cmd mount "$FS_DEVS" "$MNT" || exit 1
echo "2: Write some data"
DATA_CNT=4
for ((i = 0; i < DATA_CNT; i++)); do
size_m="$((1<<i))"
do_cmd dd bs=1M if=/dev/urandom of="$MNT"/file_"$i" count="$size_m" || exit 1
done
echo "3: Prune a disk in fs"
do_cmd umount "$MNT" || exit 1
do_cmd dd bs=1M if=/dev/zero of="$PRUNE_DEV" count=1
do_cmd mount -o "degraded" "$FS_DEVS" "$MNT" || exit 1
echo "4: Do scrub"
do_cmd btrfs scrub start -B "$MNT"
echo "5: Checking result"
dmesg --color
exit 0
----
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/5] Btrfs: remove misleading handling of missing device scrub
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
@ 2015-06-19 18:52 ` Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 2/5] Btrfs: count devices correctly in readahead during RAID 5/6 replace Omar Sandoval
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
scrub_submit() claims that it can handle a bio with a NULL block device,
but this is misleading, as calling bio_add_page() on a bio with a NULL
->bi_bdev would've already crashed. Delete this, as we're about to
properly handle a missing block device.
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
fs/btrfs/scrub.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab5811545a98..633fa7b19b7d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2087,21 +2087,7 @@ static void scrub_submit(struct scrub_ctx *sctx)
sbio = sctx->bios[sctx->curr];
sctx->curr = -1;
scrub_pending_bio_inc(sctx);
-
- if (!sbio->bio->bi_bdev) {
- /*
- * this case should not happen. If btrfs_map_block() is
- * wrong, it could happen for dev-replace operations on
- * missing devices when no mirrors are available, but in
- * this case it should already fail the mount.
- * This case is handled correctly (but _very_ slowly).
- */
- printk_ratelimited(KERN_WARNING
- "BTRFS: scrub_submit(bio bdev == NULL) is unexpected!\n");
- bio_endio(sbio->bio, -EIO);
- } else {
- btrfsic_submit_bio(READ, sbio->bio);
- }
+ btrfsic_submit_bio(READ, sbio->bio);
}
static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/5] Btrfs: count devices correctly in readahead during RAID 5/6 replace
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 1/5] Btrfs: remove misleading handling of missing device scrub Omar Sandoval
@ 2015-06-19 18:52 ` Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 3/5] Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation Omar Sandoval
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
Commit 5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for raid5/6
degraded mounting") fixed a problem where we would skip a missing device
when we shouldn't have because there are no other mirrors to read from
in RAID 5/6. After commit 2c8cdd6ee4e7 ("Btrfs, replace: write dirty
pages into the replace target device"), the fix doesn't work when we're
doing a missing device replace on RAID 5/6 because the replace device is
counted as a mirror so we're tricked into thinking we can safely skip
the missing device. The fix is to count only the real stripes and decide
based on that.
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
fs/btrfs/reada.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 0e7beea92b4c..4645cd16d5ba 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -328,6 +328,7 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root,
struct btrfs_device *prev_dev;
u32 blocksize;
u64 length;
+ int real_stripes;
int nzones = 0;
int i;
unsigned long index = logical >> PAGE_CACHE_SHIFT;
@@ -369,7 +370,8 @@ static struct reada_extent *reada_find_extent(struct btrfs_root *root,
goto error;
}
- for (nzones = 0; nzones < bbio->num_stripes; ++nzones) {
+ real_stripes = bbio->num_stripes - bbio->num_tgtdevs;
+ for (nzones = 0; nzones < real_stripes; ++nzones) {
struct reada_zone *zone;
dev = bbio->stripes[nzones].dev;
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 1/5] Btrfs: remove misleading handling of missing device scrub Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 2/5] Btrfs: count devices correctly in readahead during RAID 5/6 replace Omar Sandoval
@ 2015-06-19 18:52 ` Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 4/5] Btrfs: fix device replace of a missing RAID 5/6 device Omar Sandoval
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
The current RAID 5/6 recovery code isn't quite prepared to handle
missing devices. In particular, it expects a bio that we previously
attempted to use in the read path, meaning that it has valid pages
allocated. However, missing devices have a NULL blkdev, and we can't
call bio_add_page() on a bio with a NULL blkdev. We could do manual
manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add
a separate path that allows us to manually add pages to the rbio.
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
fs/btrfs/raid56.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++-------
fs/btrfs/raid56.h | 10 +++++--
fs/btrfs/scrub.c | 3 +-
3 files changed, 86 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index fa72068bd256..6fe2613ef288 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -61,9 +61,10 @@
#define RBIO_CACHE_SIZE 1024
enum btrfs_rbio_ops {
- BTRFS_RBIO_WRITE = 0,
- BTRFS_RBIO_READ_REBUILD = 1,
- BTRFS_RBIO_PARITY_SCRUB = 2,
+ BTRFS_RBIO_WRITE,
+ BTRFS_RBIO_READ_REBUILD,
+ BTRFS_RBIO_PARITY_SCRUB,
+ BTRFS_RBIO_REBUILD_MISSING,
};
struct btrfs_raid_bio {
@@ -602,6 +603,10 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
cur->operation == BTRFS_RBIO_PARITY_SCRUB)
return 0;
+ if (last->operation == BTRFS_RBIO_REBUILD_MISSING ||
+ cur->operation == BTRFS_RBIO_REBUILD_MISSING)
+ return 0;
+
return 1;
}
@@ -793,7 +798,10 @@ static noinline void unlock_stripe(struct btrfs_raid_bio *rbio)
if (next->operation == BTRFS_RBIO_READ_REBUILD)
async_read_rebuild(next);
- else if (next->operation == BTRFS_RBIO_WRITE) {
+ else if (next->operation == BTRFS_RBIO_REBUILD_MISSING) {
+ steal_rbio(rbio, next);
+ async_read_rebuild(next);
+ } else if (next->operation == BTRFS_RBIO_WRITE) {
steal_rbio(rbio, next);
async_rmw_stripe(next);
} else if (next->operation == BTRFS_RBIO_PARITY_SCRUB) {
@@ -1809,7 +1817,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
faila = rbio->faila;
failb = rbio->failb;
- if (rbio->operation == BTRFS_RBIO_READ_REBUILD) {
+ if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+ rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
spin_lock_irq(&rbio->bio_list_lock);
set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags);
spin_unlock_irq(&rbio->bio_list_lock);
@@ -1834,7 +1843,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
* if we're rebuilding a read, we have to use
* pages from the bio list
*/
- if (rbio->operation == BTRFS_RBIO_READ_REBUILD &&
+ if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+ rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
(stripe == faila || stripe == failb)) {
page = page_in_rbio(rbio, stripe, pagenr, 0);
} else {
@@ -1943,7 +1953,8 @@ pstripe:
* if we're rebuilding a read, we have to use
* pages from the bio list
*/
- if (rbio->operation == BTRFS_RBIO_READ_REBUILD &&
+ if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+ rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
(stripe == faila || stripe == failb)) {
page = page_in_rbio(rbio, stripe, pagenr, 0);
} else {
@@ -1965,6 +1976,8 @@ cleanup_io:
clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
rbio_orig_end_io(rbio, err, err == 0);
+ } else if (rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
+ rbio_orig_end_io(rbio, err, err == 0);
} else if (err == 0) {
rbio->faila = -1;
rbio->failb = -1;
@@ -2101,7 +2114,8 @@ out:
return 0;
cleanup:
- if (rbio->operation == BTRFS_RBIO_READ_REBUILD)
+ if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+ rbio->operation == BTRFS_RBIO_REBUILD_MISSING)
rbio_orig_end_io(rbio, -EIO, 0);
return -EIO;
}
@@ -2232,8 +2246,9 @@ raid56_parity_alloc_scrub_rbio(struct btrfs_root *root, struct bio *bio,
return rbio;
}
-void raid56_parity_add_scrub_pages(struct btrfs_raid_bio *rbio,
- struct page *page, u64 logical)
+/* Used for both parity scrub and missing. */
+void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
+ u64 logical)
{
int stripe_offset;
int index;
@@ -2668,3 +2683,55 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio)
if (!lock_stripe_add(rbio))
async_scrub_parity(rbio);
}
+
+/* The following code is used for dev replace of a missing RAID 5/6 device. */
+
+struct btrfs_raid_bio *
+raid56_alloc_missing_rbio(struct btrfs_root *root, struct bio *bio,
+ struct btrfs_bio *bbio, u64 length)
+{
+ struct btrfs_raid_bio *rbio;
+
+ rbio = alloc_rbio(root, bbio, length);
+ if (IS_ERR(rbio))
+ return NULL;
+
+ rbio->operation = BTRFS_RBIO_REBUILD_MISSING;
+ bio_list_add(&rbio->bio_list, bio);
+ /*
+ * This is a special bio which is used to hold the completion handler
+ * and make the scrub rbio is similar to the other types
+ */
+ ASSERT(!bio->bi_iter.bi_size);
+
+ rbio->faila = find_logical_bio_stripe(rbio, bio);
+ if (rbio->faila == -1) {
+ BUG();
+ kfree(rbio);
+ return NULL;
+ }
+
+ return rbio;
+}
+
+static void missing_raid56_work(struct btrfs_work *work)
+{
+ struct btrfs_raid_bio *rbio;
+
+ rbio = container_of(work, struct btrfs_raid_bio, work);
+ __raid56_parity_recover(rbio);
+}
+
+static void async_missing_raid56(struct btrfs_raid_bio *rbio)
+{
+ btrfs_init_work(&rbio->work, btrfs_rmw_helper,
+ missing_raid56_work, NULL, NULL);
+
+ btrfs_queue_work(rbio->fs_info->rmw_workers, &rbio->work);
+}
+
+void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio)
+{
+ if (!lock_stripe_add(rbio))
+ async_missing_raid56(rbio);
+}
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 2b5d7977d83b..8b694699d502 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -48,15 +48,21 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
int raid56_parity_write(struct btrfs_root *root, struct bio *bio,
struct btrfs_bio *bbio, u64 stripe_len);
+void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
+ u64 logical);
+
struct btrfs_raid_bio *
raid56_parity_alloc_scrub_rbio(struct btrfs_root *root, struct bio *bio,
struct btrfs_bio *bbio, u64 stripe_len,
struct btrfs_device *scrub_dev,
unsigned long *dbitmap, int stripe_nsectors);
-void raid56_parity_add_scrub_pages(struct btrfs_raid_bio *rbio,
- struct page *page, u64 logical);
void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio);
+struct btrfs_raid_bio *
+raid56_alloc_missing_rbio(struct btrfs_root *root, struct bio *bio,
+ struct btrfs_bio *bbio, u64 length);
+void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio);
+
int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info);
void btrfs_free_stripe_hash_table(struct btrfs_fs_info *info);
#endif
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 633fa7b19b7d..b94694d59de5 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2699,8 +2699,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
goto rbio_out;
list_for_each_entry(spage, &sparity->spages, list)
- raid56_parity_add_scrub_pages(rbio, spage->page,
- spage->logical);
+ raid56_add_scrub_pages(rbio, spage->page, spage->logical);
scrub_pending_bio_inc(sctx);
raid56_parity_submit_scrub_rbio(rbio);
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] Btrfs: fix device replace of a missing RAID 5/6 device
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
` (2 preceding siblings ...)
2015-06-19 18:52 ` [PATCH v2 3/5] Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation Omar Sandoval
@ 2015-06-19 18:52 ` Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 5/5] Btrfs: fix parity scrub of RAID 5/6 with missing device Omar Sandoval
2015-06-23 3:07 ` [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace wangyf
5 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
The original implementation of device replace on RAID 5/6 seems to have
missed support for replacing a missing device. When this is attempted,
we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which
crashes when we try to dereference it. This happens because
btrfs_map_block() has no choice but to return us the missing device
because RAID 5/6 don't have any alternate mirrors to read from, and a
missing device has a NULL bdev.
The idea implemented here is to handle the missing device case
separately, which better only happen when we're replacing a missing RAID
5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to
reconstruct the data from parity, check it with
scrub_recheck_block_checksum(), and write it out with
scrub_write_block_to_dev_replace().
Reported-by: Philip <bugzilla@philip-seeger.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
fs/btrfs/scrub.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 147 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b94694d59de5..b75f1e9c6adc 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -125,6 +125,7 @@ struct scrub_block {
/* It is for the data with checksum */
unsigned int data_corrected:1;
};
+ struct btrfs_work work;
};
/* Used for the chunks with parity stripe such RAID5/6 */
@@ -2164,6 +2165,134 @@ again:
return 0;
}
+static void scrub_missing_raid56_end_io(struct bio *bio, int error)
+{
+ struct scrub_block *sblock = bio->bi_private;
+ struct btrfs_fs_info *fs_info = sblock->sctx->dev_root->fs_info;
+
+ if (error)
+ sblock->no_io_error_seen = 0;
+
+ btrfs_queue_work(fs_info->scrub_workers, &sblock->work);
+}
+
+static void scrub_missing_raid56_worker(struct btrfs_work *work)
+{
+ struct scrub_block *sblock = container_of(work, struct scrub_block, work);
+ struct scrub_ctx *sctx = sblock->sctx;
+ struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info;
+ unsigned int is_metadata;
+ unsigned int have_csum;
+ u8 *csum;
+ u64 generation;
+ u64 logical;
+ struct btrfs_device *dev;
+
+ is_metadata = !(sblock->pagev[0]->flags & BTRFS_EXTENT_FLAG_DATA);
+ have_csum = sblock->pagev[0]->have_csum;
+ csum = sblock->pagev[0]->csum;
+ generation = sblock->pagev[0]->generation;
+ logical = sblock->pagev[0]->logical;
+ dev = sblock->pagev[0]->dev;
+
+ if (sblock->no_io_error_seen) {
+ scrub_recheck_block_checksum(fs_info, sblock, is_metadata,
+ have_csum, csum, generation,
+ sctx->csum_size);
+ }
+
+ if (!sblock->no_io_error_seen) {
+ spin_lock(&sctx->stat_lock);
+ sctx->stat.read_errors++;
+ spin_unlock(&sctx->stat_lock);
+ printk_ratelimited_in_rcu(KERN_ERR
+ "BTRFS: I/O error rebulding logical %llu for dev %s\n",
+ logical, rcu_str_deref(dev->name));
+ } else if (sblock->header_error || sblock->checksum_error) {
+ spin_lock(&sctx->stat_lock);
+ sctx->stat.uncorrectable_errors++;
+ spin_unlock(&sctx->stat_lock);
+ printk_ratelimited_in_rcu(KERN_ERR
+ "BTRFS: failed to rebuild valid logical %llu for dev %s\n",
+ logical, rcu_str_deref(dev->name));
+ } else {
+ scrub_write_block_to_dev_replace(sblock);
+ }
+
+ scrub_block_put(sblock);
+
+ if (sctx->is_dev_replace &&
+ atomic_read(&sctx->wr_ctx.flush_all_writes)) {
+ mutex_lock(&sctx->wr_ctx.wr_lock);
+ scrub_wr_submit(sctx);
+ mutex_unlock(&sctx->wr_ctx.wr_lock);
+ }
+
+ scrub_pending_bio_dec(sctx);
+}
+
+static void scrub_missing_raid56_pages(struct scrub_block *sblock)
+{
+ struct scrub_ctx *sctx = sblock->sctx;
+ struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info;
+ u64 length = sblock->page_count * PAGE_SIZE;
+ u64 logical = sblock->pagev[0]->logical;
+ struct btrfs_bio *bbio;
+ struct bio *bio;
+ struct btrfs_raid_bio *rbio;
+ int ret;
+ int i;
+
+ ret = btrfs_map_sblock(fs_info, REQ_GET_READ_MIRRORS, logical, &length,
+ &bbio, 0, 1);
+ if (ret || !bbio || !bbio->raid_map)
+ goto bbio_out;
+
+ if (WARN_ON(!sctx->is_dev_replace ||
+ !(bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK))) {
+ /*
+ * We shouldn't be scrubbing a missing device. Even for dev
+ * replace, we should only get here for RAID 5/6. We either
+ * managed to mount something with no mirrors remaining or
+ * there's a bug in scrub_remap_extent()/btrfs_map_block().
+ */
+ goto bbio_out;
+ }
+
+ bio = btrfs_io_bio_alloc(GFP_NOFS, 0);
+ if (!bio)
+ goto bbio_out;
+
+ bio->bi_iter.bi_sector = logical >> 9;
+ bio->bi_private = sblock;
+ bio->bi_end_io = scrub_missing_raid56_end_io;
+
+ rbio = raid56_alloc_missing_rbio(sctx->dev_root, bio, bbio, length);
+ if (!rbio)
+ goto rbio_out;
+
+ for (i = 0; i < sblock->page_count; i++) {
+ struct scrub_page *spage = sblock->pagev[i];
+
+ raid56_add_scrub_pages(rbio, spage->page, spage->logical);
+ }
+
+ btrfs_init_work(&sblock->work, btrfs_scrub_helper,
+ scrub_missing_raid56_worker, NULL, NULL);
+ scrub_block_get(sblock);
+ scrub_pending_bio_inc(sctx);
+ raid56_submit_missing_rbio(rbio);
+ return;
+
+rbio_out:
+ bio_put(bio);
+bbio_out:
+ btrfs_put_bbio(bbio);
+ spin_lock(&sctx->stat_lock);
+ sctx->stat.malloc_errors++;
+ spin_unlock(&sctx->stat_lock);
+}
+
static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
u64 physical, struct btrfs_device *dev, u64 flags,
u64 gen, int mirror_num, u8 *csum, int force,
@@ -2227,19 +2356,27 @@ leave_nomem:
}
WARN_ON(sblock->page_count == 0);
- for (index = 0; index < sblock->page_count; index++) {
- struct scrub_page *spage = sblock->pagev[index];
- int ret;
+ if (dev->missing) {
+ /*
+ * This case should only be hit for RAID 5/6 device replace. See
+ * the comment in scrub_missing_raid56_pages() for details.
+ */
+ scrub_missing_raid56_pages(sblock);
+ } else {
+ for (index = 0; index < sblock->page_count; index++) {
+ struct scrub_page *spage = sblock->pagev[index];
+ int ret;
- ret = scrub_add_page_to_rd_bio(sctx, spage);
- if (ret) {
- scrub_block_put(sblock);
- return ret;
+ ret = scrub_add_page_to_rd_bio(sctx, spage);
+ if (ret) {
+ scrub_block_put(sblock);
+ return ret;
+ }
}
- }
- if (force)
- scrub_submit(sctx);
+ if (force)
+ scrub_submit(sctx);
+ }
/* last one frees, either here or in bio completion for last page */
scrub_block_put(sblock);
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] Btrfs: fix parity scrub of RAID 5/6 with missing device
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
` (3 preceding siblings ...)
2015-06-19 18:52 ` [PATCH v2 4/5] Btrfs: fix device replace of a missing RAID 5/6 device Omar Sandoval
@ 2015-06-19 18:52 ` Omar Sandoval
2015-06-23 3:07 ` [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace wangyf
5 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-19 18:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Miao Xie, Zhao Lei, wangyf, Philip, Omar Sandoval
When testing the previous patch, Zhao Lei reported a similar bug when
attempting to scrub a degraded RAID 5/6 filesystem with a missing
device, leading to NULL pointer dereferences from the RAID 5/6 parity
scrubbing code.
The first cause was the same as in the previous patch: attempting to
call bio_add_page() on a missing block device. To fix this,
scrub_extent_for_parity() can just mark the sectors on the missing
device as errors instead of attempting to read from it.
Additionally, the code uses scrub_remap_extent() to map the extent of
the corresponding data stripe, but the extent wasn't already mapped. If
scrub_remap_extent() finds a missing block device, it doesn't initialize
extent_dev, so we're left with a NULL struct btrfs_device. The solution
is to use btrfs_map_block() directly.
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
fs/btrfs/scrub.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b75f1e9c6adc..731bab4c0118 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2687,6 +2687,11 @@ static int scrub_extent_for_parity(struct scrub_parity *sparity,
u8 csum[BTRFS_CSUM_SIZE];
u32 blocksize;
+ if (dev->missing) {
+ scrub_parity_mark_sectors_error(sparity, logical, len);
+ return 0;
+ }
+
if (flags & BTRFS_EXTENT_FLAG_DATA) {
blocksize = sctx->sectorsize;
} else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
@@ -2884,6 +2889,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_root *csum_root = fs_info->csum_root;
struct btrfs_extent_item *extent;
+ struct btrfs_bio *bbio = NULL;
u64 flags;
int ret;
int slot;
@@ -2893,6 +2899,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
u64 extent_logical;
u64 extent_physical;
u64 extent_len;
+ u64 mapped_length;
struct btrfs_device *extent_dev;
struct scrub_parity *sparity;
int nsectors;
@@ -3015,10 +3022,21 @@ again:
scrub_parity_mark_sectors_data(sparity, extent_logical,
extent_len);
- scrub_remap_extent(fs_info, extent_logical,
- extent_len, &extent_physical,
- &extent_dev,
- &extent_mirror_num);
+ mapped_length = extent_len;
+ ret = btrfs_map_block(fs_info, READ, extent_logical,
+ &mapped_length, &bbio, 0);
+ if (!ret) {
+ if (!bbio || mapped_length < extent_len)
+ ret = -EIO;
+ }
+ if (ret) {
+ btrfs_put_bbio(bbio);
+ goto out;
+ }
+ extent_physical = bbio->stripes[0].physical;
+ extent_mirror_num = bbio->mirror_num;
+ extent_dev = bbio->stripes[0].dev;
+ btrfs_put_bbio(bbio);
ret = btrfs_lookup_csums_range(csum_root,
extent_logical,
--
1.8.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
` (4 preceding siblings ...)
2015-06-19 18:52 ` [PATCH v2 5/5] Btrfs: fix parity scrub of RAID 5/6 with missing device Omar Sandoval
@ 2015-06-23 3:07 ` wangyf
2015-06-24 4:15 ` Omar Sandoval
5 siblings, 1 reply; 13+ messages in thread
From: wangyf @ 2015-06-23 3:07 UTC (permalink / raw)
To: Omar Sandoval, linux-btrfs; +Cc: Miao Xie, Zhao Lei, Philip
Hi,
I have tested your PATCH v2 , but something wrong happened.
kernel: 4.1.0-rc7+ with your five patches
vitrualBox ubuntu14.10-server + LVM
I make a new btrfs.ko with your patches,
rmmod original module and insmod the new.
When I use the profile RAID1/10, mkfs successfully
But when mount the fs, dmesg dumped:
trans: 18446612133975020584 running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584
running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584
running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584
running 5
... ...
When use the RAID5/6, mkfs and mount
system stoped at the 'mount -t btrfs /dev/mapper/server-dev1 /mnt' cmd.
That's all.
在 2015年06月20日 02:52, Omar Sandoval 写道:
> Hi,
>
> Here's version 2 of the missing device RAID 5/6 fixes. The original
> problem was reported by a user on Bugzilla: the kernel crashed when
> attempting to replace a missing device in a RAID 6 filesystem. This is
> detailed and fixed in patch 4. After the initial posting, Zhao Lei
> reported a similar issue when doing a scrub on a RAID 5 filesystem with
> a missing device. This is fixed in the added patch 5.
>
> My new-and-improved-and-overengineered reproducer as well as Zhao Lei's
> reproducer can be found below.
>
> Thanks!
>
> v1: http://article.gmane.org/gmane.comp.file-systems.btrfs/45045
> v1->v2:
> - Add missing scrub_wr_submit() in scrub_missing_raid56_worker()
> - Add clarifying comment in dev->missing case of scrub_stripe()
> (Zhaolei)
> - Add fix for scrub with missing device (patch 5)
>
> Omar Sandoval (5):
> Btrfs: remove misleading handling of missing device scrub
> Btrfs: count devices correctly in readahead during RAID 5/6 replace
> Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
> Btrfs: fix device replace of a missing RAID 5/6 device
> Btrfs: fix parity scrub of RAID 5/6 with missing device
>
> fs/btrfs/raid56.c | 87 ++++++++++++++++++++---
> fs/btrfs/raid56.h | 10 ++-
> fs/btrfs/reada.c | 4 +-
> fs/btrfs/scrub.c | 202 +++++++++++++++++++++++++++++++++++++++++++++---------
> 4 files changed, 259 insertions(+), 44 deletions(-)
>
> Reproducer 1:
>
> ----
> #!/bin/bash
>
> usage () {
> USAGE_STRING="Usage: $0 [OPTION]...
> Options:
> -m failure mode; MODE is 'eio', 'missing', or 'corrupt' (defaults to
> 'missing')
> -n number of files to write, each twice as big as the last, the first
> being 1M in size (defaults to 4)
> -o operation to perform; OP is 'replace' or 'scrub' (defaults to
> 'replace')
> -r RAID profile; RAID is 'raid0', 'raid1', 'raid10', 'raid5', or 'raid6'
> (defaults to 'raid5')
>
> Miscellaneous:
> -h display this help message and exit"
>
> case "$1" in
> out)
> echo "$USAGE_STRING"
> exit 0
> ;;
> err)
> echo "$USAGE_STRING" >&2
> exit 1
> ;;
> esac
> }
>
> MODE=missing
> RAID=raid5
> OP=replace
> NUM_FILES=4
>
> while getopts "m:n:o:r:h" OPT; do
> case "$OPT" in
> m)
> MODE="$OPTARG"
> ;;
> r)
> RAID="$OPTARG"
> ;;
> o)
> OP="$OPTARG"
> ;;
> n)
> NUM_FILES="$OPTARG"
> if [[ ! "$NUM_FILES" =~ ^[0-9]+$ ]]; then
> usage "err"
> fi
> ;;
> h)
> usage "out"
> ;;
> *)
> usage "err"
> ;;
> esac
> done
>
> case "$MODE" in
> eio|missing|corrupt)
> ;;
> *)
> usage err
> ;;
> esac
>
> case "$RAID" in
> raid[01])
> NUM_RAID_DISKS=2
> ;;
> raid10)
> NUM_RAID_DISKS=4
> ;;
> raid5)
> NUM_RAID_DISKS=3
> ;;
> raid6)
> NUM_RAID_DISKS=4
> ;;
> *)
> usage err
> ;;
> esac
>
> case "$OP" in
> replace)
> NUM_DISKS=$((NUM_RAID_DISKS + 1))
> ;;
> scrub)
> NUM_DISKS=$NUM_RAID_DISKS
> ;;
> *)
> usage err
> ;;
> esac
>
> echo "Running $OP on $RAID with $MODE"
>
> SRC_DISK=$((NUM_RAID_DISKS - 1))
> TARGET_DISK=$((NUM_DISKS - 1))
> NUM_SECTORS=$((1024 * 1024))
> LOOP_DEVICES=()
> DM_DEVICES=()
>
> cleanup () {
> echo "Done. Press enter to cleanup..."
> read
> if findmnt /mnt; then
> umount /mnt
> fi
> for DM in "${DM_DEVICES[@]}"; do
> dmsetup remove "$DM"
> done
> for LOOP in "${LOOP_DEVICES[@]}"; do
> losetup --detach "$LOOP"
> done
> for ((i = 0; i < NUM_DISKS; i++)); do
> rm -f disk${i}.img
> done
> }
> trap 'cleanup; exit 1' ERR
>
> echo "Creating disk images..."
> for ((i = 0; i < NUM_DISKS; i++)); do
> rm -f disk${i}.img
> dd if=/dev/zero of=disk${i}.img bs=512 seek=$NUM_SECTORS count=0
> LOOP_DEVICES+=("$(losetup --find --show disk${i}.img)")
> done
>
> echo "Creating loopback devices..."
> for LOOP in "${LOOP_DEVICES[@]}"; do
> DM="${LOOP/\/dev\/loop/dm}"
> dmsetup create "$DM" --table "0 $NUM_SECTORS linear $LOOP 0"
> DM_DEVICES+=("$DM")
> done
>
> echo "Creating filesystem..."
> FS_DEVICES=("${DM_DEVICES[@]:0:$NUM_RAID_DISKS}")
> FS_DEVICES=("${FS_DEVICES[@]/#//dev/mapper/}")
> echo "${FS_DEVICES[@]}"
> MOUNT_DEVICE="${FS_DEVICES[$(((SRC_DISK + 1) % NUM_RAID_DISKS))]}"
> mkfs.btrfs -d "$RAID" -m "$RAID" "${FS_DEVICES[@]}"
> mount "$MOUNT_DEVICE" /mnt
> for ((i = 0; i < NUM_FILES; i++)); do
> dd if=/dev/urandom of=/mnt/file$i bs=1M count=$((1 << $i))
> done
> sync
>
> case "$MODE" in
> eio)
> echo "Killing disk..."
> dmsetup suspend "${DM_DEVICES[$SRC_DISK]}"
> dmsetup reload "${DM_DEVICES[$SRC_DISK]}" --table "0 $NUM_SECTORS error"
> dmsetup resume "${DM_DEVICES[$SRC_DISK]}"
> ;;
> missing)
> echo "Removing disk and remounting degraded..."
> umount /mnt
> dmsetup remove "${DM_DEVICES[$SRC_DISK]}"
> unset DM_DEVICES[$SRC_DISK]
> mount -o degraded "$MOUNT_DEVICE" /mnt
> ;;
> corrupt)
> echo "Corrupting disk and remounting degraded..."
> umount /mnt
> dd if=/dev/zero of=/dev/mapper/"${DM_DEVICES[$SRC_DISK]}" bs=1M count=1
> mount -o degraded "$MOUNT_DEVICE" /mnt
> ;;
> esac
>
> case "$OP" in
> replace)
> echo "Replacing disk..."
> btrfs replace start -B $((SRC_DISK + 1)) /dev/mapper/"${DM_DEVICES[$TARGET_DISK]}" /mnt
> ;;
> scrub)
> echo "Scrubbing filesystem..."
> btrfs scrub start -B /mnt
> ;;
> esac
>
> echo "Scrubbing to double-check..."
> btrfs scrub start -Br /mnt
>
> cleanup
> ----
>
> Reproducer 2:
>
> ----
> #!/bin/bash
>
> FS_DEVS=(/dev/vdb /dev/vdc /dev/vdd)
> PRUNE_DEV=/dev/vdc
> MNT=/mnt
>
> do_cmd()
> {
> echo " $*"
> local output
> local ret
> output=$("$@" 2>&1)
> ret="$?"
> [[ "$ret" != 0 ]] && {
> echo "$output"
> }
> return "$ret"
> }
>
> mkdir -p "$MNT"
> for ((i = 0; i < 10; i++)); do
> umount "$MNT" &>/dev/null
> done
> dmesg -c >/dev/null
>
> echo "1: Creating filesystem"
> do_cmd mkfs.btrfs -f -d raid5 -m raid5 "${FS_DEVS[@]}" || exit 1
> do_cmd mount "$FS_DEVS" "$MNT" || exit 1
>
> echo "2: Write some data"
> DATA_CNT=4
> for ((i = 0; i < DATA_CNT; i++)); do
> size_m="$((1<<i))"
> do_cmd dd bs=1M if=/dev/urandom of="$MNT"/file_"$i" count="$size_m" || exit 1
> done
>
> echo "3: Prune a disk in fs"
> do_cmd umount "$MNT" || exit 1
> do_cmd dd bs=1M if=/dev/zero of="$PRUNE_DEV" count=1
> do_cmd mount -o "degraded" "$FS_DEVS" "$MNT" || exit 1
>
> echo "4: Do scrub"
> do_cmd btrfs scrub start -B "$MNT"
>
> echo "5: Checking result"
> dmesg --color
>
> exit 0
> ----
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-23 3:07 ` [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace wangyf
@ 2015-06-24 4:15 ` Omar Sandoval
2015-06-24 12:00 ` Ed Tomlinson
0 siblings, 1 reply; 13+ messages in thread
From: Omar Sandoval @ 2015-06-24 4:15 UTC (permalink / raw)
To: wangyf; +Cc: linux-btrfs, Miao Xie, Zhao Lei, Philip
On Tue, Jun 23, 2015 at 11:07:00AM +0800, wangyf wrote:
> Hi,
> I have tested your PATCH v2 , but something wrong happened.
>
> kernel: 4.1.0-rc7+ with your five patches
> vitrualBox ubuntu14.10-server + LVM
>
> I make a new btrfs.ko with your patches,
> rmmod original module and insmod the new.
>
> When I use the profile RAID1/10, mkfs successfully
> But when mount the fs, dmesg dumped:
> trans: 18446612133975020584 running 5
> btrfs transid mismatch buffer 29507584, found 18446612133975020584
> running 5
> btrfs transid mismatch buffer 29507584, found 18446612133975020584
> running 5
> btrfs transid mismatch buffer 29507584, found 18446612133975020584
> running 5
> ... ...
>
> When use the RAID5/6, mkfs and mount
> system stoped at the 'mount -t btrfs /dev/mapper/server-dev1 /mnt' cmd.
>
> That's all.
Hm, that's really weird, I can't reproduce that here at all. I don't see
what would cause that in this series, and the changes from v1 are
minimal. For my sake, could you make sure that there's nothing else
going on?
Thanks!
--
Omar
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-24 4:15 ` Omar Sandoval
@ 2015-06-24 12:00 ` Ed Tomlinson
2015-06-25 5:03 ` wangyf
0 siblings, 1 reply; 13+ messages in thread
From: Ed Tomlinson @ 2015-06-24 12:00 UTC (permalink / raw)
To: Omar Sandoval; +Cc: wangyf, linux-btrfs, Miao Xie, Zhao Lei, Philip
On Wednesday, June 24, 2015 12:15:29 AM EDT, Omar Sandoval wrote:
> On Tue, Jun 23, 2015 at 11:07:00AM +0800, wangyf wrote:
>> Hi,
>> I have tested your PATCH v2 , but something wrong happened.
>>
>> kernel: 4.1.0-rc7+ with your five patches
>> vitrualBox ubuntu14.10-server + LVM
>>
>> I make a new btrfs.ko with your patches,
>> rmmod original module and insmod the new.
>>
>> When I use the profile RAID1/10, mkfs successfully
>> But when mount the fs, dmesg dumped:
>> trans: 18446612133975020584 running 5
>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>> running 5
>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>> running 5
>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>> running 5
>> ... ...
>>
>> When use the RAID5/6, mkfs and mount
>> system stoped at the 'mount -t btrfs /dev/mapper/server-dev1 /mnt' cmd.
>>
>> That's all.
>
> Hm, that's really weird, I can't reproduce that here at all. I don't see
> what would cause that in this series, and the changes from v1 are
> minimal. For my sake, could you make sure that there's nothing else
> going on?
Omar,
I've been running v1 of this patch and now with v2 and have done numberious
reboots without issues... Just another data point.
Ed
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-24 12:00 ` Ed Tomlinson
@ 2015-06-25 5:03 ` wangyf
2015-06-25 16:35 ` Omar Sandoval
0 siblings, 1 reply; 13+ messages in thread
From: wangyf @ 2015-06-25 5:03 UTC (permalink / raw)
To: Ed Tomlinson, Omar Sandoval; +Cc: linux-btrfs, Miao Xie, Zhao Lei, Philip
I confirmed this bug report, and found the reason is that
I compiled the patched module with a dirty kernel.
This morning I tested this patch again, and didn't see above error,
this patch is OK.
Sorry for this bug report. : (
在 2015年06月24日 20:00, Ed Tomlinson 写道:
> On Wednesday, June 24, 2015 12:15:29 AM EDT, Omar Sandoval wrote:
>> On Tue, Jun 23, 2015 at 11:07:00AM +0800, wangyf wrote:
>>> Hi,
>>> I have tested your PATCH v2 , but something wrong happened.
>>>
>>> kernel: 4.1.0-rc7+ with your five patches
>>> vitrualBox ubuntu14.10-server + LVM
>>>
>>> I make a new btrfs.ko with your patches,
>>> rmmod original module and insmod the new.
>>>
>>> When I use the profile RAID1/10, mkfs successfully
>>> But when mount the fs, dmesg dumped:
>>> trans: 18446612133975020584 running 5
>>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>>> running 5
>>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>>> running 5
>>> btrfs transid mismatch buffer 29507584, found 18446612133975020584
>>> running 5
>>> ... ...
>>>
>>> When use the RAID5/6, mkfs and mount
>>> system stoped at the 'mount -t btrfs /dev/mapper/server-dev1 /mnt' cmd.
>>>
>>> That's all.
>>
>> Hm, that's really weird, I can't reproduce that here at all. I don't see
>> what would cause that in this series, and the changes from v1 are
>> minimal. For my sake, could you make sure that there's nothing else
>> going on?
>
> Omar,
>
> I've been running v1 of this patch and now with v2 and have done
> numberious reboots without issues... Just another data point.
>
> Ed
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-25 5:03 ` wangyf
@ 2015-06-25 16:35 ` Omar Sandoval
2015-06-26 2:07 ` wangyf
2015-06-26 2:46 ` wangyf
0 siblings, 2 replies; 13+ messages in thread
From: Omar Sandoval @ 2015-06-25 16:35 UTC (permalink / raw)
To: wangyf; +Cc: Ed Tomlinson, linux-btrfs, Miao Xie, Zhao Lei, Philip
On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote:
> I confirmed this bug report, and found the reason is that
> I compiled the patched module with a dirty kernel.
> This morning I tested this patch again, and didn't see above error, this
> patch is OK.
> Sorry for this bug report. : (
It's no problem! Do either of you feel like providing your Tested-by?
--
Omar
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-25 16:35 ` Omar Sandoval
@ 2015-06-26 2:07 ` wangyf
2015-06-26 2:46 ` wangyf
1 sibling, 0 replies; 13+ messages in thread
From: wangyf @ 2015-06-26 2:07 UTC (permalink / raw)
To: Omar Sandoval; +Cc: Ed Tomlinson, linux-btrfs, Miao Xie, Zhao Lei, Philip
No problem.
Wang Yanfeng <wangyf-fnst@cn.fujitsu.com>
cheers
wangyf
在 2015年06月26日 00:35, Omar Sandoval 写道:
> On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote:
>> I confirmed this bug report, and found the reason is that
>> I compiled the patched module with a dirty kernel.
>> This morning I tested this patch again, and didn't see above error, this
>> patch is OK.
>> Sorry for this bug report. : (
> It's no problem! Do either of you feel like providing your Tested-by?
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
2015-06-25 16:35 ` Omar Sandoval
2015-06-26 2:07 ` wangyf
@ 2015-06-26 2:46 ` wangyf
1 sibling, 0 replies; 13+ messages in thread
From: wangyf @ 2015-06-26 2:46 UTC (permalink / raw)
To: Omar Sandoval; +Cc: Ed Tomlinson, linux-btrfs, Miao Xie, Zhao Lei, Philip
Tested-by: Wang Yanfeng <wangyf-fnst@cn.fujitsu.com>
On 06/26/2015 12:35 AM, Omar Sandoval wrote:
> On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote:
>> I confirmed this bug report, and found the reason is that
>> I compiled the patched module with a dirty kernel.
>> This morning I tested this patch again, and didn't see above error, this
>> patch is OK.
>> Sorry for this bug report. : (
> It's no problem! Do either of you feel like providing your Tested-by?
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-06-26 2:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19 18:52 [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 1/5] Btrfs: remove misleading handling of missing device scrub Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 2/5] Btrfs: count devices correctly in readahead during RAID 5/6 replace Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 3/5] Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 4/5] Btrfs: fix device replace of a missing RAID 5/6 device Omar Sandoval
2015-06-19 18:52 ` [PATCH v2 5/5] Btrfs: fix parity scrub of RAID 5/6 with missing device Omar Sandoval
2015-06-23 3:07 ` [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace wangyf
2015-06-24 4:15 ` Omar Sandoval
2015-06-24 12:00 ` Ed Tomlinson
2015-06-25 5:03 ` wangyf
2015-06-25 16:35 ` Omar Sandoval
2015-06-26 2:07 ` wangyf
2015-06-26 2:46 ` wangyf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.