All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup
@ 2017-04-06  3:22 Anand Jain
  2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

These patches adds cleanup of device barrier, q and flush codes.
This was needed for the following reasons..

  - First of all, Qu approach on the chunk based degradable check
wanted to know what devices failed rather than total count of failed
devices. So the patch 3/7 adds a function check_barrier_error() in
which Qu can add such a logic. Irrespective whether those Qu patches
passes through all the reviews, and if it does not, still this patch
is good to have as it does not change the commit performance since
check_barrier_error() is only called when there is an error in any of
the device. However I believe a volume manager when deciding an action
of the device error, it should rather have a broader view and state of
other devices instead of action based on just individual device, this
just a step towards that. (Earlier the patch [1] was also on the similar
idea, its yet to be separated from the feature patch, as time permits
and priority rises. [1] https://patchwork.kernel.org/patch/9058361/)

 [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error

  - However barrier_all_devices() wasn't capable of supporting 3/7
(above). So clean up of write_dev_flush() is needed, and if it uses
blkdev_issue_flush() then the way we count the error device especially
the err_send++ part can be avoided. Patch 1/7 provides this cleanup.
  
 [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache

  - There are some static error checking in barrier_all_devices(), such
as !dev->bdev && errors_send++, these errors would remain same before
or after the flush. As of now we do not have any code which will bring
a null bdev back, also we are in device_mutex. So consolidated all
those errors as dropouts at the place after the flush. This clean up is
in 2/7.

 [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count

Now with 1/7 and 2/7 in place, 3/7 can be applied. These 3 are
dependent patches.

  - Patch 4/7 is not related to above 3 patches, however it adds up a
cleanup around the REQ_PREFLUSH flag. It is found that we do not have
any bio with the flag  REQ_PREFLUSH set. So an end io checking for that
flag is not required. Concerned what if there is a bio in future which
will set REQ_PREFLUSH ? I am not too sure about that. However the
write_dev_flush code with the patch 3/7 (above) is still unaffected
with or without patch 4/7, thats because 3/7  does not depend on
BTRFS_DEV_STAT_FLUSH_ERRS, instead it uses a new per device
last_flush_err member to track the error. But it would have been better
or the right way if it had checked BTRFS_DEV_STAT_FLUSH_ERRS instead.
But as it seeks much bigger change, it can be optimized to use
BTRFS_DEV_STAT_FLUSH_ERRS at a later point
of time.
  
 [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback

And rest of the patches are completely a cleanup patches which are
found good to have when around these parts of the code. They are -

[PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue
[PATCH v4 6/7] btrfs: delete unused member nobarriers
[PATCH v4 7/7] btrfs: check if the device is flush capable

Anand Jain (7):
  btrfs: use blkdev_issue_flush to flush the device cache
  btrfs: cleanup barrier_all_devices() unify dev error count
  btrfs: cleanup barrier_all_devices() to check dev stat flush error
  btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback
  btrfs: use q which is already obtained from bdev_get_queue
  btrfs: delete unused member nobarriers
  btrfs: check if the device is flush capable

 fs/btrfs/disk-io.c | 112 +++++++++++++++++++++++++----------------------------
 fs/btrfs/volumes.c |  10 ++---
 fs/btrfs/volumes.h |   4 +-
 3 files changed, 58 insertions(+), 68 deletions(-)

-- 
2.10.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-13 18:41   ` Liu Bo
  2017-04-18 13:54   ` David Sterba
  2017-04-06  3:22 ` [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count Anand Jain
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
to flush the device cache, instead we can use blkdev_issue_flush()
for this puspose.

Also now no need to check the return when write_dev_flush() is called
with wait = 0

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: Title of this patch is changed from
   btrfs: communicate back ENOMEM when it occurs
  And its entirely a new patch, which now use blkdev_issue_flush()
v3: no change
v4: no change

 fs/btrfs/disk-io.c | 64 +++++++++++++++---------------------------------------
 fs/btrfs/volumes.h |  3 ++-
 2 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08b74daf35d0..08cbbee228ee 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3498,70 +3498,42 @@ static int write_dev_supers(struct btrfs_device *device,
 	return errors < i ? 0 : -1;
 }
 
-/*
- * endio for the write_dev_flush, this will wake anyone waiting
- * for the barrier when it is done
- */
-static void btrfs_end_empty_barrier(struct bio *bio)
+static void btrfs_dev_issue_flush(struct work_struct *work)
 {
-	if (bio->bi_private)
-		complete(bio->bi_private);
-	bio_put(bio);
+	int ret;
+	struct btrfs_device *device;
+
+	device = container_of(work, struct btrfs_device, flush_work);
+
+	/* we are in the commit thread */
+	ret = blkdev_issue_flush(device->bdev, GFP_NOFS, NULL);
+	device->last_flush_error = ret;
+	complete(&device->flush_wait);
 }
 
 /*
  * trigger flushes for one the devices.  If you pass wait == 0, the flushes are
  * sent down.  With wait == 1, it waits for the previous flush.
- *
- * any device where the flush fails with eopnotsupp are flagged as not-barrier
- * capable
  */
 static int write_dev_flush(struct btrfs_device *device, int wait)
 {
-	struct bio *bio;
-	int ret = 0;
-
 	if (device->nobarriers)
 		return 0;
 
 	if (wait) {
-		bio = device->flush_bio;
-		if (!bio)
-			return 0;
+		int ret;
 
 		wait_for_completion(&device->flush_wait);
-
-		if (bio->bi_error) {
-			ret = bio->bi_error;
+		ret = device->last_flush_error;
+		if (ret)
 			btrfs_dev_stat_inc_and_print(device,
-				BTRFS_DEV_STAT_FLUSH_ERRS);
-		}
-
-		/* drop the reference from the wait == 0 run */
-		bio_put(bio);
-		device->flush_bio = NULL;
-
+					BTRFS_DEV_STAT_FLUSH_ERRS);
 		return ret;
 	}
 
-	/*
-	 * one reference for us, and we leave it for the
-	 * caller
-	 */
-	device->flush_bio = NULL;
-	bio = btrfs_io_bio_alloc(GFP_NOFS, 0);
-	if (!bio)
-		return -ENOMEM;
-
-	bio->bi_end_io = btrfs_end_empty_barrier;
-	bio->bi_bdev = device->bdev;
-	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
 	init_completion(&device->flush_wait);
-	bio->bi_private = &device->flush_wait;
-	device->flush_bio = bio;
-
-	bio_get(bio);
-	btrfsic_submit_bio(bio);
+	INIT_WORK(&device->flush_work, btrfs_dev_issue_flush);
+	schedule_work(&device->flush_work);
 
 	return 0;
 }
@@ -3590,9 +3562,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (!dev->in_fs_metadata || !dev->writeable)
 			continue;
 
-		ret = write_dev_flush(dev, 0);
-		if (ret)
-			errors_send++;
+		write_dev_flush(dev, 0);
 	}
 
 	/* wait for all the barriers */
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be81206dd7..0df50bc65578 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -124,8 +124,9 @@ struct btrfs_device {
 
 	/* for sending down flush barriers */
 	int nobarriers;
-	struct bio *flush_bio;
 	struct completion flush_wait;
+	struct work_struct flush_work;
+	int last_flush_error;
 
 	/* per-device scrub information */
 	struct scrub_ctx *scrub_device;
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
  2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-06  3:22 ` [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error Anand Jain
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Now when counting number of error devices we don't need to count
them separately once during send and wait, as because device error
counted during send is more of static check.

Also kindly note that as of now there is no code which would set
dev->bdev = NULL unless device is missing. However I still kept
bdev == NULL counted towards error device in view of future
enhancements. And as the device_list_mutex is held when
barrier_all_devices() is called, I don't expect a new bdev to null
in between send and wait.

Now in this process I also rename error_wait to dropouts.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: As the write_dev_flush with wait=0 is always successful,
from the previous patch the ret is now removed.
v3: no change
v4: no change

 fs/btrfs/disk-io.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08cbbee228ee..420753d37e1a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3546,19 +3546,15 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
-	int ret;
+	int dropouts = 0;
 
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
 	list_for_each_entry_rcu(dev, head, dev_list) {
 		if (dev->missing)
 			continue;
-		if (!dev->bdev) {
-			errors_send++;
+		if (!dev->bdev)
 			continue;
-		}
 		if (!dev->in_fs_metadata || !dev->writeable)
 			continue;
 
@@ -3570,18 +3566,16 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dropouts++;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
 			continue;
 
-		ret = write_dev_flush(dev, 1);
-		if (ret)
-			errors_wait++;
+		if (write_dev_flush(dev, 1))
+			dropouts++;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (dropouts > info->num_tolerated_disk_barrier_failures)
 		return -EIO;
 	return 0;
 }
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
  2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
  2017-04-06  3:22 ` [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-06  3:22 ` [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback Anand Jain
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

The objective of this patch is to cleanup barrier_all_devices()
so that the error checking is in a separate loop independent of
of the loop which submits and waits on the device flush requests.

By doing this it helps to further develop patches which would tune
the error-actions as needed.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: Address Qu review comments viz..
     Add meaningful names, like cp_list (for checkpoint_list head).
     (And actually it does not need a new struct type just to hold
      the head pointer, list node is already named as device_checkpoint).
     Check return value of add_device_checkpoint()
     Check if the device is already added at add_device_checkpoint()
     Rename fini_devices_checkpoint() to rel_devices_checkpoint()
v3: (resent with the correct version (that is 3 not 2) of the patch).
   Dropped for idea of using the BTRFS_DEV_STAT_FLUSH_ERRS, though
   its the right way, but it needs a better infracture to handle that.
   Now the flush error return is saved and checked instead of the
   checkpoint of the dev_stat method earlier.
v4: no change

 fs/btrfs/disk-io.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 420753d37e1a..3c476b118440 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3538,6 +3538,23 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
 	return 0;
 }
 
+static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
+{
+	int dropouts = 0;
+	struct btrfs_device *dev;
+
+	list_for_each_entry_rcu(dev, &fsdevs->devices, dev_list) {
+		if (!dev->bdev || dev->last_flush_error)
+			dropouts++;
+	}
+
+	if (dropouts >
+		fsdevs->fs_info->num_tolerated_disk_barrier_failures)
+		return -EIO;
+
+	return 0;
+}
+
 /*
  * send an empty flush down to each device in parallel,
  * then wait for them
@@ -3575,8 +3592,19 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (write_dev_flush(dev, 1))
 			dropouts++;
 	}
-	if (dropouts > info->num_tolerated_disk_barrier_failures)
-		return -EIO;
+
+	/*
+	 * A slight optimization, we check for dropouts here which avoids
+	 * a dev list loop when disks are healthy.
+	 */
+	if (dropouts) {
+		/*
+		 * As we need holistic view of the failed disks, so
+		 * error checking is pushed to a separate loop.
+		 */
+		return check_barrier_error(info->fs_devices);
+	}
+
 	return 0;
 }
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
                   ` (2 preceding siblings ...)
  2017-04-06  3:22 ` [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-06  3:22 ` [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue Anand Jain
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

REQ_PREFLUSH bio to flush dev cache uses btrfs_end_empty_barrier()
completion callback only, as of now, and there it accounts for dev
stat flush errors BTRFS_DEV_STAT_FLUSH_ERRS, so remove it from the
btrfs_end_bio().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2-3: no change
v4: no change, but now the patch number is 4/7 in the set.

 fs/btrfs/volumes.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73d56eef5e60..1563ae03079b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6010,9 +6010,6 @@ static void btrfs_end_bio(struct bio *bio)
 				else
 					btrfs_dev_stat_inc(dev,
 						BTRFS_DEV_STAT_READ_ERRS);
-				if (bio->bi_opf & REQ_PREFLUSH)
-					btrfs_dev_stat_inc(dev,
-						BTRFS_DEV_STAT_FLUSH_ERRS);
 				btrfs_dev_stat_print_on_error(dev);
 			}
 		}
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
                   ` (3 preceding siblings ...)
  2017-04-06  3:22 ` [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-18 14:01   ` David Sterba
  2017-04-06  3:22 ` [PATCH v4 6/7] btrfs: delete unused member nobarriers Anand Jain
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

We have already assigned q from bdev_get_queue() so use it.
And rearrange the code for better view.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
---
v2-4: no change

 fs/btrfs/volumes.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1563ae03079b..4de5b2d549bd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1008,14 +1008,13 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		q = bdev_get_queue(bdev);
 		if (blk_queue_discard(q))
 			device->can_discard = 1;
+		if (!blk_queue_nonrot(q))
+			fs_devices->rotating = 1;
 
 		device->bdev = bdev;
 		device->in_fs_metadata = 0;
 		device->mode = flags;
 
-		if (!blk_queue_nonrot(bdev_get_queue(bdev)))
-			fs_devices->rotating = 1;
-
 		fs_devices->open_devices++;
 		if (device->writeable &&
 		    device->devid != BTRFS_DEV_REPLACE_DEVID) {
@@ -2417,7 +2416,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	fs_info->free_chunk_space += device->total_bytes;
 	spin_unlock(&fs_info->free_chunk_lock);
 
-	if (!blk_queue_nonrot(bdev_get_queue(bdev)))
+	if (!blk_queue_nonrot(q))
 		fs_info->fs_devices->rotating = 1;
 
 	tmp = btrfs_super_total_bytes(fs_info->super_copy);
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 6/7] btrfs: delete unused member nobarriers
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
                   ` (4 preceding siblings ...)
  2017-04-06  3:22 ` [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-18 14:03   ` David Sterba
  2017-04-06  3:22 ` [PATCH v4 7/7] btrfs: check if the device is flush capable Anand Jain
  2017-04-13  8:42 ` [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
  7 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

The last consumer of nobarriers is removed by the commit [1] and sync
won't fail with EOPNOTSUPP anymore. Thus, now when write cache is write
through it just return success without actually transpiring such a
request to the block device/lun.

[1]
commit b25de9d6da49b1a8760a89672283128aa8c78345
block: remove BIO_EOPNOTSUPP

And, as the device/lun write cache state may change dynamically saving
such as state won't help either. So deleting the member nobarriers.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: add commit log. ref of last consumer.
v3: nochange
v4: nochange

 fs/btrfs/disk-io.c | 3 ---
 fs/btrfs/volumes.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3c476b118440..b6d047250ce2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3517,9 +3517,6 @@ static void btrfs_dev_issue_flush(struct work_struct *work)
  */
 static int write_dev_flush(struct btrfs_device *device, int wait)
 {
-	if (device->nobarriers)
-		return 0;
-
 	if (wait) {
 		int ret;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0df50bc65578..1168b78c5f1d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -123,7 +123,6 @@ struct btrfs_device {
 	struct list_head resized_list;
 
 	/* for sending down flush barriers */
-	int nobarriers;
 	struct completion flush_wait;
 	struct work_struct flush_work;
 	int last_flush_error;
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 7/7] btrfs: check if the device is flush capable
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
                   ` (5 preceding siblings ...)
  2017-04-06  3:22 ` [PATCH v4 6/7] btrfs: delete unused member nobarriers Anand Jain
@ 2017-04-06  3:22 ` Anand Jain
  2017-04-18 14:04   ` David Sterba
  2017-04-13  8:42 ` [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
  7 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2017-04-06  3:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

The blkdev_issue_flush() will check if the write cache is enabled
before submitting the flush. This will add a code to fail fast if
its not.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: This patch will now _not_ replace the below patch and this patch
  should be applied on top of it.
    [PATCH] btrfs: delete unused member nobarriers
  That's because, Q write cache flag is not static, so we can't save
  its state into nobarries.
  - commit log is updated.
v3: no change
v4: no change

 fs/btrfs/disk-io.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6d047250ce2..bb5816c7f5c4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3517,6 +3517,11 @@ static void btrfs_dev_issue_flush(struct work_struct *work)
  */
 static int write_dev_flush(struct btrfs_device *device, int wait)
 {
+	struct request_queue *q = bdev_get_queue(device->bdev);
+
+	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
+		return 0;
+
 	if (wait) {
 		int ret;
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup
  2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
                   ` (6 preceding siblings ...)
  2017-04-06  3:22 ` [PATCH v4 7/7] btrfs: check if the device is flush capable Anand Jain
@ 2017-04-13  8:42 ` Anand Jain
  2017-04-13 12:13   ` David Sterba
  7 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2017-04-13  8:42 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs


Hi David,

  Just want to bring this patch to your notice. It fixes all
  the previously commented issues.

Thanks, Anand


On 04/06/2017 11:22 AM, Anand Jain wrote:
> These patches adds cleanup of device barrier, q and flush codes.
> This was needed for the following reasons..
>
>   - First of all, Qu approach on the chunk based degradable check
> wanted to know what devices failed rather than total count of failed
> devices. So the patch 3/7 adds a function check_barrier_error() in
> which Qu can add such a logic. Irrespective whether those Qu patches
> passes through all the reviews, and if it does not, still this patch
> is good to have as it does not change the commit performance since
> check_barrier_error() is only called when there is an error in any of
> the device. However I believe a volume manager when deciding an action
> of the device error, it should rather have a broader view and state of
> other devices instead of action based on just individual device, this
> just a step towards that. (Earlier the patch [1] was also on the similar
> idea, its yet to be separated from the feature patch, as time permits
> and priority rises. [1] https://patchwork.kernel.org/patch/9058361/)
>
>  [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error
>
>   - However barrier_all_devices() wasn't capable of supporting 3/7
> (above). So clean up of write_dev_flush() is needed, and if it uses
> blkdev_issue_flush() then the way we count the error device especially
> the err_send++ part can be avoided. Patch 1/7 provides this cleanup.
>
>  [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
>
>   - There are some static error checking in barrier_all_devices(), such
> as !dev->bdev && errors_send++, these errors would remain same before
> or after the flush. As of now we do not have any code which will bring
> a null bdev back, also we are in device_mutex. So consolidated all
> those errors as dropouts at the place after the flush. This clean up is
> in 2/7.
>
>  [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count
>
> Now with 1/7 and 2/7 in place, 3/7 can be applied. These 3 are
> dependent patches.
>
>   - Patch 4/7 is not related to above 3 patches, however it adds up a
> cleanup around the REQ_PREFLUSH flag. It is found that we do not have
> any bio with the flag  REQ_PREFLUSH set. So an end io checking for that
> flag is not required. Concerned what if there is a bio in future which
> will set REQ_PREFLUSH ? I am not too sure about that. However the
> write_dev_flush code with the patch 3/7 (above) is still unaffected
> with or without patch 4/7, thats because 3/7  does not depend on
> BTRFS_DEV_STAT_FLUSH_ERRS, instead it uses a new per device
> last_flush_err member to track the error. But it would have been better
> or the right way if it had checked BTRFS_DEV_STAT_FLUSH_ERRS instead.
> But as it seeks much bigger change, it can be optimized to use
> BTRFS_DEV_STAT_FLUSH_ERRS at a later point
> of time.
>
>  [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback
>
> And rest of the patches are completely a cleanup patches which are
> found good to have when around these parts of the code. They are -
>
> [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue
> [PATCH v4 6/7] btrfs: delete unused member nobarriers
> [PATCH v4 7/7] btrfs: check if the device is flush capable
>
> Anand Jain (7):
>   btrfs: use blkdev_issue_flush to flush the device cache
>   btrfs: cleanup barrier_all_devices() unify dev error count
>   btrfs: cleanup barrier_all_devices() to check dev stat flush error
>   btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback
>   btrfs: use q which is already obtained from bdev_get_queue
>   btrfs: delete unused member nobarriers
>   btrfs: check if the device is flush capable
>
>  fs/btrfs/disk-io.c | 112 +++++++++++++++++++++++++----------------------------
>  fs/btrfs/volumes.c |  10 ++---
>  fs/btrfs/volumes.h |   4 +-
>  3 files changed, 58 insertions(+), 68 deletions(-)
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup
  2017-04-13  8:42 ` [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
@ 2017-04-13 12:13   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-04-13 12:13 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Thu, Apr 13, 2017 at 04:42:11PM +0800, Anand Jain wrote:
>   Just want to bring this patch to your notice. It fixes all
>   the previously commented issues.

I got only half way through the patches so far, stay tuned.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
@ 2017-04-13 18:41   ` Liu Bo
  2017-04-19  4:29     ` Anand Jain
  2017-04-18 13:54   ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: Liu Bo @ 2017-04-13 18:41 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba

On Thu, Apr 06, 2017 at 11:22:47AM +0800, Anand Jain wrote:
> As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
> to flush the device cache, instead we can use blkdev_issue_flush()
> for this puspose.
> 
> Also now no need to check the return when write_dev_flush() is called
> with wait = 0
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> v2: Title of this patch is changed from
>    btrfs: communicate back ENOMEM when it occurs
>   And its entirely a new patch, which now use blkdev_issue_flush()
> v3: no change
> v4: no change
> 
>  fs/btrfs/disk-io.c | 64 +++++++++++++++---------------------------------------
>  fs/btrfs/volumes.h |  3 ++-
>  2 files changed, 19 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 08b74daf35d0..08cbbee228ee 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3498,70 +3498,42 @@ static int write_dev_supers(struct btrfs_device *device,
>  	return errors < i ? 0 : -1;
>  }
>  
> -/*
> - * endio for the write_dev_flush, this will wake anyone waiting
> - * for the barrier when it is done
> - */
> -static void btrfs_end_empty_barrier(struct bio *bio)
> +static void btrfs_dev_issue_flush(struct work_struct *work)
>  {
> -	if (bio->bi_private)
> -		complete(bio->bi_private);
> -	bio_put(bio);
> +	int ret;
> +	struct btrfs_device *device;
> +
> +	device = container_of(work, struct btrfs_device, flush_work);
> +
> +	/* we are in the commit thread */

What is the above comment trying to explain?

Others look good.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Thanks,

-liubo
> +	ret = blkdev_issue_flush(device->bdev, GFP_NOFS, NULL);
> +	device->last_flush_error = ret;
> +	complete(&device->flush_wait);
>  }
>  
>  /*
>   * trigger flushes for one the devices.  If you pass wait == 0, the flushes are
>   * sent down.  With wait == 1, it waits for the previous flush.
> - *
> - * any device where the flush fails with eopnotsupp are flagged as not-barrier
> - * capable
>   */
>  static int write_dev_flush(struct btrfs_device *device, int wait)
>  {
> -	struct bio *bio;
> -	int ret = 0;
> -
>  	if (device->nobarriers)
>  		return 0;
>  
>  	if (wait) {
> -		bio = device->flush_bio;
> -		if (!bio)
> -			return 0;
> +		int ret;
>  
>  		wait_for_completion(&device->flush_wait);
> -
> -		if (bio->bi_error) {
> -			ret = bio->bi_error;
> +		ret = device->last_flush_error;
> +		if (ret)
>  			btrfs_dev_stat_inc_and_print(device,
> -				BTRFS_DEV_STAT_FLUSH_ERRS);
> -		}
> -
> -		/* drop the reference from the wait == 0 run */
> -		bio_put(bio);
> -		device->flush_bio = NULL;
> -
> +					BTRFS_DEV_STAT_FLUSH_ERRS);
>  		return ret;
>  	}
>  
> -	/*
> -	 * one reference for us, and we leave it for the
> -	 * caller
> -	 */
> -	device->flush_bio = NULL;
> -	bio = btrfs_io_bio_alloc(GFP_NOFS, 0);
> -	if (!bio)
> -		return -ENOMEM;
> -
> -	bio->bi_end_io = btrfs_end_empty_barrier;
> -	bio->bi_bdev = device->bdev;
> -	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
>  	init_completion(&device->flush_wait);
> -	bio->bi_private = &device->flush_wait;
> -	device->flush_bio = bio;
> -
> -	bio_get(bio);
> -	btrfsic_submit_bio(bio);
> +	INIT_WORK(&device->flush_work, btrfs_dev_issue_flush);
> +	schedule_work(&device->flush_work);
>  
>  	return 0;
>  }
> @@ -3590,9 +3562,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  		if (!dev->in_fs_metadata || !dev->writeable)
>  			continue;
>  
> -		ret = write_dev_flush(dev, 0);
> -		if (ret)
> -			errors_send++;
> +		write_dev_flush(dev, 0);
>  	}
>  
>  	/* wait for all the barriers */
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 59be81206dd7..0df50bc65578 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -124,8 +124,9 @@ struct btrfs_device {
>  
>  	/* for sending down flush barriers */
>  	int nobarriers;
> -	struct bio *flush_bio;
>  	struct completion flush_wait;
> +	struct work_struct flush_work;
> +	int last_flush_error;
>  
>  	/* per-device scrub information */
>  	struct scrub_ctx *scrub_device;
> -- 
> 2.10.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
  2017-04-13 18:41   ` Liu Bo
@ 2017-04-18 13:54   ` David Sterba
  2017-04-19  4:29     ` Anand Jain
  1 sibling, 1 reply; 18+ messages in thread
From: David Sterba @ 2017-04-18 13:54 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba

On Thu, Apr 06, 2017 at 11:22:47AM +0800, Anand Jain wrote:
> As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
> to flush the device cache, instead we can use blkdev_issue_flush()
> for this puspose.

This would change the scheduling characteristics. Right now, the caller
thread submits all bios from one thread, lets block layer do it's work,
and then in the same thread wait for each of the submitted bios.

In your code, the btrfs thread prepares tasks for each bio, shifts the
work to the global workqueue (schedule_work) and then it's same as
before.

I'm concerned about using the global queue. As the bio submission jobs
could get blocked by some other unrelated task queued there, the
guarantees depend on the forward progress of the tasks scheduled
(plus block layer processing).

In the current behaviour, the guarantees stand on the block layer only.

We could introduce yet another work queue and submit the bios there,
with possible fine tuning of the flags, like priority or emergency etc.
But that sounds like unnecessary work as we can simply keep the code
as-is and get the same end result.

Regarding other patches, some of them are independent so I'll see what
can be merged now regardless of the above comments.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue
  2017-04-06  3:22 ` [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue Anand Jain
@ 2017-04-18 14:01   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-04-18 14:01 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba

On Thu, Apr 06, 2017 at 11:22:51AM +0800, Anand Jain wrote:
> We have already assigned q from bdev_get_queue() so use it.
> And rearrange the code for better view.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: David Sterba <dsterba@suse.com>

JFYI, this patch has been added to 4.12

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/7] btrfs: delete unused member nobarriers
  2017-04-06  3:22 ` [PATCH v4 6/7] btrfs: delete unused member nobarriers Anand Jain
@ 2017-04-18 14:03   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-04-18 14:03 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba

On Thu, Apr 06, 2017 at 11:22:52AM +0800, Anand Jain wrote:
> The last consumer of nobarriers is removed by the commit [1] and sync
> won't fail with EOPNOTSUPP anymore. Thus, now when write cache is write
> through it just return success without actually transpiring such a
> request to the block device/lun.
> 
> [1]
> commit b25de9d6da49b1a8760a89672283128aa8c78345
> block: remove BIO_EOPNOTSUPP
> 
> And, as the device/lun write cache state may change dynamically saving
> such as state won't help either. So deleting the member nobarriers.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/7] btrfs: check if the device is flush capable
  2017-04-06  3:22 ` [PATCH v4 7/7] btrfs: check if the device is flush capable Anand Jain
@ 2017-04-18 14:04   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-04-18 14:04 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba

On Thu, Apr 06, 2017 at 11:22:53AM +0800, Anand Jain wrote:
> The blkdev_issue_flush() will check if the write cache is enabled
> before submitting the flush. This will add a code to fail fast if
> its not.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-13 18:41   ` Liu Bo
@ 2017-04-19  4:29     ` Anand Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-19  4:29 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, dsterba



On 04/14/2017 02:41 AM, Liu Bo wrote:
> On Thu, Apr 06, 2017 at 11:22:47AM +0800, Anand Jain wrote:
>> As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
>> to flush the device cache, instead we can use blkdev_issue_flush()
>> for this puspose.
>>
>> Also now no need to check the return when write_dev_flush() is called
>> with wait = 0
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> v2: Title of this patch is changed from
>>    btrfs: communicate back ENOMEM when it occurs
>>   And its entirely a new patch, which now use blkdev_issue_flush()
>> v3: no change
>> v4: no change
>>
>>  fs/btrfs/disk-io.c | 64 +++++++++++++++---------------------------------------
>>  fs/btrfs/volumes.h |  3 ++-
>>  2 files changed, 19 insertions(+), 48 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 08b74daf35d0..08cbbee228ee 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3498,70 +3498,42 @@ static int write_dev_supers(struct btrfs_device *device,
>>  	return errors < i ? 0 : -1;
>>  }
>>
>> -/*
>> - * endio for the write_dev_flush, this will wake anyone waiting
>> - * for the barrier when it is done
>> - */
>> -static void btrfs_end_empty_barrier(struct bio *bio)
>> +static void btrfs_dev_issue_flush(struct work_struct *work)
>>  {
>> -	if (bio->bi_private)
>> -		complete(bio->bi_private);
>> -	bio_put(bio);
>> +	int ret;
>> +	struct btrfs_device *device;
>> +
>> +	device = container_of(work, struct btrfs_device, flush_work);
>> +
>> +	/* we are in the commit thread */
>
> What is the above comment trying to explain?

  I had GFP_NOFS to justify when I wrote that.

Thanks, Anand


> Others look good.
>
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
>
> Thanks,
>
> -liubo
>> +	ret = blkdev_issue_flush(device->bdev, GFP_NOFS, NULL);
>> +	device->last_flush_error = ret;
>> +	complete(&device->flush_wait);
>>  }
>>
>>  /*
>>   * trigger flushes for one the devices.  If you pass wait == 0, the flushes are
>>   * sent down.  With wait == 1, it waits for the previous flush.
>> - *
>> - * any device where the flush fails with eopnotsupp are flagged as not-barrier
>> - * capable
>>   */
>>  static int write_dev_flush(struct btrfs_device *device, int wait)
>>  {
>> -	struct bio *bio;
>> -	int ret = 0;
>> -
>>  	if (device->nobarriers)
>>  		return 0;
>>
>>  	if (wait) {
>> -		bio = device->flush_bio;
>> -		if (!bio)
>> -			return 0;
>> +		int ret;
>>
>>  		wait_for_completion(&device->flush_wait);
>> -
>> -		if (bio->bi_error) {
>> -			ret = bio->bi_error;
>> +		ret = device->last_flush_error;
>> +		if (ret)
>>  			btrfs_dev_stat_inc_and_print(device,
>> -				BTRFS_DEV_STAT_FLUSH_ERRS);
>> -		}
>> -
>> -		/* drop the reference from the wait == 0 run */
>> -		bio_put(bio);
>> -		device->flush_bio = NULL;
>> -
>> +					BTRFS_DEV_STAT_FLUSH_ERRS);
>>  		return ret;
>>  	}
>>
>> -	/*
>> -	 * one reference for us, and we leave it for the
>> -	 * caller
>> -	 */
>> -	device->flush_bio = NULL;
>> -	bio = btrfs_io_bio_alloc(GFP_NOFS, 0);
>> -	if (!bio)
>> -		return -ENOMEM;
>> -
>> -	bio->bi_end_io = btrfs_end_empty_barrier;
>> -	bio->bi_bdev = device->bdev;
>> -	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
>>  	init_completion(&device->flush_wait);
>> -	bio->bi_private = &device->flush_wait;
>> -	device->flush_bio = bio;
>> -
>> -	bio_get(bio);
>> -	btrfsic_submit_bio(bio);
>> +	INIT_WORK(&device->flush_work, btrfs_dev_issue_flush);
>> +	schedule_work(&device->flush_work);
>>
>>  	return 0;
>>  }
>> @@ -3590,9 +3562,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>>  		if (!dev->in_fs_metadata || !dev->writeable)
>>  			continue;
>>
>> -		ret = write_dev_flush(dev, 0);
>> -		if (ret)
>> -			errors_send++;
>> +		write_dev_flush(dev, 0);
>>  	}
>>
>>  	/* wait for all the barriers */
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 59be81206dd7..0df50bc65578 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -124,8 +124,9 @@ struct btrfs_device {
>>
>>  	/* for sending down flush barriers */
>>  	int nobarriers;
>> -	struct bio *flush_bio;
>>  	struct completion flush_wait;
>> +	struct work_struct flush_work;
>> +	int last_flush_error;
>>
>>  	/* per-device scrub information */
>>  	struct scrub_ctx *scrub_device;
>> --
>> 2.10.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-18 13:54   ` David Sterba
@ 2017-04-19  4:29     ` Anand Jain
  2017-04-25  9:25       ` Anand Jain
  0 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2017-04-19  4:29 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 04/18/2017 09:54 PM, David Sterba wrote:
> On Thu, Apr 06, 2017 at 11:22:47AM +0800, Anand Jain wrote:
>> As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
>> to flush the device cache, instead we can use blkdev_issue_flush()
>> for this puspose.
>
> This would change the scheduling characteristics. Right now, the caller
> thread submits all bios from one thread, lets block layer do it's work,
> and then in the same thread wait for each of the submitted bios.

> In your code, the btrfs thread prepares tasks for each bio, shifts the
> work to the global workqueue (schedule_work) and then it's same as
> before.

> I'm concerned about using the global queue. As the bio submission jobs
> could get blocked by some other unrelated task queued there, the
> guarantees depend on the forward progress of the tasks scheduled
> (plus block layer processing).

> In the current behaviour, the guarantees stand on the block layer only.
>
> We could introduce yet another work queue and submit the bios there,
> with possible fine tuning of the flags, like priority or emergency etc.
> But that sounds like unnecessary work as we can simply keep the code
> as-is and get the same end result.

  I had to schedule the flush to request flush for all the devices in
  parallel. For the obvious reason that flush may take a lot of time
  for the devices which aren't missing. blkdev_issue_flush() uses
  submit_bio_wait() which makes sense for the non-volume-based-FS.
  And there isn't something like blkdev_issue_flush_no_wait(). Also
  now I see that there isn't the btrfsic part in the patch.
  To fix this I think pre-alloc-ed bio for the flush is a good idea
  (as you suggested earlier). Further as the commit thread doesn't
  overlap, and barrier device happens only at the commit so there
  won't be concurrent demand for the pre-alloc-ed bio.
  Also pre-alloc-ed bio can be alloc-ed only when barrier is enabled
  as a mem optimization. Will send a RFC code for the comments with
  these changes.

> Regarding other patches, some of them are independent so I'll see what
> can be merged now regardless of the above comments.

Thanks, Anand


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache
  2017-04-19  4:29     ` Anand Jain
@ 2017-04-25  9:25       ` Anand Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-04-25  9:25 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs



David,

  Based on the comments received, I have to pull back patch 1/7 to 3/7 
and instead have replaced them with a patch as below.

     [patch] btrfs: add framework to handle device flush error as a volume


Thanks, Anand


On 04/19/2017 12:29 PM, Anand Jain wrote:
>
>
> On 04/18/2017 09:54 PM, David Sterba wrote:
>> On Thu, Apr 06, 2017 at 11:22:47AM +0800, Anand Jain wrote:
>>> As of now we do alloc an empty bio and then use the flag REQ_PREFLUSH
>>> to flush the device cache, instead we can use blkdev_issue_flush()
>>> for this puspose.
>>
>> This would change the scheduling characteristics. Right now, the caller
>> thread submits all bios from one thread, lets block layer do it's work,
>> and then in the same thread wait for each of the submitted bios.
>
>> In your code, the btrfs thread prepares tasks for each bio, shifts the
>> work to the global workqueue (schedule_work) and then it's same as
>> before.
>
>> I'm concerned about using the global queue. As the bio submission jobs
>> could get blocked by some other unrelated task queued there, the
>> guarantees depend on the forward progress of the tasks scheduled
>> (plus block layer processing).
>
>> In the current behaviour, the guarantees stand on the block layer only.
>>
>> We could introduce yet another work queue and submit the bios there,
>> with possible fine tuning of the flags, like priority or emergency etc.
>> But that sounds like unnecessary work as we can simply keep the code
>> as-is and get the same end result.
>
>  I had to schedule the flush to request flush for all the devices in
>  parallel. For the obvious reason that flush may take a lot of time
>  for the devices which aren't missing. blkdev_issue_flush() uses
>  submit_bio_wait() which makes sense for the non-volume-based-FS.
>  And there isn't something like blkdev_issue_flush_no_wait(). Also
>  now I see that there isn't the btrfsic part in the patch.
>  To fix this I think pre-alloc-ed bio for the flush is a good idea
>  (as you suggested earlier). Further as the commit thread doesn't
>  overlap, and barrier device happens only at the commit so there
>  won't be concurrent demand for the pre-alloc-ed bio.
>  Also pre-alloc-ed bio can be alloc-ed only when barrier is enabled
>  as a mem optimization. Will send a RFC code for the comments with
>  these changes.


>> Regarding other patches, some of them are independent so I'll see what
>> can be merged now regardless of the above comments.
>
> Thanks, Anand
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-25  9:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06  3:22 [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
2017-04-06  3:22 ` [PATCH v4 1/7] btrfs: use blkdev_issue_flush to flush the device cache Anand Jain
2017-04-13 18:41   ` Liu Bo
2017-04-19  4:29     ` Anand Jain
2017-04-18 13:54   ` David Sterba
2017-04-19  4:29     ` Anand Jain
2017-04-25  9:25       ` Anand Jain
2017-04-06  3:22 ` [PATCH v4 2/7] btrfs: cleanup barrier_all_devices() unify dev error count Anand Jain
2017-04-06  3:22 ` [PATCH v4 3/7] btrfs: cleanup barrier_all_devices() to check dev stat flush error Anand Jain
2017-04-06  3:22 ` [PATCH v4 4/7] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback Anand Jain
2017-04-06  3:22 ` [PATCH v4 5/7] btrfs: use q which is already obtained from bdev_get_queue Anand Jain
2017-04-18 14:01   ` David Sterba
2017-04-06  3:22 ` [PATCH v4 6/7] btrfs: delete unused member nobarriers Anand Jain
2017-04-18 14:03   ` David Sterba
2017-04-06  3:22 ` [PATCH v4 7/7] btrfs: check if the device is flush capable Anand Jain
2017-04-18 14:04   ` David Sterba
2017-04-13  8:42 ` [PATCH v4 0/7] Holistic view of device error at commit flush and related cleanup Anand Jain
2017-04-13 12:13   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.