linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	tytso@mit.edu, adilger.kernel@dilger.ca, ming.lei@redhat.com,
	osandov@fb.com, jthumshirn@suse.de, minwoo.im.dev@gmail.com,
	damien.lemoal@wdc.com, andrea.parri@amarulasolutions.com,
	hare@suse.com, tj@kernel.org, ajay.joshi@wdc.com,
	sagi@grimberg.me, dsterba@suse.com, chaitanya.kulkarni@wdc.com,
	bvanassche@acm.org, dhowells@redhat.com, asml.silence@gmail.com
Subject: Re: [PATCH RFC 1/3] block: Add support for REQ_OP_ASSIGN_RANGE operation
Date: Mon, 23 Dec 2019 11:51:34 +0300	[thread overview]
Message-ID: <405b9106-0a97-0821-c41d-58ab8d0e2d09@virtuozzo.com> (raw)
In-Reply-To: <yq1pngh7blx.fsf@oracle.com>

On 21.12.2019 21:54, Martin K. Petersen wrote:
> 
> Kirill,
> 
>> One more thing to discuss. The new REQ_NOZERO flag won't be supported
>> by many block devices (their number will be even less, than number of
>> REQ_OP_WRITE_ZEROES supporters). Will this be a good thing, in case of
>> we will be completing BLKDEV_ZERO_ALLOCATE bios in
>> __blkdev_issue_write_zeroes() before splitting? I mean introduction of
>> some flag in struct request_queue::limits.  Completion of them with
>> -EOPNOTSUPP in block devices drivers looks suboptimal for me.
> 
> We already have the NOFALLBACK flag to let the user make that decision.
> 
> If that flag is not specified, and I receive an allocate request for a
> SCSI device that does not support ANCHOR, my expectation would be that I
> would do a regular write same.
>
> If it's a filesystem that is the recipient of the operation and not a
> SCSI device, how to react would depend on how the filesystem handles
> unwritten extents, etc.

Ok, this case is clear for me, thanks.

But I also worry about NOFALLBACK case. There are possible block devices,
which support write zeroes, but they can't allocate blocks (block allocation
are just not appliable for them, say, these are all ordinary hdd).

Let's say, a user called fallocate(), and filesystem allocated range of blocks.
Then filesystem propagates the range to block device, and calls zeroout:

blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
		     GFP_NOIO, BLKDEV_ZERO_ALLOCATE|BLKDEV_ZERO_NOFALLBACK);

This case filesystem does not want zeroing blocks, it just wants to send a hint
to block device. So, in case of block device supports allocation, everything is OK.

But won't it be a good thing to return EOPNOTSUPP right from __blkdev_issue_write_zeroes()
in case of block device can't allocate blocks (q->limits.write_zeroes_can_allocate
in the patch below)? Here is just a way to underline block devices, which support
write zeroes, but allocation of blocks is meant nothing for them (wasting of time).

What do you think about the below?

Thanks
---
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..524b47905fd5 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		struct bio **biop, unsigned flags)
 {
 	struct bio *bio = *biop;
-	unsigned int max_write_zeroes_sectors;
+	unsigned int max_write_zeroes_sectors, req_flags = 0;
 	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (!q)
@@ -229,13 +229,19 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 	if (max_write_zeroes_sectors == 0)
 		return -EOPNOTSUPP;
 
+	if (flags & BLKDEV_ZERO_NOUNMAP)
+		req_flags |= REQ_NOUNMAP;
+	if (flags & BLKDEV_ZERO_ALLOCATE) {
+		if (!q->limits.write_zeroes_can_allocate)
+			return -EOPNOTSUPP;
+		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
+	}
+
 	while (nr_sects) {
 		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
-		bio->bi_opf = REQ_OP_WRITE_ZEROES;
-		if (flags & BLKDEV_ZERO_NOUNMAP)
-			bio->bi_opf |= REQ_NOUNMAP;
+		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
 
 		if (nr_sects > max_write_zeroes_sectors) {
 			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 69bf2fb6f7cd..1ffef894b3bd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
 		break;
+	case FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
+		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
 		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, 0);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 70254ae11769..9ed166860099 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -335,7 +335,9 @@ enum req_flag_bits {
 
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
-
+	__REQ_NOZERO,		/* only notify about allocated blocks,
+				 * and do not actual zero them
+				 */
 	__REQ_HIPRI,
 
 	/* for driver use */
@@ -362,6 +364,7 @@ enum req_flag_bits {
 #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
+#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
 #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c45779f00cbd..9e3cd3394dd6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -347,6 +347,7 @@ struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		raid_partial_stripes_expensive;
+	bool			write_zeroes_can_allocate;
 	enum blk_zoned_model	zoned;
 };
 
@@ -1219,6 +1220,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
+#define BLKDEV_ZERO_ALLOCATE	(1 << 2)  /* allocate range of blocks */
 
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,

  reply	other threads:[~2019-12-23  8:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-10 16:56 [PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE to reflect extents allocation in block device internals Kirill Tkhai
2019-12-10 16:56 ` [PATCH RFC 1/3] block: Add support for REQ_OP_ASSIGN_RANGE operation Kirill Tkhai
2019-12-19  3:03   ` Martin K. Petersen
2019-12-19 11:07     ` Kirill Tkhai
2019-12-19 22:03       ` Chaitanya Kulkarni
2019-12-19 22:37       ` Martin K. Petersen
2019-12-20  1:53         ` Darrick J. Wong
2019-12-20  2:22           ` Martin K. Petersen
2019-12-20 11:55         ` Kirill Tkhai
2019-12-21 18:54           ` Martin K. Petersen
2019-12-23  8:51             ` Kirill Tkhai [this message]
2020-01-07  3:24               ` Martin K. Petersen
2020-01-07 13:59                 ` Kirill Tkhai
2020-01-08  2:49                   ` Martin K. Petersen
2020-01-09  9:43                     ` Kirill Tkhai
2019-12-10 16:56 ` [PATCH RFC 2/3] loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0) Kirill Tkhai
2019-12-10 16:56 ` [PATCH RFC 3/3] ext4: Notify block device about fallocate(0)-assigned blocks Kirill Tkhai
2019-12-11 12:55   ` [PATCH RFC v2 " Kirill Tkhai
2019-12-11  7:42 ` [PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE to reflect extents allocation in block device internals Chaitanya Kulkarni
2019-12-11  8:50   ` Kirill Tkhai
2019-12-17 14:16 ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=405b9106-0a97-0821-c41d-58ab8d0e2d09@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=ajay.joshi@wdc.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=minwoo.im.dev@gmail.com \
    --cc=osandov@fb.com \
    --cc=sagi@grimberg.me \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).