All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] block: implement compatible DISCARD support
@ 2010-02-11 10:53 Dmitry Monakhov
  2010-02-11 10:57 ` [PATCH 2/4] block: support compat discard mode by default Dmitry Monakhov
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 10:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, Dmitry Monakhov

Currently there are no many discs has native TRIM (aka) discard
feature support. But in fact this is good feature. We can easily
simlulate it for devices which has not native support.
In compat mode discard dequest transforms in to simple zerofiled
write request.
In fact currently blkdev_issue_discard function implemented
incorrectly.
1) Whait flags not optimal we dont have to wait for each bio in flight.
2) Not wait by default. Which makes it fairly useless.
3) Send each bio with barrier flag(if requested). Which result in
   bad performance. In fact caller just want to make shure that full
   request is completed and ordered against other requests.
5) It use allocated_page instead of ZERO_PAGE.

This patch introduce generic blkdev_issue_zeroout() function which also
may be used for native discard request support, in this case zero payload
simply ignored.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/blk-barrier.c    |  190 +++++++++++++++++++++++++++++-------------------
 block/ioctl.c          |    3 +-
 include/linux/blkdev.h |    6 +-
 3 files changed, 121 insertions(+), 78 deletions(-)

diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index 8618d89..5894a38 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -340,106 +340,148 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-static void blkdev_discard_end_io(struct bio *bio, int err)
+struct bio_batch
 {
+	atomic_t done;
+	unsigned long flags;
+	struct completion *wait;
+};
+
+static void bio_batch_end_io(struct bio *bio, int err)
+{
+	struct bio_batch *bb = bio->bi_private;
 	if (err) {
 		if (err == -EOPNOTSUPP)
-			set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+			set_bit(BIO_EOPNOTSUPP, &bb->flags);
+		else
+			clear_bit(BIO_UPTODATE, &bb->flags);
 	}
-
-	if (bio->bi_private)
-		complete(bio->bi_private);
-	__free_page(bio_page(bio));
-
+	atomic_inc(&bb->done);
+	complete(bb->wait);
 	bio_put(bio);
 }
 
 /**
- * blkdev_issue_discard - queue a discard
- * @bdev:	blockdev to issue discard for
+ * blkdev_issue_zeroout generate number of zero filed write bios
+ * @bdev:	blockdev to issue
  * @sector:	start sector
  * @nr_sects:	number of sectors to discard
  * @gfp_mask:	memory allocation flags (for bio_alloc)
- * @flags:	DISCARD_FL_* flags to control behaviour
+ * @rw:		RW flags
  *
  * Description:
- *    Issue a discard request for the sectors in question.
+ *  Generate and issue number of bios with zerofiled pages.
+ *  Send barrier at the end if requested. This guarantie that. All bios
+ *  submitted before the barrier will be completed before the barrier.
+ *  Empty barrier allow us to avoid post queue flush.
  */
-int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask, int flags)
+
+int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, sector_t max_sects, gfp_t mask, int rw)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
-	struct request_queue *q = bdev_get_queue(bdev);
-	int type = flags & DISCARD_FL_BARRIER ?
-		DISCARD_BARRIER : DISCARD_NOBARRIER;
-	struct bio *bio;
-	struct page *page;
 	int ret = 0;
+	struct bio *bio;
+	struct bio_batch bb;
+	unsigned int sz, issued = 0;
+	DECLARE_COMPLETION_ONSTACK(wait);
+	unsigned int do_barrier = rw | (1 << BIO_RW_BARRIER);
+	rw &= ~(1 << BIO_RW_BARRIER);
+	BUG_ON(!(rw | (1 << BIO_RW)));
 
-	if (!q)
-		return -ENXIO;
-
-	if (!blk_queue_discard(q))
-		return -EOPNOTSUPP;
-
-	while (nr_sects && !ret) {
-		unsigned int sector_size = q->limits.logical_block_size;
-		unsigned int max_discard_sectors =
-			min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+	atomic_set(&bb.done, 0);
+	bb.flags = 1 << BIO_UPTODATE;
+	bb.wait = &wait;
 
-		bio = bio_alloc(gfp_mask, 1);
+submit:
+	while (nr_sects != 0) {
+		bio = bio_alloc(mask, min(nr_sects, (sector_t)BIO_MAX_PAGES));
 		if (!bio)
-			goto out;
-		bio->bi_sector = sector;
-		bio->bi_end_io = blkdev_discard_end_io;
-		bio->bi_bdev = bdev;
-		if (flags & DISCARD_FL_WAIT)
-			bio->bi_private = &wait;
-
-		/*
-		 * Add a zeroed one-sector payload as that's what
-		 * our current implementations need.  If we'll ever need
-		 * more the interface will need revisiting.
-		 */
-		page = alloc_page(gfp_mask | __GFP_ZERO);
-		if (!page)
-			goto out_free_bio;
-		if (bio_add_pc_page(q, bio, page, sector_size, 0) < sector_size)
-			goto out_free_page;
+			break;
 
+		bio->bi_sector = sector;
+		bio->bi_bdev   = bdev;
+		bio->bi_private = &bb;
+		bio->bi_end_io = bio_batch_end_io;
+
+		while(nr_sects != 0) {
+			sz = min(PAGE_SIZE >> 9 , nr_sects);
+			if (max_sects - bio_sectors(bio) < sz)
+				sz = max_sects - bio_sectors(bio);
+			if (sz == 0)
+				/* bio has maximum size possible */
+				break;
+			ret = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0);
+			nr_sects -= ret >> 9;
+			sector += ret >> 9;
+			if (ret != sz)
+				break;
+		}
+		issued++;
+		submit_bio(rw, bio);
+	}
+	if (nr_sects == 0) {
 		/*
-		 * And override the bio size - the way discard works we
-		 * touch many more blocks on disk than the actual payload
-		 * length.
+		 * We have issued all data. Send final barrier if necessery.
 		 */
-		if (nr_sects > max_discard_sectors) {
-			bio->bi_size = max_discard_sectors << 9;
-			nr_sects -= max_discard_sectors;
-			sector += max_discard_sectors;
-		} else {
-			bio->bi_size = nr_sects << 9;
-			nr_sects = 0;
-		}
+		if (do_barrier)
+			ret = blkdev_issue_flush(bdev, NULL);
+	}
+	/* Wait for submitted bios and then continue */
+	while (issued != atomic_read(&bb.done))
+		wait_for_completion(&wait);
 
-		bio_get(bio);
-		submit_bio(type, bio);
+	if (!test_bit(BIO_UPTODATE, &bb.flags))
+		/* One of bios in the batch was completed with error.*/
+		ret = -EIO;
 
-		if (flags & DISCARD_FL_WAIT)
-			wait_for_completion(&wait);
+	if (ret)
+		goto out;
 
-		if (bio_flagged(bio, BIO_EOPNOTSUPP))
-			ret = -EOPNOTSUPP;
-		else if (!bio_flagged(bio, BIO_UPTODATE))
-			ret = -EIO;
-		bio_put(bio);
+	if (test_bit(BIO_EOPNOTSUPP, &bb.flags)) {
+		ret = -EOPNOTSUPP;
+		goto out;
 	}
-	return ret;
-out_free_page:
-	__free_page(page);
-out_free_bio:
-	bio_put(bio);
+	if (nr_sects != 0)
+		goto submit;
 out:
-	return -ENOMEM;
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_discard - queue a discard
+ * @bdev:	blockdev to issue discard for
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to discard
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @flags:	DISCARD_FL_* flags to control behaviour
+ *
+ * Description:
+ *    Issue a discard request for the sectors in question.
+ */
+int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, int flags)
+{
+	int type = flags & DISCARD_FL_BARRIER ?
+		DISCARD_BARRIER : DISCARD_NOBARRIER;
+	struct request_queue *q = bdev_get_queue(bdev);
+	unsigned int max_size;
+
+	if (!blk_queue_discard(q) && !(flags & DISCARD_FL_BARRIER))
+		return -EOPNOTSUPP;
+	/*
+	 * Generate request with zeroed payload.
+	 * If device has native discard support it simply ignore this payload.
+	 * In case of compat mode this request will be sent as a simple
+	 * write request.
+	 */
+	if (!blk_queue_discard(q)) {
+		type &= ~(1 << BIO_RW_DISCARD);
+		max_size = UINT_MAX >> 9;
+	} else {
+		max_size = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+	}
+	return blkdev_issue_zeroout(bdev, sector, nr_sects, max_size,
+				gfp_mask, type);
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
diff --git a/block/ioctl.c b/block/ioctl.c
index be48ea5..384b71c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -124,8 +124,7 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
 
 	if (start + len > (bdev->bd_inode->i_size >> 9))
 		return -EINVAL;
-	return blkdev_issue_discard(bdev, start, len, GFP_KERNEL,
-				    DISCARD_FL_WAIT);
+	return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, 0);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ffb13ad..c762c9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -997,8 +997,10 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 }
 
 extern int blkdev_issue_flush(struct block_device *, sector_t *);
-#define DISCARD_FL_WAIT		0x01	/* wait for completion */
-#define DISCARD_FL_BARRIER	0x02	/* issue DISCARD_BARRIER request */
+#define DISCARD_FL_BARRIER	0x01	/* issue DISCARD_BARRIER request */
+#define DISCARD_FL_COMPAT	0x02	/* allow discard compat request mode */
+extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, sector_t max_sects, gfp_t mask, int rw);
 extern int blkdev_issue_discard(struct block_device *, sector_t sector,
 		sector_t nr_sects, gfp_t, int flags);
 
-- 
1.6.6


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] block: support compat discard mode by default.
  2010-02-11 10:53 [PATCH 1/4] block: implement compatible DISCARD support Dmitry Monakhov
@ 2010-02-11 10:57 ` Dmitry Monakhov
  2010-02-11 11:25   ` Dmitry Monakhov
  2010-02-11 12:22   ` Christoph Hellwig
  2010-02-11 10:59   ` Dmitry Monakhov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 10:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, Dmitry Monakhov

Currently there are many filesystems which has implemented
discard support, but ssd discs not widely used yet.
Let's allow user to use compat discard mode by default.
After this feature is enabled by default for all devices which has
no native discard support it will be possible to use this feature
simply by passing appropriate mount option to fs (-odiscard in ext4)

This default option has many advantages:
- Hope that this helps in real filesystem testing.
- People who are crazy about data security whould be really happy.
- Virtual machine developers also would like this feature.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 include/linux/blkdev.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c762c9f..d7d028c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1010,7 +1010,7 @@ static inline int sb_issue_discard(struct super_block *sb,
 	block <<= (sb->s_blocksize_bits - 9);
 	nr_blocks <<= (sb->s_blocksize_bits - 9);
 	return blkdev_issue_discard(sb->s_bdev, block, nr_blocks, GFP_KERNEL,
-				    DISCARD_FL_BARRIER);
+				    DISCARD_FL_BARRIER|DISCARD_FL_COMPAT);
 }
 
 extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
-- 
1.6.6


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ext4: convert extent zeroout to generic function
  2010-02-11 10:53 [PATCH 1/4] block: implement compatible DISCARD support Dmitry Monakhov
@ 2010-02-11 10:59   ` Dmitry Monakhov
  2010-02-11 10:59   ` Dmitry Monakhov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 10:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, linux-ext4, Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   70 +++-------------------------------------------------
 1 files changed, 4 insertions(+), 66 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 765a482..ff4473a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2432,76 +2432,14 @@ void ext4_ext_release(struct super_block *sb)
 #endif
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-	complete((struct completion *)bio->bi_private);
-}
-
 /* FIXME!! we need to try to merge to left or right after zero-out  */
 static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
 {
-	int ret = -EIO;
-	struct bio *bio;
-	int blkbits, blocksize;
-	sector_t ee_pblock;
-	struct completion event;
-	unsigned int ee_len, len, done, offset;
-
-
-	blkbits   = inode->i_blkbits;
-	blocksize = inode->i_sb->s_blocksize;
-	ee_len    = ext4_ext_get_actual_len(ex);
-	ee_pblock = ext_pblock(ex);
+	sector_t nr_sects = ((sector_t)ext4_ext_get_actual_len(ex)) <<
+		inode->i_blkbits;
 
-	/* convert ee_pblock to 512 byte sectors */
-	ee_pblock = ee_pblock << (blkbits - 9);
-
-	while (ee_len > 0) {
-
-		if (ee_len > BIO_MAX_PAGES)
-			len = BIO_MAX_PAGES;
-		else
-			len = ee_len;
-
-		bio = bio_alloc(GFP_NOIO, len);
-		bio->bi_sector = ee_pblock;
-		bio->bi_bdev   = inode->i_sb->s_bdev;
-
-		done = 0;
-		offset = 0;
-		while (done < len) {
-			ret = bio_add_page(bio, ZERO_PAGE(0),
-							blocksize, offset);
-			if (ret != blocksize) {
-				/*
-				 * We can't add any more pages because of
-				 * hardware limitations.  Start a new bio.
-				 */
-				break;
-			}
-			done++;
-			offset += blocksize;
-			if (offset >= PAGE_CACHE_SIZE)
-				offset = 0;
-		}
-
-		init_completion(&event);
-		bio->bi_private = &event;
-		bio->bi_end_io = bi_complete;
-		submit_bio(WRITE, bio);
-		wait_for_completion(&event);
-
-		if (test_bit(BIO_UPTODATE, &bio->bi_flags))
-			ret = 0;
-		else {
-			ret = -EIO;
-			break;
-		}
-		bio_put(bio);
-		ee_len    -= done;
-		ee_pblock += done  << (blkbits - 9);
-	}
-	return ret;
+	return blkdev_issue_zeroout(inode->i_sb->s_bdev, ext_pblock(ex),
+				nr_sects, UINT_MAX >> 9, GFP_NOIO, WRITE);
 }
 
 #define EXT4_EXT_ZERO_LEN 7
-- 
1.6.6


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ext4: convert extent zeroout to generic function
@ 2010-02-11 10:59   ` Dmitry Monakhov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 10:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, linux-ext4, Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   70 +++-------------------------------------------------
 1 files changed, 4 insertions(+), 66 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 765a482..ff4473a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2432,76 +2432,14 @@ void ext4_ext_release(struct super_block *sb)
 #endif
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-	complete((struct completion *)bio->bi_private);
-}
-
 /* FIXME!! we need to try to merge to left or right after zero-out  */
 static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
 {
-	int ret = -EIO;
-	struct bio *bio;
-	int blkbits, blocksize;
-	sector_t ee_pblock;
-	struct completion event;
-	unsigned int ee_len, len, done, offset;
-
-
-	blkbits   = inode->i_blkbits;
-	blocksize = inode->i_sb->s_blocksize;
-	ee_len    = ext4_ext_get_actual_len(ex);
-	ee_pblock = ext_pblock(ex);
+	sector_t nr_sects = ((sector_t)ext4_ext_get_actual_len(ex)) <<
+		inode->i_blkbits;
 
-	/* convert ee_pblock to 512 byte sectors */
-	ee_pblock = ee_pblock << (blkbits - 9);
-
-	while (ee_len > 0) {
-
-		if (ee_len > BIO_MAX_PAGES)
-			len = BIO_MAX_PAGES;
-		else
-			len = ee_len;
-
-		bio = bio_alloc(GFP_NOIO, len);
-		bio->bi_sector = ee_pblock;
-		bio->bi_bdev   = inode->i_sb->s_bdev;
-
-		done = 0;
-		offset = 0;
-		while (done < len) {
-			ret = bio_add_page(bio, ZERO_PAGE(0),
-							blocksize, offset);
-			if (ret != blocksize) {
-				/*
-				 * We can't add any more pages because of
-				 * hardware limitations.  Start a new bio.
-				 */
-				break;
-			}
-			done++;
-			offset += blocksize;
-			if (offset >= PAGE_CACHE_SIZE)
-				offset = 0;
-		}
-
-		init_completion(&event);
-		bio->bi_private = &event;
-		bio->bi_end_io = bi_complete;
-		submit_bio(WRITE, bio);
-		wait_for_completion(&event);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] btrfs: add discard_compat support
  2010-02-11 10:53 [PATCH 1/4] block: implement compatible DISCARD support Dmitry Monakhov
  2010-02-11 10:57 ` [PATCH 2/4] block: support compat discard mode by default Dmitry Monakhov
  2010-02-11 10:59   ` Dmitry Monakhov
@ 2010-02-11 11:01 ` Dmitry Monakhov
  2010-02-11 11:15   ` Dmitry Monakhov
  2010-02-11 12:21 ` [PATCH 1/4] block: implement compatible DISCARD support Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 11:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, linux-btrfs, Dmitry Monakhov

If any device in the list has no native discard support we add
discard compat support. Devices with native discard support still
getting real discard requests.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/btrfs/ctree.h       |    1 +
 fs/btrfs/disk-io.c     |    8 ++++++++
 fs/btrfs/extent-tree.c |   14 ++++++++------
 fs/btrfs/super.c       |    8 +++++++-
 fs/btrfs/volumes.c     |    6 ++++++
 fs/btrfs/volumes.h     |    6 +++++-
 6 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9f806dd..54854d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1161,6 +1161,7 @@ struct btrfs_root {
 #define BTRFS_MOUNT_SSD_SPREAD		(1 << 8)
 #define BTRFS_MOUNT_NOSSD		(1 << 9)
 #define BTRFS_MOUNT_DISCARD		(1 << 10)
+#define BTRFS_MOUNT_DISCARD_COMPAT	(1 << 11)
 
 #define btrfs_clear_opt(o, opt)		((o) &= ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)		((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 009e3bd..c7d4812 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1945,6 +1945,14 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 		       "mode\n");
 		btrfs_set_opt(fs_info->mount_opt, SSD);
 	}
+	if (btrfs_test_opt(tree_root, DISCARD) &&
+	    fs_info->fs_devices->discard_compat) {
+		printk(KERN_INFO "Btrfs detected devices without native discard"
+			" support, enabling discard_compat mode mode\n");
+		btrfs_clear_opt(fs_info->mount_opt, DISCARD);
+		btrfs_set_opt(fs_info->mount_opt, DISCARD_COMPAT);
+	}
+
 
 	if (btrfs_super_log_root(disk_super) != 0) {
 		u64 bytenr = btrfs_super_log_root(disk_super);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 432a2da..b4c3124 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1585,20 +1585,21 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans,
 }
 
 static void btrfs_issue_discard(struct block_device *bdev,
-				u64 start, u64 len)
+				u64 start, u64 len, int do_compat)
 {
 	blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL,
-			     DISCARD_FL_BARRIER);
+			     DISCARD_FL_BARRIER |
+				(do_compat ? DISCARD_FL_COMPAT : 0));
 }
 
 static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
 				u64 num_bytes)
 {
-	int ret;
+	int ret, do_compat = btrfs_test_opt(root, DISCARD_COMPAT);
 	u64 map_length = num_bytes;
 	struct btrfs_multi_bio *multi = NULL;
 
-	if (!btrfs_test_opt(root, DISCARD))
+	if (!btrfs_test_opt(root, DISCARD) && !do_compat)
 		return 0;
 
 	/* Tell the block device(s) that the sectors can be discarded */
@@ -1614,7 +1615,8 @@ static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
 		for (i = 0; i < multi->num_stripes; i++, stripe++) {
 			btrfs_issue_discard(stripe->dev->bdev,
 					    stripe->physical,
-					    map_length);
+					    map_length,
+					    do_compat);
 		}
 		kfree(multi);
 	}
@@ -3700,7 +3702,7 @@ static int pin_down_bytes(struct btrfs_trans_handle *trans,
 	 * individual btree blocks isn't a good plan.  Just
 	 * pin everything in discard mode.
 	 */
-	if (btrfs_test_opt(root, DISCARD))
+	if (btrfs_test_opt(root, DISCARD) || btrfs_test_opt(root, DISCARD_COMPAT))
 		goto pinit;
 
 	buf = btrfs_find_tree_block(root, bytenr, num_bytes);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3f9b457..fe7ccf4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -67,7 +67,7 @@ enum {
 	Opt_max_extent, Opt_max_inline, Opt_alloc_start, Opt_nobarrier,
 	Opt_ssd, Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl,
 	Opt_compress, Opt_notreelog, Opt_ratio, Opt_flushoncommit,
-	Opt_discard, Opt_err,
+	Opt_discard, Opt_discard_compat, Opt_err,
 };
 
 static match_table_t tokens = {
@@ -90,6 +90,7 @@ static match_table_t tokens = {
 	{Opt_flushoncommit, "flushoncommit"},
 	{Opt_ratio, "metadata_ratio=%d"},
 	{Opt_discard, "discard"},
+	{Opt_discard_compat, "discard_compat"},
 	{Opt_err, NULL},
 };
 
@@ -263,6 +264,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options)
 		case Opt_discard:
 			btrfs_set_opt(info->mount_opt, DISCARD);
 			break;
+		case Opt_discard_compat:
+			btrfs_set_opt(info->mount_opt, DISCARD_COMPAT);
+			break;
 		case Opt_err:
 			printk(KERN_INFO "btrfs: unrecognized mount option "
 			       "'%s'\n", p);
@@ -459,6 +463,8 @@ static int btrfs_show_options(struct seq_file *seq, struct vfsmount *vfs)
 		seq_puts(seq, ",flushoncommit");
 	if (btrfs_test_opt(root, DISCARD))
 		seq_puts(seq, ",discard");
+	if (btrfs_test_opt(root, DISCARD_COMPAT))
+		seq_puts(seq, ",discard_compat");
 	if (!(root->fs_info->sb->s_flags & MS_POSIXACL))
 		seq_puts(seq, ",noacl");
 	return 0;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 220dad5..02e18c8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -621,6 +621,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		if (!blk_queue_nonrot(bdev_get_queue(bdev)))
 			fs_devices->rotating = 1;
 
+		if (!blk_queue_discard(bdev_get_queue(bdev)))
+			fs_devices->discard_compat = 1;
+
 		fs_devices->open_devices++;
 		if (device->writeable) {
 			fs_devices->rw_devices++;
@@ -1522,6 +1525,9 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 	if (!blk_queue_nonrot(bdev_get_queue(bdev)))
 		root->fs_info->fs_devices->rotating = 1;
 
+	if (!blk_queue_discard(bdev_get_queue(bdev)))
+		root->fs_info->fs_devices->discard_compat = 1;
+
 	total_bytes = btrfs_super_total_bytes(&root->fs_info->super_copy);
 	btrfs_set_super_total_bytes(&root->fs_info->super_copy,
 				    total_bytes + device->total_bytes);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 31b0fab..ceb66f0 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -116,7 +116,11 @@ struct btrfs_fs_devices {
 	/* set when we find or add a device that doesn't have the
 	 * nonrot flag set
 	 */
-	int rotating;
+	unsigned int rotating:1;
+	/* set then we find or add a device that doesn't have the
+	 * DISCARD flags set
+	 */
+	unsigned int discard_compat:1;
 };
 
 struct btrfs_bio_stripe {
-- 
1.6.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] btrfs: add discard_compat support
  2010-02-11 11:01 ` [PATCH 4/4] btrfs: add discard_compat support Dmitry Monakhov
@ 2010-02-11 11:15   ` Dmitry Monakhov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 11:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-btrfs

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> If any device in the list has no native discard support we add
> discard compat support. Devices with native discard support still
> getting real discard requests.
Note: Seems what enabling discard requests result in triggering
hidden bug. Orphan list corruption.

  ------------[ cut here ]------------
 WARNING: at lib/list_debug.c:26 __list_add+0x3f/0x81()
 Hardware name:         
 list_add corruption. next->prev should be prev (ffff88012029eb20), but was ffff88012239c730. (next=ffff8801223bc348).
 Modules linked in: btrfs zlib_deflate libcrc32c cpufreq_ondemand acpi_cpufreq freq_table ppdev parport_pc parport pcspkr [last unloaded: btrfs]
 Pid: 1254, comm: fsstress Not tainted 2.6.33-rc5 #3
 Call Trace:
 [<ffffffff81038f79>] warn_slowpath_common+0x7c/0x94
 [<ffffffff81038fe8>] warn_slowpath_fmt+0x41/0x43
 [<ffffffffa00df87a>] ? btrfs_unlink_inode+0x243/0x258 [btrfs]
 [<ffffffff811c0e22>] __list_add+0x3f/0x81
 [<ffffffffa00dfac7>] btrfs_orphan_add+0x61/0x80 [btrfs]
 [<ffffffffa00e015a>] btrfs_unlink+0xb2/0xf2 [btrfs]
 [<ffffffff810ec5a3>] vfs_unlink+0x67/0xa2
 [<ffffffff810eafec>] ? lookup_hash+0x36/0x3a
 [<ffffffff810edd97>] do_unlinkat+0xcd/0x15b
 [<ffffffff810eabc4>] ? path_put+0x22/0x27
 [<ffffffff8107edac>] ? audit_syscall_entry+0x11e/0x14a
 [<ffffffff810ede3b>] sys_unlink+0x16/0x18
 [<ffffffff81002adb>] system_call_fastpath+0x16/0x1b
 ---[ end trace 18d84c057a649cf7 ]---
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/inode.c:3320!
 invalid opcode: 0000 [#1] SMP 
 last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
 CPU 3 
 Pid: 1250, comm: fsstress Tainted: G        W  2.6.33-rc5 #3 DH55TC                /        
 RIP: 0010:[<ffffffffa00e071b>]  [<ffffffffa00e071b>] btrfs_setattr+0x101/0x244 [btrfs]
 RSP: 0018:ffff88012095de08  EFLAGS: 00010292
 RAX: 0000000000000021 RBX: ffff8801223b4bd8 RCX: 000000000000711a
 RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffffff81730984
 RBP: ffff88012095de48 R08: ffff88012095ddb4 R09: 0000000000000000
 R10: 0000000000000001 R11: ffff88012095dd00 R12: ffff88012095ded8
 R13: ffff88012029e800 R14: ffff88012233d140 R15: 0000000000000000
 FS:  00007f313c3c6700(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffa0ac3f4c CR3: 00000001212c7000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process fsstress (pid: 1250, threadinfo ffff88012095c000, task ffff880122c68000)
 Stack:
 ffff8801223b4cb0 ffff88011c452180 ffff88012095de28 ffff88012095ded8
 <0> ffff88011c452180 ffff8801223b4bd8 0000000000000008 0000000000000008
 <0> ffff88012095deb8 ffffffff810f5e52 ffff88012095df38 ffffffff810ee418
 Call Trace:
 [<ffffffff810f5e52>] notify_change+0x17b/0x2c9
 [<ffffffff810ee418>] ? user_path_at+0x64/0x93
 [<ffffffff810e1e0d>] do_truncate+0x6a/0x87
 [<ffffffff810ea962>] ? generic_permission+0x1c/0x9f
 [<ffffffff810eaa5e>] ? get_write_access+0x1d/0x48
 [<ffffffff810e1f46>] sys_truncate+0x11c/0x160
 [<ffffffff81002adb>] system_call_fastpath+0x16/0x1b
 Code: e0 48 89 de 4c 89 f7 49 89 46 20 e8 66 f3 ff ff 85 c0 74 1b 89 c2 48 c7 c6 e0 b4 10 a0 48 c7 c7 9c d1 10 a0 31 c0 e8 7c 5c 25 e1 <0f> 0b eb 3d 49 8b 46 10 4c 89 ee 4c 89 f7 48 89 45 c8 e8 47 7d 
 RIP  [<ffffffffa00e071b>] btrfs_setattr+0x101/0x244 [btrfs]
 RSP <ffff88012095de08>
 ---[ end trace 18d84c057a649cf8 ]---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] block: support compat discard mode by default.
  2010-02-11 10:57 ` [PATCH 2/4] block: support compat discard mode by default Dmitry Monakhov
@ 2010-02-11 11:25   ` Dmitry Monakhov
  2010-02-11 12:22   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 11:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, linux-ext4

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Currently there are many filesystems which has implemented
> discard support, but ssd discs not widely used yet.
> Let's allow user to use compat discard mode by default.
> After this feature is enabled by default for all devices which has
> no native discard support it will be possible to use this feature
> simply by passing appropriate mount option to fs (-odiscard in ext4)
BTW i've run tested ext4 with -odiscard option, and it survived more
24hour of stress tests test which consists of fsstress, compilation,
and etc
>
> This default option has many advantages:
> - Hope that this helps in real filesystem testing.
> - People who are crazy about data security whould be really happy.
> - Virtual machine developers also would like this feature.
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  include/linux/blkdev.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c762c9f..d7d028c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1010,7 +1010,7 @@ static inline int sb_issue_discard(struct super_block *sb,
>  	block <<= (sb->s_blocksize_bits - 9);
>  	nr_blocks <<= (sb->s_blocksize_bits - 9);
>  	return blkdev_issue_discard(sb->s_bdev, block, nr_blocks, GFP_KERNEL,
> -				    DISCARD_FL_BARRIER);
> +				    DISCARD_FL_BARRIER|DISCARD_FL_COMPAT);
>  }
>  
>  extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 10:53 [PATCH 1/4] block: implement compatible DISCARD support Dmitry Monakhov
                   ` (2 preceding siblings ...)
  2010-02-11 11:01 ` [PATCH 4/4] btrfs: add discard_compat support Dmitry Monakhov
@ 2010-02-11 12:21 ` Christoph Hellwig
  2010-02-11 12:59   ` Dmitry Monakhov
  2010-02-11 14:40   ` Martin K. Petersen
  3 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-02-11 12:21 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, jens.axboe

On Thu, Feb 11, 2010 at 01:53:45PM +0300, Dmitry Monakhov wrote:
> Currently there are no many discs has native TRIM (aka) discard
> feature support. But in fact this is good feature. We can easily
> simlulate it for devices which has not native support.
> In compat mode discard dequest transforms in to simple zerofiled
> write request.
> In fact currently blkdev_issue_discard function implemented
> incorrectly.
> 1) Whait flags not optimal we dont have to wait for each bio in flight.

Indeed.

> 2) Not wait by default. Which makes it fairly useless.

Not sure what you mean with that.  We do not need to wait for the
discard request for the "online" type of use - the barrier flag
means we can't re-order I/O before and after the request so there
is no reason to wait - it stays in the places where it needs to be
in the I/O stream.

> 3) Send each bio with barrier flag(if requested). Which result in
>    bad performance. In fact caller just want to make shure that full
>    request is completed and ordered against other requests.

Yes, we need to ensure ordering only against reordering with other
requests.  Your patch only issues a cache flush, which means that
we miss the queue drain before submitting the discard bios

> 5) It use allocated_page instead of ZERO_PAGE.

That's incorrect - both the scsi WRITE SAME and ATA UNMAP
implementations write to the payload.  I have some plans to change that
an get rid of the payload entirely, but I need to get back to the
discard work and look at it in more detail.

> This patch introduce generic blkdev_issue_zeroout() function which also
> may be used for native discard request support, in this case zero payload
> simply ignored.

Which is a bit different from fixing efficiency issues in discard, I'd
prefer that to be split into a separate patch, especially as there might
be quite a bit of discussion on the zeroout behaviour.

Note that a discard is not actually required to zero out data it
has discarded, it's an optional feature in the command sets. 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] block: support compat discard mode by default.
  2010-02-11 10:57 ` [PATCH 2/4] block: support compat discard mode by default Dmitry Monakhov
  2010-02-11 11:25   ` Dmitry Monakhov
@ 2010-02-11 12:22   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-02-11 12:22 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, jens.axboe

On Thu, Feb 11, 2010 at 01:57:50PM +0300, Dmitry Monakhov wrote:
> Currently there are many filesystems which has implemented
> discard support, but ssd discs not widely used yet.
> Let's allow user to use compat discard mode by default.
> After this feature is enabled by default for all devices which has
> no native discard support it will be possible to use this feature
> simply by passing appropriate mount option to fs (-odiscard in ext4)

So why do you even bother adding the DISCARD_FL_COMPAT option?
This is a debug option that should probably be controlled at the
block level, not anywhere near the filesystem.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 12:21 ` [PATCH 1/4] block: implement compatible DISCARD support Christoph Hellwig
@ 2010-02-11 12:59   ` Dmitry Monakhov
  2010-02-11 13:09     ` Christoph Hellwig
  2010-02-11 14:40   ` Martin K. Petersen
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 12:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, jens.axboe

Christoph Hellwig <hch@infradead.org> writes:

> On Thu, Feb 11, 2010 at 01:53:45PM +0300, Dmitry Monakhov wrote:
>> Currently there are no many discs has native TRIM (aka) discard
>> feature support. But in fact this is good feature. We can easily
>> simlulate it for devices which has not native support.
>> In compat mode discard dequest transforms in to simple zerofiled
>> write request.
>> In fact currently blkdev_issue_discard function implemented
>> incorrectly.
>> 1) Whait flags not optimal we dont have to wait for each bio in flight.
>
> Indeed.
>
>> 2) Not wait by default. Which makes it fairly useless.
>
> Not sure what you mean with that.  We do not need to wait for the
> discard request for the "online" type of use - the barrier flag
> means we can't re-order I/O before and after the request so there
> is no reason to wait - it stays in the places where it needs to be
> in the I/O stream.
I mean that it is impossible to know was it really successful or not.
We may just replace this function with this following function.
const char* make_me_hapy()
{
   return "you are happy already.";
}

AFAIK Currently there is no any generic block interface which send
io requests without ability to check the result. 
The question is should it be sync or async it is not easy to design
simple async interface so let's use sync by default
BTW: That's why blkdev_issue_barrier has to wait by default. 
>
>> 3) Send each bio with barrier flag(if requested). Which result in
>>    bad performance. In fact caller just want to make shure that full
>>    request is completed and ordered against other requests.
>
> Yes, we need to ensure ordering only against reordering with other
> requests.  Your patch only issues a cache flush, which means that
> we miss the queue drain before submitting the discard bios
Yes you right from theoretical point of view. But practically
It is hard to imagine that some one send something like this:
read_bio(sectcor,..)
discard_bio(sector,..BARRIER)
But off course you right, I'll add ability to pass pre_flush
barrier in next patch version.
>
>> 5) It use allocated_page instead of ZERO_PAGE.
>
> That's incorrect - both the scsi WRITE SAME and ATA UNMAP
> implementations write to the payload. 
WOW. What for?
> I have some plans to change that
> an get rid of the payload entirely, but I need to get back to the
> discard work and look at it in more detail.

>
>> This patch introduce generic blkdev_issue_zeroout() function which also
>> may be used for native discard request support, in this case zero payload
>> simply ignored.
>
> Which is a bit different from fixing efficiency issues in discard, I'd
> prefer that to be split into a separate patch, especially as there might
> be quite a bit of discussion on the zeroout behaviour.
Seems that you also right here. At list it is not obvious how we should
send compat_discard bios WRITE,WRITE_SYNC or WRITE_SYNC_PLUG?
But blkdev_issue_zeroout() interface allow all this flags.
let's wait a bit and i'll redesign the patchset correspondingly
>
> Note that a discard is not actually required to zero out data it
> has discarded, it's an optional feature in the command sets. 
Yes, i know. But zero payload will be used only by compat mode.
We assumes that default compat mode zero data on descard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 12:59   ` Dmitry Monakhov
@ 2010-02-11 13:09     ` Christoph Hellwig
  2010-02-11 13:45       ` Dmitry Monakhov
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2010-02-11 13:09 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Christoph Hellwig, linux-kernel, jens.axboe

On Thu, Feb 11, 2010 at 03:59:31PM +0300, Dmitry Monakhov wrote:
> I mean that it is impossible to know was it really successful or not.
> We may just replace this function with this following function.
> const char* make_me_hapy()
> {
>    return "you are happy already.";
> }

Which is an entirely valid, although suboptimal implementation.

> AFAIK Currently there is no any generic block interface which send
> io requests without ability to check the result. 
> The question is should it be sync or async it is not easy to design
> simple async interface so let's use sync by default
> BTW: That's why blkdev_issue_barrier has to wait by default. 

This is going to kill performance.

> > That's incorrect - both the scsi WRITE SAME and ATA UNMAP
> > implementations write to the payload. 
> WOW. What for?

Becuase these commands contain ranges of to be flushed blocks
in their payload.

> > Which is a bit different from fixing efficiency issues in discard, I'd
> > prefer that to be split into a separate patch, especially as there might
> > be quite a bit of discussion on the zeroout behaviour.
> Seems that you also right here. At list it is not obvious how we should
> send compat_discard bios WRITE,WRITE_SYNC or WRITE_SYNC_PLUG?
> But blkdev_issue_zeroout() interface allow all this flags.
> let's wait a bit and i'll redesign the patchset correspondingly

The !wait case is ansynchronous, so WRITE seems fine.  The wait and
!barrier case is more interesting, as this one is very close to
synchrous write semantics.  But I'm not sure it's really worth
optimization for that, this case is not used for online discarding
but things like mkfs that have the filesystem for itself, or that
not yet merged background discard for xfs which doesn't care too
much about I/O priority.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 13:09     ` Christoph Hellwig
@ 2010-02-11 13:45       ` Dmitry Monakhov
  2010-02-11 14:06         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-02-11 13:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, jens.axboe

Christoph Hellwig <hch@infradead.org> writes:

> On Thu, Feb 11, 2010 at 03:59:31PM +0300, Dmitry Monakhov wrote:
>> I mean that it is impossible to know was it really successful or not.
>> We may just replace this function with this following function.
>> const char* make_me_hapy()
>> {
>>    return "you are happy already.";
>> }
>
> Which is an entirely valid, although suboptimal implementation.
>
>> AFAIK Currently there is no any generic block interface which send
>> io requests without ability to check the result. 
>> The question is should it be sync or async it is not easy to design
>> simple async interface so let's use sync by default
>> BTW: That's why blkdev_issue_barrier has to wait by default. 
In fact wait is the only interface for issue_barrier.
> This is going to kill performance.
But it may be reasonable to allow caller to choose would it
wait and work fair, or to cheat in a name of performance.
>
>> > That's incorrect - both the scsi WRITE SAME and ATA UNMAP
>> > implementations write to the payload. 
>> WOW. What for?
>
> Becuase these commands contain ranges of to be flushed blocks
> in their payload.
Ok i've found. 
libata-scsi.c: ata_scsi_write_same_xlat
                  ata_set_lba_range_entries
It's was not obvious from the first glance. But it is the way how it
works for now. But seems what we still optimize things a bit
1) alloc page with GFP_HIGHUSER (because x86 arch still used)
2) Share page between eight bios.
>
>> > Which is a bit different from fixing efficiency issues in discard, I'd
>> > prefer that to be split into a separate patch, especially as there might
>> > be quite a bit of discussion on the zeroout behaviour.
>> Seems that you also right here. At list it is not obvious how we should
>> send compat_discard bios WRITE,WRITE_SYNC or WRITE_SYNC_PLUG?
>> But blkdev_issue_zeroout() interface allow all this flags.
>> let's wait a bit and i'll redesign the patchset correspondingly
>
> The !wait case is ansynchronous, so WRITE seems fine.  The wait and
> !barrier case is more interesting, as this one is very close to
> synchrous write semantics.  But I'm not sure it's really worth
> optimization for that, this case is not used for online discarding
> but things like mkfs that have the filesystem for itself, or that
> not yet merged background discard for xfs which doesn't care too
> much about I/O priority.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 13:45       ` Dmitry Monakhov
@ 2010-02-11 14:06         ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-02-11 14:06 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Christoph Hellwig, linux-kernel, jens.axboe

On Thu, Feb 11, 2010 at 04:45:13PM +0300, Dmitry Monakhov wrote:
> >> The question is should it be sync or async it is not easy to design
> >> simple async interface so let's use sync by default
> >> BTW: That's why blkdev_issue_barrier has to wait by default. 
> In fact wait is the only interface for issue_barrier.
> > This is going to kill performance.
> But it may be reasonable to allow caller to choose would it
> wait and work fair, or to cheat in a name of performance.

It's not a cheat.  The discard is a _hint_ to the hardware that it
can reclaim space.  Think become a bit different when we start relying
on the zeroing behaviour for hardware that supports it, but so far
we don't.  And given that out of two TRIM capable devices I have one
does not reliably zero the trimmed regions I'm not sure it's a good
idea to rely on that yet, either.

> libata-scsi.c: ata_scsi_write_same_xlat
>                   ata_set_lba_range_entries
> It's was not obvious from the first glance. But it is the way how it
> works for now. But seems what we still optimize things a bit
> 1) alloc page with GFP_HIGHUSER (because x86 arch still used)

I'm not sure it's worth over-optimization this.

> 2) Share page between eight bios.

If we introduce the common completion handler for a batch of bios as
your patch does we can do that.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: implement compatible DISCARD support
  2010-02-11 12:21 ` [PATCH 1/4] block: implement compatible DISCARD support Christoph Hellwig
  2010-02-11 12:59   ` Dmitry Monakhov
@ 2010-02-11 14:40   ` Martin K. Petersen
  1 sibling, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2010-02-11 14:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dmitry Monakhov, linux-kernel, jens.axboe

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
>> 5) It use allocated_page instead of ZERO_PAGE.

Christoph> That's incorrect - both the scsi WRITE SAME and ATA UNMAP
Christoph> implementations write to the payload.  I have some plans to
Christoph> change that an get rid of the payload entirely, but I need to
Christoph> get back to the discard work and look at it in more detail.

I've been away for a couple of weeks (got back yesterday).  Just a heads
up that I've been working on block layer support for WRITE SAME as well
as a discard revamp the last little while.  I'll try to post those
patches today or tomorrow.

Just to give you an idea: I distinguish between discarding, clearing,
and filling a block range.

blkdev_issue_clear() is used for explicitly zeroing a region.  If the
target supports RZAT/TPRZ, discard will be used instead of zero filling.

blkdev_issue_discard() is for trimming without caring about what's left
in the blocks.  And blkdev_issue_fill() can be used to do a write same
pass on a range (for bulk initializing RAID blocks, for instance).

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-02-11 15:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-11 10:53 [PATCH 1/4] block: implement compatible DISCARD support Dmitry Monakhov
2010-02-11 10:57 ` [PATCH 2/4] block: support compat discard mode by default Dmitry Monakhov
2010-02-11 11:25   ` Dmitry Monakhov
2010-02-11 12:22   ` Christoph Hellwig
2010-02-11 10:59 ` [PATCH 3/4] ext4: convert extent zeroout to generic function Dmitry Monakhov
2010-02-11 10:59   ` Dmitry Monakhov
2010-02-11 11:01 ` [PATCH 4/4] btrfs: add discard_compat support Dmitry Monakhov
2010-02-11 11:15   ` Dmitry Monakhov
2010-02-11 12:21 ` [PATCH 1/4] block: implement compatible DISCARD support Christoph Hellwig
2010-02-11 12:59   ` Dmitry Monakhov
2010-02-11 13:09     ` Christoph Hellwig
2010-02-11 13:45       ` Dmitry Monakhov
2010-02-11 14:06         ` Christoph Hellwig
2010-02-11 14:40   ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.