All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2] Support for write stream IDs
@ 2015-03-25 15:07 Jens Axboe
  2015-03-25 15:07 ` [PATCH 1/7] block: add support for carrying a stream ID in a bio Jens Axboe
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david

One of the things that exacerbates write amplification on flash
based devices is that fact that data with different lifetimes get
grouped together on media. Currently we have no interface that
applications can use to separate different types of writes. This
patch set adds support for that.

The kernel has no knowledge of what stream ID is what. The idea is
that writes with identical stream IDs have similar life times, not
that stream ID 'X' has a shorter lifetime than stream ID 'X+1'.

There are basically two interfaces that could be used for this. One
is fcntl, the other is fadvise. This patchset uses fadvise, with a
new POSIX_FADV_STREAMID hint. The 'offset' field is used to pass
the relevant stream ID. Switching to fcntl (with a SET/GET_STREAMID)
would be trivial.

The patchset wires up the block parts, adds buffered and O_DIRECT
support, and modifies btrfs/xfs too. It should be trivial to extend
this to all other file systems, I just used xfs and btrfs for testing.

No block drivers are wired up yet. Patches are against current -git.

Changes since v1:

- Don't add streamid to struct writeback_control, always use the one
  in the inode for buffered writeback
- Add file_streamid() helper. Returns file streamid, if set, otherwise
  it returns the inode streamid.
- Update btrfs/xfs for wbc->streamid change.
- Add streamid to ext4.
- Bump stream bits from 3 to 8, 255 streams are now supported.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] block: add support for carrying a stream ID in a bio
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-04-09 22:46   ` Andreas Dilger
  2015-03-25 15:07 ` [PATCH 2/7] Add support for per-file stream ID Jens Axboe
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

The top bits of bio->bi_flags are reserved for keeping the
allocation pool, set aside the next eight bits for carrying
a stream ID. That leaves us with support for 255 streams,
0 is reserved as a "stream not set" value.

Add helpers for setting/getting stream ID of a bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/bio.c               |  2 ++
 block/blk-core.c          |  3 +++
 include/linux/blk_types.h | 28 +++++++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index f66a4eae16ee..1cd3d745047c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -567,6 +567,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_rw = bio_src->bi_rw;
 	bio->bi_iter = bio_src->bi_iter;
 	bio->bi_io_vec = bio_src->bi_io_vec;
+	bio_set_streamid(bio, bio_get_streamid(bio_src));
 }
 EXPORT_SYMBOL(__bio_clone_fast);
 
@@ -672,6 +673,7 @@ integrity_clone:
 		}
 	}
 
+	bio_set_streamid(bio, bio_get_streamid(bio_src));
 	return bio;
 }
 EXPORT_SYMBOL(bio_clone_bioset);
diff --git a/block/blk-core.c b/block/blk-core.c
index 794c3e7f01cf..6b7b8c95c4c3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,6 +1928,9 @@ void generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
+		if (bio_streamid_valid(bio))
+			blk_add_trace_msg(q, "StreamID=%u", bio_get_streamid(bio));
+
 		q->make_request_fn(q, bio);
 
 		bio = bio_list_pop(current->bio_list);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index c294e3e25e37..d6909007a3fe 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -138,9 +138,35 @@ struct bio {
 #define BIO_POOL_BITS		(4)
 #define BIO_POOL_NONE		((1UL << BIO_POOL_BITS) - 1)
 #define BIO_POOL_OFFSET		(BITS_PER_LONG - BIO_POOL_BITS)
-#define BIO_POOL_MASK		(1UL << BIO_POOL_OFFSET)
 #define BIO_POOL_IDX(bio)	((bio)->bi_flags >> BIO_POOL_OFFSET)
 
+/*
+ * after the pool bits, next 8 bits are for the stream id
+ */
+#define BIO_STREAM_BITS		(8)
+#define BIO_STREAM_OFFSET	(BITS_PER_LONG - 12)
+#define BIO_STREAM_MASK		((1 << BIO_STREAM_BITS) - 1)
+
+static inline unsigned long streamid_to_flags(unsigned int id)
+{
+	return (unsigned long) (id & BIO_STREAM_MASK) << BIO_STREAM_OFFSET;
+}
+
+static inline void bio_set_streamid(struct bio *bio, unsigned int id)
+{
+	bio->bi_flags |= streamid_to_flags(id);
+}
+
+static inline unsigned int bio_get_streamid(struct bio *bio)
+{
+	return (bio->bi_flags >> BIO_STREAM_OFFSET) & BIO_STREAM_MASK;
+}
+
+static inline bool bio_streamid_valid(struct bio *bio)
+{
+	return bio_get_streamid(bio) != 0;
+}
+
 #endif /* CONFIG_BLOCK */
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] Add support for per-file stream ID
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
  2015-03-25 15:07 ` [PATCH 1/7] block: add support for carrying a stream ID in a bio Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-04-09  9:30   ` Dmitry Monakhov
  2015-04-09 23:22   ` Andreas Dilger
  2015-03-25 15:07 ` [PATCH 3/7] direct-io: add support for write stream IDs Jens Axboe
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Writing on flash devices can be much more efficient, if we can
inform the device what kind of data can be grouped together. If
the device is able to group data together with similar lifetimes,
then it can be more efficient in garbage collection. This, in turn,
leads to lower write amplification, which is a win on both device
wear and performance.

Add a new fadvise hint, POSIX_FADV_STREAMID, which sets the file
and inode streamid. The file streamid is used if we have the file
available at the time of the write (O_DIRECT), we use the inode
streamid if not (buffered writeback). The fadvise hint uses the
'offset' field to specify a stream ID.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/inode.c                   |  1 +
 fs/open.c                    |  1 +
 include/linux/fs.h           | 23 +++++++++++++++++++++++
 include/uapi/linux/fadvise.h |  2 ++
 mm/fadvise.c                 | 17 +++++++++++++++++
 5 files changed, 44 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index f00b16f45507..41885322ba64 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -149,6 +149,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_blocks = 0;
 	inode->i_bytes = 0;
 	inode->i_generation = 0;
+	inode->i_streamid = 0;
 	inode->i_pipe = NULL;
 	inode->i_bdev = NULL;
 	inode->i_cdev = NULL;
diff --git a/fs/open.c b/fs/open.c
index 33f9cbf2610b..4a9b2be1a674 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -743,6 +743,7 @@ static int do_dentry_open(struct file *f,
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
+	f->f_streamid = 0;
 
 	return 0;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4d71b5e1ff2..43dde70c1d0d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -631,6 +631,7 @@ struct inode {
 	};
 
 	__u32			i_generation;
+	unsigned int		i_streamid;
 
 #ifdef CONFIG_FSNOTIFY
 	__u32			i_fsnotify_mask; /* all events this inode cares about */
@@ -640,6 +641,14 @@ struct inode {
 	void			*i_private; /* fs or device private pointer */
 };
 
+static inline unsigned int inode_streamid(struct inode *inode)
+{
+	if (inode)
+		return inode->i_streamid;
+
+	return 0;
+}
+
 static inline int inode_unhashed(struct inode *inode)
 {
 	return hlist_unhashed(&inode->i_hash);
@@ -820,6 +829,8 @@ struct file {
 	const struct cred	*f_cred;
 	struct file_ra_state	f_ra;
 
+	unsigned int		f_streamid;
+
 	u64			f_version;
 #ifdef CONFIG_SECURITY
 	void			*f_security;
@@ -842,6 +853,18 @@ struct file_handle {
 	unsigned char f_handle[0];
 };
 
+/*
+ * If the file doesn't have a stream ID set, return the inode stream ID
+ * in case that has been set.
+ */
+static inline unsigned int file_streamid(struct file *f)
+{
+	if (f->f_streamid)
+		return f->f_streamid;
+
+	return inode_streamid(f->f_inode);
+}
+
 static inline struct file *get_file(struct file *f)
 {
 	atomic_long_inc(&f->f_count);
diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
index e8e747139b9a..3dc8a1ff1422 100644
--- a/include/uapi/linux/fadvise.h
+++ b/include/uapi/linux/fadvise.h
@@ -18,4 +18,6 @@
 #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
 #endif
 
+#define POSIX_FADV_STREAMID	8 /* associate stream ID with file */
+
 #endif	/* FADVISE_H_INCLUDED */
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 4a3907cf79f8..b111a8899fb7 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -60,6 +60,7 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
 		case POSIX_FADV_WILLNEED:
 		case POSIX_FADV_NOREUSE:
 		case POSIX_FADV_DONTNEED:
+		case POSIX_FADV_STREAMID:
 			/* no bad return value, but ignore advice */
 			break;
 		default:
@@ -144,6 +145,22 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
 			}
 		}
 		break;
+	case POSIX_FADV_STREAMID:
+		/*
+		 * streamid is stored in offset... we don't limit or check
+		 * if the device supports streams, or if it does, if the
+		 * stream nr is within the limits. 1 is the lowest valid
+		 * stream id, 0 is "don't care/know".
+		 */
+		if (offset != (unsigned int) offset)
+			ret = EINVAL;
+		else {
+			f.file->f_streamid = offset;
+			spin_lock(&inode->i_lock);
+			inode->i_streamid = offset;
+			spin_unlock(&inode->i_lock);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] direct-io: add support for write stream IDs
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
  2015-03-25 15:07 ` [PATCH 1/7] block: add support for carrying a stream ID in a bio Jens Axboe
  2015-03-25 15:07 ` [PATCH 2/7] Add support for per-file stream ID Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-03-25 15:07 ` [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page() Jens Axboe
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Get the streamid from the file, if any, and set it on the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/direct-io.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b2e297..3ec57abb524f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -77,6 +77,7 @@ struct dio_submit {
 	int reap_counter;		/* rate limit reaping */
 	sector_t final_block_in_request;/* doesn't change */
 	int boundary;			/* prev block is at a boundary */
+	int streamid;			/* Write stream ID */
 	get_block_t *get_block;		/* block mapping function */
 	dio_submit_t *submit_io;	/* IO submition function */
 
@@ -372,6 +373,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
+
+	bio_set_streamid(bio, sdio->streamid);
 }
 
 /*
@@ -1205,6 +1208,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	sdio.blkbits = blkbits;
 	sdio.blkfactor = i_blkbits - blkbits;
 	sdio.block_in_file = offset >> blkbits;
+	sdio.streamid = file_streamid(iocb->ki_filp);
 
 	sdio.get_block = get_block;
 	dio->end_io = end_io;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page()
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
                   ` (2 preceding siblings ...)
  2015-03-25 15:07 ` [PATCH 3/7] direct-io: add support for write stream IDs Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-03-25 22:42   ` Ming Lin-SSI
  2015-03-25 15:07 ` [PATCH 5/7] btrfs: add support for write stream IDs Jens Axboe
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Pass on the inode stream ID to the bio allocation.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/buffer.c | 4 ++--
 fs/mpage.c  | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 20805db2c987..0220925ff26d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1774,7 +1774,7 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh(write_op, bh);
+			_submit_bh(write_op, bh, streamid_to_flags(inode_streamid(inode)));
 			nr_underway++;
 		}
 		bh = next;
@@ -1828,7 +1828,7 @@ recover:
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh(write_op, bh);
+			_submit_bh(write_op, bh, streamid_to_flags(inode_streamid(inode)));
 			nr_underway++;
 		}
 		bh = next;
diff --git a/fs/mpage.c b/fs/mpage.c
index 3e79220babac..fba13f4b981d 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -605,6 +605,7 @@ alloc_new:
 				bio_get_nr_vecs(bdev), GFP_NOFS|__GFP_HIGH);
 		if (bio == NULL)
 			goto confused;
+		bio_set_streamid(bio, inode_streamid(inode));
 	}
 
 	/*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] btrfs: add support for write stream IDs
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
                   ` (3 preceding siblings ...)
  2015-03-25 15:07 ` [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page() Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-03-25 16:00   ` Chris Mason
  2015-03-25 15:07 ` [PATCH 6/7] xfs: add support for buffered writeback stream ID Jens Axboe
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Both buffered and O_DIRECT.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/btrfs/extent_io.c | 1 +
 fs/btrfs/inode.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d688cfe5d496..2845fae054b6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2838,6 +2838,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
 	bio_add_page(bio, page, page_size, offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
+	bio_set_streamid(bio, inode_streamid(page->mapping->host));
 
 	if (bio_ret)
 		*bio_ret = bio;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d2e732d7af52..804fd6768109 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8046,6 +8046,7 @@ static void btrfs_submit_direct(int rw, struct bio *dio_bio,
 	atomic_set(&dip->pending_bios, 0);
 	btrfs_bio = btrfs_io_bio(io_bio);
 	btrfs_bio->logical = file_offset;
+	bio_set_streamid(io_bio, inode_streamid(inode));
 
 	if (write) {
 		io_bio->bi_end_io = btrfs_endio_direct_write;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] xfs: add support for buffered writeback stream ID
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
                   ` (4 preceding siblings ...)
  2015-03-25 15:07 ` [PATCH 5/7] btrfs: add support for write stream IDs Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-03-25 15:07 ` [PATCH 7/7] ext4: add support for write stream IDs Jens Axboe
  2015-03-25 16:05 ` [PATCH RFC v2] Support " Jeff Moyer
  7 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/xfs/xfs_aops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3a9b7a1b8704..ee49e57bfd53 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -377,6 +377,7 @@ xfs_submit_ioend_bio(
 	atomic_inc(&ioend->io_remaining);
 	bio->bi_private = ioend;
 	bio->bi_end_io = xfs_end_bio;
+	bio_set_streamid(bio, ioend->io_inode->i_streamid);
 	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] ext4: add support for write stream IDs
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
                   ` (5 preceding siblings ...)
  2015-03-25 15:07 ` [PATCH 6/7] xfs: add support for buffered writeback stream ID Jens Axboe
@ 2015-03-25 15:07 ` Jens Axboe
  2015-03-26 20:34   ` Ming Lin-SSI
  2015-03-25 16:05 ` [PATCH RFC v2] Support " Jeff Moyer
  7 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 15:07 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/ext4/page-io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index b24a2541a9ba..f642d0b87a32 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -390,6 +390,7 @@ submit_and_retry:
 		ret = io_submit_init_bio(io, bh);
 		if (ret)
 			return ret;
+		bio_set_streamid(io->io_bio, inode_streamid(inode));
 	}
 	ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] btrfs: add support for write stream IDs
  2015-03-25 15:07 ` [PATCH 5/7] btrfs: add support for write stream IDs Jens Axboe
@ 2015-03-25 16:00   ` Chris Mason
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Mason @ 2015-03-25 16:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On Wed, Mar 25, 2015 at 11:07 AM, Jens Axboe <axboe@fb.com> wrote:
> Both buffered and O_DIRECT.

Looks good, thanks Jens.

Acked-by: Chris Mason <clm@fb.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC v2] Support for write stream IDs
  2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
                   ` (6 preceding siblings ...)
  2015-03-25 15:07 ` [PATCH 7/7] ext4: add support for write stream IDs Jens Axboe
@ 2015-03-25 16:05 ` Jeff Moyer
  2015-03-25 16:46   ` Jens Axboe
  7 siblings, 1 reply; 23+ messages in thread
From: Jeff Moyer @ 2015-03-25 16:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-fsdevel, ming.l, david

Jens Axboe <axboe@fb.com> writes:

> One of the things that exacerbates write amplification on flash
> based devices is that fact that data with different lifetimes get
> grouped together on media. Currently we have no interface that
> applications can use to separate different types of writes. This
> patch set adds support for that.
>
> The kernel has no knowledge of what stream ID is what. The idea is
> that writes with identical stream IDs have similar life times, not
> that stream ID 'X' has a shorter lifetime than stream ID 'X+1'.

And presumably the device also has no knowledge of what stream ID is
what, right?

> There are basically two interfaces that could be used for this. One
> is fcntl, the other is fadvise. This patchset uses fadvise, with a
> new POSIX_FADV_STREAMID hint. The 'offset' field is used to pass
> the relevant stream ID. Switching to fcntl (with a SET/GET_STREAMID)
> would be trivial.
>
> The patchset wires up the block parts, adds buffered and O_DIRECT
> support, and modifies btrfs/xfs too. It should be trivial to extend
> this to all other file systems, I just used xfs and btrfs for testing.
>
> No block drivers are wired up yet. Patches are against current -git.

This proposal leaves lot to the reviewer's imagination.  Is there any
research in this area you can point to?

At a high level, are you sure you've got the right interface?  I would
think data lifetime would be tied to the file.  If that's the case, it
might be possible to not export this to userspace at all, and simply
make it work under the covers.  After all, what prevents multiple
applications from using the same stream id at the same time?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC v2] Support for write stream IDs
  2015-03-25 16:05 ` [PATCH RFC v2] Support " Jeff Moyer
@ 2015-03-25 16:46   ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 16:46 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On 03/25/2015 10:05 AM, Jeff Moyer wrote:
> Jens Axboe <axboe@fb.com> writes:
>
>> One of the things that exacerbates write amplification on flash
>> based devices is that fact that data with different lifetimes get
>> grouped together on media. Currently we have no interface that
>> applications can use to separate different types of writes. This
>> patch set adds support for that.
>>
>> The kernel has no knowledge of what stream ID is what. The idea is
>> that writes with identical stream IDs have similar life times, not
>> that stream ID 'X' has a shorter lifetime than stream ID 'X+1'.
>
> And presumably the device also has no knowledge of what stream ID is
> what, right?

Right, the point is that the device need not now. As long as it knows 
that lifetime of objects in stream ID X is similar, that's enough.

>> There are basically two interfaces that could be used for this. One
>> is fcntl, the other is fadvise. This patchset uses fadvise, with a
>> new POSIX_FADV_STREAMID hint. The 'offset' field is used to pass
>> the relevant stream ID. Switching to fcntl (with a SET/GET_STREAMID)
>> would be trivial.
>>
>> The patchset wires up the block parts, adds buffered and O_DIRECT
>> support, and modifies btrfs/xfs too. It should be trivial to extend
>> this to all other file systems, I just used xfs and btrfs for testing.
>>
>> No block drivers are wired up yet. Patches are against current -git.
>
> This proposal leaves lot to the reviewer's imagination.  Is there any
> research in this area you can point to?

Samsung had a paper for HotStorage 14 here:

https://www.usenix.org/system/files/conference/hotstorage14/hotstorage14-paper-kang.pdf

Additionally, we're internally at FB know doing our own analysis of how 
this will impact write amplification for certain workloads. Hopefully I 
should have some info there in a few weeks.

> At a high level, are you sure you've got the right interface?  I would

Not at all :-)

> think data lifetime would be tied to the file.  If that's the case, it
> might be possible to not export this to userspace at all, and simply
> make it work under the covers.  After all, what prevents multiple
> applications from using the same stream id at the same time?

Yes, it'll be tied to a file, that's also what the interface works on. 
But you need something to tell the kernel what stream a given file 
belongs to. And of course multiple files can belong to the same stream ID.

More than one application is definitely more tricky, because you'd have 
to coordinate handing out stream IDs. Given that I believe we'll have a 
fairly limited number of streams available, some applications might 
share stream IDs with others. And that's perfectly fine, assuming that 
the objects stored under the same stream ID does have similar lifetimes. 
Basically it's punting that configuration to the admin.

The current interface also doesn't have any knowledge of what streams a 
device supports. It's done that way on purpose. The stream ID is a hint. 
It'll never be worse off than writing everything under the same stream. 
And I don't want to make this topology aware so that dm etc will have to 
stack and handle these limits. I just want a simple hint that we pass down.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page()
  2015-03-25 15:07 ` [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page() Jens Axboe
@ 2015-03-25 22:42   ` Ming Lin-SSI
  2015-03-25 23:08     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lin-SSI @ 2015-03-25 22:42 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-fsdevel; +Cc: david

> -----Original Message-----
> From: Jens Axboe [mailto:axboe@fb.com]
> Sent: Wednesday, March 25, 2015 8:08 AM
> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
> Cc: Ming Lin-SSI; david@fromorbit.com; Jens Axboe
> Subject: [PATCH 4/7] Add stream ID support for buffered
> mpage/__block_write_full_page()
> 
> Pass on the inode stream ID to the bio allocation.
> 
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  fs/buffer.c | 4 ++--
>  fs/mpage.c  | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 20805db2c987..0220925ff26d 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1774,7 +1774,7 @@ static int __block_write_full_page(struct inode
> *inode, struct page *page,
>  	do {
>  		struct buffer_head *next = bh->b_this_page;
>  		if (buffer_async_write(bh)) {
> -			submit_bh(write_op, bh);
> +			_submit_bh(write_op, bh,
> streamid_to_flags(inode_streamid(inode)));
>  			nr_underway++;
>  		}
>  		bh = next;
> @@ -1828,7 +1828,7 @@ recover:
>  		struct buffer_head *next = bh->b_this_page;
>  		if (buffer_async_write(bh)) {
>  			clear_buffer_dirty(bh);
> -			submit_bh(write_op, bh);
> +			_submit_bh(write_op, bh,
> streamid_to_flags(inode_streamid(inode)));
>  			nr_underway++;
>  		}
>  		bh = next;
> diff --git a/fs/mpage.c b/fs/mpage.c
> index 3e79220babac..fba13f4b981d 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -605,6 +605,7 @@ alloc_new:
>  				bio_get_nr_vecs(bdev),
> GFP_NOFS|__GFP_HIGH);
>  		if (bio == NULL)
>  			goto confused;
> +		bio_set_streamid(bio, inode_streamid(inode));

This will not work when multiple processes write to the same raw disk.
Let's say 2 process concurrently pwrite to /dev/nvme0n1 with different stream_id.

Process 1:
fd = open("/dev/nvme0n1", ...);
posix_fadvise(fd, stream_id_1, 0, POSIX_FADV_STREAMID);
pwrite( fd, buf1, count1, offset1);

Process 2:
fd = open("/dev/nvme0n1", ...);
posix_fadvise(fd, stream_id_2, 0, POSIX_FADV_STREAMID);
pwrite(fd, buf2, count2, offset2);

One stream_id will overwrite the other one because "inode" is same.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page()
  2015-03-25 22:42   ` Ming Lin-SSI
@ 2015-03-25 23:08     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-25 23:08 UTC (permalink / raw)
  To: Ming Lin-SSI, linux-kernel, linux-fsdevel; +Cc: david

On 03/25/2015 04:42 PM, Ming Lin-SSI wrote:
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@fb.com]
>> Sent: Wednesday, March 25, 2015 8:08 AM
>> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
>> Cc: Ming Lin-SSI; david@fromorbit.com; Jens Axboe
>> Subject: [PATCH 4/7] Add stream ID support for buffered
>> mpage/__block_write_full_page()
>>
>> Pass on the inode stream ID to the bio allocation.
>>
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>> ---
>>   fs/buffer.c | 4 ++--
>>   fs/mpage.c  | 1 +
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 20805db2c987..0220925ff26d 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1774,7 +1774,7 @@ static int __block_write_full_page(struct inode
>> *inode, struct page *page,
>>   	do {
>>   		struct buffer_head *next = bh->b_this_page;
>>   		if (buffer_async_write(bh)) {
>> -			submit_bh(write_op, bh);
>> +			_submit_bh(write_op, bh,
>> streamid_to_flags(inode_streamid(inode)));
>>   			nr_underway++;
>>   		}
>>   		bh = next;
>> @@ -1828,7 +1828,7 @@ recover:
>>   		struct buffer_head *next = bh->b_this_page;
>>   		if (buffer_async_write(bh)) {
>>   			clear_buffer_dirty(bh);
>> -			submit_bh(write_op, bh);
>> +			_submit_bh(write_op, bh,
>> streamid_to_flags(inode_streamid(inode)));
>>   			nr_underway++;
>>   		}
>>   		bh = next;
>> diff --git a/fs/mpage.c b/fs/mpage.c
>> index 3e79220babac..fba13f4b981d 100644
>> --- a/fs/mpage.c
>> +++ b/fs/mpage.c
>> @@ -605,6 +605,7 @@ alloc_new:
>>   				bio_get_nr_vecs(bdev),
>> GFP_NOFS|__GFP_HIGH);
>>   		if (bio == NULL)
>>   			goto confused;
>> +		bio_set_streamid(bio, inode_streamid(inode));
>
> This will not work when multiple processes write to the same raw disk.
> Let's say 2 process concurrently pwrite to /dev/nvme0n1 with different stream_id.
>
> Process 1:
> fd = open("/dev/nvme0n1", ...);
> posix_fadvise(fd, stream_id_1, 0, POSIX_FADV_STREAMID);
> pwrite( fd, buf1, count1, offset1);
>
> Process 2:
> fd = open("/dev/nvme0n1", ...);
> posix_fadvise(fd, stream_id_2, 0, POSIX_FADV_STREAMID);
> pwrite(fd, buf2, count2, offset2);
>
> One stream_id will overwrite the other one because "inode" is same.

Well, that's how buffered writeback works... There's no file available 
at that point in time, in fact it could be long gone. So the only 
reliable part we have here is the inode.

If you want the above scenario to work, you have to use O_DIRECT. Then 
it will work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 7/7] ext4: add support for write stream IDs
  2015-03-25 15:07 ` [PATCH 7/7] ext4: add support for write stream IDs Jens Axboe
@ 2015-03-26 20:34   ` Ming Lin-SSI
  2015-03-26 20:39     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lin-SSI @ 2015-03-26 20:34 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-fsdevel; +Cc: david

> -----Original Message-----
> From: Jens Axboe [mailto:axboe@fb.com]
> Sent: Wednesday, March 25, 2015 8:08 AM
> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
> Cc: Ming Lin-SSI; david@fromorbit.com; Jens Axboe
> Subject: [PATCH 7/7] ext4: add support for write stream IDs
> 
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  fs/ext4/page-io.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index
> b24a2541a9ba..f642d0b87a32 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -390,6 +390,7 @@ submit_and_retry:
>  		ret = io_submit_init_bio(io, bh);
>  		if (ret)
>  			return ret;
> +		bio_set_streamid(io->io_bio, inode_streamid(inode));

For "writeback" and "ordered" journal mode, it should be OK.

But for "journal" journal mode, 
Step1: data written to journal
Step2: data written from journal to fs

Do we need other code to carry stream_id in step2?

Thanks,
Ming

>  	}
>  	ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size,
> bh_offset(bh));
>  	if (ret != bh->b_size)
> --
> 1.9.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] ext4: add support for write stream IDs
  2015-03-26 20:34   ` Ming Lin-SSI
@ 2015-03-26 20:39     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-03-26 20:39 UTC (permalink / raw)
  To: Ming Lin-SSI, linux-kernel, linux-fsdevel; +Cc: david

On 03/26/2015 02:34 PM, Ming Lin-SSI wrote:
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@fb.com]
>> Sent: Wednesday, March 25, 2015 8:08 AM
>> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
>> Cc: Ming Lin-SSI; david@fromorbit.com; Jens Axboe
>> Subject: [PATCH 7/7] ext4: add support for write stream IDs
>>
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>> ---
>>  fs/ext4/page-io.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index
>> b24a2541a9ba..f642d0b87a32 100644
>> --- a/fs/ext4/page-io.c
>> +++ b/fs/ext4/page-io.c
>> @@ -390,6 +390,7 @@ submit_and_retry:
>>  		ret = io_submit_init_bio(io, bh);
>>  		if (ret)
>>  			return ret;
>> +		bio_set_streamid(io->io_bio, inode_streamid(inode));
> 
> For "writeback" and "ordered" journal mode, it should be OK.
> 
> But for "journal" journal mode, 
> Step1: data written to journal
> Step2: data written from journal to fs
> 
> Do we need other code to carry stream_id in step2?

For either mode, arguably the journal should be written with its own
stream ID. But that's a subject that the filesystem guys need to tackle,
once the support is one.

For the case of maintaining stream ID when in data=journal mode and
copying the data out, I doubt that a use case that anyone would care
about. If they do, the ext4/3 guys would need to sort that out.

I'll update the commit message to reflect that, it is a bit under
explained at the moment.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Add support for per-file stream ID
  2015-03-25 15:07 ` [PATCH 2/7] Add support for per-file stream ID Jens Axboe
@ 2015-04-09  9:30   ` Dmitry Monakhov
  2015-04-09 16:28     ` Jens Axboe
  2015-04-09 23:22   ` Andreas Dilger
  1 sibling, 1 reply; 23+ messages in thread
From: Dmitry Monakhov @ 2015-04-09  9:30 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-fsdevel; +Cc: ming.l, david, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 5306 bytes --]

Jens Axboe <axboe@fb.com> writes:

One small question.
You states that all IDs are equals but can we reserve some IDs
for internal kernel purposes. For example very short lived data (files
opened with O_TEMP) and so on.

Also small nitpicking see below.
> Writing on flash devices can be much more efficient, if we can
> inform the device what kind of data can be grouped together. If
> the device is able to group data together with similar lifetimes,
> then it can be more efficient in garbage collection. This, in turn,
> leads to lower write amplification, which is a win on both device
> wear and performance.
>
> Add a new fadvise hint, POSIX_FADV_STREAMID, which sets the file
> and inode streamid. The file streamid is used if we have the file
> available at the time of the write (O_DIRECT), we use the inode
> streamid if not (buffered writeback). The fadvise hint uses the
> 'offset' field to specify a stream ID.
>
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  fs/inode.c                   |  1 +
>  fs/open.c                    |  1 +
>  include/linux/fs.h           | 23 +++++++++++++++++++++++
>  include/uapi/linux/fadvise.h |  2 ++
>  mm/fadvise.c                 | 17 +++++++++++++++++
>  5 files changed, 44 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..41885322ba64 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -149,6 +149,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	inode->i_blocks = 0;
>  	inode->i_bytes = 0;
>  	inode->i_generation = 0;
> +	inode->i_streamid = 0;
>  	inode->i_pipe = NULL;
>  	inode->i_bdev = NULL;
>  	inode->i_cdev = NULL;
> diff --git a/fs/open.c b/fs/open.c
> index 33f9cbf2610b..4a9b2be1a674 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -743,6 +743,7 @@ static int do_dentry_open(struct file *f,
>  	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
>  
>  	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
> +	f->f_streamid = 0;
>  
>  	return 0;
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b4d71b5e1ff2..43dde70c1d0d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -631,6 +631,7 @@ struct inode {
>  	};
>  
>  	__u32			i_generation;
> +	unsigned int		i_streamid;
>  
>  #ifdef CONFIG_FSNOTIFY
>  	__u32			i_fsnotify_mask; /* all events this inode cares about */
> @@ -640,6 +641,14 @@ struct inode {
>  	void			*i_private; /* fs or device private pointer */
>  };
>  
> +static inline unsigned int inode_streamid(struct inode *inode)
> +{
> +	if (inode)
> +		return inode->i_streamid;
> +
> +	return 0;
> +}
> +
>  static inline int inode_unhashed(struct inode *inode)
>  {
>  	return hlist_unhashed(&inode->i_hash);
> @@ -820,6 +829,8 @@ struct file {
>  	const struct cred	*f_cred;
>  	struct file_ra_state	f_ra;
>  
> +	unsigned int		f_streamid;
> +
>  	u64			f_version;
>  #ifdef CONFIG_SECURITY
>  	void			*f_security;
> @@ -842,6 +853,18 @@ struct file_handle {
>  	unsigned char f_handle[0];
>  };
>  
> +/*
> + * If the file doesn't have a stream ID set, return the inode stream ID
> + * in case that has been set.
> + */
> +static inline unsigned int file_streamid(struct file *f)
> +{
> +	if (f->f_streamid)
> +		return f->f_streamid;
> +
> +	return inode_streamid(f->f_inode);
> +}
> +
>  static inline struct file *get_file(struct file *f)
>  {
>  	atomic_long_inc(&f->f_count);
> diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> index e8e747139b9a..3dc8a1ff1422 100644
> --- a/include/uapi/linux/fadvise.h
> +++ b/include/uapi/linux/fadvise.h
> @@ -18,4 +18,6 @@
>  #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
>  #endif
>  
> +#define POSIX_FADV_STREAMID	8 /* associate stream ID with file */
> +
>  #endif	/* FADVISE_H_INCLUDED */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 4a3907cf79f8..b111a8899fb7 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -60,6 +60,7 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
>  		case POSIX_FADV_WILLNEED:
>  		case POSIX_FADV_NOREUSE:
>  		case POSIX_FADV_DONTNEED:
> +		case POSIX_FADV_STREAMID:
>  			/* no bad return value, but ignore advice */
>  			break;
>  		default:
> @@ -144,6 +145,22 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
>  			}
>  		}
>  		break;
> +	case POSIX_FADV_STREAMID:
> +		/*
> +		 * streamid is stored in offset... we don't limit or check
> +		 * if the device supports streams, or if it does, if the
> +		 * stream nr is within the limits. 1 is the lowest valid
> +		 * stream id, 0 is "don't care/know".
> +		 */
> +		if (offset != (unsigned int) offset)
> +			ret = EINVAL;
Shuld be negative       ret = -EINVAL;
> +		else {
> +			f.file->f_streamid = offset;
> +			spin_lock(&inode->i_lock);
> +			inode->i_streamid = offset;
> +			spin_unlock(&inode->i_lock);
> +		}
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Add support for per-file stream ID
  2015-04-09  9:30   ` Dmitry Monakhov
@ 2015-04-09 16:28     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-04-09 16:28 UTC (permalink / raw)
  To: Dmitry Monakhov, linux-kernel, linux-fsdevel; +Cc: ming.l, david

On 04/09/2015 03:30 AM, Dmitry Monakhov wrote:
> Jens Axboe <axboe@fb.com> writes:
>
> One small question.
> You states that all IDs are equals but can we reserve some IDs
> for internal kernel purposes. For example very short lived data (files
> opened with O_TEMP) and so on.

Yes, we probably should end up reserving some IDs for specific internal 
things. O_TEMP to a specific stream would make sense. Journal writes 
too, for instance.

I just preferred not wiring any of that up, as it then gets closer to 
being a policy decision. I'm mainly interested in getting this exposed 
to userspace, and seeing how the hw side develops since that will 
influence how we actually use this (number of streams, any actions 
required to manage streams, etc).

>> +	case POSIX_FADV_STREAMID:
>> +		/*
>> +		 * streamid is stored in offset... we don't limit or check
>> +		 * if the device supports streams, or if it does, if the
>> +		 * stream nr is within the limits. 1 is the lowest valid
>> +		 * stream id, 0 is "don't care/know".
>> +		 */
>> +		if (offset != (unsigned int) offset)
>> +			ret = EINVAL;
> Shuld be negative       ret = -EINVAL;

Indeed it should, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] block: add support for carrying a stream ID in a bio
  2015-03-25 15:07 ` [PATCH 1/7] block: add support for carrying a stream ID in a bio Jens Axboe
@ 2015-04-09 22:46   ` Andreas Dilger
  2015-04-18 19:53     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Dilger @ 2015-04-09 22:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On Mar 25, 2015, at 9:07 AM, Jens Axboe <axboe@fb.com> wrote:
> 
> The top bits of bio->bi_flags are reserved for keeping the
> allocation pool, set aside the next eight bits for carrying
> a stream ID. That leaves us with support for 255 streams,
> 0 is reserved as a "stream not set" value.

I understand that the stream ID is not related to specific priority
of an IO request.  However, I'm wondering how this patch series
interacts with some of the other patch series that add cache priority
hints?  Is there a danger of running out of space in the IO pipeline
for the additional cache hints if this is using 8 bits?

Some more comments inline.

> Add helpers for setting/getting stream ID of a bio.
> 
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
> block/bio.c               |  2 ++
> block/blk-core.c          |  3 +++
> include/linux/blk_types.h | 28 +++++++++++++++++++++++++++-
> 3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index f66a4eae16ee..1cd3d745047c 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -567,6 +567,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
> 	bio->bi_rw = bio_src->bi_rw;
> 	bio->bi_iter = bio_src->bi_iter;
> 	bio->bi_io_vec = bio_src->bi_io_vec;
> +	bio_set_streamid(bio, bio_get_streamid(bio_src));
> }
> EXPORT_SYMBOL(__bio_clone_fast);
> 
> @@ -672,6 +673,7 @@ integrity_clone:
> 		}
> 	}
> 
> +	bio_set_streamid(bio, bio_get_streamid(bio_src));
> 	return bio;
> }
> EXPORT_SYMBOL(bio_clone_bioset);
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 794c3e7f01cf..6b7b8c95c4c3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,6 +1928,9 @@ void generic_make_request(struct bio *bio)
> 	do {
> 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> 
> +		if (bio_streamid_valid(bio))
> +			blk_add_trace_msg(q, "StreamID=%u", bio_get_streamid(bio));
> +
> 		q->make_request_fn(q, bio);
> 
> 		bio = bio_list_pop(current->bio_list);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index c294e3e25e37..d6909007a3fe 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -138,9 +138,35 @@ struct bio {
> #define BIO_POOL_BITS		(4)
> #define BIO_POOL_NONE		((1UL << BIO_POOL_BITS) - 1)
> #define BIO_POOL_OFFSET		(BITS_PER_LONG - BIO_POOL_BITS)
> -#define BIO_POOL_MASK		(1UL << BIO_POOL_OFFSET)
> #define BIO_POOL_IDX(bio)	((bio)->bi_flags >> BIO_POOL_OFFSET)
> 
> +/*
> + * after the pool bits, next 8 bits are for the stream id
> + */
> +#define BIO_STREAM_BITS		(8)
> +#define BIO_STREAM_OFFSET	(BITS_PER_LONG - 12)

Should this really be:

#define BIO_STREAM_OFFSET	(BIO_POOL_OFFSET - BIO_STREAM_BITS)

Otherwise there is a risk of conflict if someone changes BIO_POOL_BITS.

Cheers, Andreas

> +#define BIO_STREAM_MASK		((1 << BIO_STREAM_BITS) - 1)
> +
> +static inline unsigned long streamid_to_flags(unsigned int id)
> +{
> +	return (unsigned long) (id & BIO_STREAM_MASK) << BIO_STREAM_OFFSET;
> +}
> +
> +static inline void bio_set_streamid(struct bio *bio, unsigned int id)
> +{
> +	bio->bi_flags |= streamid_to_flags(id);
> +}
> +
> +static inline unsigned int bio_get_streamid(struct bio *bio)
> +{
> +	return (bio->bi_flags >> BIO_STREAM_OFFSET) & BIO_STREAM_MASK;
> +}
> +
> +static inline bool bio_streamid_valid(struct bio *bio)
> +{
> +	return bio_get_streamid(bio) != 0;
> +}
> +
> #endif /* CONFIG_BLOCK */
> 
> /*
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Add support for per-file stream ID
  2015-03-25 15:07 ` [PATCH 2/7] Add support for per-file stream ID Jens Axboe
  2015-04-09  9:30   ` Dmitry Monakhov
@ 2015-04-09 23:22   ` Andreas Dilger
  2015-04-18 19:51     ` Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Andreas Dilger @ 2015-04-09 23:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On Mar 25, 2015, at 9:07 AM, Jens Axboe <axboe@fb.com> wrote:
> 
> Writing on flash devices can be much more efficient, if we can
> inform the device what kind of data can be grouped together. If
> the device is able to group data together with similar lifetimes,
> then it can be more efficient in garbage collection. This, in turn,
> leads to lower write amplification, which is a win on both device
> wear and performance.
> 
> Add a new fadvise hint, POSIX_FADV_STREAMID, which sets the file
> and inode streamid. The file streamid is used if we have the file
> available at the time of the write (O_DIRECT), we use the inode
> streamid if not (buffered writeback). The fadvise hint uses the
> 'offset' field to specify a stream ID.
> 
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
> fs/inode.c                   |  1 +
> fs/open.c                    |  1 +
> include/linux/fs.h           | 23 +++++++++++++++++++++++
> include/uapi/linux/fadvise.h |  2 ++
> mm/fadvise.c                 | 17 +++++++++++++++++
> 5 files changed, 44 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..41885322ba64 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -149,6 +149,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> 	inode->i_blocks = 0;
> 	inode->i_bytes = 0;
> 	inode->i_generation = 0;
> +	inode->i_streamid = 0;
> 	inode->i_pipe = NULL;
> 	inode->i_bdev = NULL;
> 	inode->i_cdev = NULL;
> diff --git a/fs/open.c b/fs/open.c
> index 33f9cbf2610b..4a9b2be1a674 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -743,6 +743,7 @@ static int do_dentry_open(struct file *f,
> 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
> 
> 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
> +	f->f_streamid = 0;
> 
> 	return 0;
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b4d71b5e1ff2..43dde70c1d0d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -631,6 +631,7 @@ struct inode {
> 	};
> 
> 	__u32			i_generation;
> +	unsigned int		i_streamid;

Since there are only 8 bits of streamid being passed from userspace,
is it possible to declare this as a char and pack it into a hole so
that it doesn't increase the inode size for a functionality that most
people won't be using?  Maybe after i_bytes?  That could be increased
to unsigned short if needed without increasing the size of the inode.

Cheers, Andreas

> #ifdef CONFIG_FSNOTIFY
> 	__u32			i_fsnotify_mask; /* all events this inode cares about */
> @@ -640,6 +641,14 @@ struct inode {
> 	void			*i_private; /* fs or device private pointer */
> };
> 
> +static inline unsigned int inode_streamid(struct inode *inode)
> +{
> +	if (inode)
> +		return inode->i_streamid;
> +
> +	return 0;
> +}
> +
> static inline int inode_unhashed(struct inode *inode)
> {
> 	return hlist_unhashed(&inode->i_hash);
> @@ -820,6 +829,8 @@ struct file {
> 	const struct cred	*f_cred;
> 	struct file_ra_state	f_ra;
> 
> +	unsigned int		f_streamid;
> 	u64			f_version;
> #ifdef CONFIG_SECURITY
> 	void			*f_security;
> @@ -842,6 +853,18 @@ struct file_handle {
> 	unsigned char f_handle[0];
> };
> 
> +/*
> + * If the file doesn't have a stream ID set, return the inode stream ID
> + * in case that has been set.
> + */
> +static inline unsigned int file_streamid(struct file *f)
> +{
> +	if (f->f_streamid)
> +		return f->f_streamid;
> +
> +	return inode_streamid(f->f_inode);
> +}
> +
> static inline struct file *get_file(struct file *f)
> {
> 	atomic_long_inc(&f->f_count);
> diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> index e8e747139b9a..3dc8a1ff1422 100644
> --- a/include/uapi/linux/fadvise.h
> +++ b/include/uapi/linux/fadvise.h
> @@ -18,4 +18,6 @@
> #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
> #endif
> 
> +#define POSIX_FADV_STREAMID	8 /* associate stream ID with file */
> +
> #endif	/* FADVISE_H_INCLUDED */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 4a3907cf79f8..b111a8899fb7 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -60,6 +60,7 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
> 		case POSIX_FADV_WILLNEED:
> 		case POSIX_FADV_NOREUSE:
> 		case POSIX_FADV_DONTNEED:
> +		case POSIX_FADV_STREAMID:
> 			/* no bad return value, but ignore advice */
> 			break;
> 		default:
> @@ -144,6 +145,22 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
> 			}
> 		}
> 		break;
> +	case POSIX_FADV_STREAMID:
> +		/*
> +		 * streamid is stored in offset... we don't limit or check
> +		 * if the device supports streams, or if it does, if the
> +		 * stream nr is within the limits. 1 is the lowest valid
> +		 * stream id, 0 is "don't care/know".
> +		 */
> +		if (offset != (unsigned int) offset)
> +			ret = EINVAL;
> +		else {
> +			f.file->f_streamid = offset;
> +			spin_lock(&inode->i_lock);
> +			inode->i_streamid = offset;
> +			spin_unlock(&inode->i_lock);
> +		}
> +		break;
> 	default:
> 		ret = -EINVAL;
> 	}
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Add support for per-file stream ID
  2015-04-09 23:22   ` Andreas Dilger
@ 2015-04-18 19:51     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-04-18 19:51 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On 04/09/2015 05:22 PM, Andreas Dilger wrote:
> On Mar 25, 2015, at 9:07 AM, Jens Axboe <axboe@fb.com> wrote:
>>
>> Writing on flash devices can be much more efficient, if we can
>> inform the device what kind of data can be grouped together. If
>> the device is able to group data together with similar lifetimes,
>> then it can be more efficient in garbage collection. This, in turn,
>> leads to lower write amplification, which is a win on both device
>> wear and performance.
>>
>> Add a new fadvise hint, POSIX_FADV_STREAMID, which sets the file
>> and inode streamid. The file streamid is used if we have the file
>> available at the time of the write (O_DIRECT), we use the inode
>> streamid if not (buffered writeback). The fadvise hint uses the
>> 'offset' field to specify a stream ID.
>>
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>> ---
>> fs/inode.c                   |  1 +
>> fs/open.c                    |  1 +
>> include/linux/fs.h           | 23 +++++++++++++++++++++++
>> include/uapi/linux/fadvise.h |  2 ++
>> mm/fadvise.c                 | 17 +++++++++++++++++
>> 5 files changed, 44 insertions(+)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index f00b16f45507..41885322ba64 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -149,6 +149,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>> 	inode->i_blocks = 0;
>> 	inode->i_bytes = 0;
>> 	inode->i_generation = 0;
>> +	inode->i_streamid = 0;
>> 	inode->i_pipe = NULL;
>> 	inode->i_bdev = NULL;
>> 	inode->i_cdev = NULL;
>> diff --git a/fs/open.c b/fs/open.c
>> index 33f9cbf2610b..4a9b2be1a674 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -743,6 +743,7 @@ static int do_dentry_open(struct file *f,
>> 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
>>
>> 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
>> +	f->f_streamid = 0;
>>
>> 	return 0;
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index b4d71b5e1ff2..43dde70c1d0d 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -631,6 +631,7 @@ struct inode {
>> 	};
>>
>> 	__u32			i_generation;
>> +	unsigned int		i_streamid;
>
> Since there are only 8 bits of streamid being passed from userspace,
> is it possible to declare this as a char and pack it into a hole so
> that it doesn't increase the inode size for a functionality that most
> people won't be using?  Maybe after i_bytes?  That could be increased
> to unsigned short if needed without increasing the size of the inode.

In the next version, I've retained the int, but ensured that they pack 
better in both struct file and struct inode. Basically fill some 
existing holes.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] block: add support for carrying a stream ID in a bio
  2015-04-09 22:46   ` Andreas Dilger
@ 2015-04-18 19:53     ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-04-18 19:53 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-kernel, linux-fsdevel, ming.l, david

On 04/09/2015 04:46 PM, Andreas Dilger wrote:
> On Mar 25, 2015, at 9:07 AM, Jens Axboe <axboe@fb.com> wrote:
>>
>> The top bits of bio->bi_flags are reserved for keeping the
>> allocation pool, set aside the next eight bits for carrying
>> a stream ID. That leaves us with support for 255 streams,
>> 0 is reserved as a "stream not set" value.
>
> I understand that the stream ID is not related to specific priority
> of an IO request.  However, I'm wondering how this patch series
> interacts with some of the other patch series that add cache priority
> hints?  Is there a danger of running out of space in the IO pipeline
> for the additional cache hints if this is using 8 bits?

That's always a risk, of course, but that goes for most features that 
need to carry more data in struct bio (or elsewhere). Otherwise we'll 
have to bite the bullet and add a new field.

>> +/*
>> + * after the pool bits, next 8 bits are for the stream id
>> + */
>> +#define BIO_STREAM_BITS		(8)
>> +#define BIO_STREAM_OFFSET	(BITS_PER_LONG - 12)
>
> Should this really be:
>
> #define BIO_STREAM_OFFSET	(BIO_POOL_OFFSET - BIO_STREAM_BITS)
>
> Otherwise there is a risk of conflict if someone changes BIO_POOL_BITS.

Good point, that would be cleaner. I'll make that change.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/7] ext4: add support for write stream IDs
  2015-05-05 20:02 [PATCH v2] Support " Jens Axboe
@ 2015-05-05 20:03 ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-05-05 20:03 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: adilger, david, Jens Axboe, Theodore Ts'o

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/ext4/page-io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 5765f88b3904..980c5a937812 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -406,6 +406,7 @@ submit_and_retry:
 		ret = io_submit_init_bio(io, bh);
 		if (ret)
 			return ret;
+		bio_set_streamid(io->io_bio, inode_streamid(inode));
 	}
 	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
-- 
2.4.0.rc2.1.g3d6bc9a


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] ext4: add support for write stream IDs
  2015-04-18 20:03 [PATCH " Jens Axboe
@ 2015-04-18 20:03 ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2015-04-18 20:03 UTC (permalink / raw)
  To: axboe, linux-kernel, linux-fsdevel; +Cc: ming.l, adilger, david, Jens Axboe

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/ext4/page-io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 464984261e69..392a82925d5f 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -389,6 +389,7 @@ submit_and_retry:
 		ret = io_submit_init_bio(io, bh);
 		if (ret)
 			return ret;
+		bio_set_streamid(io->io_bio, inode_streamid(inode));
 	}
 	ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
-- 
2.4.0.rc2.1.g3d6bc9a


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-05-05 20:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25 15:07 [PATCH RFC v2] Support for write stream IDs Jens Axboe
2015-03-25 15:07 ` [PATCH 1/7] block: add support for carrying a stream ID in a bio Jens Axboe
2015-04-09 22:46   ` Andreas Dilger
2015-04-18 19:53     ` Jens Axboe
2015-03-25 15:07 ` [PATCH 2/7] Add support for per-file stream ID Jens Axboe
2015-04-09  9:30   ` Dmitry Monakhov
2015-04-09 16:28     ` Jens Axboe
2015-04-09 23:22   ` Andreas Dilger
2015-04-18 19:51     ` Jens Axboe
2015-03-25 15:07 ` [PATCH 3/7] direct-io: add support for write stream IDs Jens Axboe
2015-03-25 15:07 ` [PATCH 4/7] Add stream ID support for buffered mpage/__block_write_full_page() Jens Axboe
2015-03-25 22:42   ` Ming Lin-SSI
2015-03-25 23:08     ` Jens Axboe
2015-03-25 15:07 ` [PATCH 5/7] btrfs: add support for write stream IDs Jens Axboe
2015-03-25 16:00   ` Chris Mason
2015-03-25 15:07 ` [PATCH 6/7] xfs: add support for buffered writeback stream ID Jens Axboe
2015-03-25 15:07 ` [PATCH 7/7] ext4: add support for write stream IDs Jens Axboe
2015-03-26 20:34   ` Ming Lin-SSI
2015-03-26 20:39     ` Jens Axboe
2015-03-25 16:05 ` [PATCH RFC v2] Support " Jeff Moyer
2015-03-25 16:46   ` Jens Axboe
2015-04-18 20:03 [PATCH " Jens Axboe
2015-04-18 20:03 ` [PATCH 7/7] ext4: add support " Jens Axboe
2015-05-05 20:02 [PATCH v2] Support " Jens Axboe
2015-05-05 20:03 ` [PATCH 7/7] ext4: add support " Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.