All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] blk-stat: Add ability to not bucket IO, add this to IO poling.
@ 2017-03-26  2:18 ` sbates
  0 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-nvme, Damien.LeMoal, osandov, sbates

From: Stephen Bates <sbates@raithlin.com>

Omar recently developed some patches for block layer stats that use
callbacks to determine which bucket an IO should be considered for. At
the same time there was discussion at LSF/MM that we might not want to
consider all IO when generating stats for certain algorithms (e.g. IO
completion polling).

This set does two things. It makes the bucket callback for stats
signed so we can now ignore IO that cause a negative to be returned
from the bucket function. It then uses this new functionality to
filter IO for the IO completion polling algorithms.

This patchset applies cleanly on a83b576c9c25cf (block: fix stacked
driver stats init and free) in Jens' for-next tree.

I've lightly tested this using QEMU and a real NVMe low-latency
device. I do not have performance number yet. Feedback would be
appreciated!

[BTW this is my first submission for an all new setup so apologies if
this does not come through correctly!]

Cc: Damien.LeMoal@wdc.com
Cc: osandov@osandov.com

Stephen Bates (2):
  blk-stat: convert blk-stat bucket callback to signed
  blk-stat: add a poll_size value to the request_queue struct

 block/blk-mq.c         | 17 +++++++++++++++--
 block/blk-stat.c       |  8 +++++---
 block/blk-stat.h       |  9 +++++----
 block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 5 files changed, 56 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/2] blk-stat: Add ability to not bucket IO, add this to IO poling.
@ 2017-03-26  2:18 ` sbates
  0 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)


From: Stephen Bates <sbates@raithlin.com>

Omar recently developed some patches for block layer stats that use
callbacks to determine which bucket an IO should be considered for. At
the same time there was discussion at LSF/MM that we might not want to
consider all IO when generating stats for certain algorithms (e.g. IO
completion polling).

This set does two things. It makes the bucket callback for stats
signed so we can now ignore IO that cause a negative to be returned
from the bucket function. It then uses this new functionality to
filter IO for the IO completion polling algorithms.

This patchset applies cleanly on a83b576c9c25cf (block: fix stacked
driver stats init and free) in Jens' for-next tree.

I've lightly tested this using QEMU and a real NVMe low-latency
device. I do not have performance number yet. Feedback would be
appreciated!

[BTW this is my first submission for an all new setup so apologies if
this does not come through correctly!]

Cc: Damien.LeMoal at wdc.com
Cc: osandov at osandov.com

Stephen Bates (2):
  blk-stat: convert blk-stat bucket callback to signed
  blk-stat: add a poll_size value to the request_queue struct

 block/blk-mq.c         | 17 +++++++++++++++--
 block/blk-stat.c       |  8 +++++---
 block/blk-stat.h       |  9 +++++----
 block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 5 files changed, 56 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
  2017-03-26  2:18 ` sbates
@ 2017-03-26  2:18   ` sbates
  -1 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-nvme, Damien.LeMoal, osandov, sbates

From: Stephen Bates <sbates@raithlin.com>

In order to allow for filtering of IO based on some other properties
of the request than direction we allow the bucket function to return
an int.

If the bucket callback returns a negative do no count it in the stats
accumulation.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
---
 block/blk-stat.c | 8 +++++---
 block/blk-stat.h | 9 +++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-stat.c b/block/blk-stat.c
index 188b535..936adfb 100644
--- a/block/blk-stat.c
+++ b/block/blk-stat.c
@@ -17,9 +17,9 @@ struct blk_queue_stats {
 	spinlock_t lock;
 };
 
-unsigned int blk_stat_rq_ddir(const struct request *rq)
+int blk_stat_rq_ddir(const struct request *rq)
 {
-	return rq_data_dir(rq);
+	return (int)rq_data_dir(rq);
 }
 EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
 
@@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
 	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
 		if (blk_stat_is_active(cb)) {
 			bucket = cb->bucket_fn(rq);
+			if (bucket < 0)
+				continue;
 			stat = &this_cpu_ptr(cb->cpu_stat)[bucket];
 			__blk_stat_add(stat, value);
 		}
@@ -131,7 +133,7 @@ static void blk_stat_timer_fn(unsigned long data)
 
 struct blk_stat_callback *
 blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
-			unsigned int (*bucket_fn)(const struct request *),
+			int (*bucket_fn)(const struct request *),
 			unsigned int buckets, void *data)
 {
 	struct blk_stat_callback *cb;
diff --git a/block/blk-stat.h b/block/blk-stat.h
index 6ad5b8c..7417805 100644
--- a/block/blk-stat.h
+++ b/block/blk-stat.h
@@ -41,9 +41,10 @@ struct blk_stat_callback {
 
 	/**
 	 * @bucket_fn: Given a request, returns which statistics bucket it
-	 * should be accounted under.
+	 * should be accounted under. Return -1 for no bucket for this
+	 * request.
 	 */
-	unsigned int (*bucket_fn)(const struct request *);
+	int (*bucket_fn)(const struct request *);
 
 	/**
 	 * @buckets: Number of statistics buckets.
@@ -98,7 +99,7 @@ static inline u64 blk_stat_time(struct blk_issue_stat *stat)
  *
  * Return: Data direction of the request, either READ or WRITE.
  */
-unsigned int blk_stat_rq_ddir(const struct request *rq);
+int blk_stat_rq_ddir(const struct request *rq);
 
 /**
  * blk_stat_alloc_callback() - Allocate a block statistics callback.
@@ -113,7 +114,7 @@ unsigned int blk_stat_rq_ddir(const struct request *rq);
  */
 struct blk_stat_callback *
 blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
-			unsigned int (*bucket_fn)(const struct request *),
+			int (*bucket_fn)(const struct request *),
 			unsigned int buckets, void *data);
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
@ 2017-03-26  2:18   ` sbates
  0 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)


From: Stephen Bates <sbates@raithlin.com>

In order to allow for filtering of IO based on some other properties
of the request than direction we allow the bucket function to return
an int.

If the bucket callback returns a negative do no count it in the stats
accumulation.

Signed-off-by: Stephen Bates <sbates at raithlin.com>
---
 block/blk-stat.c | 8 +++++---
 block/blk-stat.h | 9 +++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-stat.c b/block/blk-stat.c
index 188b535..936adfb 100644
--- a/block/blk-stat.c
+++ b/block/blk-stat.c
@@ -17,9 +17,9 @@ struct blk_queue_stats {
 	spinlock_t lock;
 };
 
-unsigned int blk_stat_rq_ddir(const struct request *rq)
+int blk_stat_rq_ddir(const struct request *rq)
 {
-	return rq_data_dir(rq);
+	return (int)rq_data_dir(rq);
 }
 EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
 
@@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
 	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
 		if (blk_stat_is_active(cb)) {
 			bucket = cb->bucket_fn(rq);
+			if (bucket < 0)
+				continue;
 			stat = &this_cpu_ptr(cb->cpu_stat)[bucket];
 			__blk_stat_add(stat, value);
 		}
@@ -131,7 +133,7 @@ static void blk_stat_timer_fn(unsigned long data)
 
 struct blk_stat_callback *
 blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
-			unsigned int (*bucket_fn)(const struct request *),
+			int (*bucket_fn)(const struct request *),
 			unsigned int buckets, void *data)
 {
 	struct blk_stat_callback *cb;
diff --git a/block/blk-stat.h b/block/blk-stat.h
index 6ad5b8c..7417805 100644
--- a/block/blk-stat.h
+++ b/block/blk-stat.h
@@ -41,9 +41,10 @@ struct blk_stat_callback {
 
 	/**
 	 * @bucket_fn: Given a request, returns which statistics bucket it
-	 * should be accounted under.
+	 * should be accounted under. Return -1 for no bucket for this
+	 * request.
 	 */
-	unsigned int (*bucket_fn)(const struct request *);
+	int (*bucket_fn)(const struct request *);
 
 	/**
 	 * @buckets: Number of statistics buckets.
@@ -98,7 +99,7 @@ static inline u64 blk_stat_time(struct blk_issue_stat *stat)
  *
  * Return: Data direction of the request, either READ or WRITE.
  */
-unsigned int blk_stat_rq_ddir(const struct request *rq);
+int blk_stat_rq_ddir(const struct request *rq);
 
 /**
  * blk_stat_alloc_callback() - Allocate a block statistics callback.
@@ -113,7 +114,7 @@ unsigned int blk_stat_rq_ddir(const struct request *rq);
  */
 struct blk_stat_callback *
 blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
-			unsigned int (*bucket_fn)(const struct request *),
+			int (*bucket_fn)(const struct request *),
 			unsigned int buckets, void *data);
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-26  2:18 ` sbates
@ 2017-03-26  2:18   ` sbates
  -1 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-nvme, Damien.LeMoal, osandov, sbates

From: Stephen Bates <sbates@raithlin.com>

In order to bucket IO for the polling algorithm we use a sysfs entry
to set the filter value. It is signed and we will use that as follows:

 0   : No filtering. All IO are considered in stat generation
 > 0 : Filtering based on IO of exactly this size only.
 < 0 : Filtering based on IO less than or equal to -1 time this value.

Use this member to implement a new bucket function for the io polling
stats.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
---
 block/blk-mq.c         | 17 +++++++++++++++--
 block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ff66f2..5ea9481 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2327,6 +2327,18 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	blk_mq_sysfs_register(q);
 }
 
+int blk_mq_poll_stats_bucket(const struct request *rq)
+{
+	int val = rq->q->poll_size;
+
+	if ((val == 0) ||
+	    (val > 0 && blk_rq_bytes(rq) == val) ||
+	    (val < 0 && blk_rq_bytes(rq) <= -val))
+		return (int)rq_data_dir(rq);
+
+	return -1;
+}
+
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q)
 {
@@ -2338,7 +2350,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		goto err_exit;
 
 	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
-					     blk_stat_rq_ddir, 2, q);
+					     blk_mq_poll_stats_bucket, 2, q);
 	if (!q->poll_cb)
 		goto err_exit;
 
@@ -2387,9 +2399,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->nr_requests = set->queue_depth;
 
 	/*
-	 * Default to classic polling
+	 * Default to classic polling. Default to considering all IO.
 	 */
 	q->poll_nsec = -1;
+	q->poll_size = 0;
 
 	if (set->ops->complete)
 		blk_queue_softirq_done(q, set->ops->complete);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fa831cb..000d5db 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -394,6 +394,29 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 	return count;
 }
 
+static ssize_t queue_poll_size_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%d\n", q->poll_size);
+}
+
+static ssize_t queue_poll_size_store(struct request_queue *q, const char *page,
+				     size_t count)
+{
+	int err, val;
+
+	if (!q->mq_ops || !q->mq_ops->poll)
+		return -EINVAL;
+
+	err = kstrtoint(page, 10, &val);
+	if (err < 0)
+		return err;
+
+	q->poll_size = val;
+
+	return count;
+}
+
+
 static ssize_t queue_poll_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
@@ -654,6 +677,12 @@ static struct queue_sysfs_entry queue_poll_entry = {
 	.store = queue_poll_store,
 };
 
+static struct queue_sysfs_entry queue_poll_size_entry = {
+	.attr = {.name = "io_poll_size", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_poll_size_show,
+	.store = queue_poll_size_store,
+};
+
 static struct queue_sysfs_entry queue_poll_delay_entry = {
 	.attr = {.name = "io_poll_delay", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_poll_delay_show,
@@ -710,6 +739,7 @@ static struct attribute *default_attrs[] = {
 	&queue_dax_entry.attr,
 	&queue_wb_lat_entry.attr,
 	&queue_poll_delay_entry.attr,
+	&queue_poll_size_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1a7dc42..7ff5388 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -517,6 +517,7 @@ struct request_queue {
 
 	unsigned int		rq_timeout;
 	int			poll_nsec;
+	int			poll_size;
 
 	struct blk_stat_callback	*poll_cb;
 	struct blk_rq_stat	poll_stat[2];
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-26  2:18   ` sbates
  0 siblings, 0 replies; 25+ messages in thread
From: sbates @ 2017-03-26  2:18 UTC (permalink / raw)


From: Stephen Bates <sbates@raithlin.com>

In order to bucket IO for the polling algorithm we use a sysfs entry
to set the filter value. It is signed and we will use that as follows:

 0   : No filtering. All IO are considered in stat generation
 > 0 : Filtering based on IO of exactly this size only.
 < 0 : Filtering based on IO less than or equal to -1 time this value.

Use this member to implement a new bucket function for the io polling
stats.

Signed-off-by: Stephen Bates <sbates at raithlin.com>
---
 block/blk-mq.c         | 17 +++++++++++++++--
 block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ff66f2..5ea9481 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2327,6 +2327,18 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	blk_mq_sysfs_register(q);
 }
 
+int blk_mq_poll_stats_bucket(const struct request *rq)
+{
+	int val = rq->q->poll_size;
+
+	if ((val == 0) ||
+	    (val > 0 && blk_rq_bytes(rq) == val) ||
+	    (val < 0 && blk_rq_bytes(rq) <= -val))
+		return (int)rq_data_dir(rq);
+
+	return -1;
+}
+
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q)
 {
@@ -2338,7 +2350,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		goto err_exit;
 
 	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
-					     blk_stat_rq_ddir, 2, q);
+					     blk_mq_poll_stats_bucket, 2, q);
 	if (!q->poll_cb)
 		goto err_exit;
 
@@ -2387,9 +2399,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->nr_requests = set->queue_depth;
 
 	/*
-	 * Default to classic polling
+	 * Default to classic polling. Default to considering all IO.
 	 */
 	q->poll_nsec = -1;
+	q->poll_size = 0;
 
 	if (set->ops->complete)
 		blk_queue_softirq_done(q, set->ops->complete);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fa831cb..000d5db 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -394,6 +394,29 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 	return count;
 }
 
+static ssize_t queue_poll_size_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%d\n", q->poll_size);
+}
+
+static ssize_t queue_poll_size_store(struct request_queue *q, const char *page,
+				     size_t count)
+{
+	int err, val;
+
+	if (!q->mq_ops || !q->mq_ops->poll)
+		return -EINVAL;
+
+	err = kstrtoint(page, 10, &val);
+	if (err < 0)
+		return err;
+
+	q->poll_size = val;
+
+	return count;
+}
+
+
 static ssize_t queue_poll_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
@@ -654,6 +677,12 @@ static struct queue_sysfs_entry queue_poll_entry = {
 	.store = queue_poll_store,
 };
 
+static struct queue_sysfs_entry queue_poll_size_entry = {
+	.attr = {.name = "io_poll_size", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_poll_size_show,
+	.store = queue_poll_size_store,
+};
+
 static struct queue_sysfs_entry queue_poll_delay_entry = {
 	.attr = {.name = "io_poll_delay", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_poll_delay_show,
@@ -710,6 +739,7 @@ static struct attribute *default_attrs[] = {
 	&queue_dax_entry.attr,
 	&queue_wb_lat_entry.attr,
 	&queue_poll_delay_entry.attr,
+	&queue_poll_size_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1a7dc42..7ff5388 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -517,6 +517,7 @@ struct request_queue {
 
 	unsigned int		rq_timeout;
 	int			poll_nsec;
+	int			poll_size;
 
 	struct blk_stat_callback	*poll_cb;
 	struct blk_rq_stat	poll_stat[2];
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
  2017-03-26  2:18   ` sbates
@ 2017-03-26  2:49     ` Omar Sandoval
  -1 siblings, 0 replies; 25+ messages in thread
From: Omar Sandoval @ 2017-03-26  2:49 UTC (permalink / raw)
  To: sbates; +Cc: axboe, linux-block, linux-nvme, Damien.LeMoal

Hey, Stephen,

On Sat, Mar 25, 2017 at 08:18:11PM -0600, sbates@raithlin.com wrote:
> From: Stephen Bates <sbates@raithlin.com>
> 
> In order to allow for filtering of IO based on some other properties
> of the request than direction we allow the bucket function to return
> an int.
> 
> If the bucket callback returns a negative do no count it in the stats
> accumulation.

This is great, I had it this way at first but didn't know if anyone
would want to omit stats in this way. A couple of comments below.

> Signed-off-by: Stephen Bates <sbates@raithlin.com>
> ---
>  block/blk-stat.c | 8 +++++---
>  block/blk-stat.h | 9 +++++----
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-stat.c b/block/blk-stat.c
> index 188b535..936adfb 100644
> --- a/block/blk-stat.c
> +++ b/block/blk-stat.c
> @@ -17,9 +17,9 @@ struct blk_queue_stats {
>  	spinlock_t lock;
>  };
>  
> -unsigned int blk_stat_rq_ddir(const struct request *rq)
> +int blk_stat_rq_ddir(const struct request *rq)
>  {
> -	return rq_data_dir(rq);
> +	return (int)rq_data_dir(rq);

The cast here here isn't necessary, let's leave it off.

>  }
>  EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
>  
> @@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
>  	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
>  		if (blk_stat_is_active(cb)) {
>  			bucket = cb->bucket_fn(rq);
> +			if (bucket < 0)
> +				continue;

You also need to change the declaration of bucket to int above.

>  			stat = &this_cpu_ptr(cb->cpu_stat)[bucket];
>  			__blk_stat_add(stat, value);
>  		}
> @@ -131,7 +133,7 @@ static void blk_stat_timer_fn(unsigned long data)
>  
>  struct blk_stat_callback *
>  blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
> -			unsigned int (*bucket_fn)(const struct request *),
> +			int (*bucket_fn)(const struct request *),
>  			unsigned int buckets, void *data)
>  {
>  	struct blk_stat_callback *cb;
> diff --git a/block/blk-stat.h b/block/blk-stat.h
> index 6ad5b8c..7417805 100644
> --- a/block/blk-stat.h
> +++ b/block/blk-stat.h
> @@ -41,9 +41,10 @@ struct blk_stat_callback {
>  
>  	/**
>  	 * @bucket_fn: Given a request, returns which statistics bucket it
> -	 * should be accounted under.
> +	 * should be accounted under. Return -1 for no bucket for this
> +	 * request.
>  	 */
> -	unsigned int (*bucket_fn)(const struct request *);
> +	int (*bucket_fn)(const struct request *);
>  
>  	/**
>  	 * @buckets: Number of statistics buckets.
> @@ -98,7 +99,7 @@ static inline u64 blk_stat_time(struct blk_issue_stat *stat)
>   *
>   * Return: Data direction of the request, either READ or WRITE.
>   */
> -unsigned int blk_stat_rq_ddir(const struct request *rq);
> +int blk_stat_rq_ddir(const struct request *rq);
>  
>  /**
>   * blk_stat_alloc_callback() - Allocate a block statistics callback.
> @@ -113,7 +114,7 @@ unsigned int blk_stat_rq_ddir(const struct request *rq);
>   */
>  struct blk_stat_callback *
>  blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
> -			unsigned int (*bucket_fn)(const struct request *),
> +			int (*bucket_fn)(const struct request *),
>  			unsigned int buckets, void *data);
>  
>  /**
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
@ 2017-03-26  2:49     ` Omar Sandoval
  0 siblings, 0 replies; 25+ messages in thread
From: Omar Sandoval @ 2017-03-26  2:49 UTC (permalink / raw)


Hey, Stephen,

On Sat, Mar 25, 2017@08:18:11PM -0600, sbates@raithlin.com wrote:
> From: Stephen Bates <sbates at raithlin.com>
> 
> In order to allow for filtering of IO based on some other properties
> of the request than direction we allow the bucket function to return
> an int.
> 
> If the bucket callback returns a negative do no count it in the stats
> accumulation.

This is great, I had it this way at first but didn't know if anyone
would want to omit stats in this way. A couple of comments below.

> Signed-off-by: Stephen Bates <sbates at raithlin.com>
> ---
>  block/blk-stat.c | 8 +++++---
>  block/blk-stat.h | 9 +++++----
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-stat.c b/block/blk-stat.c
> index 188b535..936adfb 100644
> --- a/block/blk-stat.c
> +++ b/block/blk-stat.c
> @@ -17,9 +17,9 @@ struct blk_queue_stats {
>  	spinlock_t lock;
>  };
>  
> -unsigned int blk_stat_rq_ddir(const struct request *rq)
> +int blk_stat_rq_ddir(const struct request *rq)
>  {
> -	return rq_data_dir(rq);
> +	return (int)rq_data_dir(rq);

The cast here here isn't necessary, let's leave it off.

>  }
>  EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
>  
> @@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
>  	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
>  		if (blk_stat_is_active(cb)) {
>  			bucket = cb->bucket_fn(rq);
> +			if (bucket < 0)
> +				continue;

You also need to change the declaration of bucket to int above.

>  			stat = &this_cpu_ptr(cb->cpu_stat)[bucket];
>  			__blk_stat_add(stat, value);
>  		}
> @@ -131,7 +133,7 @@ static void blk_stat_timer_fn(unsigned long data)
>  
>  struct blk_stat_callback *
>  blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
> -			unsigned int (*bucket_fn)(const struct request *),
> +			int (*bucket_fn)(const struct request *),
>  			unsigned int buckets, void *data)
>  {
>  	struct blk_stat_callback *cb;
> diff --git a/block/blk-stat.h b/block/blk-stat.h
> index 6ad5b8c..7417805 100644
> --- a/block/blk-stat.h
> +++ b/block/blk-stat.h
> @@ -41,9 +41,10 @@ struct blk_stat_callback {
>  
>  	/**
>  	 * @bucket_fn: Given a request, returns which statistics bucket it
> -	 * should be accounted under.
> +	 * should be accounted under. Return -1 for no bucket for this
> +	 * request.
>  	 */
> -	unsigned int (*bucket_fn)(const struct request *);
> +	int (*bucket_fn)(const struct request *);
>  
>  	/**
>  	 * @buckets: Number of statistics buckets.
> @@ -98,7 +99,7 @@ static inline u64 blk_stat_time(struct blk_issue_stat *stat)
>   *
>   * Return: Data direction of the request, either READ or WRITE.
>   */
> -unsigned int blk_stat_rq_ddir(const struct request *rq);
> +int blk_stat_rq_ddir(const struct request *rq);
>  
>  /**
>   * blk_stat_alloc_callback() - Allocate a block statistics callback.
> @@ -113,7 +114,7 @@ unsigned int blk_stat_rq_ddir(const struct request *rq);
>   */
>  struct blk_stat_callback *
>  blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
> -			unsigned int (*bucket_fn)(const struct request *),
> +			int (*bucket_fn)(const struct request *),
>  			unsigned int buckets, void *data);
>  
>  /**
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
  2017-03-26  2:49     ` Omar Sandoval
@ 2017-03-27 16:00       ` Stephen  Bates
  -1 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-27 16:00 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: axboe, linux-block, linux-nvme, Damien.LeMoal

VGhhbmtzIGZvciB0aGUgcmV2aWV3IE9tYXIhDQoNCj4+ICANCj4+IC11bnNpZ25lZCBpbnQgYmxr
X3N0YXRfcnFfZGRpcihjb25zdCBzdHJ1Y3QgcmVxdWVzdCAqcnEpDQo+PiAraW50IGJsa19zdGF0
X3JxX2RkaXIoY29uc3Qgc3RydWN0IHJlcXVlc3QgKnJxKQ0KPj4gIHsNCj4+IC0JcmV0dXJuIHJx
X2RhdGFfZGlyKHJxKTsNCj4+ICsJcmV0dXJuIChpbnQpcnFfZGF0YV9kaXIocnEpOw0KPg0KPlRo
ZSBjYXN0IGhlcmUgaGVyZSBpc24ndCBuZWNlc3NhcnksIGxldCdzIGxlYXZlIGl0IG9mZi4NCj4N
Cg0KT0ssIEkgd2lsbCBhZGQgdGhhdCBpbiB0aGUgcmVzcGluIQ0KDQo+PiAgfQ0KPj4gIEVYUE9S
VF9TWU1CT0xfR1BMKGJsa19zdGF0X3JxX2RkaXIpOw0KPj4gIA0KPj4gQEAgLTEwMCw2ICsxMDAs
OCBAQCB2b2lkIGJsa19zdGF0X2FkZChzdHJ1Y3QgcmVxdWVzdCAqcnEpDQo+PiAgCWxpc3RfZm9y
X2VhY2hfZW50cnlfcmN1KGNiLCAmcS0+c3RhdHMtPmNhbGxiYWNrcywgbGlzdCkgew0KPj4gIAkJ
aWYgKGJsa19zdGF0X2lzX2FjdGl2ZShjYikpIHsNCj4+ICAJCQlidWNrZXQgPSBjYi0+YnVja2V0
X2ZuKHJxKTsNCj4+ICsJCQlpZiAoYnVja2V0IDwgMCkNCj4+ICsJCQkJY29udGludWU7DQo+DQo+
WW91IGFsc28gbmVlZCB0byBjaGFuZ2UgdGhlIGRlY2xhcmF0aW9uIG9mIGJ1Y2tldCB0byBpbnQg
YWJvdmUuDQo+DQoNClVtbW1tLCBpdCBpcyBhbHJlYWR5IGlzIGFuIGludOKApg0KDQpTdGVwaGVu
DQoNCg0K

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
@ 2017-03-27 16:00       ` Stephen  Bates
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-27 16:00 UTC (permalink / raw)


Thanks for the review Omar!

>>  
>> -unsigned int blk_stat_rq_ddir(const struct request *rq)
>> +int blk_stat_rq_ddir(const struct request *rq)
>>  {
>> -	return rq_data_dir(rq);
>> +	return (int)rq_data_dir(rq);
>
>The cast here here isn't necessary, let's leave it off.
>

OK, I will add that in the respin!

>>  }
>>  EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
>>  
>> @@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
>>  	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
>>  		if (blk_stat_is_active(cb)) {
>>  			bucket = cb->bucket_fn(rq);
>> +			if (bucket < 0)
>> +				continue;
>
>You also need to change the declaration of bucket to int above.
>

Ummmm, it is already is an int?

Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
  2017-03-27 16:00       ` Stephen  Bates
@ 2017-03-27 16:01         ` Omar Sandoval
  -1 siblings, 0 replies; 25+ messages in thread
From: Omar Sandoval @ 2017-03-27 16:01 UTC (permalink / raw)
  To: Stephen Bates; +Cc: axboe, linux-block, linux-nvme, Damien.LeMoal

On Mon, Mar 27, 2017 at 04:00:08PM +0000, Stephen  Bates wrote:
> Thanks for the review Omar!
> 
> >>  
> >> -unsigned int blk_stat_rq_ddir(const struct request *rq)
> >> +int blk_stat_rq_ddir(const struct request *rq)
> >>  {
> >> -	return rq_data_dir(rq);
> >> +	return (int)rq_data_dir(rq);
> >
> >The cast here here isn't necessary, let's leave it off.
> >
> 
> OK, I will add that in the respin!
> 
> >>  }
> >>  EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
> >>  
> >> @@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
> >>  	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
> >>  		if (blk_stat_is_active(cb)) {
> >>  			bucket = cb->bucket_fn(rq);
> >> +			if (bucket < 0)
> >> +				continue;
> >
> >You also need to change the declaration of bucket to int above.
> >
> 
> Ummmm, it is already is an int…

Yup, I was looking at the wrong place, sorry.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed
@ 2017-03-27 16:01         ` Omar Sandoval
  0 siblings, 0 replies; 25+ messages in thread
From: Omar Sandoval @ 2017-03-27 16:01 UTC (permalink / raw)


On Mon, Mar 27, 2017@04:00:08PM +0000, Stephen  Bates wrote:
> Thanks for the review Omar!
> 
> >>  
> >> -unsigned int blk_stat_rq_ddir(const struct request *rq)
> >> +int blk_stat_rq_ddir(const struct request *rq)
> >>  {
> >> -	return rq_data_dir(rq);
> >> +	return (int)rq_data_dir(rq);
> >
> >The cast here here isn't necessary, let's leave it off.
> >
> 
> OK, I will add that in the respin!
> 
> >>  }
> >>  EXPORT_SYMBOL_GPL(blk_stat_rq_ddir);
> >>  
> >> @@ -100,6 +100,8 @@ void blk_stat_add(struct request *rq)
> >>  	list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
> >>  		if (blk_stat_is_active(cb)) {
> >>  			bucket = cb->bucket_fn(rq);
> >> +			if (bucket < 0)
> >> +				continue;
> >
> >You also need to change the declaration of bucket to int above.
> >
> 
> Ummmm, it is already is an int?

Yup, I was looking at the wrong place, sorry.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-26  2:18   ` sbates
@ 2017-03-28 10:50     ` Sagi Grimberg
  -1 siblings, 0 replies; 25+ messages in thread
From: Sagi Grimberg @ 2017-03-28 10:50 UTC (permalink / raw)
  To: sbates, axboe; +Cc: linux-block, Damien.LeMoal, osandov, linux-nvme



On 26/03/17 05:18, sbates@raithlin.com wrote:
> From: Stephen Bates <sbates@raithlin.com>
>
> In order to bucket IO for the polling algorithm we use a sysfs entry
> to set the filter value. It is signed and we will use that as follows:
>
>  0   : No filtering. All IO are considered in stat generation
>  > 0 : Filtering based on IO of exactly this size only.
>  < 0 : Filtering based on IO less than or equal to -1 time this value.

I'd say that this is a fairly non-trivial semantic meanning to this...

Is there any use for the size exact match filter? If not then
I suggest we always make it (<=) in its semantics...

> Use this member to implement a new bucket function for the io polling
> stats.
>
> Signed-off-by: Stephen Bates <sbates@raithlin.com>
> ---
>  block/blk-mq.c         | 17 +++++++++++++++--
>  block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
>  include/linux/blkdev.h |  1 +
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5ff66f2..5ea9481 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2327,6 +2327,18 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  	blk_mq_sysfs_register(q);
>  }
>
> +int blk_mq_poll_stats_bucket(const struct request *rq)
> +{
> +	int val = rq->q->poll_size;
> +
> +	if ((val == 0) ||
> +	    (val > 0 && blk_rq_bytes(rq) == val) ||
> +	    (val < 0 && blk_rq_bytes(rq) <= -val))
> +		return (int)rq_data_dir(rq);
> +
> +	return -1;
> +}
> +
>  struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  						  struct request_queue *q)
>  {
> @@ -2338,7 +2350,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		goto err_exit;
>
>  	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
> -					     blk_stat_rq_ddir, 2, q);
> +					     blk_mq_poll_stats_bucket, 2, q);
>  	if (!q->poll_cb)
>  		goto err_exit;
>
> @@ -2387,9 +2399,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	q->nr_requests = set->queue_depth;
>
>  	/*
> -	 * Default to classic polling
> +	 * Default to classic polling. Default to considering all IO.
>  	 */
>  	q->poll_nsec = -1;
> +	q->poll_size = 0;
>
>  	if (set->ops->complete)
>  		blk_queue_softirq_done(q, set->ops->complete);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index fa831cb..000d5db 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -394,6 +394,29 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
>  	return count;
>  }
>
> +static ssize_t queue_poll_size_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%d\n", q->poll_size);
> +}
> +
> +static ssize_t queue_poll_size_store(struct request_queue *q, const char *page,
> +				     size_t count)
> +{
> +	int err, val;
> +
> +	if (!q->mq_ops || !q->mq_ops->poll)
> +		return -EINVAL;
> +
> +	err = kstrtoint(page, 10, &val);
> +	if (err < 0)
> +		return err;
> +
> +	q->poll_size = val;
> +
> +	return count;
> +}
> +
> +
>  static ssize_t queue_poll_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
> @@ -654,6 +677,12 @@ static struct queue_sysfs_entry queue_poll_entry = {
>  	.store = queue_poll_store,
>  };
>
> +static struct queue_sysfs_entry queue_poll_size_entry = {
> +	.attr = {.name = "io_poll_size", .mode = S_IRUGO | S_IWUSR },
> +	.show = queue_poll_size_show,
> +	.store = queue_poll_size_store,
> +};
> +
>  static struct queue_sysfs_entry queue_poll_delay_entry = {
>  	.attr = {.name = "io_poll_delay", .mode = S_IRUGO | S_IWUSR },
>  	.show = queue_poll_delay_show,
> @@ -710,6 +739,7 @@ static struct attribute *default_attrs[] = {
>  	&queue_dax_entry.attr,
>  	&queue_wb_lat_entry.attr,
>  	&queue_poll_delay_entry.attr,
> +	&queue_poll_size_entry.attr,
>  	NULL,
>  };
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1a7dc42..7ff5388 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -517,6 +517,7 @@ struct request_queue {
>
>  	unsigned int		rq_timeout;
>  	int			poll_nsec;
> +	int			poll_size;
>
>  	struct blk_stat_callback	*poll_cb;
>  	struct blk_rq_stat	poll_stat[2];
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 10:50     ` Sagi Grimberg
  0 siblings, 0 replies; 25+ messages in thread
From: Sagi Grimberg @ 2017-03-28 10:50 UTC (permalink / raw)




On 26/03/17 05:18, sbates@raithlin.com wrote:
> From: Stephen Bates <sbates at raithlin.com>
>
> In order to bucket IO for the polling algorithm we use a sysfs entry
> to set the filter value. It is signed and we will use that as follows:
>
>  0   : No filtering. All IO are considered in stat generation
>  > 0 : Filtering based on IO of exactly this size only.
>  < 0 : Filtering based on IO less than or equal to -1 time this value.

I'd say that this is a fairly non-trivial semantic meanning to this...

Is there any use for the size exact match filter? If not then
I suggest we always make it (<=) in its semantics...

> Use this member to implement a new bucket function for the io polling
> stats.
>
> Signed-off-by: Stephen Bates <sbates at raithlin.com>
> ---
>  block/blk-mq.c         | 17 +++++++++++++++--
>  block/blk-sysfs.c      | 30 ++++++++++++++++++++++++++++++
>  include/linux/blkdev.h |  1 +
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5ff66f2..5ea9481 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2327,6 +2327,18 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  	blk_mq_sysfs_register(q);
>  }
>
> +int blk_mq_poll_stats_bucket(const struct request *rq)
> +{
> +	int val = rq->q->poll_size;
> +
> +	if ((val == 0) ||
> +	    (val > 0 && blk_rq_bytes(rq) == val) ||
> +	    (val < 0 && blk_rq_bytes(rq) <= -val))
> +		return (int)rq_data_dir(rq);
> +
> +	return -1;
> +}
> +
>  struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  						  struct request_queue *q)
>  {
> @@ -2338,7 +2350,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		goto err_exit;
>
>  	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
> -					     blk_stat_rq_ddir, 2, q);
> +					     blk_mq_poll_stats_bucket, 2, q);
>  	if (!q->poll_cb)
>  		goto err_exit;
>
> @@ -2387,9 +2399,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	q->nr_requests = set->queue_depth;
>
>  	/*
> -	 * Default to classic polling
> +	 * Default to classic polling. Default to considering all IO.
>  	 */
>  	q->poll_nsec = -1;
> +	q->poll_size = 0;
>
>  	if (set->ops->complete)
>  		blk_queue_softirq_done(q, set->ops->complete);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index fa831cb..000d5db 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -394,6 +394,29 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
>  	return count;
>  }
>
> +static ssize_t queue_poll_size_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%d\n", q->poll_size);
> +}
> +
> +static ssize_t queue_poll_size_store(struct request_queue *q, const char *page,
> +				     size_t count)
> +{
> +	int err, val;
> +
> +	if (!q->mq_ops || !q->mq_ops->poll)
> +		return -EINVAL;
> +
> +	err = kstrtoint(page, 10, &val);
> +	if (err < 0)
> +		return err;
> +
> +	q->poll_size = val;
> +
> +	return count;
> +}
> +
> +
>  static ssize_t queue_poll_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
> @@ -654,6 +677,12 @@ static struct queue_sysfs_entry queue_poll_entry = {
>  	.store = queue_poll_store,
>  };
>
> +static struct queue_sysfs_entry queue_poll_size_entry = {
> +	.attr = {.name = "io_poll_size", .mode = S_IRUGO | S_IWUSR },
> +	.show = queue_poll_size_show,
> +	.store = queue_poll_size_store,
> +};
> +
>  static struct queue_sysfs_entry queue_poll_delay_entry = {
>  	.attr = {.name = "io_poll_delay", .mode = S_IRUGO | S_IWUSR },
>  	.show = queue_poll_delay_show,
> @@ -710,6 +739,7 @@ static struct attribute *default_attrs[] = {
>  	&queue_dax_entry.attr,
>  	&queue_wb_lat_entry.attr,
>  	&queue_poll_delay_entry.attr,
> +	&queue_poll_size_entry.attr,
>  	NULL,
>  };
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1a7dc42..7ff5388 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -517,6 +517,7 @@ struct request_queue {
>
>  	unsigned int		rq_timeout;
>  	int			poll_nsec;
> +	int			poll_size;
>
>  	struct blk_stat_callback	*poll_cb;
>  	struct blk_rq_stat	poll_stat[2];
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 10:50     ` Sagi Grimberg
@ 2017-03-28 19:38       ` Stephen  Bates
  -1 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 19:38 UTC (permalink / raw)
  To: Sagi Grimberg, axboe; +Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

Pj4NCj4+ICBJbiBvcmRlciB0byBidWNrZXQgSU8gZm9yIHRoZSBwb2xsaW5nIGFsZ29yaXRobSB3
ZSB1c2UgYSBzeXNmcyBlbnRyeQ0KPj4gIHRvIHNldCB0aGUgZmlsdGVyIHZhbHVlLiBJdCBpcyBz
aWduZWQgYW5kIHdlIHdpbGwgdXNlIHRoYXQgYXMgZm9sbG93czoNCj4+DQo+PiAgIDAgICA6IE5v
IGZpbHRlcmluZy4gQWxsIElPIGFyZSBjb25zaWRlcmVkIGluIHN0YXQgZ2VuZXJhdGlvbg0KPj4g
ICA+IDAgOiBGaWx0ZXJpbmcgYmFzZWQgb24gSU8gb2YgZXhhY3RseSB0aGlzIHNpemUgb25seS4N
Cj4+ICAgPCAwIDogRmlsdGVyaW5nIGJhc2VkIG9uIElPIGxlc3MgdGhhbiBvciBlcXVhbCB0byAt
MSB0aW1lIHRoaXMgdmFsdWUuPg0KPg0KPiBJJ2Qgc2F5IHRoYXQgdGhpcyBpcyBhIGZhaXJseSBu
b24tdHJpdmlhbCBzZW1hbnRpYyBtZWFubmluZyB0byB0aGlzLi4uDQo+DQo+IElzIHRoZXJlIGFu
eSB1c2UgZm9yIHRoZSBzaXplIGV4YWN0IG1hdGNoIGZpbHRlcj8gSWYgbm90IHRoZW4NCj4gSSBz
dWdnZXN0IHdlIGFsd2F5cyBtYWtlIGl0ICg8PSkgaW4gaXRzIHNlbWFudGljcy4uLg0KDQpUaGFu
a3MgZm9yIHRoZSByZXZpZXcgU2FnaS4gSeKAmWQgYmUgT0sgZ29pbmcgd2l0aCA8PTAgYXMgdGhl
IGV4YWN0IG1hdGNoIHdvdWxkIG5vcm1hbGx5IGJlIGZvciBtaW5pbWFsIElPIHNpemVzICh3aGVy
ZSA8PSBhbmQgPSBhcmUgdGhlIHNhbWUgdGhpbmcpLiBJIHdpbGwgc2VlIHdoYXQgb3RoZXIgZmVl
ZGJhY2sgSSBnZXQgYW5kIGFpbSB0byBkbyBhIHJlc3BpbiBzb29u4oCmDQoNClN0ZXBoZW4NCg0K

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 19:38       ` Stephen  Bates
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 19:38 UTC (permalink / raw)


>>
>>  In order to bucket IO for the polling algorithm we use a sysfs entry
>>  to set the filter value. It is signed and we will use that as follows:
>>
>>   0   : No filtering. All IO are considered in stat generation
>>   > 0 : Filtering based on IO of exactly this size only.
>>   < 0 : Filtering based on IO less than or equal to -1 time this value.>
>
> I'd say that this is a fairly non-trivial semantic meanning to this...
>
> Is there any use for the size exact match filter? If not then
> I suggest we always make it (<=) in its semantics...

Thanks for the review Sagi. I?d be OK going with <=0 as the exact match would normally be for minimal IO sizes (where <= and = are the same thing). I will see what other feedback I get and aim to do a respin soon?

Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 19:38       ` Stephen  Bates
@ 2017-03-28 19:46         ` Jens Axboe
  -1 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-03-28 19:46 UTC (permalink / raw)
  To: Stephen Bates, Sagi Grimberg
  Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

On 03/28/2017 01:38 PM, Stephen  Bates wrote:
>>>
>>>  In order to bucket IO for the polling algorithm we use a sysfs entry
>>>  to set the filter value. It is signed and we will use that as follows:
>>>
>>>   0   : No filtering. All IO are considered in stat generation
>>>   > 0 : Filtering based on IO of exactly this size only.
>>>   < 0 : Filtering based on IO less than or equal to -1 time this value.>
>>
>> I'd say that this is a fairly non-trivial semantic meanning to this...
>>
>> Is there any use for the size exact match filter? If not then
>> I suggest we always make it (<=) in its semantics...
> 
> Thanks for the review Sagi. I�d be OK going with <=0 as the exact
> match would normally be for minimal IO sizes (where <= and = are the
> same thing). I will see what other feedback I get and aim to do a
> respin soon�

No tunables for this, please. There's absolutely no reason why we should
need it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 19:46         ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-03-28 19:46 UTC (permalink / raw)


On 03/28/2017 01:38 PM, Stephen  Bates wrote:
>>>
>>>  In order to bucket IO for the polling algorithm we use a sysfs entry
>>>  to set the filter value. It is signed and we will use that as follows:
>>>
>>>   0   : No filtering. All IO are considered in stat generation
>>>   > 0 : Filtering based on IO of exactly this size only.
>>>   < 0 : Filtering based on IO less than or equal to -1 time this value.>
>>
>> I'd say that this is a fairly non-trivial semantic meanning to this...
>>
>> Is there any use for the size exact match filter? If not then
>> I suggest we always make it (<=) in its semantics...
> 
> Thanks for the review Sagi. I?d be OK going with <=0 as the exact
> match would normally be for minimal IO sizes (where <= and = are the
> same thing). I will see what other feedback I get and aim to do a
> respin soon?

No tunables for this, please. There's absolutely no reason why we should
need it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 19:46         ` Jens Axboe
@ 2017-03-28 19:58           ` Stephen  Bates
  -1 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 19:58 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg; +Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

Pj4gDQo+PiBUaGFua3MgZm9yIHRoZSByZXZpZXcgU2FnaS4gSeKAmWQgYmUgT0sgZ29pbmcgd2l0
aCA8PTAgYXMgdGhlIGV4YWN0DQo+PiBtYXRjaCB3b3VsZCBub3JtYWxseSBiZSBmb3IgbWluaW1h
bCBJTyBzaXplcyAod2hlcmUgPD0gYW5kID0gYXJlIHRoZQ0KPj4gc2FtZSB0aGluZykuIEkgd2ls
bCBzZWUgd2hhdCBvdGhlciBmZWVkYmFjayBJIGdldCBhbmQgYWltIHRvIGRvIGENCj4+IHJlc3Bp
biBzb29u4oCmDQo+DQo+IE5vIHR1bmFibGVzIGZvciB0aGlzLCBwbGVhc2UuIFRoZXJlJ3MgYWJz
b2x1dGVseSBubyByZWFzb24gd2h5IHdlIHNob3VsZA0KPiBuZWVkIGl0Lg0KDQpKZW5zIOKAkyBi
eSB0aGlzIHlvdSBtZWFuIHlvdSB3YW50IHRvIG9ubHkgYnVja2V0IElPIHRoYXQgYXJlIGV4YWN0
bHkgdGhlIG1pbmltdW0gYmxvY2sgc2l6ZSBzdXBwb3J0ZWQgYnkgdGhlIHVuZGVybHlpbmcgYmxv
Y2sgZGV2aWNlPyBJIHdhcyBlbnZpc2lvbmluZyB3ZSBtaWdodCB3YW50IHRvIHJlbGF4IHRoYXQg
aW4gY2VydGFpbiBjYXNlcyAoZS5nLiBidWNrZXQgNEtCIGFuZCBiZWxvdyBnb2luZyB0byBhIDUx
MkIgZGV2aWNlKS4NCg0KU3RlcGhlbg0KDQoNCg==

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 19:58           ` Stephen  Bates
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 19:58 UTC (permalink / raw)


>> 
>> Thanks for the review Sagi. I?d be OK going with <=0 as the exact
>> match would normally be for minimal IO sizes (where <= and = are the
>> same thing). I will see what other feedback I get and aim to do a
>> respin soon?
>
> No tunables for this, please. There's absolutely no reason why we should
> need it.

Jens ? by this you mean you want to only bucket IO that are exactly the minimum block size supported by the underlying block device? I was envisioning we might want to relax that in certain cases (e.g. bucket 4KB and below going to a 512B device).

Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 19:58           ` Stephen  Bates
@ 2017-03-28 20:38             ` Jens Axboe
  -1 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-03-28 20:38 UTC (permalink / raw)
  To: Stephen Bates, Sagi Grimberg
  Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

On 03/28/2017 01:58 PM, Stephen  Bates wrote:
>>>
>>> Thanks for the review Sagi. I’d be OK going with <=0 as the exact
>>> match would normally be for minimal IO sizes (where <= and = are the
>>> same thing). I will see what other feedback I get and aim to do a
>>> respin soon…
>>
>> No tunables for this, please. There's absolutely no reason why we should
>> need it.
> 
> Jens – by this you mean you want to only bucket IO that are exactly
> the minimum block size supported by the underlying block device? I was
> envisioning we might want to relax that in certain cases (e.g. bucket
> 4KB and below going to a 512B device).

Sorry, the above was a bit terse. I think a much better solution would
be to create a number of buckets (per data direction) and do stats on
all of them. The buckets would cover a reasonable range of request
sizes. Then when you poll for a given request, we can base our timed
number on the data direction AND size of it. You can get pretty far with
a few buckets:

512b
4k
8k
16k
32k
64k
128k

and you could even have your time estimation function turn these into
something sane. Or just use a composite of buckets, if you wish.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 20:38             ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-03-28 20:38 UTC (permalink / raw)


On 03/28/2017 01:58 PM, Stephen  Bates wrote:
>>>
>>> Thanks for the review Sagi. I?d be OK going with <=0 as the exact
>>> match would normally be for minimal IO sizes (where <= and = are the
>>> same thing). I will see what other feedback I get and aim to do a
>>> respin soon?
>>
>> No tunables for this, please. There's absolutely no reason why we should
>> need it.
> 
> Jens ? by this you mean you want to only bucket IO that are exactly
> the minimum block size supported by the underlying block device? I was
> envisioning we might want to relax that in certain cases (e.g. bucket
> 4KB and below going to a 512B device).

Sorry, the above was a bit terse. I think a much better solution would
be to create a number of buckets (per data direction) and do stats on
all of them. The buckets would cover a reasonable range of request
sizes. Then when you poll for a given request, we can base our timed
number on the data direction AND size of it. You can get pretty far with
a few buckets:

512b
4k
8k
16k
32k
64k
128k

and you could even have your time estimation function turn these into
something sane. Or just use a composite of buckets, if you wish.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 20:38             ` Jens Axboe
  (?)
@ 2017-03-28 21:49             ` Stephen  Bates
  -1 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 21:49 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg; +Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

>>>

>>> Thanks for the review Sagi. I’d be OK going with <=0 as the exact

>>> match would normally be for minimal IO sizes (where <= and = are the

>>> same thing). I will see what other feedback I get and aim to do a

>>> respin soon…

>>

>> No tunables for this, please. There's absolutely no reason why we should

>> need it.

>

> Jens – by this you mean you want to only bucket IO that are exactly

> the minimum block size supported by the underlying block device? I was

> envisioning we might want to relax that in certain cases (e.g. bucket

> 4KB and below going to a 512B device).



> Sorry, the above was a bit terse. I think a much better solution would

> be to create a number of buckets (per data direction) and do stats on

> all of them. The buckets would cover a reasonable range of request

> sizes. Then when you poll for a given request, we can base our timed

> number on the data direction AND size of it. You can get pretty far with

> a few buckets:

>

> 512b

> 4k

> 8k

> 16k

> 32k

> 64k

> 128k

>

> and you could even have your time estimation function turn these into

> something sane. Or just use a composite of buckets, if you wish.



I did go down this path initially but then moved away from it since we were focusing only on the smaller IO size. However I can definitely take a look at this again as I agree that it could be more useful in the long run.



I would like to keep my first patch in this series alive since I do think having the option to not bucket an IO is a useful thing to have.



I’ll take all the feedback to date and work on a v2. Thanks!



Stephen





[-- Attachment #2: Type: text/html, Size: 5137 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
  2017-03-28 20:38             ` Jens Axboe
@ 2017-03-28 21:52               ` Stephen  Bates
  -1 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 21:52 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg; +Cc: linux-block, Damien.LeMoal, osandov, linux-nvme

W1JldHJ5aW5nIGFzIG15IG5ldyBzZXR1cCBzZWNyZXRseSBjb252ZXJ0ZWQgdG8gaHRtbCBmb3Jt
YXQgd2l0aG91dCB0ZWxsaW5nIG1lLiBBcG9sb2dpZXMgZm9yIHRoZSByZXNlbmQuXQ0KDQo+Pj4g
DQo+Pj4gVGhhbmtzIGZvciB0aGUgcmV2aWV3IFNhZ2kuIEnigJlkIGJlIE9LIGdvaW5nIHdpdGgg
PD0wIGFzIHRoZSBleGFjdA0KPj4+IG1hdGNoIHdvdWxkIG5vcm1hbGx5IGJlIGZvciBtaW5pbWFs
IElPIHNpemVzICh3aGVyZSA8PSBhbmQgPSBhcmUgdGhlDQo+Pj4gc2FtZSB0aGluZykuIEkgd2ls
bCBzZWUgd2hhdCBvdGhlciBmZWVkYmFjayBJIGdldCBhbmQgYWltIHRvIGRvIGENCj4+PiByZXNw
aW4gc29vbuKApg0KPj4gDQo+PiBObyB0dW5hYmxlcyBmb3IgdGhpcywgcGxlYXNlLiBUaGVyZSdz
IGFic29sdXRlbHkgbm8gcmVhc29uIHdoeSB3ZSBzaG91bGQNCj4+IG5lZWQgaXQuDQo+DQo+IEpl
bnMg4oCTIGJ5IHRoaXMgeW91IG1lYW4geW91IHdhbnQgdG8gb25seSBidWNrZXQgSU8gdGhhdCBh
cmUgZXhhY3RseQ0KPiB0aGUgbWluaW11bSBibG9jayBzaXplIHN1cHBvcnRlZCBieSB0aGUgdW5k
ZXJseWluZyBibG9jayBkZXZpY2U/IEkgd2FzDQo+IGVudmlzaW9uaW5nIHdlIG1pZ2h0IHdhbnQg
dG8gcmVsYXggdGhhdCBpbiBjZXJ0YWluIGNhc2VzIChlLmcuIGJ1Y2tldA0KPiA0S0IgYW5kIGJl
bG93IGdvaW5nIHRvIGEgNTEyQiBkZXZpY2UpLg0KIA0KPiBTb3JyeSwgdGhlIGFib3ZlIHdhcyBh
IGJpdCB0ZXJzZS4gSSB0aGluayBhIG11Y2ggYmV0dGVyIHNvbHV0aW9uIHdvdWxkDQo+IGJlIHRv
IGNyZWF0ZSBhIG51bWJlciBvZiBidWNrZXRzIChwZXIgZGF0YSBkaXJlY3Rpb24pIGFuZCBkbyBz
dGF0cyBvbg0KPiBhbGwgb2YgdGhlbS4gVGhlIGJ1Y2tldHMgd291bGQgY292ZXIgYSByZWFzb25h
YmxlIHJhbmdlIG9mIHJlcXVlc3QNCj4gc2l6ZXMuIFRoZW4gd2hlbiB5b3UgcG9sbCBmb3IgYSBn
aXZlbiByZXF1ZXN0LCB3ZSBjYW4gYmFzZSBvdXIgdGltZWQNCj4gbnVtYmVyIG9uIHRoZSBkYXRh
IGRpcmVjdGlvbiBBTkQgc2l6ZSBvZiBpdC4gWW91IGNhbiBnZXQgcHJldHR5IGZhciB3aXRoDQo+
IGEgZmV3IGJ1Y2tldHM6DQo+IA0KPiA1MTJiDQo+IDRrDQo+IDhrDQo+IDE2aw0KPiAzMmsNCj4g
NjRrDQo+IDEyOGsNCj4gDQo+IGFuZCB5b3UgY291bGQgZXZlbiBoYXZlIHlvdXIgdGltZSBlc3Rp
bWF0aW9uIGZ1bmN0aW9uIHR1cm4gdGhlc2UgaW50bw0KPiBzb21ldGhpbmcgc2FuZS4gT3IganVz
dCB1c2UgYSBjb21wb3NpdGUgb2YgYnVja2V0cywgaWYgeW91IHdpc2guDQogDQpJIGRpZCBnbyBk
b3duIHRoaXMgcGF0aCBpbml0aWFsbHkgYnV0IHRoZW4gbW92ZWQgYXdheSBmcm9tIGl0IHNpbmNl
IHdlIHdlcmUgZm9jdXNpbmcgb25seSBvbiB0aGUgc21hbGxlciBJTyBzaXplLiBIb3dldmVyIEkg
Y2FuIGRlZmluaXRlbHkgdGFrZSBhIGxvb2sgYXQgdGhpcyBhZ2FpbiBhcyBJIGFncmVlIHRoYXQg
aXQgY291bGQgYmUgbW9yZSB1c2VmdWwgaW4gdGhlIGxvbmcgcnVuLg0KIA0KSSB3b3VsZCBsaWtl
IHRvIGtlZXAgbXkgZmlyc3QgcGF0Y2ggaW4gdGhpcyBzZXJpZXMgYWxpdmUgc2luY2UgSSBkbyB0
aGluayBoYXZpbmcgdGhlIG9wdGlvbiB0byBub3QgYnVja2V0IGFuIElPIGlzIGEgdXNlZnVsIHRo
aW5nIHRvIGhhdmUuDQogDQpJ4oCZbGwgdGFrZSBhbGwgdGhlIGZlZWRiYWNrIHRvIGRhdGUgYW5k
IHdvcmsgb24gYSB2Mi4gVGhhbmtzIQ0KIA0KU3RlcGhlbg0KDQoNCg0K

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct
@ 2017-03-28 21:52               ` Stephen  Bates
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen  Bates @ 2017-03-28 21:52 UTC (permalink / raw)


[Retrying as my new setup secretly converted to html format without telling me. Apologies for the resend.]

>>> 
>>> Thanks for the review Sagi. I?d be OK going with <=0 as the exact
>>> match would normally be for minimal IO sizes (where <= and = are the
>>> same thing). I will see what other feedback I get and aim to do a
>>> respin soon?
>> 
>> No tunables for this, please. There's absolutely no reason why we should
>> need it.
>
> Jens ? by this you mean you want to only bucket IO that are exactly
> the minimum block size supported by the underlying block device? I was
> envisioning we might want to relax that in certain cases (e.g. bucket
> 4KB and below going to a 512B device).
 
> Sorry, the above was a bit terse. I think a much better solution would
> be to create a number of buckets (per data direction) and do stats on
> all of them. The buckets would cover a reasonable range of request
> sizes. Then when you poll for a given request, we can base our timed
> number on the data direction AND size of it. You can get pretty far with
> a few buckets:
> 
> 512b
> 4k
> 8k
> 16k
> 32k
> 64k
> 128k
> 
> and you could even have your time estimation function turn these into
> something sane. Or just use a composite of buckets, if you wish.
 
I did go down this path initially but then moved away from it since we were focusing only on the smaller IO size. However I can definitely take a look at this again as I agree that it could be more useful in the long run.
 
I would like to keep my first patch in this series alive since I do think having the option to not bucket an IO is a useful thing to have.
 
I?ll take all the feedback to date and work on a v2. Thanks!
 
Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-03-28 21:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-26  2:18 [PATCH 0/2] blk-stat: Add ability to not bucket IO, add this to IO poling sbates
2017-03-26  2:18 ` sbates
2017-03-26  2:18 ` [PATCH 1/2] blk-stat: convert blk-stat bucket callback to signed sbates
2017-03-26  2:18   ` sbates
2017-03-26  2:49   ` Omar Sandoval
2017-03-26  2:49     ` Omar Sandoval
2017-03-27 16:00     ` Stephen  Bates
2017-03-27 16:00       ` Stephen  Bates
2017-03-27 16:01       ` Omar Sandoval
2017-03-27 16:01         ` Omar Sandoval
2017-03-26  2:18 ` [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct sbates
2017-03-26  2:18   ` sbates
2017-03-28 10:50   ` Sagi Grimberg
2017-03-28 10:50     ` Sagi Grimberg
2017-03-28 19:38     ` Stephen  Bates
2017-03-28 19:38       ` Stephen  Bates
2017-03-28 19:46       ` Jens Axboe
2017-03-28 19:46         ` Jens Axboe
2017-03-28 19:58         ` Stephen  Bates
2017-03-28 19:58           ` Stephen  Bates
2017-03-28 20:38           ` Jens Axboe
2017-03-28 20:38             ` Jens Axboe
2017-03-28 21:49             ` Stephen  Bates
2017-03-28 21:52             ` Stephen  Bates
2017-03-28 21:52               ` Stephen  Bates

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.