All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] block: Fix IO priority mess
@ 2022-06-01 14:51 Jan Kara
  2022-06-01 14:51 ` [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2) Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jan Kara @ 2022-06-01 14:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Damien Le Moal, Jan Kara

Hello,

recently I've been looking into 10% regression reported by our performance
measurement infrastructure in reaim benchmark that was bisected down to
5a9d041ba2f6 ("block: move io_context creation into where it's needed"). This
didn't really make much sense and it took me a while to understand this but the
culprit is actually in even older commit e70344c05995 ("block: fix default IO
priority handling") and 5a9d041ba2f6 just made the breakage visible.
Essentially the problem was that after these commits some IO was queued with IO
priority class IOPRIO_CLASS_BE while other IO was queued with IOPRIO_CLASS_NONE
and as a result they could not be merged together resulting in performance
regression. I think what commit e70344c05995 ("block: fix default IO
priority handling") did is actually broken not only because of this performance
regression but because of other reasons as well (see changelog of patch 3/3
for details) so this patch set aims at fixing it.

								Honza

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2)
  2022-06-01 14:51 [PATCH 0/3] block: Fix IO priority mess Jan Kara
@ 2022-06-01 14:51 ` Jan Kara
  2022-06-01 15:11   ` Damien Le Moal
  2022-06-01 14:51 ` [PATCH 2/3] block: Make ioprio_best() static Jan Kara
  2022-06-01 14:51 ` [PATCH 3/3] block: fix default IO priority handling again Jan Kara
  2 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-01 14:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Damien Le Moal, Jan Kara

ioprio_get(2) can be asked to return the best IO priority from several
tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats
tasks without set IO priority as having priority
IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the
IO priority the task will get (which depends on task's nice value) and
with the following fix it will not even match returned IO priority for a
single task. So fix IO priority comparison to treat unset IO priority as
the lowest possible one. This way we will return IOPRIO_CLASS_NONE
priority only if none of the considered tasks has explicitely set IO
priority, otherwise we return the highest set IO priority. This changes
userspace visible behavior but hopefully the results are clearer and
nothing breaks.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/ioprio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 2fe068fcaad5..62890391fc80 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -157,10 +157,9 @@ static int get_task_ioprio(struct task_struct *p)
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
 	if (!ioprio_valid(aprio))
-		aprio = IOPRIO_DEFAULT;
+		return bprio;
 	if (!ioprio_valid(bprio))
-		bprio = IOPRIO_DEFAULT;
-
+		return aprio;
 	return min(aprio, bprio);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] block: Make ioprio_best() static
  2022-06-01 14:51 [PATCH 0/3] block: Fix IO priority mess Jan Kara
  2022-06-01 14:51 ` [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2) Jan Kara
@ 2022-06-01 14:51 ` Jan Kara
  2022-06-01 14:51 ` [PATCH 3/3] block: fix default IO priority handling again Jan Kara
  2 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2022-06-01 14:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Damien Le Moal, Jan Kara

Nobody outside of block/ioprio.c uses it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/ioprio.c         | 2 +-
 include/linux/ioprio.h | 5 -----
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/ioprio.c b/block/ioprio.c
index 62890391fc80..18f7e16882fe 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -154,7 +154,7 @@ static int get_task_ioprio(struct task_struct *p)
 	return ret;
 }
 
-int ioprio_best(unsigned short aprio, unsigned short bprio)
+static int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
 	if (!ioprio_valid(aprio))
 		return bprio;
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 3f53bc27a19b..774bb90ad668 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -59,11 +59,6 @@ static inline int get_current_ioprio(void)
 	return IOPRIO_DEFAULT;
 }
 
-/*
- * For inheritance, return the highest of the two given priorities
- */
-extern int ioprio_best(unsigned short aprio, unsigned short bprio);
-
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
 #ifdef CONFIG_BLOCK
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] block: fix default IO priority handling again
  2022-06-01 14:51 [PATCH 0/3] block: Fix IO priority mess Jan Kara
  2022-06-01 14:51 ` [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2) Jan Kara
  2022-06-01 14:51 ` [PATCH 2/3] block: Make ioprio_best() static Jan Kara
@ 2022-06-01 14:51 ` Jan Kara
  2022-06-01 15:08   ` Damien Le Moal
  2 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-01 14:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Damien Le Moal, Jan Kara

Commit e70344c05995 ("block: fix default IO priority handling")
introduced an inconsistency in get_current_ioprio() that tasks without
IO context return IOPRIO_DEFAULT priority while tasks with freshly
allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority.
Tasks without IO context used to be rare before 5a9d041ba2f6 ("block:
move io_context creation into where it's needed") but after this commit
they became common because now only BFQ IO scheduler setups task's IO
context. Similar inconsistency is there for get_task_ioprio() so this
inconsistency is now exposed to userspace and userspace will see
different IO priority for tasks operating on devices with BFQ compared
to devices without BFQ. Furthemore the changes done by commit
e70344c05995 change the behavior when no IO priority is set for BFQ IO
scheduler which is also documented in ioprio_set(2) manpage - namely
that tasks without set IO priority will use IO priority based on their
nice value.

So make sure we default to IOPRIO_CLASS_NONE as used to be the case
before commit e70344c05995. Also cleanup alloc_io_context() to
explicitely set this IO priority for the allocated IO context.

Fixes: e70344c05995 ("block: fix default IO priority handling")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-ioc.c        | 2 ++
 include/linux/ioprio.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index df9cfe4ca532..63fc02042408 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -247,6 +247,8 @@ static struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
 	INIT_HLIST_HEAD(&ioc->icq_list);
 	INIT_WORK(&ioc->release_work, ioc_release_fn);
 #endif
+	ioc->ioprio = IOPRIO_DEFAULT;
+
 	return ioc;
 }
 
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 774bb90ad668..d9dc78a15301 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -11,7 +11,7 @@
 /*
  * Default IO priority.
  */
-#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
+#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0)
 
 /*
  * Check that a priority value has a valid class.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-01 14:51 ` [PATCH 3/3] block: fix default IO priority handling again Jan Kara
@ 2022-06-01 15:08   ` Damien Le Moal
  2022-06-01 16:04     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2022-06-01 15:08 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe; +Cc: linux-block, Christoph Hellwig

On 2022/06/01 23:51, Jan Kara wrote:
> Commit e70344c05995 ("block: fix default IO priority handling")
> introduced an inconsistency in get_current_ioprio() that tasks without
> IO context return IOPRIO_DEFAULT priority while tasks with freshly
> allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority.
> Tasks without IO context used to be rare before 5a9d041ba2f6 ("block:
> move io_context creation into where it's needed") but after this commit
> they became common because now only BFQ IO scheduler setups task's IO
> context. Similar inconsistency is there for get_task_ioprio() so this
> inconsistency is now exposed to userspace and userspace will see
> different IO priority for tasks operating on devices with BFQ compared
> to devices without BFQ. Furthemore the changes done by commit
> e70344c05995 change the behavior when no IO priority is set for BFQ IO
> scheduler which is also documented in ioprio_set(2) manpage - namely
> that tasks without set IO priority will use IO priority based on their
> nice value.
> 
> So make sure we default to IOPRIO_CLASS_NONE as used to be the case
> before commit e70344c05995. Also cleanup alloc_io_context() to
> explicitely set this IO priority for the allocated IO context.
> 
> Fixes: e70344c05995 ("block: fix default IO priority handling")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/blk-ioc.c        | 2 ++
>  include/linux/ioprio.h | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index df9cfe4ca532..63fc02042408 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -247,6 +247,8 @@ static struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
>  	INIT_HLIST_HEAD(&ioc->icq_list);
>  	INIT_WORK(&ioc->release_work, ioc_release_fn);
>  #endif
> +	ioc->ioprio = IOPRIO_DEFAULT;
> +
>  	return ioc;
>  }
>  
> diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
> index 774bb90ad668..d9dc78a15301 100644
> --- a/include/linux/ioprio.h
> +++ b/include/linux/ioprio.h
> @@ -11,7 +11,7 @@
>  /*
>   * Default IO priority.
>   */
> -#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
> +#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0)

"man ioprio_set" says:

IOPRIO_CLASS_BE (2)
This is the best-effort scheduling class, which is the default for any process
that hasn't set  a  specific I/O  priority.

Which is why patch e70344c05995 introduced IOPRIO_DEFAULT definition using the
BE class, to have the kernel in sync with the manual.

The different ioprio leading to no BIO merging is definitely a problem but this
patch is not really fixing anything in my opinion. It simply gets back to the
previous "all 0s" ioprio initialization, which do not show the places where we
actually have missing ioprio initialization to IOPRIO_DEFAULT.

Isn't it simply that IOPRIO_DEFAULT should be set as the default for any bio
being allocated (in bio_alloc ?) before it is setup and inherits the user io
priority ? Otherwise, the bio io prio is indeed IOPRIO_CLASS_NONE/0 and changing
IOPRIO_DEFAULT to that value removes the differences you observed.

>  
>  /*
>   * Check that a priority value has a valid class.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2)
  2022-06-01 14:51 ` [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2) Jan Kara
@ 2022-06-01 15:11   ` Damien Le Moal
  2022-06-01 15:23     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2022-06-01 15:11 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe; +Cc: linux-block, Christoph Hellwig

On 2022/06/01 23:51, Jan Kara wrote:
> ioprio_get(2) can be asked to return the best IO priority from several
> tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats
> tasks without set IO priority as having priority
> IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the
> IO priority the task will get (which depends on task's nice value) and
> with the following fix it will not even match returned IO priority for a
> single task. So fix IO priority comparison to treat unset IO priority as
> the lowest possible one. This way we will return IOPRIO_CLASS_NONE
> priority only if none of the considered tasks has explicitely set IO
> priority, otherwise we return the highest set IO priority. This changes
> userspace visible behavior but hopefully the results are clearer and
> nothing breaks.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/ioprio.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/ioprio.c b/block/ioprio.c
> index 2fe068fcaad5..62890391fc80 100644
> --- a/block/ioprio.c
> +++ b/block/ioprio.c
> @@ -157,10 +157,9 @@ static int get_task_ioprio(struct task_struct *p)
>  int ioprio_best(unsigned short aprio, unsigned short bprio)
>  {
>  	if (!ioprio_valid(aprio))
> -		aprio = IOPRIO_DEFAULT;
> +		return bprio;

bprio may not be valid...

>  	if (!ioprio_valid(bprio))
> -		bprio = IOPRIO_DEFAULT;
> -
> +		return aprio;
>  	return min(aprio, bprio);
>  }
>  


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2)
  2022-06-01 15:11   ` Damien Le Moal
@ 2022-06-01 15:23     ` Jan Kara
  2022-06-02  0:56       ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-01 15:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig

On Thu 02-06-22 00:11:29, Damien Le Moal wrote:
> On 2022/06/01 23:51, Jan Kara wrote:
> > ioprio_get(2) can be asked to return the best IO priority from several
> > tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats
> > tasks without set IO priority as having priority
> > IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the
> > IO priority the task will get (which depends on task's nice value) and
> > with the following fix it will not even match returned IO priority for a
> > single task. So fix IO priority comparison to treat unset IO priority as
> > the lowest possible one. This way we will return IOPRIO_CLASS_NONE
> > priority only if none of the considered tasks has explicitely set IO
> > priority, otherwise we return the highest set IO priority. This changes
> > userspace visible behavior but hopefully the results are clearer and
> > nothing breaks.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/ioprio.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/ioprio.c b/block/ioprio.c
> > index 2fe068fcaad5..62890391fc80 100644
> > --- a/block/ioprio.c
> > +++ b/block/ioprio.c
> > @@ -157,10 +157,9 @@ static int get_task_ioprio(struct task_struct *p)
> >  int ioprio_best(unsigned short aprio, unsigned short bprio)
> >  {
> >  	if (!ioprio_valid(aprio))
> > -		aprio = IOPRIO_DEFAULT;
> > +		return bprio;
> 
> bprio may not be valid...

Yes, bprio may be from IOPRIO_CLASS_NONE as well and IMHO that is a
desirable return in that case. ioprio_valid() is IMHO a bit of a misnomer
because IOPRIO_CLASS_NONE is a valid class and can be even set by
userspace. It actually means, set IO priority based on task's CPU priority.
But lets first settle on the desired meaning of ioprio in the discussion
over patch 3/3. How we should behave in this case is a detail we can sort
out after the general meaning is clear.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-01 15:08   ` Damien Le Moal
@ 2022-06-01 16:04     ` Jan Kara
  2022-06-02  1:53       ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-01 16:04 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig

On Thu 02-06-22 00:08:28, Damien Le Moal wrote:
> On 2022/06/01 23:51, Jan Kara wrote:
> > Commit e70344c05995 ("block: fix default IO priority handling")
> > introduced an inconsistency in get_current_ioprio() that tasks without
> > IO context return IOPRIO_DEFAULT priority while tasks with freshly
> > allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority.
> > Tasks without IO context used to be rare before 5a9d041ba2f6 ("block:
> > move io_context creation into where it's needed") but after this commit
> > they became common because now only BFQ IO scheduler setups task's IO
> > context. Similar inconsistency is there for get_task_ioprio() so this
> > inconsistency is now exposed to userspace and userspace will see
> > different IO priority for tasks operating on devices with BFQ compared
> > to devices without BFQ. Furthemore the changes done by commit
> > e70344c05995 change the behavior when no IO priority is set for BFQ IO
> > scheduler which is also documented in ioprio_set(2) manpage - namely
> > that tasks without set IO priority will use IO priority based on their
> > nice value.
> > 
> > So make sure we default to IOPRIO_CLASS_NONE as used to be the case
> > before commit e70344c05995. Also cleanup alloc_io_context() to
> > explicitely set this IO priority for the allocated IO context.
> > 
> > Fixes: e70344c05995 ("block: fix default IO priority handling")
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/blk-ioc.c        | 2 ++
> >  include/linux/ioprio.h | 2 +-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> > index df9cfe4ca532..63fc02042408 100644
> > --- a/block/blk-ioc.c
> > +++ b/block/blk-ioc.c
> > @@ -247,6 +247,8 @@ static struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
> >  	INIT_HLIST_HEAD(&ioc->icq_list);
> >  	INIT_WORK(&ioc->release_work, ioc_release_fn);
> >  #endif
> > +	ioc->ioprio = IOPRIO_DEFAULT;
> > +
> >  	return ioc;
> >  }
> >  
> > diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
> > index 774bb90ad668..d9dc78a15301 100644
> > --- a/include/linux/ioprio.h
> > +++ b/include/linux/ioprio.h
> > @@ -11,7 +11,7 @@
> >  /*
> >   * Default IO priority.
> >   */
> > -#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
> > +#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0)
> 
> "man ioprio_set" says:
> 
> IOPRIO_CLASS_BE (2)
> This is the best-effort scheduling class, which is the default for any process
> that hasn't set  a  specific I/O  priority.
> 
> Which is why patch e70344c05995 introduced IOPRIO_DEFAULT definition using the
> BE class, to have the kernel in sync with the manual.

Yes, but it also has:

       If no I/O scheduler has been set for a thread, then by default the  I/O
       priority  will  follow  the  CPU nice value (setpriority(2)).  In Linux
       kernels before version 2.6.24, once an I/O priority had been set  using
       ioprio_set(),  there was no way to reset the I/O scheduling behavior to
       the default.  Since Linux 2.6.24, specifying ioprio as 0 can be used to
       reset to the default I/O scheduling behavior.

So apparently even the manpage itself is inconsistent ;) (and I will
neglect the mistake that the text says "no I/O scheduler" instead of "no
I/O priority"). And there is actually code in BFQ (and there used to be
code in CFQ) that does:
        switch (ioprio_class) {
	...
        case IOPRIO_CLASS_NONE:
                /*
                 * No prio set, inherit CPU scheduling settings.
                 */
                bfqq->new_ioprio = task_nice_ioprio(tsk);
                bfqq->new_ioprio_class = task_nice_ioclass(tsk);
                break;
	
So IOPRIO_CLASS_NONE indeed has a meaning and it used to be the default
one until Christoph's 5a9d041ba2f6 in most cases. Your change e70344c05995
happening before that actually didn't change much practically because IO
contexts were initialized with 0 priority anyway and that was what
get_current_ioprio() was returning.

> The different ioprio leading to no BIO merging is definitely a problem
> but this patch is not really fixing anything in my opinion. It simply
> gets back to the previous "all 0s" ioprio initialization, which do not
> show the places where we actually have missing ioprio initialization to
> IOPRIO_DEFAULT.

So I agree we should settle on how to treat IOs with unset IO priority. The
behavior to use task's CPU priority when IO priority is unset is there for
a *long* time and so I think we should preserve that. The question is where
in the stack should the switch from "unset" value to "effective ioprio"
happen. Switching in IO context is IMO too early since userspace needs to
be able to differentiate "unset" from "set to
IOPRIO_CLASS_BE,IOPRIO_BE_NORM". But we could have a helper like
current_effective_ioprio() that will do the magic with mangling unset IO
priority based on task's CPU priority.

The fact is that bio->bi_ioprio gets set to anything only for direct IO in
iomap_dio_rw(). The rest of the IO has priority unset (BFQ fetches the
priority from task's IO context and ignores priority on bios BTW). And the
only place where req->ioprio (inherited from bio->bi_ioprio) gets used is
in a few drivers to set HIGHPRI flag for IOPRIO_CLASS_RT IO and then the
relatively new code in mq-deadline.

So it all is very inconsistent mess :-|

> Isn't it simply that IOPRIO_DEFAULT should be set as the default for any bio
> being allocated (in bio_alloc ?) before it is setup and inherits the user io
> priority ? Otherwise, the bio io prio is indeed IOPRIO_CLASS_NONE/0 and changing
> IOPRIO_DEFAULT to that value removes the differences you observed.

Yes, I think that would make sence although as I explain above this is
somewhat independent to what the default IO priority behavior should be.

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2)
  2022-06-01 15:23     ` Jan Kara
@ 2022-06-02  0:56       ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2022-06-02  0:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On 2022/06/02 0:23, Jan Kara wrote:
> On Thu 02-06-22 00:11:29, Damien Le Moal wrote:
>> On 2022/06/01 23:51, Jan Kara wrote:
>>> ioprio_get(2) can be asked to return the best IO priority from several
>>> tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats
>>> tasks without set IO priority as having priority
>>> IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the
>>> IO priority the task will get (which depends on task's nice value) and
>>> with the following fix it will not even match returned IO priority for a
>>> single task. So fix IO priority comparison to treat unset IO priority as
>>> the lowest possible one. This way we will return IOPRIO_CLASS_NONE
>>> priority only if none of the considered tasks has explicitely set IO
>>> priority, otherwise we return the highest set IO priority. This changes
>>> userspace visible behavior but hopefully the results are clearer and
>>> nothing breaks.
>>>
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>>  block/ioprio.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/ioprio.c b/block/ioprio.c
>>> index 2fe068fcaad5..62890391fc80 100644
>>> --- a/block/ioprio.c
>>> +++ b/block/ioprio.c
>>> @@ -157,10 +157,9 @@ static int get_task_ioprio(struct task_struct *p)
>>>  int ioprio_best(unsigned short aprio, unsigned short bprio)
>>>  {
>>>  	if (!ioprio_valid(aprio))
>>> -		aprio = IOPRIO_DEFAULT;
>>> +		return bprio;
>>
>> bprio may not be valid...
> 
> Yes, bprio may be from IOPRIO_CLASS_NONE as well and IMHO that is a
> desirable return in that case. ioprio_valid() is IMHO a bit of a misnomer
> because IOPRIO_CLASS_NONE is a valid class and can be even set by
> userspace. It actually means, set IO priority based on task's CPU priority.
> But lets first settle on the desired meaning of ioprio in the discussion
> over patch 3/3. How we should behave in this case is a detail we can sort
> out after the general meaning is clear.

Sounds all good to me.

> 
> 								Honza


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-01 16:04     ` Jan Kara
@ 2022-06-02  1:53       ` Damien Le Moal
  2022-06-06 10:42         ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2022-06-02  1:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On 2022/06/02 1:04, Jan Kara wrote:
> On Thu 02-06-22 00:08:28, Damien Le Moal wrote:
>> On 2022/06/01 23:51, Jan Kara wrote:
>>> Commit e70344c05995 ("block: fix default IO priority handling")
>>> introduced an inconsistency in get_current_ioprio() that tasks without
>>> IO context return IOPRIO_DEFAULT priority while tasks with freshly
>>> allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority.
>>> Tasks without IO context used to be rare before 5a9d041ba2f6 ("block:
>>> move io_context creation into where it's needed") but after this commit
>>> they became common because now only BFQ IO scheduler setups task's IO
>>> context. Similar inconsistency is there for get_task_ioprio() so this
>>> inconsistency is now exposed to userspace and userspace will see
>>> different IO priority for tasks operating on devices with BFQ compared
>>> to devices without BFQ. Furthemore the changes done by commit
>>> e70344c05995 change the behavior when no IO priority is set for BFQ IO
>>> scheduler which is also documented in ioprio_set(2) manpage - namely
>>> that tasks without set IO priority will use IO priority based on their
>>> nice value.
>>>
>>> So make sure we default to IOPRIO_CLASS_NONE as used to be the case
>>> before commit e70344c05995. Also cleanup alloc_io_context() to
>>> explicitely set this IO priority for the allocated IO context.
>>>
>>> Fixes: e70344c05995 ("block: fix default IO priority handling")
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>>  block/blk-ioc.c        | 2 ++
>>>  include/linux/ioprio.h | 2 +-
>>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>>> index df9cfe4ca532..63fc02042408 100644
>>> --- a/block/blk-ioc.c
>>> +++ b/block/blk-ioc.c
>>> @@ -247,6 +247,8 @@ static struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
>>>  	INIT_HLIST_HEAD(&ioc->icq_list);
>>>  	INIT_WORK(&ioc->release_work, ioc_release_fn);
>>>  #endif
>>> +	ioc->ioprio = IOPRIO_DEFAULT;
>>> +
>>>  	return ioc;
>>>  }
>>>  
>>> diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
>>> index 774bb90ad668..d9dc78a15301 100644
>>> --- a/include/linux/ioprio.h
>>> +++ b/include/linux/ioprio.h
>>> @@ -11,7 +11,7 @@
>>>  /*
>>>   * Default IO priority.
>>>   */
>>> -#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
>>> +#define IOPRIO_DEFAULT	IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0)
>>
>> "man ioprio_set" says:
>>
>> IOPRIO_CLASS_BE (2)
>> This is the best-effort scheduling class, which is the default for any process
>> that hasn't set  a  specific I/O  priority.
>>
>> Which is why patch e70344c05995 introduced IOPRIO_DEFAULT definition using the
>> BE class, to have the kernel in sync with the manual.
> 
> Yes, but it also has:
> 
>        If no I/O scheduler has been set for a thread, then by default the  I/O
>        priority  will  follow  the  CPU nice value (setpriority(2)).  In Linux
>        kernels before version 2.6.24, once an I/O priority had been set  using
>        ioprio_set(),  there was no way to reset the I/O scheduling behavior to
>        the default.  Since Linux 2.6.24, specifying ioprio as 0 can be used to
>        reset to the default I/O scheduling behavior.
> 
> So apparently even the manpage itself is inconsistent ;) (and I will
> neglect the mistake that the text says "no I/O scheduler" instead of "no
> I/O priority"). And there is actually code in BFQ (and there used to be
> code in CFQ) that does:

Arg, indeed, it is a mess. We need to fix this manpage too.


>         switch (ioprio_class) {
> 	...
>         case IOPRIO_CLASS_NONE:
>                 /*
>                  * No prio set, inherit CPU scheduling settings.
>                  */
>                 bfqq->new_ioprio = task_nice_ioprio(tsk);
>                 bfqq->new_ioprio_class = task_nice_ioclass(tsk);
>                 break;
> 	
> So IOPRIO_CLASS_NONE indeed has a meaning and it used to be the default
> one until Christoph's 5a9d041ba2f6 in most cases. Your change e70344c05995
> happening before that actually didn't change much practically because IO
> contexts were initialized with 0 priority anyway and that was what
> get_current_ioprio() was returning.

5a9d041ba2f6 was from Jens, but agreed, io priority initialization never has
been correct.

> 
>> The different ioprio leading to no BIO merging is definitely a problem
>> but this patch is not really fixing anything in my opinion. It simply
>> gets back to the previous "all 0s" ioprio initialization, which do not
>> show the places where we actually have missing ioprio initialization to
>> IOPRIO_DEFAULT.
> 
> So I agree we should settle on how to treat IOs with unset IO priority. The
> behavior to use task's CPU priority when IO priority is unset is there for
> a *long* time and so I think we should preserve that. The question is where
> in the stack should the switch from "unset" value to "effective ioprio"
> happen. Switching in IO context is IMO too early since userspace needs to
> be able to differentiate "unset" from "set to
> IOPRIO_CLASS_BE,IOPRIO_BE_NORM". But we could have a helper like
> current_effective_ioprio() that will do the magic with mangling unset IO
> priority based on task's CPU priority.

I agree that the task's CPU priority is a more sensible default. However, I do
not understand your point about the IO context being too early to set the
effective priority. If we do not do that, getting the issuer CPU priority will
not be easily possible, right ?

> 
> The fact is that bio->bi_ioprio gets set to anything only for direct IO in
> iomap_dio_rw(). The rest of the IO has priority unset (BFQ fetches the
> priority from task's IO context and ignores priority on bios BTW). And the
> only place where req->ioprio (inherited from bio->bi_ioprio) gets used is
> in a few drivers to set HIGHPRI flag for IOPRIO_CLASS_RT IO and then the
> relatively new code in mq-deadline.

Yes, only IOPRIO_CLASS_RT has an effect at the hardware level right now.
Everything else is only for the IO scheduler to play with. I am preparing a
patch series to support scsi/ata command duration limits though. That adds a new
IOPRIO_CLASS_DL and that one will also have an effect on the hardware.

Note that if a process/thread use ioprio_set(), we should also honor the ioprio
set for buffered reads, at least those page IOs that are issued directly from
the IO issuer context (which may include readahead... hmmm).

> 
> So it all is very inconsistent mess :-|

Indeed it is.

> 
>> Isn't it simply that IOPRIO_DEFAULT should be set as the default for any bio
>> being allocated (in bio_alloc ?) before it is setup and inherits the user io
>> priority ? Otherwise, the bio io prio is indeed IOPRIO_CLASS_NONE/0 and changing
>> IOPRIO_DEFAULT to that value removes the differences you observed.
> 
> Yes, I think that would make sence although as I explain above this is
> somewhat independent to what the default IO priority behavior should be.
I am OK with the use of the task CPU priority/ionice value as the default when
no other ioprio is set for a bio using the user aio_reqprio or ioprio_set(). If
this relies on task_nice_ioclass() as it is today (I see no reason to change
that), then the default class for regular tasks remains IOPRIO_CLASS_BE as is
defined by IOPRIO_DEFAULT.

But to avoid the performance regression you observed, we really need to be 100%
sure that all bios have their ->bi_ioprio field correctly initialized. Something
like:

void bio_set_effective_ioprio(struct *bio)
{
	switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) {
	case IOPRIO_CLASS_RT:
	case IOPRIO_CLASS_BE:
	case IOPRIO_CLASS_IDLE:
		/*
		 * the bio ioprio was already set from an aio kiocb ioprio
		 * (aio->aio_reqprio) or from the issuer context ioprio if that
		 * context used ioprio_set().
		 */;
		return;
	case IOPRIO_CLASS_NONE:
	default:
		/* Use the current task CPU priority */
		bio->ioprio =
			IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
					  task_nice_ioprio(current));
		return;
	}
}

being called before a bio is inserted in a scheduler or bypass inserted in the
dispatch queues should result in all BIOs having an ioprio that is set to
something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at
the beginning of submit_bio(), before submit_bio_noacct() is called.

I am tempted to argue that block device drivers should never see any req with an
ioprio set to IOPRIO_CLASS_NONE, which means that no bio should ever enter the
block stack with that ioprio either. With the above solution, bios from DM
targets submitted with submit_bio_noacct() could still have IOPRIO_CLASS_NONE...
So would submit_bio_noacct() be the better place to call the effective ioprio
helper ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-02  1:53       ` Damien Le Moal
@ 2022-06-06 10:42         ` Jan Kara
  2022-06-06 14:21           ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-06 10:42 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig

On Thu 02-06-22 10:53:29, Damien Le Moal wrote:
> On 2022/06/02 1:04, Jan Kara wrote:
> > On Thu 02-06-22 00:08:28, Damien Le Moal wrote:
> >> The different ioprio leading to no BIO merging is definitely a problem
> >> but this patch is not really fixing anything in my opinion. It simply
> >> gets back to the previous "all 0s" ioprio initialization, which do not
> >> show the places where we actually have missing ioprio initialization to
> >> IOPRIO_DEFAULT.
> > 
> > So I agree we should settle on how to treat IOs with unset IO priority. The
> > behavior to use task's CPU priority when IO priority is unset is there for
> > a *long* time and so I think we should preserve that. The question is where
> > in the stack should the switch from "unset" value to "effective ioprio"
> > happen. Switching in IO context is IMO too early since userspace needs to
> > be able to differentiate "unset" from "set to
> > IOPRIO_CLASS_BE,IOPRIO_BE_NORM". But we could have a helper like
> > current_effective_ioprio() that will do the magic with mangling unset IO
> > priority based on task's CPU priority.
> 
> I agree that the task's CPU priority is a more sensible default. However,
> I do not understand your point about the IO context being too early to
> set the effective priority. If we do not do that, getting the issuer CPU
> priority will not be easily possible, right ?

I just meant that in the IO context we need to keep the information whether
IO priority is set to a particular value, or whether it is set to 0 meaning
"inherit from CPU priority". So we cannot just store the effective IO
priority immediately in the IO context instead of 0.

> > The fact is that bio->bi_ioprio gets set to anything only for direct IO in
> > iomap_dio_rw(). The rest of the IO has priority unset (BFQ fetches the
> > priority from task's IO context and ignores priority on bios BTW). And the
> > only place where req->ioprio (inherited from bio->bi_ioprio) gets used is
> > in a few drivers to set HIGHPRI flag for IOPRIO_CLASS_RT IO and then the
> > relatively new code in mq-deadline.
> 
> Yes, only IOPRIO_CLASS_RT has an effect at the hardware level right now.
> Everything else is only for the IO scheduler to play with. I am preparing a
> patch series to support scsi/ata command duration limits though. That adds a new
> IOPRIO_CLASS_DL and that one will also have an effect on the hardware.
> 
> Note that if a process/thread use ioprio_set(), we should also honor the ioprio
> set for buffered reads, at least those page IOs that are issued directly from
> the IO issuer context (which may include readahead... hmmm).

Yes, that would make sense as well.

> >> Isn't it simply that IOPRIO_DEFAULT should be set as the default for any bio
> >> being allocated (in bio_alloc ?) before it is setup and inherits the user io
> >> priority ? Otherwise, the bio io prio is indeed IOPRIO_CLASS_NONE/0 and changing
> >> IOPRIO_DEFAULT to that value removes the differences you observed.
> > 
> > Yes, I think that would make sence although as I explain above this is
> > somewhat independent to what the default IO priority behavior should be.
> I am OK with the use of the task CPU priority/ionice value as the default when
> no other ioprio is set for a bio using the user aio_reqprio or ioprio_set(). If
> this relies on task_nice_ioclass() as it is today (I see no reason to change
> that), then the default class for regular tasks remains IOPRIO_CLASS_BE as is
> defined by IOPRIO_DEFAULT.

Yes, good.

> But to avoid the performance regression you observed, we really need to be 100%
> sure that all bios have their ->bi_ioprio field correctly initialized. Something
> like:
> 
> void bio_set_effective_ioprio(struct *bio)
> {
> 	switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) {
> 	case IOPRIO_CLASS_RT:
> 	case IOPRIO_CLASS_BE:
> 	case IOPRIO_CLASS_IDLE:
> 		/*
> 		 * the bio ioprio was already set from an aio kiocb ioprio
> 		 * (aio->aio_reqprio) or from the issuer context ioprio if that
> 		 * context used ioprio_set().
> 		 */;
> 		return;
> 	case IOPRIO_CLASS_NONE:
> 	default:
> 		/* Use the current task CPU priority */
> 		bio->ioprio =
> 			IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
> 					  task_nice_ioprio(current));
> 		return;
> 	}
> }
> 
> being called before a bio is inserted in a scheduler or bypass inserted in the
> dispatch queues should result in all BIOs having an ioprio that is set to
> something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at
> the beginning of submit_bio(), before submit_bio_noacct() is called.
> 
> I am tempted to argue that block device drivers should never see any req
> with an ioprio set to IOPRIO_CLASS_NONE, which means that no bio should
> ever enter the block stack with that ioprio either. With the above
> solution, bios from DM targets submitted with submit_bio_noacct() could
> still have IOPRIO_CLASS_NONE...  So would submit_bio_noacct() be the
> better place to call the effective ioprio helper ?

Yes, I also think it would be the cleanest if we made sure bio->ioprio is
always set to some value other than IOPRIO_CLASS_NONE. I'll see how we can
make that happen in the least painful way :). Thanks for your input!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-06 10:42         ` Jan Kara
@ 2022-06-06 14:21           ` Jan Kara
  2022-06-07 12:13             ` Niklas Cassel
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-06-06 14:21 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche

On Mon 06-06-22 12:42:02, Jan Kara wrote:
> On Thu 02-06-22 10:53:29, Damien Le Moal wrote:
> > But to avoid the performance regression you observed, we really need to be 100%
> > sure that all bios have their ->bi_ioprio field correctly initialized. Something
> > like:
> > 
> > void bio_set_effective_ioprio(struct *bio)
> > {
> > 	switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) {
> > 	case IOPRIO_CLASS_RT:
> > 	case IOPRIO_CLASS_BE:
> > 	case IOPRIO_CLASS_IDLE:
> > 		/*
> > 		 * the bio ioprio was already set from an aio kiocb ioprio
> > 		 * (aio->aio_reqprio) or from the issuer context ioprio if that
> > 		 * context used ioprio_set().
> > 		 */;
> > 		return;
> > 	case IOPRIO_CLASS_NONE:
> > 	default:
> > 		/* Use the current task CPU priority */
> > 		bio->ioprio =
> > 			IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
> > 					  task_nice_ioprio(current));
> > 		return;
> > 	}
> > }
> > 
> > being called before a bio is inserted in a scheduler or bypass inserted in the
> > dispatch queues should result in all BIOs having an ioprio that is set to
> > something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at
> > the beginning of submit_bio(), before submit_bio_noacct() is called.
> > 
> > I am tempted to argue that block device drivers should never see any req
> > with an ioprio set to IOPRIO_CLASS_NONE, which means that no bio should
> > ever enter the block stack with that ioprio either. With the above
> > solution, bios from DM targets submitted with submit_bio_noacct() could
> > still have IOPRIO_CLASS_NONE...  So would submit_bio_noacct() be the
> > better place to call the effective ioprio helper ?
> 
> Yes, I also think it would be the cleanest if we made sure bio->ioprio is
> always set to some value other than IOPRIO_CLASS_NONE. I'll see how we can
> make that happen in the least painful way :). Thanks for your input!

When looking into this I've hit a snag: ioprio rq_qos policy relies on the
fact that bio->bi_ioprio remains at 0 (unless explicitely set to some other
value by userspace) until we call rq_qos_track() in blk_mq_submit_bio().
BTW this happens after we have attempted to merge the bio to existing
requests so ioprio rq_qos policy is going to show strange behavior wrt
merging - most of the bios will not be able to merge to existing queued
requests due to ioprio mismatch.

I'd say .track hook gets called too late to properly set bio->bi_ioprio.
Shouldn't we set the io priority much earlier - I'd be tempted to use
bio_associate_blkg_from_css() for this... What do people think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-06 14:21           ` Jan Kara
@ 2022-06-07 12:13             ` Niklas Cassel
  2022-06-07 12:59               ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Cassel @ 2022-06-07 12:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Damien Le Moal, Jens Axboe, linux-block, hch, Bart Van Assche

On Mon, Jun 06, 2022 at 04:21:36PM +0200, Jan Kara wrote:
> On Mon 06-06-22 12:42:02, Jan Kara wrote:
> > On Thu 02-06-22 10:53:29, Damien Le Moal wrote:
> > > But to avoid the performance regression you observed, we really need to be 100%
> > > sure that all bios have their ->bi_ioprio field correctly initialized. Something
> > > like:
> > > 
> > > void bio_set_effective_ioprio(struct *bio)
> > > {
> > > 	switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) {
> > > 	case IOPRIO_CLASS_RT:
> > > 	case IOPRIO_CLASS_BE:
> > > 	case IOPRIO_CLASS_IDLE:
> > > 		/*
> > > 		 * the bio ioprio was already set from an aio kiocb ioprio
> > > 		 * (aio->aio_reqprio) or from the issuer context ioprio if that
> > > 		 * context used ioprio_set().
> > > 		 */;
> > > 		return;
> > > 	case IOPRIO_CLASS_NONE:
> > > 	default:
> > > 		/* Use the current task CPU priority */
> > > 		bio->ioprio =
> > > 			IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
> > > 					  task_nice_ioprio(current));
> > > 		return;
> > > 	}
> > > }
> > > 
> > > being called before a bio is inserted in a scheduler or bypass inserted in the
> > > dispatch queues should result in all BIOs having an ioprio that is set to
> > > something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at
> > > the beginning of submit_bio(), before submit_bio_noacct() is called.
> > > 
> > > I am tempted to argue that block device drivers should never see any req
> > > with an ioprio set to IOPRIO_CLASS_NONE, which means that no bio should
> > > ever enter the block stack with that ioprio either. With the above
> > > solution, bios from DM targets submitted with submit_bio_noacct() could
> > > still have IOPRIO_CLASS_NONE...  So would submit_bio_noacct() be the
> > > better place to call the effective ioprio helper ?
> > 
> > Yes, I also think it would be the cleanest if we made sure bio->ioprio is
> > always set to some value other than IOPRIO_CLASS_NONE. I'll see how we can
> > make that happen in the least painful way :). Thanks for your input!
> 
> When looking into this I've hit a snag: ioprio rq_qos policy relies on the
> fact that bio->bi_ioprio remains at 0 (unless explicitely set to some other
> value by userspace) until we call rq_qos_track() in blk_mq_submit_bio().
> BTW this happens after we have attempted to merge the bio to existing
> requests so ioprio rq_qos policy is going to show strange behavior wrt
> merging - most of the bios will not be able to merge to existing queued
> requests due to ioprio mismatch.
> 
> I'd say .track hook gets called too late to properly set bio->bi_ioprio.
> Shouldn't we set the io priority much earlier - I'd be tempted to use
> bio_associate_blkg_from_css() for this... What do people think?

Hello Jan,

bio_associate_blkg_from_css() is just an empty stub if CONFIG_BLK_CGROUP
is not set.

Having the effective ioprio set should correctly shouldn't depend on if
CONFIG_BLK_CGROUP is set or not, no?

The function name bio_associate_blkg_from_css() (css - cgroup_subsys_state)
also seems to imply that it should only perform cgroup related things, no?

AFAICT, both bfq and mq-deadline can currently prioritize requests without
CONFIG_BLK_CGROUP enabled.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] block: fix default IO priority handling again
  2022-06-07 12:13             ` Niklas Cassel
@ 2022-06-07 12:59               ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2022-06-07 12:59 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jan Kara, Damien Le Moal, Jens Axboe, linux-block, hch, Bart Van Assche

On Tue 07-06-22 12:13:48, Niklas Cassel wrote:
> On Mon, Jun 06, 2022 at 04:21:36PM +0200, Jan Kara wrote:
> > On Mon 06-06-22 12:42:02, Jan Kara wrote:
> > > On Thu 02-06-22 10:53:29, Damien Le Moal wrote:
> > > > But to avoid the performance regression you observed, we really need to be 100%
> > > > sure that all bios have their ->bi_ioprio field correctly initialized. Something
> > > > like:
> > > > 
> > > > void bio_set_effective_ioprio(struct *bio)
> > > > {
> > > > 	switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) {
> > > > 	case IOPRIO_CLASS_RT:
> > > > 	case IOPRIO_CLASS_BE:
> > > > 	case IOPRIO_CLASS_IDLE:
> > > > 		/*
> > > > 		 * the bio ioprio was already set from an aio kiocb ioprio
> > > > 		 * (aio->aio_reqprio) or from the issuer context ioprio if that
> > > > 		 * context used ioprio_set().
> > > > 		 */;
> > > > 		return;
> > > > 	case IOPRIO_CLASS_NONE:
> > > > 	default:
> > > > 		/* Use the current task CPU priority */
> > > > 		bio->ioprio =
> > > > 			IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
> > > > 					  task_nice_ioprio(current));
> > > > 		return;
> > > > 	}
> > > > }
> > > > 
> > > > being called before a bio is inserted in a scheduler or bypass inserted in the
> > > > dispatch queues should result in all BIOs having an ioprio that is set to
> > > > something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at
> > > > the beginning of submit_bio(), before submit_bio_noacct() is called.
> > > > 
> > > > I am tempted to argue that block device drivers should never see any req
> > > > with an ioprio set to IOPRIO_CLASS_NONE, which means that no bio should
> > > > ever enter the block stack with that ioprio either. With the above
> > > > solution, bios from DM targets submitted with submit_bio_noacct() could
> > > > still have IOPRIO_CLASS_NONE...  So would submit_bio_noacct() be the
> > > > better place to call the effective ioprio helper ?
> > > 
> > > Yes, I also think it would be the cleanest if we made sure bio->ioprio is
> > > always set to some value other than IOPRIO_CLASS_NONE. I'll see how we can
> > > make that happen in the least painful way :). Thanks for your input!
> > 
> > When looking into this I've hit a snag: ioprio rq_qos policy relies on the
> > fact that bio->bi_ioprio remains at 0 (unless explicitely set to some other
> > value by userspace) until we call rq_qos_track() in blk_mq_submit_bio().
> > BTW this happens after we have attempted to merge the bio to existing
> > requests so ioprio rq_qos policy is going to show strange behavior wrt
> > merging - most of the bios will not be able to merge to existing queued
> > requests due to ioprio mismatch.
> > 
> > I'd say .track hook gets called too late to properly set bio->bi_ioprio.
> > Shouldn't we set the io priority much earlier - I'd be tempted to use
> > bio_associate_blkg_from_css() for this... What do people think?
> 
> Hello Jan,
> 
> bio_associate_blkg_from_css() is just an empty stub if CONFIG_BLK_CGROUP
> is not set.
> 
> Having the effective ioprio set should correctly shouldn't depend on if
> CONFIG_BLK_CGROUP is set or not, no?
> 
> The function name bio_associate_blkg_from_css() (css - cgroup_subsys_state)
> also seems to imply that it should only perform cgroup related things, no?
> 
> AFAICT, both bfq and mq-deadline can currently prioritize requests without
> CONFIG_BLK_CGROUP enabled.

Correct on all points. However the ioprio rq_qos policy very much depends
on the cgroup support. So at least the update of bio->bi_ioprio based on
that policy would make sense in bio_associate_blkg_from_css(). OTOH
thinking about it now, we would have a problem that if bio blkcg
association changes, we don't have enough information to update the
bi_ioprio accordingly (since we don't know whether the resulting ioprio is
set by userspace or by ioprio rq_qos policy). So probably we need make
ioprio rq_qos policy to set the priority later (likely at bio submission
time when bio blkcg association is stable) but we need to do it earlier
than after we try to merge the bio...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-07 12:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-01 14:51 [PATCH 0/3] block: Fix IO priority mess Jan Kara
2022-06-01 14:51 ` [PATCH 1/3] block: Fix handling of tasks without ioprio in ioprio_get(2) Jan Kara
2022-06-01 15:11   ` Damien Le Moal
2022-06-01 15:23     ` Jan Kara
2022-06-02  0:56       ` Damien Le Moal
2022-06-01 14:51 ` [PATCH 2/3] block: Make ioprio_best() static Jan Kara
2022-06-01 14:51 ` [PATCH 3/3] block: fix default IO priority handling again Jan Kara
2022-06-01 15:08   ` Damien Le Moal
2022-06-01 16:04     ` Jan Kara
2022-06-02  1:53       ` Damien Le Moal
2022-06-06 10:42         ` Jan Kara
2022-06-06 14:21           ` Jan Kara
2022-06-07 12:13             ` Niklas Cassel
2022-06-07 12:59               ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.