linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	Fabio Checconi <fchecconi@gmail.com>,
	Arianna Avanzini <avanzini.arianna@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, linus.walleij@linaro.org,
	broonie@kernel.org
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler
Date: Thu, 11 Feb 2016 17:22:10 -0500	[thread overview]
Message-ID: <20160211222210.GC3741@mtj.duckdns.org> (raw)
In-Reply-To: <1454364778-25179-10-git-send-email-paolo.valente@linaro.org>

Hello,

Here's the biggest question I have.  If I'm reading the paper and code
correctly, bfq is using bandwidth as the virtual time used for
scheduling.  This doesn't make sense to me because for most IO devices
and especially for rotating disks bandwidth is a number which
fluctuates easily and widely.  Trying to schedule a process with
sequential IO pattern and another with random IO on bandwidth basis
would be disastrous.

bfq adjusts for the fact by "punishing" seeky workloads, which, I
guess, is because doing naive bandwidth based scheduling is simply
unworkable.  The punishment is naturally charging it more than it
consumed but that looks like a twisted way of doing time based
scheduling.  It uses some heuristics to determine how much a seeky
queue should get over-charged so that the overcharged amount equals
the actual IO resource consumed, which is IO time on the device.

cfq may have a lot of shortcomings but abstracting IO resource in
terms of device IO time isn't one of them and bfq follows all the
necessary pieces of scheduling mechanics to be able to do so too -
single issuer at a time with opportunistic idling.

So, I'm scratching my head wondering why this wasn't time based to
begin with.  My limited understanding of the scheduling algorithm
doesn't show anything why this couldn't have been time based.  What am
I missing?

I haven't scrutinized the code closely and what follows are all
trivial stylistic comments.

>  block/Kconfig.iosched |    8 +-
>  block/bfq.h           |  502 ++++++

Why does this need to be in a separate file?  It doesn't seem to get
used anywhere other than cfq-iosched.c.

> +/**
> + * struct bfq_data - per device data structure.
> + * @queue: request queue for the managed device.
> + * @sched_data: root @bfq_sched_data for the device.
> + * @busy_queues: number of bfq_queues containing requests (including the
> + *		 queue in service, even if it is idling).
...

I'm personally not a big fan of documenting struct fields this way.
It's too easy to get them out of sync.

> + * All the fields are protected by the @queue lock.
> + */
> +struct bfq_data {
> +	struct request_queue *queue;
> +
> +	struct bfq_sched_data sched_data;

and also I think it's easier on the eyes to align field names

	struct request_queue		*queue;
	struct bfq_sched_data		sched_data;

That said, this is fundamentally bike-shedding, so please feel free
to ignore.

> +enum bfqq_state_flags {
> +	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is in service */
> +	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
> +	BFQ_BFQQ_FLAG_must_alloc,	/* must be allowed rq alloc */
...
> +#define BFQ_BFQQ_FNS(name)						\
...
> +
> +BFQ_BFQQ_FNS(busy);
> +BFQ_BFQQ_FNS(wait_request);
> +BFQ_BFQQ_FNS(must_alloc);

Heh... can we not do this?  What's wrong with just doing the following?

enum {
	BFQQ_BUSY		= 1U << 0,
	BFQQ_WAIT_REQUEST	= 1U << 1,
	BFQQ_MUST_ALLOC		= 1U << 2,
	...
};

> +/* Logging facilities. */
> +#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
> +	blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)

I don't think it's too productive to repeat bfq's.  bfqq_log()?

> +#define bfq_log(bfqd, fmt, args...) \
> +	blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
> +
> +/* Expiration reasons. */
> +enum bfqq_expiration {
> +	BFQ_BFQQ_TOO_IDLE = 0,		/*
> +					 * queue has been idling for
> +					 * too long
> +					 */
> +	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be used */
> +	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
> +	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more requests */
> +};

Given that these are less common than the flags, maybe use something
like BFQQ_EXP_TOO_IDLE?

> +static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);

bfqe_to_bfqq()?  And so on.  I don't think being verbose with heavily
repeated keywords helps much.

> +static struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
> +{
> +	return bic->bfqq[is_sync];
> +}
> +
> +static void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq,
> +			 bool is_sync)
> +{
> +	bic->bfqq[is_sync] = bfqq;
> +}

Do helpers like the above add anything?  Don't these obfuscate more
than help?

> +static struct bfq_data *bfq_get_bfqd_locked(void **ptr, unsigned long *flags)

bfqd_get_and_lock_irqsave()?

> +{
> +	struct bfq_data *bfqd;
> +
> +	rcu_read_lock();
> +	bfqd = rcu_dereference(*(struct bfq_data **)ptr);
> +
> +	if (bfqd != NULL) {
> +		spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
> +		if (ptr == NULL)
> +			pr_crit("get_bfqd_locked pointer NULL\n");

I don't get it.  How can @ptr can be NULL at this point?  The kernel
would have crashed on the above rcu_deref.

> +		else if (*ptr == bfqd)
> +			goto out;
> +		spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
> +	}
> +
> +	bfqd = NULL;
> +out:
> +	rcu_read_unlock();
> +	return bfqd;
> +}
> +
> +static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
> +{
> +	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
> +}

Ditto, I don't think these single liners help anything.  Just make
sure that the counterpart's name clearly indicates that it's a locking
function.

...
> +/* hw_tag detection: parallel requests threshold and min samples needed. */
> +#define BFQ_HW_QUEUE_THRESHOLD	4
> +#define BFQ_HW_QUEUE_SAMPLES	32
> +
> +#define BFQQ_SEEK_THR	 (sector_t)(8 * 1024)
> +#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)

Inconsistent indenting?

...
> + * Shift for timestamp calculations.  This actually limits the maximum
> + * service allowed in one timestamp delta (small shift values increase it),
> + * the maximum total weight that can be used for the queues in the system
> + * (big shift values increase it), and the period of virtual time
> + * wraparounds.
>   */
> +#define WFQ_SERVICE_SHIFT	22

Collect constants in one place?

...
> +static struct bfq_entity *bfq_entity_of(struct rb_node *node)
> +{
> +	struct bfq_entity *entity = NULL;
>  
> +	if (node)
> +		entity = rb_entry(node, struct bfq_entity, rb_node);
>  
> +	return entity;
> +}

Maybe

	if (!node)
		return NULL;
	return rb_entry(node, ...);

> +static unsigned short bfq_weight_to_ioprio(int weight)
>  {
> +	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
> +		0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;

Heh, how about?

	return max(IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight, 0);

> +static struct rb_node *bfq_find_deepest(struct rb_node *node)
>  {
> +	struct rb_node *deepest;
> +
> +	if (!node->rb_right && !node->rb_left)
> +		deepest = rb_parent(node);
> +	else if (!node->rb_right)
> +		deepest = node->rb_left;
> +	else if (!node->rb_left)
> +		deepest = node->rb_right;
> +	else {
> +		deepest = rb_next(node);
> +		if (deepest->rb_right)
> +			deepest = deepest->rb_right;
> +		else if (rb_parent(deepest) != node)
> +			deepest = rb_parent(deepest);
> +	}

According to CodyingStyle, if any clause gets braced, all clauses get
braced.

And there were some lines which were unnecessarily broken when joining
them would still fit under 80col limit.

Thanks.

-- 
tejun

  reply	other threads:[~2016-02-11 22:22 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01 22:12 [PATCH RFC 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-02-10 23:04   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 07/22] block, cfq: get rid of workload type Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-02-10 23:05   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-02-11 22:22   ` Tejun Heo [this message]
2016-02-12  0:35     ` Mark Brown
2016-02-17 15:57       ` Tejun Heo
2016-02-17 16:02         ` Mark Brown
2016-02-17 17:04           ` Tejun Heo
2016-02-17 18:13             ` Jonathan Corbet
2016-02-17 19:45               ` Tejun Heo
2016-02-17 19:56                 ` Jonathan Corbet
2016-02-17 20:14                   ` Tejun Heo
2016-02-17  9:02     ` Paolo Valente
2016-02-17 17:02       ` Tejun Heo
2016-02-20 10:23         ` Paolo Valente
2016-02-20 11:02           ` Paolo Valente
2016-03-01 18:46           ` Tejun Heo
2016-03-04 17:29             ` Linus Walleij
2016-03-04 17:39               ` Christoph Hellwig
2016-03-04 18:10                 ` Austin S. Hemmelgarn
2016-03-11 11:16                   ` Christoph Hellwig
2016-03-11 13:38                     ` Austin S. Hemmelgarn
2016-03-05 12:18                 ` Linus Walleij
2016-03-11 11:17                   ` Christoph Hellwig
2016-03-11 11:24                     ` Nikolay Borisov
2016-03-11 11:49                       ` Christoph Hellwig
2016-03-11 14:53                     ` Linus Walleij
2016-03-09  6:55                 ` Paolo Valente
2016-04-13 19:54                 ` Tejun Heo
2016-04-14  5:03                   ` Mark Brown
2016-03-09  6:34             ` Paolo Valente
2016-04-13 20:41               ` Tejun Heo
2016-04-14 10:23                 ` Paolo Valente
2016-04-14 16:29                   ` Tejun Heo
2016-04-15 14:20                     ` Paolo Valente
2016-04-15 15:08                       ` Tejun Heo
2016-04-15 16:17                         ` Paolo Valente
2016-04-15 19:29                           ` Tejun Heo
2016-04-15 22:08                             ` Paolo Valente
2016-04-15 22:45                               ` Tejun Heo
2016-04-16  6:03                                 ` Paolo Valente
2016-04-15 14:49                     ` Linus Walleij
2016-02-01 22:12 ` [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-02-11 22:28   ` Tejun Heo
2016-02-17  9:07     ` Paolo Valente
2016-02-17 17:14       ` Tejun Heo
2016-02-17 17:45         ` Tejun Heo
2016-04-20  9:32     ` Paolo
2016-04-22 18:13       ` Tejun Heo
2016-04-22 18:19         ` Paolo Valente
2016-04-22 18:41           ` Tejun Heo
2016-04-22 19:05             ` Paolo Valente
2016-04-22 19:32               ` Tejun Heo
2016-04-23  7:07                 ` Paolo Valente
2016-04-25 19:24                   ` Tejun Heo
2016-04-25 20:30                     ` Paolo
2016-05-06 20:20                       ` Paolo Valente
2016-05-12 13:11                         ` Paolo
2016-07-27 16:13                         ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 07/22] block, cfq: get rid of workload type Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 13/22] block, bfq: add more fairness with writes and slow processes Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 14/22] block, bfq: improve responsiveness Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 22/22] block, bfq: handle bursts of queue activations Paolo Valente
2016-07-28 16:50                           ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo
2016-02-01 22:12 ` [PATCH RFC 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 13/22] block, bfq: add more fairness to boost throughput and reduce latency Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 14/22] block, bfq: improve responsiveness Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 22/22] block, bfq: handle bursts of queue activations Paolo Valente

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160211222210.GC3741@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=avanzini.arianna@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=broonie@kernel.org \
    --cc=fchecconi@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).