linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Waiman Long <longman@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH v2 2/2] blk-cgroup: Optimize blkcg_rstat_flush()
Date: Wed, 1 Jun 2022 07:48:06 -1000	[thread overview]
Message-ID: <YpemVpvaPomwH7mt@slm.duckdns.org> (raw)
In-Reply-To: <20220601165324.60892-2-longman@redhat.com>

Hello,

On Wed, Jun 01, 2022 at 12:53:24PM -0400, Waiman Long wrote:
> +static struct llist_node llist_last;	/* Last sentinel node of llist */

Can you please add comment explaining why we need the special sentinel and
empty helper?

> +static inline bool blkcg_llist_empty(struct llist_head *lhead)
> +{
> +	return lhead->first == &llist_last;
> +}
> +
> +static inline void init_blkcg_llists(struct blkcg *blkcg)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		per_cpu_ptr(blkcg->lhead, cpu)->first = &llist_last;
> +}
> +
> +static inline struct llist_node *
> +fetch_delete_blkcg_llist(struct llist_head *lhead)
> +{
> +	return xchg(&lhead->first, &llist_last);
> +}
> +
> +/*
> + * The retrieved blkg_iostat_set is immediately marked as not in the
> + * lockless list by clearing its node->next pointer. It could be put
> + * back into the list by a parallel update before the iostat's are
> + * finally flushed. So being in the list doesn't always mean it has new
> + * iostat's to be flushed.
> + */

Isn't the above true for any sort of mechanism which tracking pending state?
You gotta clear the pending state before consuming so that you don't miss
the events which happen while data is being consumed.

> +#define blkcg_llist_for_each_entry_safe(pos, node, nxt)			\
> +	for (; (node != &llist_last) &&					\
> +	       (pos = llist_entry(node, struct blkg_iostat_set, lnode),	\
> +		nxt = node->next, node->next = NULL, true);		\
> +		node = nxt)
> +
>  /**
>   * blkcg_css - find the current css
>   *
...
> @@ -852,17 +888,26 @@ static void blkg_iostat_sub(struct blkg_iostat *dst, struct blkg_iostat *src)
>  static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
>  {
>  	struct blkcg *blkcg = css_to_blkcg(css);
> -	struct blkcg_gq *blkg;
> +	struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> +	struct llist_node *lnode, *lnext;
> +	struct blkg_iostat_set *bisc;
>  
>  	/* Root-level stats are sourced from system-wide IO stats */
>  	if (!cgroup_parent(css->cgroup))
>  		return;
>  
> -	rcu_read_lock();
> +	if (blkcg_llist_empty(lhead))
> +		return;
>  
> -	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
> +	lnode = fetch_delete_blkcg_llist(lhead);
> +
> +	/*
> +	 * No RCU protection is needed as it is assumed that blkg_iostat_set's
> +	 * in the percpu lockless list won't go away until the flush is done.
> +	 */

Can you please elaborate on why this is safe?

> +	blkcg_llist_for_each_entry_safe(bisc, lnode, lnext) {
> +		struct blkcg_gq *blkg = bisc->blkg;
>  		struct blkcg_gq *parent = blkg->parent;
> -		struct blkg_iostat_set *bisc = per_cpu_ptr(blkg->iostat_cpu, cpu);
>  		struct blkg_iostat cur, delta;
>  		unsigned long flags;
>  		unsigned int seq;

Overall, looks fantastic to me. Thanks a lot for working on it.

-- 
tejun

  reply	other threads:[~2022-06-01 17:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01 16:53 [PATCH v2 1/2] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Waiman Long
2022-06-01 16:53 ` [PATCH v2 2/2] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-06-01 17:48   ` Tejun Heo [this message]
2022-06-01 18:15     ` Waiman Long
2022-06-01 18:35       ` Tejun Heo
2022-06-01 18:52         ` Waiman Long
2022-06-01 21:25           ` Waiman Long
2022-06-01 21:28             ` Tejun Heo
2022-06-01 21:32               ` Waiman Long
2022-06-02  1:52   ` kernel test robot
2022-06-01 17:48 ` [PATCH v2 1/2] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpemVpvaPomwH7mt@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).