From: Waiman Long <longman@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH v2 2/2] blk-cgroup: Optimize blkcg_rstat_flush()
Date: Wed, 1 Jun 2022 14:15:46 -0400 [thread overview]
Message-ID: <ca091a5c-4ae1-e973-403e-4086d4527102@redhat.com> (raw)
In-Reply-To: <YpemVpvaPomwH7mt@slm.duckdns.org>
On 6/1/22 13:48, Tejun Heo wrote:
> Hello,
>
> On Wed, Jun 01, 2022 at 12:53:24PM -0400, Waiman Long wrote:
>> +static struct llist_node llist_last; /* Last sentinel node of llist */
> Can you please add comment explaining why we need the special sentinel and
> empty helper?
It was mentioned in the commit log, but I will add a comment to repeat
that. It is because lnode.next is used as a flag to indicate its
presence in the lockless list. By default, the first one that go into
the lockless list will have a NULL value in its next pointer. So I have
to put a sentinel node that to make sure that the next pointer is always
non-NULL.
>
>> +static inline bool blkcg_llist_empty(struct llist_head *lhead)
>> +{
>> + return lhead->first == &llist_last;
>> +}
>> +
>> +static inline void init_blkcg_llists(struct blkcg *blkcg)
>> +{
>> + int cpu;
>> +
>> + for_each_possible_cpu(cpu)
>> + per_cpu_ptr(blkcg->lhead, cpu)->first = &llist_last;
>> +}
>> +
>> +static inline struct llist_node *
>> +fetch_delete_blkcg_llist(struct llist_head *lhead)
>> +{
>> + return xchg(&lhead->first, &llist_last);
>> +}
>> +
>> +/*
>> + * The retrieved blkg_iostat_set is immediately marked as not in the
>> + * lockless list by clearing its node->next pointer. It could be put
>> + * back into the list by a parallel update before the iostat's are
>> + * finally flushed. So being in the list doesn't always mean it has new
>> + * iostat's to be flushed.
>> + */
> Isn't the above true for any sort of mechanism which tracking pending state?
> You gotta clear the pending state before consuming so that you don't miss
> the events which happen while data is being consumed.
That is true. I was about thinking what race conditions can happen with
these changes. The above comment is for the race that can happen which
is benign. I am remove it if you think it is necessary.
>
>> +#define blkcg_llist_for_each_entry_safe(pos, node, nxt) \
>> + for (; (node != &llist_last) && \
>> + (pos = llist_entry(node, struct blkg_iostat_set, lnode), \
>> + nxt = node->next, node->next = NULL, true); \
>> + node = nxt)
>> +
>> /**
>> * blkcg_css - find the current css
>> *
> ...
>> @@ -852,17 +888,26 @@ static void blkg_iostat_sub(struct blkg_iostat *dst, struct blkg_iostat *src)
>> static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
>> {
>> struct blkcg *blkcg = css_to_blkcg(css);
>> - struct blkcg_gq *blkg;
>> + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
>> + struct llist_node *lnode, *lnext;
>> + struct blkg_iostat_set *bisc;
>>
>> /* Root-level stats are sourced from system-wide IO stats */
>> if (!cgroup_parent(css->cgroup))
>> return;
>>
>> - rcu_read_lock();
>> + if (blkcg_llist_empty(lhead))
>> + return;
>>
>> - hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
>> + lnode = fetch_delete_blkcg_llist(lhead);
>> +
>> + /*
>> + * No RCU protection is needed as it is assumed that blkg_iostat_set's
>> + * in the percpu lockless list won't go away until the flush is done.
>> + */
> Can you please elaborate on why this is safe?
You are right that the comment is probably not quite right. I will put
the rcu_read_lock/unlock() back in. However, we don't have a rcu
iterator for the lockless list. On the other hand, blkcg_rstat_flush()
is now called with irq disabled. So rcu_read_lock() is not technically
needed.
Will send out a v3 soon.
Thanks,
Longman
>
>> + blkcg_llist_for_each_entry_safe(bisc, lnode, lnext) {
>> + struct blkcg_gq *blkg = bisc->blkg;
>> struct blkcg_gq *parent = blkg->parent;
>> - struct blkg_iostat_set *bisc = per_cpu_ptr(blkg->iostat_cpu, cpu);
>> struct blkg_iostat cur, delta;
>> unsigned long flags;
>> unsigned int seq;
> Overall, looks fantastic to me. Thanks a lot for working on it.
>
next prev parent reply other threads:[~2022-06-01 18:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-01 16:53 [PATCH v2 1/2] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Waiman Long
2022-06-01 16:53 ` [PATCH v2 2/2] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-06-01 17:48 ` Tejun Heo
2022-06-01 18:15 ` Waiman Long [this message]
2022-06-01 18:35 ` Tejun Heo
2022-06-01 18:52 ` Waiman Long
2022-06-01 21:25 ` Waiman Long
2022-06-01 21:28 ` Tejun Heo
2022-06-01 21:32 ` Waiman Long
2022-06-02 1:52 ` kernel test robot
2022-06-01 17:48 ` [PATCH v2 1/2] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca091a5c-4ae1-e973-403e-4086d4527102@redhat.com \
--to=longman@redhat.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).