Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting

From: Waiman Long <longman@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: lizefan@huawei.com, hannes@cmpxchg.org, peterz@infradead.org,
	mingo@redhat.com, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
	luto@amacapital.net, efault@gmx.de,
	torvalds@linux-foundation.org, guro@fb.com
Subject: Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting
Date: Sun, 13 Aug 2017 19:20:35 -0400	[thread overview]
Message-ID: <a5ff145e-e2ca-d852-bd40-bc94ccc98167@redhat.com> (raw)
In-Reply-To: <20170813194421.GB1438922@devbig577.frc2.facebook.com>

On 08/13/2017 03:44 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Fri, Aug 11, 2017 at 04:19:07PM -0400, Waiman Long wrote:
>>> * CPU usage stats are collected and shown in "cgroup.stat" with "cpu."
>>>   prefix.  Total usage is collected from scheduling events.  User/sys
>>>   breakdown is sourced from tick sampling and adjusted to the usage
>>>   using cputime_adjuts().
>> Typo: cputime_adjust()
> Oops, will fix.
>
>>> +static DEFINE_MUTEX(cgroup_stat_mutex);
>>> +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_cpu_stat_lock);
>> If the hierarchy is large enough and the stat data hasn't been read
>> recently, it may take a while to accumulate all the stat data even for
>> one cpu in cgroup_stat_flush_locked(). So I think it will make more
>> sense to use regular spinlock_t instead of raw_spinlock_t.
> They need to be raw spinlocks because the accounting side may grab
> them while scheduling.  If the accumulating latency becomes
> problematic, we can test for need_resched and spin_needbreak and break
> out as necessary.  The iteration logic is built to allow that and an
> earlier revision actually did that but I wasn't sure whether it's
> actually necessary and removed it for simplicity.
>
> If the latency ever becomes a problem, resurrecting that isn't
> difficult at all.

Right, I forgot they will be used by the scheduler.

I think it is prudent to limit the latency, but it can be another patch
when the need arises.

>>> +static struct cgroup *cgroup_cpu_stat_next_updated(struct cgroup *pos,
>>> +						   struct cgroup *root, int cpu)
>> This function is trying to unwind one cgrp from the updated_children and
>> updated_next linkage. It is somewhat like the opposite of
>> cgroup_cpu_stat_updated(). I just feel like its name isn't intuitive
>> enough to convey what it is doing. Maybe use name like
>> cgroup_cpu_stat_unlink_one() to match cgroup_cpu_stat_flush_one().
> Hmm... the name comes from it being an iterator - most interators are
> named _next.  But, yeah, the name doesn't signifiy that it unlinks as
> it goes along.  I'll rename it to cgroup_cpu_stat_pop_updated().

I am fine with that.

Thanks,
Longman