From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757920Ab3APAdP (ORCPT ); Tue, 15 Jan 2013 19:33:15 -0500 Received: from mail-vb0-f48.google.com ([209.85.212.48]:59492 "EHLO mail-vb0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757809Ab3APAdL (ORCPT ); Tue, 15 Jan 2013 19:33:11 -0500 MIME-Version: 1.0 In-Reply-To: <1357731938-8417-1-git-send-email-glommer@parallels.com> References: <1357731938-8417-1-git-send-email-glommer@parallels.com> Date: Tue, 15 Jan 2013 16:33:10 -0800 Message-ID: Subject: Re: [PATCH v5 00/11] per-cgroup cpu-stat From: Colin Cross To: Glauber Costa Cc: cgroups@vger.kernel.org, lkml , Andrew Morton , Tejun Heo , Peter Zijlstra , Paul Turner Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 9, 2013 at 3:45 AM, Glauber Costa wrote: > [ update: I thought I posted this already before leaving for holidays. However, > now that I am checking for replies, I can't find nor replies nor the original > mail in my boxes or archives. I am posting again for safety sake, but sorry > you are getting this twice by any chance ] > > Hi all, > > This is an attempt to provide userspace with enough information to reconstruct > per-container version of files like "/proc/stat". In particular, we are > interested in knowing the per-cgroup slices of user time, system time, wait > time, number of processes, and a variety of statistics. > > This task is made more complicated by the fact that multiple controllers are > involved in collecting those statistics: cpu and cpuacct. So the first thing I > am doing here, is ressurecting Tejun's patches that aim at deprecating cpuacct. > > This is one of the major differences from earlier attempts: all data is provided > by the cpu controller, resulting in greater simplicity. Android userspace is currently using both cpu and cpuacct, and not co-mounting them. They are used for fundamentally different uses such that creating a single hierarchy for both of them while maintaining the existing behavior is not possible. We use the cpu cgroup primarily as a priority container. A simple view is that each thread is assigned to a foreground cgroup when it is user-visible, and a background cgroup when it is not. The foreground cgroup is assigned a significantly higher cpu.shares value such that when each group is fully loaded the background group will get 5% and the foreground group will get 95%. We use the cpuacct cgroup to measure cpu usage per uid, primarily to estimate one cause of battery usage. Each uid gets a cgroup, and when spawning a task for a new uid we put it in the appropriate cgroup. We could create a new uid cgroup for cpuacct inside the foreground and background cgroups used for scheduling, but that would drastically change the way scheduling works when multiple uids have active threads. With separate cpu and cpuacct mounts, every active foreground thread will get equal cpu time. With co-mounted cpu and cpuacct cgroups, cpu time will be shared between each accounting group, and then sub-shared inside that group. A concrete example: Two uids, 1 and 2. Uid 1 has one thread A, uid 2 has two threads B and C. All threads are foreground and running continuously. With separate cpu and cpuacct mounts, we have: /cpu/foreground/tasks: A B C /cpuacct/uid/1/tasks: A /cpuacct/uid/2/tasks: B C A, B, and C each will get 33% of the cpu time. With co-mounted cpu and cpuacct mounts: /cpu/foreground/1/tasks: A /cpu/foreground/2/tasks B C A will get 50% of the cpu time, B and C will get 25% of the cpu time. I don't see any way to add new subgroups for accounting without partitioning the cpu time for each subgroup.