linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: KUROSAWA Takahiro <kurosawa@valinux.co.jp>
Cc: taka@valinux.co.jp, magnus.damm@gmail.com, dino@in.ibm.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/3] CPUMETER (Re: [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS)
Date: Sat, 1 Oct 2005 21:20:26 -0700	[thread overview]
Message-ID: <20051001212026.1d39222a.pj@sgi.com> (raw)
In-Reply-To: <20050926093432.626D07003D@sv1.valinux.co.jp>

Takahiro-san,

I spent a little more reading the cpuset side of your cpumeter patches.

I am hopeful that some substantial restructuring of the code would
integrate it better with the existing cpuset structure, reducing the
size of new code substantially.

I noticed not one, but two, nested CONFIG options added for CPUMETER:
CONFIG_CPUMETER and CONFIG_CPU_RC.  If this CPUMETER patch only
affected the cpuset code, I'd be tempted to have no additional CONFIG_*
options, however since the affect on the scheduler might be more
interesting (I am not a scheduler guru), it might make sense to have
one new such option - say the CONFIG_CPUMETER one, that covers it all.
Two seems at least one too many.

I am surprised to see fixed length arrays of size CPUMETER_CTRLS_MAX,
rather than having the cpumeter struct be dynamically allocated,
reference counted and locked.   I hope we do not have to have a small
fixed upper limit on the number of distinct cpu controllers.

I suspect that we should have a single additional pointer to cpu
resource controller (rc) domain structure, referenced from both
cpusets and tasks, where that structure is reference counted and
has its own lock.  All tasks and all cpuses in a given cpu rc domain
would point to the same structure.  This would remove any need in the
critical scheduler code to reference a tasks cpuset or the parent
of that cpuset, and it would allow the cpusets in such a domain to
have child cpusets, which would inherit another reference to the same
cpumeter rc domain, as I have been hoping would be possible.

So the code now added to struct cpuset:

	#ifdef CONFIG_CPUMETER
		/*
		 * rcdomains: used for the recource control domains
		 *            to keep track of total ammount of resources.
		 * meters:    used for metering resources assigned for
		 *            the cpuset.
		 */
		void *rcdomains[CPUMETER_CTLRS_MAX];
		struct cpumeter meters[CPUMETER_CTLRS_MAX];
	#endif /* CONFIG_CPUMETER */

might collapse to simply:

	struct cpumeter *cpumeter;

The use of the last CPUMETER_CTLRS_MAX (16) bits, starting at offset
CS_METER_OFFSET, of the typedef enum cpuset_flagbigs_t for one bit per
controller instance is creative, but hopefully unnecessary.  Rather,
if the struct cpumeters are dynamically allocated, then whether the
cpuset->cpumeter pointer is non-NULL will determine if that cpuset
is metered, and the pointer will reference the information about that
metered domain, shared by all cpusets in that domain, and hence by all
tasks in those cpusets.  Be sure to update which cpumeter rc domain a
task is in, if that task is moved to another cpuset, by attach_task().

The cpumeter_destroy_meters() routine should not be called from
cpuset_diput().  Rather the reference count on the meter structure
should be incremented and decremented each time a cpuset or task
attaches to or detached from it, and it should be free'd when the
count goes to zero.

There are 736 lines of code added to the end of kernel/cpuset.c,
with many snippets closely resembling existing code from the rest
of kernel/cpuset.c.   This is too much duplicated, parallel code,
and too much duplicate, parallel data structures.  I am hoping
that this can be collapsed into the existing cpuset.c code, adding
perhaps 100 or 200 lines instead of 736 lines.  But I have not
thought about it too hard - this hope might be wishful thinking.
Seeing routines such as "cpumeter_file_read_common()", along side
the existing "cpuset_common_file_read()", worries me.  Way too much
duplication of code.  And I hope we don't need a whole different
set of file_operations for the guar and lim files.

This code provides a basis for other forms of metering besides just
cpu metering.  I wonder if perhaps it would be better to just do
cpu metering here, and remove a bit of the generality that supports
other types of controllers.  If the addition of this cpuset based
controller were sufficiently integrated with the existing cpuset
code, and kept simple enough, then adding another controller type,
such as for memory, could just do the same sort of thing easy enough.
It would be easier to read the code if lines such as the following:

		sprintf(name, "%s%s%s", cpumeter_meter_prefix, c->name,
			cpumeter_guar_suffix);

were instead:

		name = "meter_cpu_guar";

You have been most gracious in your consideration of my suggestions
so far.  I hope the above thoughts will benefit your work.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  reply	other threads:[~2005-10-02  4:20 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-08  5:39 [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS KUROSAWA Takahiro
2005-09-08  7:23 ` Paul Jackson
2005-09-08  8:18   ` KUROSAWA Takahiro
2005-09-08 12:02     ` Paul Jackson
2005-09-09  1:38       ` KUROSAWA Takahiro
2005-09-09  4:12         ` Magnus Damm
2005-09-09  5:55           ` Paul Jackson
2005-09-09  7:52             ` Magnus Damm
2005-09-09  8:39               ` Paul Jackson
2005-09-09 11:38             ` Hirokazu Takahashi
2005-09-09 13:31               ` Paul Jackson
2005-09-10  7:11                 ` Hirokazu Takahashi
2005-09-10  8:52                   ` Paul Jackson
2005-09-11 16:02                     ` Hirokazu Takahashi
2005-09-26  9:33                     ` [PATCH 0/3] CPUMETER (Re: [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS) KUROSAWA Takahiro
2005-10-02  4:20                       ` Paul Jackson [this message]
2005-10-04  2:49                         ` KUROSAWA Takahiro
2005-09-26  9:34                     ` [PATCH 1/3] CPUMETER: add cpumeter framework to the CPUSETS KUROSAWA Takahiro
2005-09-27  8:37                       ` Paul Jackson
2005-09-27  9:22                         ` Nick Piggin
2005-09-27 15:53                           ` [ckrm-tech] " Paul Jackson
2005-09-27 21:45                           ` Chandra Seetharaman
2005-09-28  6:35                           ` KUROSAWA Takahiro
2005-09-28 10:08                             ` Hirokazu Takahashi
2005-09-28 10:32                               ` KUROSAWA Takahiro
2005-09-27 11:39                         ` KUROSAWA Takahiro
2005-09-27 15:49                           ` [ckrm-tech] " Paul Jackson
2005-09-28  6:21                             ` KUROSAWA Takahiro
2005-09-28  6:43                               ` Paul Jackson
2005-09-28  7:08                               ` Paul Jackson
2005-09-28  7:53                                 ` KUROSAWA Takahiro
2005-09-28 16:49                                   ` Paul Jackson
2005-09-29  2:53                                     ` KUROSAWA Takahiro
2005-09-29  2:58                                       ` Paul Jackson
2005-09-30  9:39                                       ` Simon Derr
2005-09-30 14:21                                         ` Paul Jackson
2005-10-02  7:01                             ` Ok to change cpuset flags for sched domains? (was [PATCH 1/3] CPUMETER ...) Paul Jackson
2005-10-03 14:00                               ` Dinakar Guniguntala
2005-10-03 23:36                                 ` [ckrm-tech] " Paul Jackson
2005-09-28  9:25                           ` [PATCH][BUG] fix memory leak on reading cpuset files after seeking beyond eof KUROSAWA Takahiro
2005-09-28 13:42                             ` Paul Jackson
2005-09-28 13:42                             ` [PATCH] cpuset read past eof memory leak fix Paul Jackson
2005-09-28 15:01                               ` Linus Torvalds
2005-09-28 17:53                                 ` Paul Jackson
2005-09-28 18:03                                   ` Linus Torvalds
2005-09-28 18:03                                   ` Randy.Dunlap
2005-09-28 19:04                                     ` [ckrm-tech] " Paul Jackson
2005-09-28 14:29                           ` [PATCH 1/3] CPUMETER: add cpumeter framework to the CPUSETS Paul Jackson
2005-09-26  9:34                     ` [PATCH 2/3] CPUMETER: CPU resource controller KUROSAWA Takahiro
2005-09-26  9:34                     ` [PATCH 3/3] CPUMETER: connect the CPU resource controller to CPUMETER KUROSAWA Takahiro
2005-09-09 22:26           ` [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS Matthew Helsley
2005-09-08 13:14   ` Dinakar Guniguntala
2005-09-08 14:11     ` Dipankar Sarma
2005-09-08 14:55       ` Paul Jackson
2005-09-08 14:59     ` Paul Jackson
2005-09-08 22:51     ` [ckrm-tech] " Chandra Seetharaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051001212026.1d39222a.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=dino@in.ibm.com \
    --cc=kurosawa@valinux.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.damm@gmail.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).