linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
@ 2004-12-14  3:55 Roland McGrath
  2004-12-14 18:36 ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Roland McGrath @ 2004-12-14  3:55 UTC (permalink / raw)
  To: akpm, torvalds; +Cc: linux-kernel, Christoph Lameter, Ulrich Drepper


This patch provides support for thread and process CPU time clocks in
the POSIX clock interface.  Both the existing utime and utime+stime
information (already available via getrusage et al) can be used, as well
as a new (potentially) more precise and accurate clock (which cannot
distinguish user from system time).  The clock used is that provided by
the `sched_clock' function already used internally by the scheduler.
This gives a way for platforms to provide the highest-resolution CPU
time tracking that is available cheaply, and some already do so (such as
x86 using TSC).  Because this clock is already sampled internally by the
scheduler, this new tracking adds only the tiniest new overhead to
accomplish the bookkeeping.

This replaces the support contributed by Christoph Lameter, with the same
goals.  It improves on that support in these ways:

1. The ABI encoding of clockid_t for CPU time clocks is changed.
   The current code encodes PID_MAX_LIMIT as part of the ABI; this is a
   poor ABI, since that has heretofore been an internal implementation
   limit never exposed directly to userland as an ABI constraint.
   The new ABI is expressed in the macros defined in <linux/posix-timers.h>.
   Note that the small-integer constants are not supported by the kernel ABI.
   Userland never wants to pass these anyway, they will always be
   translated by compatibility code in glibc.  The caller's own thread
   or process clock is expressed by using the encoding with a PID of zero.

2. Three clocks available, not one.  The PROF and VIRT clocks are the
   familiar clocks that drive the ITIMER_PROF and ITIMER_VIRTUAL itimers
   of old.  The new SCHED clock is the high-resolution one, usually
   based on CPU cycle counters (but sampled at context switches, so it
   doesn't matter whether multiple CPUs synchronize their counters or
   not).  Moreover, it counts time spent on the CPU directly, rather
   than charging a thread for the whole 1/HZ second period during which
   it was the thread running when the timer interrupt hit.  With the
   existing clocks, a thread can do most of a tick worth of work and
   then yield in time to have someone else get the interrupt, and manage
   to steadily use CPU cycles without registering any utime/stime.

3. The code is cleanly separated into the new file posix-cpu-timers.c;
   the main code punts to posix_cpu_* functions whenever the clockid_t
   has the high bit set.

Some notes:

This allows per-thread clocks to be accessed only by other threads in
the same process.  The only POSIX calls that access these are defined
only for in-process use, and having this check is necessary for the
userland implementations of the POSIX clock functions to robustly refuse
stale clockid_t's in the face of potential PID reuse.

This makes no constraint on who can see whose per-process clocks.  This
information is already available for the VIRT and PROF (i.e. utime and
stime) information via /proc.  I am open to suggestions on if/how
security constraints on who can see whose clocks should be imposed.

The SCHED clock information is now available only via clock_* syscalls.
This means that per-thread information is not available outside the process.
Perhaps /proc should show sched_time as well?  This would let ps et al
show this more-accurate information.

When this code is merged, it will be supported in glibc.
I have written the support and added some test programs for glibc, 
which are what I mainly used to test the new kernel code.
You can get those here:
	http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch


Signed-off-by: Roland McGrath <roland@redhat.com>

--- linux-2.6/include/linux/posix-timers.h
+++ linux-2.6/include/linux/posix-timers.h
@@ -4,6 +4,23 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
+#define CPUCLOCK_PID(clock)	((pid_t) ~((clock) >> 3))
+#define CPUCLOCK_PERTHREAD(clock) \
+	(((clock) & (clockid_t) CPUCLOCK_PERTHREAD_MASK) != 0)
+#define CPUCLOCK_PID_MASK	7
+#define CPUCLOCK_PERTHREAD_MASK	4
+#define CPUCLOCK_WHICH(clock)	((clock) & (clockid_t) CPUCLOCK_CLOCK_MASK)
+#define CPUCLOCK_CLOCK_MASK	3
+#define CPUCLOCK_PROF		0
+#define CPUCLOCK_VIRT		1
+#define CPUCLOCK_SCHED		2
+#define CPUCLOCK_MAX		3
+
+#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
+	((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
+#define MAKE_THREAD_CPUCLOCK(tid, clock) \
+	MAKE_PROCESS_CPUCLOCK((tid), (clock) | CPUCLOCK_PERTHREAD_MASK)
+
 /* POSIX.1b interval timer structure. */
 struct k_itimer {
 	struct list_head list;		/* free/ allocate list */
@@ -34,8 +51,7 @@ struct k_clock {
 	int (*clock_set) (struct timespec * tp);
 	int (*clock_get) (struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
-	int (*nsleep) (int which_clock, int flags,
-		       struct timespec * t);
+	int (*nsleep) (clockid_t which_clock, int flags, struct timespec *);
 	int (*timer_set) (struct k_itimer * timr, int flags,
 			  struct itimerspec * new_setting,
 			  struct itimerspec * old_setting);
@@ -48,7 +64,7 @@ void register_posix_clock(int clock_id, 
 
 /* Error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_notimer_create(struct k_itimer *timer);
-int do_posix_clock_nonanosleep(int which_clock, int flags, struct timespec * t);
+int do_posix_clock_nonanosleep(clockid_t, int, struct timespec *);
 int do_posix_clock_nosettime(struct timespec *tp);
 
 /* function to call to trigger timer event */
@@ -72,5 +88,9 @@ struct now_struct {
                   (timr)->it_overrun += orun;				\
               }								\
             }while (0)
-#endif
 
+int posix_cpu_clock_getres(clockid_t, struct timespec __user *);
+int posix_cpu_clock_gettime(clockid_t, struct timespec __user *);
+int posix_cpu_clock_settime(clockid_t, const struct timespec __user *);
+
+#endif
--- linux-2.6/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -314,6 +314,14 @@ struct signal_struct {
 	unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
 
 	/*
+	 * Cumulative ns of scheduled CPU time for dead threads in the
+	 * group, not including a zombie group leader.  (This only differs
+	 * from jiffies_to_ns(utime + stime) if sched_clock uses something
+	 * other than jiffies.)
+	 */
+	unsigned long long sched_time;
+
+	/*
 	 * We don't bother to synchronize most readers of this at all,
 	 * because there is no reader checking a limit that actually needs
 	 * to get both rlim_cur and rlim_max atomically, and either one
@@ -526,6 +534,7 @@ struct task_struct {
 	unsigned long sleep_avg;
 	long interactive_credit;
 	unsigned long long timestamp, last_ran;
+	unsigned long long sched_time; /* sched_clock time spent running */
 	int activated;
 
 	unsigned long policy;
@@ -713,6 +722,7 @@ static inline int set_cpus_allowed(task_
 #endif
 
 extern unsigned long long sched_clock(void);
+extern unsigned long long current_sched_time(const task_t *current_task);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
--- linux-2.6/kernel/Makefile
+++ linux-2.6/kernel/Makefile
@@ -7,7 +7,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o wait.o kfifo.o sys_ni.o
+	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
--- linux-2.6/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -746,6 +746,7 @@ static inline int copy_signal(unsigned l
 	sig->utime = sig->stime = sig->cutime = sig->cstime = 0;
 	sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
 	sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
+	sig->sched_time = 0;
 
 	task_lock(current->group_leader);
 	memcpy(sig->rlim, current->signal->rlim, sizeof sig->rlim);
@@ -870,6 +871,7 @@ static task_t *copy_process(unsigned lon
 	p->real_timer.data = (unsigned long) p;
 
 	p->utime = p->stime = 0;
+	p->sched_time = 0;
 	p->lock_depth = -1;		/* -1 = no lock */
 	do_posix_clock_monotonic_gettime(&p->start_time);
 	p->security = NULL;
--- linux-2.6/kernel/posix-timers.c
+++ linux-2.6/kernel/posix-timers.c
@@ -138,6 +138,10 @@ static spinlock_t idr_lock = SPIN_LOCK_U
  *	    resolution.	 Here we define the standard CLOCK_REALTIME as a
  *	    1/HZ resolution clock.
  *
+ * CPUTIME: These clocks have clockid_t values < -1.  All the clock
+ *	    and timer system calls here just punt to posix_cpu_*
+ *	    functions defined in posix-cpu-timers.c, which see.
+ *
  * RESOLUTION: Clock resolution is used to round up timer and interval
  *	    times, NOT to report clock times, which are reported with as
  *	    much resolution as the system can muster.  In some cases this
@@ -193,8 +197,6 @@ static int do_posix_gettime(struct k_clo
 static u64 do_posix_clock_monotonic_gettime_parts(
 	struct timespec *tp, struct timespec *mo);
 int do_posix_clock_monotonic_gettime(struct timespec *tp);
-static int do_posix_clock_process_gettime(struct timespec *tp);
-static int do_posix_clock_thread_gettime(struct timespec *tp);
 static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags);
 
 static inline void unlock_timer(struct k_itimer *timr, unsigned long flags)
@@ -215,25 +217,9 @@ static __init int init_posix_timers(void
 		.clock_get = do_posix_clock_monotonic_gettime,
 		.clock_set = do_posix_clock_nosettime
 	};
-	struct k_clock clock_thread = {.res = CLOCK_REALTIME_RES,
-		.abs_struct = NULL,
-		.clock_get = do_posix_clock_thread_gettime,
-		.clock_set = do_posix_clock_nosettime,
-		.timer_create = do_posix_clock_notimer_create,
-		.nsleep = do_posix_clock_nonanosleep
-	};
-	struct k_clock clock_process = {.res = CLOCK_REALTIME_RES,
-		.abs_struct = NULL,
-		.clock_get = do_posix_clock_process_gettime,
-		.clock_set = do_posix_clock_nosettime,
-		.timer_create = do_posix_clock_notimer_create,
-		.nsleep = do_posix_clock_nonanosleep
-	};
 
 	register_posix_clock(CLOCK_REALTIME, &clock_realtime);
 	register_posix_clock(CLOCK_MONOTONIC, &clock_monotonic);
-	register_posix_clock(CLOCK_PROCESS_CPUTIME_ID, &clock_process);
-	register_posix_clock(CLOCK_THREAD_CPUTIME_ID, &clock_thread);
 
 	posix_timers_cache = kmem_cache_create("posix_timers_cache",
 					sizeof (struct k_itimer), 0, 0, NULL, NULL);
@@ -1224,65 +1210,10 @@ int do_posix_clock_notimer_create(struct
 	return -EINVAL;
 }
 
-int do_posix_clock_nonanosleep(int which_lock, int flags,struct timespec * t) {
-/* Single Unix specficiation says to return ENOTSUP but we do not have that */
-	return -EINVAL;
-}
-
-static unsigned long process_ticks(task_t *p) {
-	unsigned long ticks;
-	task_t *t;
-
-	spin_lock(&p->sighand->siglock);
-	/* The signal structure is shared between all threads */
-	ticks = p->signal->utime + p->signal->stime;
-
-	/* Add up the cpu time for all the still running threads of this process */
-	t = p;
-	do {
-		ticks += t->utime + t->stime;
-		t = next_thread(t);
-	} while (t != p);
-
-	spin_unlock(&p->sighand->siglock);
-	return ticks;
-}
-
-static inline unsigned long thread_ticks(task_t *p) {
-	return p->utime + current->stime;
-}
-
-/*
- * Single Unix Specification V3:
- *
- * Implementations shall also support the special clockid_t value
- * CLOCK_THREAD_CPUTIME_ID, which represents the CPU-time clock of the calling
- * thread when invoking one of the clock_*() or timer_*() functions. For these
- * clock IDs, the values returned by clock_gettime() and specified by
- * clock_settime() shall represent the amount of execution time of the thread
- * associated with the clock.
- */
-static int do_posix_clock_thread_gettime(struct timespec *tp)
-{
-	jiffies_to_timespec(thread_ticks(current), tp);
-	return 0;
-}
-
-/*
- * Single Unix Specification V3:
- *
- * Implementations shall also support the special clockid_t value
- * CLOCK_PROCESS_CPUTIME_ID, which represents the CPU-time clock of the
- * calling process when invoking one of the clock_*() or timer_*() functions.
- * For these clock IDs, the values returned by clock_gettime() and specified
- * by clock_settime() represent the amount of execution time of the process
- * associated with the clock.
- */
-
-static int do_posix_clock_process_gettime(struct timespec *tp)
+int do_posix_clock_nonanosleep(clockid_t which_clock, int flags,
+			       struct timespec *t)
 {
-	jiffies_to_timespec(process_ticks(current), tp);
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 asmlinkage long
@@ -1290,9 +1221,8 @@ sys_clock_settime(clockid_t which_clock,
 {
 	struct timespec new_tp;
 
-	/* Cannot set process specific clocks */
-	if (which_clock<0)
-		return -EINVAL;
+	if (which_clock < 0)
+		return posix_cpu_clock_settime(which_clock, tp);
 
 	if ((unsigned) which_clock >= MAX_CLOCKS ||
 					!posix_clocks[which_clock].res)
@@ -1305,45 +1235,20 @@ sys_clock_settime(clockid_t which_clock,
 	return do_sys_settimeofday(&new_tp, NULL);
 }
 
-static int do_clock_gettime(clockid_t which_clock, struct timespec *tp)
-{
-	/* Process process specific clocks */
-	if (which_clock < 0) {
-		task_t *t;
-		int pid = -which_clock;
-
-		if (pid < PID_MAX_LIMIT) {
-			if ((t = find_task_by_pid(pid))) {
-				jiffies_to_timespec(process_ticks(t), tp);
-				return 0;
-			}
-			return -EINVAL;
-		}
-		if (pid < 2*PID_MAX_LIMIT) {
-			if ((t = find_task_by_pid(pid - PID_MAX_LIMIT))) {
-				jiffies_to_timespec(thread_ticks(t), tp);
-				return 0;
-			}
-			return -EINVAL;
-		}
-		/* More process specific clocks could follow here */
-		return -EINVAL;
-	}
-
-	if ((unsigned) which_clock >= MAX_CLOCKS ||
-					!posix_clocks[which_clock].res)
-		return -EINVAL;
-
-	return do_posix_gettime(&posix_clocks[which_clock], tp);
-}
-
 asmlinkage long
 sys_clock_gettime(clockid_t which_clock, struct timespec __user *tp)
 {
 	struct timespec kernel_tp;
 	int error;
 
-	error = do_clock_gettime(which_clock, &kernel_tp);
+	if (which_clock < 0)
+		return posix_cpu_clock_gettime(which_clock, tp);
+
+	if ((unsigned) which_clock >= MAX_CLOCKS ||
+	    !posix_clocks[which_clock].res)
+		return -EINVAL;
+
+	error = do_posix_gettime(&posix_clocks[which_clock], &kernel_tp);
 	if (!error && copy_to_user(tp, &kernel_tp, sizeof (kernel_tp)))
 		error = -EFAULT;
 
@@ -1356,8 +1261,8 @@ sys_clock_getres(clockid_t which_clock, 
 {
 	struct timespec rtn_tp;
 
-	/* All process clocks have the resolution of CLOCK_PROCESS_CPUTIME_ID */
-	if (which_clock < 0 ) which_clock = CLOCK_PROCESS_CPUTIME_ID;
+	if (which_clock < 0)
+		return posix_cpu_clock_getres(which_clock, tp);
 
 	if ((unsigned) which_clock >= MAX_CLOCKS ||
 					!posix_clocks[which_clock].res)
@@ -1495,6 +1400,9 @@ sys_clock_nanosleep(clockid_t which_cloc
 	    &(current_thread_info()->restart_block);
 	int ret;
 
+	if (which_clock < 0)	/* CPU time clocks */
+		return -EOPNOTSUPP;
+
 	if ((unsigned) which_clock >= MAX_CLOCKS ||
 					!posix_clocks[which_clock].res)
 		return -EINVAL;
--- linux-2.6/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2259,6 +2259,32 @@ DEFINE_PER_CPU(struct kernel_stat, kstat
 EXPORT_PER_CPU_SYMBOL(kstat);
 
 /*
+ * This is called on clock ticks and on context switches.
+ * Bank in p->sched_time the ns elapsed since the last tick or switch.
+ */
+static void update_cpu_clock(task_t *p, runqueue_t *rq,
+			     unsigned long long now)
+{
+	unsigned long long last = max(p->timestamp, rq->timestamp_last_tick);
+	p->sched_time += now - last;
+}
+
+/*
+ * Return current->sched_time plus any more ns on the sched_clock
+ * that have not yet been banked.
+ */
+unsigned long long current_sched_time(const task_t *tsk)
+{
+	unsigned long long ns;
+	unsigned long flags;
+	local_irq_save(flags);
+	ns = max(tsk->timestamp, task_rq(tsk)->timestamp_last_tick);
+	ns = tsk->sched_time + (sched_clock() - ns);
+	local_irq_restore(flags);
+	return ns;
+}
+
+/*
  * We place interactive tasks back into the active array, if possible.
  *
  * To guarantee that this does not starve expired tasks we ignore the
@@ -2287,8 +2313,11 @@ void scheduler_tick(int user_ticks, int 
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
 	runqueue_t *rq = this_rq();
 	task_t *p = current;
+	unsigned long long now = sched_clock();
+
+	update_cpu_clock(p, rq, now);
 
-	rq->timestamp_last_tick = sched_clock();
+	rq->timestamp_last_tick = now;
 
 	if (rcu_pending(cpu))
 		rcu_check_callbacks(cpu, user_ticks);
@@ -2669,6 +2698,8 @@ switch_tasks:
 	clear_tsk_need_resched(prev);
 	rcu_qsctr_inc(task_cpu(prev));
 
+	update_cpu_clock(prev, rq, now);
+
 	prev->sleep_avg -= run_time;
 	if ((long)prev->sleep_avg <= 0) {
 		prev->sleep_avg = 0;
--- linux-2.6/kernel/signal.c
+++ linux-2.6/kernel/signal.c
@@ -386,6 +386,7 @@ void __exit_signal(struct task_struct *t
 		sig->maj_flt += tsk->maj_flt;
 		sig->nvcsw += tsk->nvcsw;
 		sig->nivcsw += tsk->nivcsw;
+		sig->sched_time += tsk->sched_time;
 		spin_unlock(&sighand->siglock);
 		sig = NULL;	/* Marker for below.  */
 	}
--- /dev/null
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -0,0 +1,236 @@
+/*
+ * Implement CPU time clocks for the POSIX clock interface.
+ */
+
+#include <linux/sched.h>
+#include <linux/posix-timers.h>
+#include <asm/uaccess.h>
+#include <linux/errno.h>
+
+union cpu_time_count {
+	unsigned long cpu;
+	unsigned long long sched;
+};
+
+static int check_clock(clockid_t which_clock)
+{
+	int error = 0;
+	struct task_struct *p;
+	const pid_t pid = CPUCLOCK_PID(which_clock);
+
+	if (CPUCLOCK_WHICH(which_clock) >= CPUCLOCK_MAX)
+		return -EINVAL;
+
+	if (pid == 0)
+		return 0;
+
+	read_lock(&tasklist_lock);
+	p = find_task_by_pid(pid);
+	if (!p || (CPUCLOCK_PERTHREAD(which_clock) ?
+		   p->tgid != current->tgid : p->tgid != pid)) {
+		error = -EINVAL;
+	}
+	read_unlock(&tasklist_lock);
+
+	return error;
+}
+
+static void sample_to_timespec(clockid_t which_clock,
+			       union cpu_time_count cpu,
+			       struct timespec *tp)
+{
+	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
+		tp->tv_sec = div_long_long_rem(cpu.sched,
+					       NSEC_PER_SEC, &tp->tv_nsec);
+	} else {
+		jiffies_to_timespec(cpu.cpu, tp);
+	}
+}
+
+static int sample_to_user(clockid_t which_clock,
+			  union cpu_time_count cpu,
+			  struct timespec __user *tp)
+{
+	struct timespec ts;
+	sample_to_timespec(which_clock, cpu, &ts);
+	return copy_to_user(tp, &ts, sizeof *tp) ? -EFAULT : 0;
+}
+
+static inline unsigned long prof_ticks(struct task_struct *p)
+{
+	return p->utime + p->stime;
+}
+static inline unsigned long virt_ticks(struct task_struct *p)
+{
+	return p->utime;
+}
+static inline unsigned long long sched_ns(struct task_struct *p)
+{
+	return (p == current) ? current_sched_time(p) : p->sched_time;
+}
+
+int posix_cpu_clock_getres(clockid_t which_clock, struct timespec __user *tp)
+{
+	int error = check_clock(which_clock);
+	if (!error && tp) {
+		struct timespec rtn_tp = { 0, ((NSEC_PER_SEC + HZ - 1) / HZ) };
+		if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
+			/*
+			 * If sched_clock is using a cycle counter, we
+			 * don't have any idea of its true resolution
+			 * exported, but it is much more than 1s/HZ.
+			 */
+			rtn_tp.tv_nsec = 1;
+		}
+		if (copy_to_user(tp, &rtn_tp, sizeof *tp))
+			error = -EFAULT;
+	}
+	return error;
+}
+
+int posix_cpu_clock_settime(clockid_t which_clock,
+			    const struct timespec __user *tp)
+{
+	/*
+	 * You can never reset a CPU clock, but we check for other errors
+	 * in the call before failing with EPERM.
+	 */
+	int error = check_clock(which_clock);
+	if (error == 0) {
+		struct timespec new_tp;
+		error = -EPERM;
+		if (copy_from_user(&new_tp, tp, sizeof *tp))
+			error = -EFAULT;
+	}
+	return error;
+}
+
+
+/*
+ * Sample a per-thread clock for the given task.
+ */
+static int cpu_clock_sample(clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	switch (CPUCLOCK_WHICH(which_clock)) {
+	default:
+		return -EINVAL;
+	case CPUCLOCK_PROF:
+		cpu->cpu = prof_ticks(p);
+		break;
+	case CPUCLOCK_VIRT:
+		cpu->cpu = virt_ticks(p);
+		break;
+	case CPUCLOCK_SCHED:
+		cpu->sched = sched_ns(p);
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Sample a process (thread group) clock for the given group_leader task.
+ * Must be called with tasklist_lock held for reading.
+ */
+static int cpu_clock_sample_group(clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	struct task_struct *t = p;
+	unsigned long flags;
+	switch (CPUCLOCK_WHICH(which_clock)) {
+	default:
+		return -EINVAL;
+	case CPUCLOCK_PROF:
+		spin_lock_irqsave(&p->sighand->siglock, flags);
+		cpu->cpu = p->signal->utime + p->signal->stime;
+		do {
+			cpu->cpu += prof_ticks(t);
+			t = next_thread(t);
+		} while (t != p);
+		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		break;
+	case CPUCLOCK_VIRT:
+		spin_lock_irqsave(&p->sighand->siglock, flags);
+		cpu->cpu = p->signal->utime;
+		do {
+			cpu->cpu += virt_ticks(t);
+			t = next_thread(t);
+		} while (t != p);
+		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		break;
+	case CPUCLOCK_SCHED:
+		spin_lock_irqsave(&p->sighand->siglock, flags);
+		cpu->sched = p->signal->sched_time;
+		/* Add in each other live thread.  */
+		while ((t = next_thread(t)) != p) {
+			cpu->sched += t->sched_time;
+		}
+		if (p->tgid == current->tgid) {
+			/*
+			 * We're sampling ourselves, so include the
+			 * cycles not yet banked.  We still omit
+			 * other threads running on other CPUs,
+			 * so the total can always be behind as
+			 * much as max(nthreads-1,ncpus) * (NSEC_PER_SEC/HZ).
+			 */
+			cpu->sched += current_sched_time(current);
+		} else {
+			cpu->sched += p->sched_time;
+		}
+		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		break;
+	}
+	return 0;
+}
+
+
+int posix_cpu_clock_gettime(clockid_t which_clock, struct timespec __user *tp)
+{
+	const pid_t pid = CPUCLOCK_PID(which_clock);
+	int error = -EINVAL;
+	union cpu_time_count rtn;
+
+	if (pid == 0) {
+		/*
+		 * Special case constant value for our own clocks.
+		 * We don't have to do any lookup to find ourselves.
+		 */
+		if (CPUCLOCK_PERTHREAD(which_clock)) {
+			/*
+			 * Sampling just ourselves we can do with no locking.
+			 */
+			error = cpu_clock_sample(which_clock,
+						 current, &rtn);
+		} else {
+			read_lock(&tasklist_lock);
+			error = cpu_clock_sample_group(which_clock,
+						       current, &rtn);
+			read_unlock(&tasklist_lock);
+		}
+	} else {
+		/*
+		 * Find the given PID, and validate that the caller
+		 * should be able to see it.
+		 */
+		struct task_struct *p;
+		read_lock(&tasklist_lock);
+		p = find_task_by_pid(pid);
+		if (p) {
+			if (CPUCLOCK_PERTHREAD(which_clock)) {
+				if (p->tgid == current->tgid) {
+					error = cpu_clock_sample(which_clock,
+								 p, &rtn);
+				}
+			} else if (p->tgid == pid && p->signal) {
+				error = cpu_clock_sample_group(which_clock,
+							       p, &rtn);
+			}
+		}
+		read_unlock(&tasklist_lock);
+	}
+
+	if (error)
+		return error;
+	return sample_to_user(which_clock, rtn, tp);
+}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14  3:55 [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls Roland McGrath
@ 2004-12-14 18:36 ` Christoph Lameter
  2004-12-14 21:38   ` Ulrich Drepper
  2004-12-14 22:14   ` Roland McGrath
  0 siblings, 2 replies; 15+ messages in thread
From: Christoph Lameter @ 2004-12-14 18:36 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, linux-kernel, Ulrich Drepper

On Mon, 13 Dec 2004, Roland McGrath wrote:

> This patch provides support for thread and process CPU time clocks in
> the POSIX clock interface.  Both the existing utime and utime+stime
> information (already available via getrusage et al) can be used, as well
> as a new (potentially) more precise and accurate clock (which cannot
> distinguish user from system time).  The clock used is that provided by
> the `sched_clock' function already used internally by the scheduler.
> This gives a way for platforms to provide the highest-resolution CPU
> time tracking that is available cheaply, and some already do so (such as
> x86 using TSC).  Because this clock is already sampled internally by the
> scheduler, this new tracking adds only the tiniest new overhead to
> accomplish the bookkeeping.

Sounds good.

> This replaces the support contributed by Christoph Lameter, with the same
> goals.  It improves on that support in these ways:
>
> 1. The ABI encoding of clockid_t for CPU time clocks is changed.
>    The current code encodes PID_MAX_LIMIT as part of the ABI; this is a
>    poor ABI, since that has heretofore been an internal implementation
>    limit never exposed directly to userland as an ABI constraint.
>    The new ABI is expressed in the macros defined in <linux/posix-timers.h>.
>    Note that the small-integer constants are not supported by the kernel ABI.
>    Userland never wants to pass these anyway, they will always be
>    translated by compatibility code in glibc.  The caller's own thread
>    or process clock is expressed by using the encoding with a PID of zero.

This yields some additional functionality and is an improvement to the
ABI. Why is CLOCK_*_CPUTIME_ID etc not directly supporting using
the kernel API? Otherwise the API will take only some of the posix
clocks defined in the kernel which may surprise authors of other c
libraries.

> This allows per-thread clocks to be accessed only by other threads in
> the same process.  The only POSIX calls that access these are defined
> only for in-process use, and having this check is necessary for the
> userland implementations of the POSIX clock functions to robustly refuse
> stale clockid_t's in the face of potential PID reuse.

Posix does not prescribe any access limitations for those clocks and as
far as I understand the standard, access to all process clocks needs to
be possible. Access to to process time information is already in
some sense available via the standard ps command and its not restrictyed
at all.

> When this code is merged, it will be supported in glibc.
> I have written the support and added some test programs for glibc,
> which are what I mainly used to test the new kernel code.
> You can get those here:
> 	http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch

It seems that this code does now support extra clocks like
CLOCK_REALTIME_HR, CLOCK_MONOTONIC_HR and CLOCK_SGI_CYCLE right?

Also glibc continues to provide a implementation for CLOCK_*_CPUTIME_ID
for other platforms and for older kernel releases that returns real time
instead of cpu time. You are continuing to make people write code that is
not portable. No willingness to make a clean break and confess your sins
against the posix standard?

The patches look fine to me. Still have to find time to test them
though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 18:36 ` Christoph Lameter
@ 2004-12-14 21:38   ` Ulrich Drepper
  2004-12-14 21:50     ` Roland McGrath
  2004-12-14 22:14   ` Roland McGrath
  1 sibling, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2004-12-14 21:38 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Roland McGrath, akpm, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

Christoph Lameter wrote:

> Posix does not prescribe any access limitations for those clocks and as
> far as I understand the standard, access to all process clocks needs to
> be possible.

And how exactly do you plan to address clocks of various threads in 
another process?  Threads are only identified by the pthread_t 
descriptor.  These values have no meaning outside the process the 
threads are in.  The TIDs we use in the implementation cannot be used. 
They are an implementation detail and a thread might very well have 
different TIDs over time in future versions of the thread library.

The pthread_getcpuclockid() and similar uses return clock IDs which are 
only meaningful in the calling process.  Using the value in another 
process has undefined results.  I.e., what Roland says is correct, the 
limitation is needed.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 21:38   ` Ulrich Drepper
@ 2004-12-14 21:50     ` Roland McGrath
  2004-12-14 21:58       ` Christoph Lameter
  2004-12-14 22:18       ` Linus Torvalds
  0 siblings, 2 replies; 15+ messages in thread
From: Roland McGrath @ 2004-12-14 21:50 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Christoph Lameter, akpm, torvalds, linux-kernel

I believe Christoph may have been referring exclusively to the per-process
clocks, not the per-thread clocks.  I'm not entirely sure since he used the
term "process" exclusively, but quoted my paragraph about the per-thread
clock access.  Ulrich's reply is apropos to the individual thread clocks.

It was about the per-process clocks that I raised the question.  For those,
POSIX says it's implementation-defined what process can see the CPU clock
of another process.  That means we can make it as restricted or as free as
we like, but we are obliged to document up front for the users what the
semantics are.  That's why I would like to make sure we have thought a
little about the choices now, rather than someone coming along later and
deciding we really ought to impose security on it (which might be be
changing the story after developers reasonably coded to the
implementation-defined behavior we documented in the first place).


Thanks,
Roland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 21:50     ` Roland McGrath
@ 2004-12-14 21:58       ` Christoph Lameter
  2004-12-14 22:18       ` Linus Torvalds
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2004-12-14 21:58 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Ulrich Drepper, akpm, torvalds, linux-kernel

On Tue, 14 Dec 2004, Roland McGrath wrote:

> I believe Christoph may have been referring exclusively to the per-process
> clocks, not the per-thread clocks.  I'm not entirely sure since he used the
> term "process" exclusively, but quoted my paragraph about the per-thread
> clock access.  Ulrich's reply is apropos to the individual thread clocks.
>
> It was about the per-process clocks that I raised the question.  For those,
> POSIX says it's implementation-defined what process can see the CPU clock
> of another process.  That means we can make it as restricted or as free as
> we like, but we are obliged to document up front for the users what the
> semantics are.  That's why I would like to make sure we have thought a
> little about the choices now, rather than someone coming along later and
> deciding we really ought to impose security on it (which might be be
> changing the story after developers reasonably coded to the
> implementation-defined behavior we documented in the first place).

The threads are processes under Linux and thus may also be identified by a
pid. You are right, this is a Linux centric view and this identificaition
is not applicable to other systems.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 18:36 ` Christoph Lameter
  2004-12-14 21:38   ` Ulrich Drepper
@ 2004-12-14 22:14   ` Roland McGrath
  2004-12-14 22:23     ` Christoph Lameter
  2004-12-15 18:43     ` Christoph Lameter
  1 sibling, 2 replies; 15+ messages in thread
From: Roland McGrath @ 2004-12-14 22:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, torvalds, linux-kernel, Ulrich Drepper

> Sounds good.

I'm glad you like it.  Your prior interest and work in this area stimulated
general interest and active discussion, and I think that has contributed to
the likelihood of getting useful functionality available to users ASAP.

> This yields some additional functionality and is an improvement to the
> ABI. Why is CLOCK_*_CPUTIME_ID etc not directly supporting using
> the kernel API? Otherwise the API will take only some of the posix
> clocks defined in the kernel which may surprise authors of other c
> libraries.

It's mainly just to simplify the code.  The CPU clocks did not quite fit
into the k_clock function-table model, though I think that was really the
case more before some k_clock interface cleanups happened.  Now I think it
might be possible to write k_clock hook functions that call into the
posix_cpu_* functions for those small-integer constant clockid_t cases,
though off hand what I would be more confident of would be to handle them
in the existing special case calls.  i.e., if they are supported at all it
might best be by having every call translate CLOCK_*_CPUTIME_ID to
MAKE_*_CPUCLOCK(SCHED, 0) before the < 0 check.  I really don't see the
need to support those values in the kernel interface, which never did
before (though having them in linux/time.h).  But I would not object to it.
I just strongly doubt anyone would ever use it.

> It seems that this code does now support extra clocks like
> CLOCK_REALTIME_HR, CLOCK_MONOTONIC_HR and CLOCK_SGI_CYCLE right?

Here I think you are talking about glibc, not kernel code.  If you mean
this about the kernel code, then please clarify (and I'm not sure what the
question means in that context).  If you are asking about the glibc code,
please take that to the libc mailing list and we won't bother the kernel
folks with that discussion.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 21:50     ` Roland McGrath
  2004-12-14 21:58       ` Christoph Lameter
@ 2004-12-14 22:18       ` Linus Torvalds
  2004-12-14 22:26         ` Ulrich Drepper
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2004-12-14 22:18 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Ulrich Drepper, Christoph Lameter, akpm, linux-kernel



On Tue, 14 Dec 2004, Roland McGrath wrote:
>
> I believe Christoph may have been referring exclusively to the per-process
> clocks, not the per-thread clocks.

Please do not confuse things.

There still are no such things as "threads" vs "processes" as far as the 
kernel is concerned.

They are all the same thing, and they are all threads or processes or
whatever you want to call them. I've tried to call them "contexts of
execution" just to clarify the fact that they are _not_ threads of
processes. And they all have a unified ID space.

They just happen to share different things. We should try to avoid at all
cost to take on stupidities from legacy systems. We've got a unified
process/thread/whatever space, and that's a good thing.

Yes, when you share the signal state (and you have to share the VM and
signal handlers to do so), you end up looking like a pthreads "process".
But dammit, people should NOT think that that is all that special from a
kernel standpoint.

And no kernel interface should really care about some pthreads rules. The 
interfaces should work with old linux-threads, and with pure "clone()" 
things too.

Of course, in this case, it's doubtful whether we want to expose non-local
clocks to anybody else, making the whole point pretty moot. I'd vote for
not exposing them any more than necessary (ie the current incidental "ps"  
interface is quite enough), at least until somebody can come up with a
very powerful example of why exposing them is a good idea.

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:14   ` Roland McGrath
@ 2004-12-14 22:23     ` Christoph Lameter
  2004-12-15 18:43     ` Christoph Lameter
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2004-12-14 22:23 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, linux-kernel, Ulrich Drepper, george

On Tue, 14 Dec 2004, Roland McGrath wrote:

> > It seems that this code does now support extra clocks like
> > CLOCK_REALTIME_HR, CLOCK_MONOTONIC_HR and CLOCK_SGI_CYCLE right?
>
> Here I think you are talking about glibc, not kernel code.  If you mean
> this about the kernel code, then please clarify (and I'm not sure what the
> question means in that context).  If you are asking about the glibc code,
> please take that to the libc mailing list and we won't bother the kernel
> folks with that discussion.

The clock_id's are now passed through since you also have to pass your
bitmapped performance clocks through. I think this may also satisfy
George Anzinger's long standing request for the possibility of additional
clock support in glibc. The clocks are now defined only in the
kernel header files though.

Could you make the three clocks also available in glibc header files?
What do we do if additional clocks become available through the posix
layer in Linux?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:18       ` Linus Torvalds
@ 2004-12-14 22:26         ` Ulrich Drepper
  2004-12-14 22:44           ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2004-12-14 22:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roland McGrath, Christoph Lameter, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

Linus Torvalds wrote:
> I'd vote for
> not exposing them any more than necessary (ie the current incidental "ps"  
> interface is quite enough), at least until somebody can come up with a
> very powerful example of why exposing them is a good idea.

Indeed.  It's so much easier to grant additional rights at a later time 
than to take something away for whatever reasons.

Globally accessible clocks would need to have the semantic carefully 
defined, SELinux hooks would have to be added etc.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:26         ` Ulrich Drepper
@ 2004-12-14 22:44           ` Linus Torvalds
  2004-12-14 23:08             ` Roland McGrath
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2004-12-14 22:44 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Roland McGrath, Christoph Lameter, akpm, linux-kernel



On Tue, 14 Dec 2004, Ulrich Drepper wrote:
> 
> Indeed.  It's so much easier to grant additional rights at a later time 
> than to take something away for whatever reasons.

Yes.

> Globally accessible clocks would need to have the semantic carefully 
> defined, SELinux hooks would have to be added etc.

More interestingly (where "interesting" is defined as "could be really 
nasty") it's likely to interact very badly in cases where we have some 
_physically_ local clocks. Ie we might have some situation where we do 
some node-local thing for intra-node scheduling, with some other clock for 
inter-node scheduling. Exposing such a clock to a process that isn't 
actually using it could result in the node-local clock source suddenly 
needing to be exposed outside the node.

Think single-image clusters etc.

So in general, it's better to try to keep things as local as possible, 
even if it's not a visibility issue today. Some day you might be happy you 
did..

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:44           ` Linus Torvalds
@ 2004-12-14 23:08             ` Roland McGrath
  0 siblings, 0 replies; 15+ messages in thread
From: Roland McGrath @ 2004-12-14 23:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ulrich Drepper, Christoph Lameter, akpm, linux-kernel

> More interestingly (where "interesting" is defined as "could be really 
> nasty") it's likely to interact very badly in cases where we have some 
> _physically_ local clocks. Ie we might have some situation where we do 
> some node-local thing for intra-node scheduling, with some other clock for 
> inter-node scheduling. Exposing such a clock to a process that isn't 
> actually using it could result in the node-local clock source suddenly 
> needing to be exposed outside the node.

Not with the support I've written.  When you get the thread CPU time of any
thread but yourself, you just take ->sched_time as already stored and don't
do a sched_clock sample right then.  So you only see the amount of time
that has accrued as of that thread's last context switch or timer interrupt
(scheduler_tick), and not whatever amount is known only on the local CPU
where it is running right now.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:14   ` Roland McGrath
  2004-12-14 22:23     ` Christoph Lameter
@ 2004-12-15 18:43     ` Christoph Lameter
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2004-12-15 18:43 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, linux-kernel, Ulrich Drepper

On Tue, 14 Dec 2004, Roland McGrath wrote:

> > This yields some additional functionality and is an improvement to the
> > ABI. Why is CLOCK_*_CPUTIME_ID etc not directly supporting using
> > the kernel API? Otherwise the API will take only some of the posix
> > clocks defined in the kernel which may surprise authors of other c
> > libraries.
>
> It's mainly just to simplify the code.  The CPU clocks did not quite fit
> into the k_clock function-table model, though I think that was really the
> case more before some k_clock interface cleanups happened.  Now I think it
> might be possible to write k_clock hook functions that call into the
> posix_cpu_* functions for those small-integer constant clockid_t cases,
> though off hand what I would be more confident of would be to handle them
> in the existing special case calls.  i.e., if they are supported at all it
> might best be by having every call translate CLOCK_*_CPUTIME_ID to
> MAKE_*_CPUCLOCK(SCHED, 0) before the < 0 check.  I really don't see the
> need to support those values in the kernel interface, which never did
> before (though having them in linux/time.h).  But I would not object to it.
> I just strongly doubt anyone would ever use it.

I just reviewed the glibc patches and the kernel side and I would think
that your glibc code would be simplified by passing all CLOCK_* values
straight through and then calling your cputimer functions as needed in
the kernel as before. As also said before this would also make the
interface more intuitive and keep the existing semantics for clock_id's >
0. It would separate the most common use from the special functionality
intended for access to clocks of other processes/threads/(whatever you
want to call them).

The glibc side already is a mess and I would think that doing so
also would keep things cleaner on that side.

Also

Why are the hooks in sys_gettime and not in do_gettime? Is there any
reason that the existing user space transfer code in do_gettime not be
used? I would like to have all user space interfacing centralized in
posix-timers.c. clock specific code should not deal with user space
pointers. Do not merge do_clock_gettime and sys_clock_gettime.

posix-cpu-timers.c needs to have all user space interaction removed. If
you would like to make improvements to the code transferring to / from
user space them make them for the general case in posix-timers.c.

Ideally the integration of the cpu timers into posix-timers.c would
follow the general way that clock specific code is called as closely as
possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 22:44   ` Ulrich Drepper
@ 2004-12-15  4:51     ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2004-12-15  4:51 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Andi Kleen, Roland McGrath, linux-kernel

On Tue, Dec 14, 2004 at 02:44:17PM -0800, Ulrich Drepper wrote:
> On 14 Dec 2004 20:07:39 +0100, Andi Kleen <ak@suse.de> wrote:
> 
> > I don't think this should be merged until a clear need from a useful
> > application is demonstrated for it.
> 
> This is something which is requested countless times.  Everybody doing
> development of sophisticated programs adds some kind of self
> monitoring.  And there is of course profiling.  The most widely used

You mean statistical profiling? I don't see why you need a bigger
resolution than HZ for that. 

> program which needs this is probably the JVM.  Don't ask me for the
> specific class, but the JVM developers asked for these clocks in this
> form.  Without the support available the Linux JVM is never going to

They didn't ask linux-kernel at least. Perhaps their requirements
should be discussed here first before messing up all kinds of
fast paths with dubious changes? 

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
  2004-12-14 19:07 ` Andi Kleen
@ 2004-12-14 22:44   ` Ulrich Drepper
  2004-12-15  4:51     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2004-12-14 22:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Roland McGrath, linux-kernel

On 14 Dec 2004 20:07:39 +0100, Andi Kleen <ak@suse.de> wrote:

> I don't think this should be merged until a clear need from a useful
> application is demonstrated for it.

This is something which is requested countless times.  Everybody doing
development of sophisticated programs adds some kind of self
monitoring.  And there is of course profiling.  The most widely used
program which needs this is probably the JVM.  Don't ask me for the
specific class, but the JVM developers asked for these clocks in this
form.  Without the support available the Linux JVM is never going to
reach par.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls
       [not found] <200412140355.iBE3t7KL008040@magilla.sf.frob.com.suse.lists.linux.kernel>
@ 2004-12-14 19:07 ` Andi Kleen
  2004-12-14 22:44   ` Ulrich Drepper
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2004-12-14 19:07 UTC (permalink / raw)
  To: Roland McGrath; +Cc: linux-kernel

Roland McGrath <roland@redhat.com> writes:
>  
>  /*
> + * This is called on clock ticks and on context switches.
> + * Bank in p->sched_time the ns elapsed since the last tick or switch.
> + */
> +static void update_cpu_clock(task_t *p, runqueue_t *rq,
> +			     unsigned long long now)
> +{
> +	unsigned long long last = max(p->timestamp, rq->timestamp_last_tick);
> +	p->sched_time += now - last;
> +}

This will completely mess up the register allocation in schedule()
long long on i386 forces basically everything else out onto the stack
because it needs 4 aligned registers.

I suspect when you benchmark it it will become visibly slower.

In general it seems like a bad idea to polute the extremly critical
fast paths in schedule with support for such an obscure operation.
Is there really any real need for such a high resolution per process
timer anyways? I have my doubts about it, I would suspect most apps
are more interested in wall clock time.

I don't think this should be merged until a clear need from a useful
application is demonstrated for it.

-Andi


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-12-15 18:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-14  3:55 [PATCH 1/7] cpu-timers: high-resolution CPU clocks for POSIX clock_* syscalls Roland McGrath
2004-12-14 18:36 ` Christoph Lameter
2004-12-14 21:38   ` Ulrich Drepper
2004-12-14 21:50     ` Roland McGrath
2004-12-14 21:58       ` Christoph Lameter
2004-12-14 22:18       ` Linus Torvalds
2004-12-14 22:26         ` Ulrich Drepper
2004-12-14 22:44           ` Linus Torvalds
2004-12-14 23:08             ` Roland McGrath
2004-12-14 22:14   ` Roland McGrath
2004-12-14 22:23     ` Christoph Lameter
2004-12-15 18:43     ` Christoph Lameter
     [not found] <200412140355.iBE3t7KL008040@magilla.sf.frob.com.suse.lists.linux.kernel>
2004-12-14 19:07 ` Andi Kleen
2004-12-14 22:44   ` Ulrich Drepper
2004-12-15  4:51     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).