All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][Patch 0/5] Per-task delay accounting
@ 2005-12-07 22:08 Shailabh Nagar
  2005-12-07 22:13 ` [RFC][Patch 1/5] nanosecond timestamps and diffs Shailabh Nagar
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: elsa-devel, lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan,
	Jens Axboe

The following patches add accounting for the delays seen by a task in
a) waiting for a CPU (while being runnable)
b) completion of synchronous block I/O initiated by the task
c) swapping in pages (i.e. capacity misses).

Such delays provide feedback for a task's cpu priority, io priority and
rss limit values. Long delays, especially relative to other tasks, can be
a trigger for changing a task's cpu/io priorities and modifying its rss usage
(either directly through sys_getprlimit() that was proposed earlier on lkml or
by throttling cpu consumption or process calling sys_setrlimit etc.)

There are quite a few differences from the earlier posting of these patches
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0511.1/2275.html):

- block I/O is (hopefully) being accounted properly now  instead of just counting the
time spent in io_schedule() as done earlier.

- instead of accounting for time spent in all page faults, only swapping in of pages
is being counted since thats the only part that one can really control (capacity misses
vs. compulsory misses)

- a /proc interface is being used instead of connector-based interface. Andrew Morton
suggested a generic connector-based interface useful for future usage of
connectors fo stats. This revised connector-based interface will be posted separately
since its useful for efficient delivery of any per-task statistics, not just the ones
being introduced by these patches.

- the timestamping code has been made generic (following the suggestions to Matt Helsley's
patches to add timestamps to process events connectors)


More comments in individual patches.

Series

nstimestamp-diff.patch
delayacct-init.patch
delayacct-blkio.patch
delayacct-swapin.patch
delayacct-procfs.patch


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
@ 2005-12-07 22:13 ` Shailabh Nagar
  2005-12-12 18:50   ` [Lse-tech] " Christoph Lameter
  2005-12-07 22:15 ` [RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off Shailabh Nagar
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:13 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

Add kernel utility functions for
- nanosecond resolution timestamps, adjusted for lost ticks
- interval (diff) between two such timestamps, in nanoseconds, adjusting
  for overflow

The timestamp part of this patch is identical to the one proposed by
Matt Helsley (as part of adding timestamps to process event connectors)
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.0/1373.html

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>

 include/linux/time.h |   16 ++++++++++++++++
 kernel/time.c        |   22 ++++++++++++++++++++++
 2 files changed, 38 insertions(+)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -95,6 +95,7 @@ struct itimerval;
 extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue);
 extern int do_getitimer(int which, struct itimerval *value);
 extern void getnstimeofday (struct timespec *tv);
+extern void getnstimestamp(struct timespec *ts);

 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);

@@ -113,6 +114,21 @@ set_normalized_timespec (struct timespec
 	ts->tv_nsec = nsec;
 }

+/*
+ * timespec_nsdiff - Return difference of two timestamps in nanoseconds
+ * In the rare case of @end being earlier than @start, return zero
+ */
+static inline unsigned long long
+timespec_nsdiff(struct timespec *start, struct timespec *end)
+{
+	long long ret;
+
+	ret = end->tv_sec*(1000000000) + end->tv_nsec;
+	ret -= start->tv_sec*(1000000000) + start->tv_nsec;
+	if (ret < 0)
+		return 0;
+	return ret;
+}
 #endif /* __KERNEL__ */

 #define NFDBITS			__NFDBITS
Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -561,6 +561,28 @@ void getnstimeofday(struct timespec *tv)
 EXPORT_SYMBOL_GPL(getnstimeofday);
 #endif

+void getnstimestamp(struct timespec *ts)
+{
+	unsigned int seq;
+	struct timespec wall2mono;
+
+	/* synchronize with settimeofday() changes */
+	do {
+		seq = read_seqbegin(&xtime_lock);
+		getnstimeofday(ts);
+		wall2mono = wall_to_monotonic;
+	} while(unlikely(read_seqretry(&xtime_lock, seq)));
+
+	/* adjust to monotonicaly-increasing values */
+	ts->tv_sec += wall2mono.tv_sec;
+	ts->tv_nsec += wall2mono.tv_nsec;
+	while (unlikely(ts->tv_nsec >= NSEC_PER_SEC)) {
+		ts->tv_nsec -= NSEC_PER_SEC;
+		ts->tv_sec++;
+	}
+}
+EXPORT_SYMBOL_GPL(getnstimestamp);
+
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off
  2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
  2005-12-07 22:13 ` [RFC][Patch 1/5] nanosecond timestamps and diffs Shailabh Nagar
@ 2005-12-07 22:15 ` Shailabh Nagar
  2005-12-07 22:23 ` [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays Shailabh Nagar
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:15 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- kernel param, sysctl option to control delay stats collection (parag)
- better CONFIG parameter name (parag)

11/14/05: First post

delayacct-init.patch

Initialization code related to collection of per-task "delay"
statistics which measure how long it had to wait for cpu,
sync block io, swapping etc.. The collection of statistics and
the interface are in other patches. This patch sets up the data
structures and enables the statistics collection to be dynamically
enabled (through a  kernel boot paramater and through
/proc/sys/kernel/delayacct).


Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>

 Documentation/kernel-parameters.txt |    2 ++
 include/linux/delayacct.h           |   26 ++++++++++++++++++++++++++
 include/linux/sched.h               |   11 +++++++++++
 include/linux/sysctl.h              |    1 +
 init/Kconfig                        |   13 +++++++++++++
 kernel/Makefile                     |    1 +
 kernel/delayacct.c                  |   36 ++++++++++++++++++++++++++++++++++++
 kernel/fork.c                       |    2 ++
 kernel/sysctl.c                     |   14 ++++++++++++++
 9 files changed, 106 insertions(+)

Index: linux-2.6.15-rc5/init/Kconfig
===================================================================
--- linux-2.6.15-rc5.orig/init/Kconfig
+++ linux-2.6.15-rc5/init/Kconfig
@@ -162,6 +162,19 @@ config BSD_PROCESS_ACCT_V3
 	  for processing it. A preliminary version of these tools is available
 	  at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>.

+config TASK_DELAY_ACCT
+	bool "Enable per-task delay accounting (EXPERIMENTAL)"
+	help
+	  Collect information on time spent by a task waiting for system
+	  resources like cpu, synchronous block I/O completion and swapping
+	  in pages. Such statistics can help in setting a task's priorities
+	  relative to other tasks for cpu, io, rss limits etc.
+
+	  Unlike BSD process accounting, this information is available
+	  continuously during the lifetime of a task.
+
+	  Say N if unsure.
+
 config SYSCTL
 	bool "Sysctl support"
 	---help---
Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -541,6 +541,14 @@ struct sched_info {
 extern struct file_operations proc_schedstat_operations;
 #endif

+#ifdef CONFIG_TASK_DELAY_ACCT
+struct task_delay_info {
+	spinlock_t	lock;
+
+	/* Add stats in pairs: uint64_t delay, uint32_t count */
+};
+#endif
+
 enum idle_type
 {
 	SCHED_IDLE,
@@ -857,6 +865,9 @@ struct task_struct {
 	int cpuset_mems_generation;
 #endif
 	atomic_t fs_excl;	/* holding fs exclusive resources */
+#ifdef	CONFIG_TASK_DELAY_ACCT
+	struct task_delay_info delays;
+#endif
 };

 static inline pid_t process_group(struct task_struct *tsk)
Index: linux-2.6.15-rc5/kernel/fork.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/fork.c
+++ linux-2.6.15-rc5/kernel/fork.c
@@ -43,6 +43,7 @@
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/cn_proc.h>
+#include <linux/delayacct.h>

 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -923,6 +924,7 @@ static task_t *copy_process(unsigned lon
 	if (p->binfmt && !try_module_get(p->binfmt->module))
 		goto bad_fork_cleanup_put_domain;

+	delayacct_tsk_init(p);
 	p->did_exec = 0;
 	copy_flags(clone_flags, p);
 	p->pid = pid;
Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -0,0 +1,26 @@
+/* delayacct.h - per-task delay accounting
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#ifndef _LINUX_TASKDELAYS_H
+#define _LINUX_TASKDELAYS_H
+
+#include <linux/sched.h>
+
+#ifdef CONFIG_TASK_DELAY_ACCT
+extern int delayacct_on;	/* Delay accounting turned on/off */
+extern void delayacct_tsk_init(struct task_struct *tsk);
+#else
+static inline void delayacct_tsk_init(struct task_struct *tsk)
+{}
+#endif /* CONFIG_TASK_DELAY_ACCT */
+#endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/kernel/sysctl.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/sysctl.c
+++ linux-2.6.15-rc5/kernel/sysctl.c
@@ -124,6 +124,10 @@ extern int sysctl_hz_timer;
 extern int acct_parm[];
 #endif

+#ifdef CONFIG_TASK_DELAY_ACCT
+extern int delayacct_on;
+#endif
+
 int randomize_va_space = 1;

 static int parse_table(int __user *, int, void __user *, size_t __user *, void __user *, size_t,
@@ -656,6 +660,16 @@ static ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+#if defined(CONFIG_TASK_DELAY_ACCT)
+	{
+		.ctl_name	= KERN_TASK_DELAY_ACCT,
+		.procname	= "delayacct",
+		.data		= &delayacct_on,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+#endif
 	{ .ctl_name = 0 }
 };

Index: linux-2.6.15-rc5/include/linux/sysctl.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sysctl.h
+++ linux-2.6.15-rc5/include/linux/sysctl.h
@@ -146,6 +146,7 @@ enum
 	KERN_RANDOMIZE=68, /* int: randomize virtual address space */
 	KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
+	KERN_TASK_DELAY_ACCT=71,	/* turn task delay accounting on/off */
 };


Index: linux-2.6.15-rc5/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.15-rc5.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.15-rc5/Documentation/kernel-parameters.txt
@@ -410,6 +410,8 @@ running once the system is up.
 			Format: <area>[,<node>]
 			See also Documentation/networking/decnet.txt.

+	delayacct	[KNL] Enable per-task delay accounting
+
 	devfs=		[DEVFS]
 			See Documentation/filesystems/devfs/boot-options.

Index: linux-2.6.15-rc5/kernel/Makefile
===================================================================
--- linux-2.6.15-rc5.orig/kernel/Makefile
+++ linux-2.6.15-rc5/kernel/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o

 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -0,0 +1,36 @@
+/* delayacct.c - per-task delay accounting
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#include <linux/sched.h>
+
+int delayacct_on;	/* Delay accounting turned on/off */
+
+int __init delayacct_setup_enable(char *str)
+{
+	delayacct_on = 1;
+	return 1;
+}
+__setup("delayacct", delayacct_setup_enable);
+
+inline void delayacct_tsk_init(struct task_struct *tsk)
+{
+	memset(&tsk->delays, 0, sizeof(tsk->delays));
+	spin_lock_init(&tsk->delays.lock);
+}
+
+static int __init delayacct_init(void)
+{
+	delayacct_tsk_init(&init_task);
+	return 0;
+}
+core_initcall(delayacct_init);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays
  2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
  2005-12-07 22:13 ` [RFC][Patch 1/5] nanosecond timestamps and diffs Shailabh Nagar
  2005-12-07 22:15 ` [RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off Shailabh Nagar
@ 2005-12-07 22:23 ` Shailabh Nagar
  2005-12-07 22:33   ` [ckrm-tech] " Dave Hansen
  2005-12-07 22:28 ` [RFC][Patch 4/5] Per-task delay accounting: Swap in delays Shailabh Nagar
  2005-12-07 22:29 ` [RFC][Patch 5/5] Per-task delay accounting: procfs interface Shailabh Nagar
  4 siblings, 1 reply; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:23 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe, Suparna Bhattacharya

This patch attempts to record all the time spent by a task
waiting for completion of (user-initiated) block I/O. Ideally, it
would have been nice to be able to record the time spent by a task
waiting for I/O events that are related to async block I/O. While
that can be done now (by measuring time spent in wait_for_async_kiocb)
once (if ?) network aio is implemented, AFAIK, it won't be possible
to distinguish async block and network aio events (and I suspect async
I/O to pipes too...) so async block I/O gets ignored for now.

Suggestions on how async block I/O wait can be accounted accurately would
be welcome.




Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- stats collected for delays in all userspace-initiated block I/O
including fsync/fdatasync but not counting waits for async block io events.

11/14/05: First post


delayacct-blkio.patch

Record time spent by a task waiting for completion of
userspace initiated synchronous block I/O. This can help
determine the right I/O priority for the task.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>

 fs/buffer.c               |    6 ++++++
 fs/read_write.c           |   10 +++++++++-
 include/linux/delayacct.h |    4 ++++
 include/linux/sched.h     |    2 ++
 kernel/delayacct.c        |   31 +++++++++++++++++++++++++++++++
 mm/filemap.c              |   10 +++++++++-
 mm/memory.c               |   17 +++++++++++++++--
 7 files changed, 76 insertions(+), 4 deletions(-)

Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -546,6 +546,8 @@ struct task_delay_info {
 	spinlock_t	lock;

 	/* Add stats in pairs: uint64_t delay, uint32_t count */
+	uint64_t blkio_delay;	/* wait for sync block io completion */
+	uint32_t blkio_count;
 };
 #endif

Index: linux-2.6.15-rc5/fs/read_write.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/read_write.c
+++ linux-2.6.15-rc5/fs/read_write.c
@@ -14,6 +14,8 @@
 #include <linux/security.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -224,8 +226,14 @@ ssize_t do_sync_read(struct file *filp,
 		(ret = filp->f_op->aio_read(&kiocb, buf, len, kiocb.ki_pos)))
 		wait_on_retry_sync_kiocb(&kiocb);

-	if (-EIOCBQUEUED == ret)
+	if (-EIOCBQUEUED == ret) {
+		__attribute__((unused)) struct timespec start, end;
+
+		getnstimestamp(&start);
 		ret = wait_on_sync_kiocb(&kiocb);
+		getnstimestamp(&end);
+		delayacct_blkio(&start, &end);
+	}
 	*ppos = kiocb.ki_pos;
 	return ret;
 }
Index: linux-2.6.15-rc5/mm/filemap.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/filemap.c
+++ linux-2.6.15-rc5/mm/filemap.c
@@ -28,6 +28,8 @@
 #include <linux/blkdev.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>
 #include "filemap.h"
 /*
  * FIXME: remove all knowledge of the buffer layer from the core VM
@@ -1062,8 +1064,14 @@ generic_file_read(struct file *filp, cha

 	init_sync_kiocb(&kiocb, filp);
 	ret = __generic_file_aio_read(&kiocb, &local_iov, 1, ppos);
-	if (-EIOCBQUEUED == ret)
+	if (-EIOCBQUEUED == ret) {
+		__attribute__((unused)) struct timespec start, end;
+
+		getnstimestamp(&start);
 		ret = wait_on_sync_kiocb(&kiocb);
+		getnstimestamp(&end);
+		delayacct_blkio(&start, &end);
+	}
 	return ret;
 }

Index: linux-2.6.15-rc5/mm/memory.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/memory.c
+++ linux-2.6.15-rc5/mm/memory.c
@@ -48,6 +48,8 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -2200,11 +2202,22 @@ static inline int handle_pte_fault(struc
 	old_entry = entry = *pte;
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
+			int ret;
+			__attribute__((unused)) struct timespec start, end;
+
 			if (!vma->vm_ops || !vma->vm_ops->nopage)
 				return do_anonymous_page(mm, vma, address,
 					pte, pmd, write_access);
-			return do_no_page(mm, vma, address,
-					pte, pmd, write_access);
+
+			if (vma->vm_file)
+				getnstimestamp(&start);
+			ret = do_no_page(mm, vma, address,
+					 pte, pmd, write_access);
+			if (vma->vm_file) {
+				getnstimestamp(&end);
+				delayacct_blkio(&start, &end);
+			}
+			return ret;
 		}
 		if (pte_file(entry))
 			return do_file_page(mm, vma, address,
Index: linux-2.6.15-rc5/fs/buffer.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/buffer.c
+++ linux-2.6.15-rc5/fs/buffer.c
@@ -41,6 +41,8 @@
 #include <linux/bitops.h>
 #include <linux/mpage.h>
 #include <linux/bit_spinlock.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void invalidate_bh_lrus(void);
@@ -337,6 +339,7 @@ static long do_fsync(unsigned int fd, in
 	struct file * file;
 	struct address_space *mapping;
 	int ret, err;
+	__attribute__((unused)) struct timespec start, end;

 	ret = -EBADF;
 	file = fget(fd);
@@ -349,6 +352,7 @@ static long do_fsync(unsigned int fd, in
 		goto out_putf;
 	}

+	getnstimestamp(&start);
 	mapping = file->f_mapping;

 	current->flags |= PF_SYNCWRITE;
@@ -371,6 +375,8 @@ static long do_fsync(unsigned int fd, in
 out_putf:
 	fput(file);
 out:
+	getnstimestamp(&end);
+	delayacct_blkio(&start, &end);
 	return ret;
 }

Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -19,8 +19,12 @@
 #ifdef CONFIG_TASK_DELAY_ACCT
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern void delayacct_tsk_init(struct task_struct *tsk);
+extern void delayacct_blkio(struct timespec *start, struct timespec *end);
 #else
 static inline void delayacct_tsk_init(struct task_struct *tsk)
 {}
+static inline void delayacct_blkio(struct timespec *start, struct timespec *end)
+{}
+
 #endif /* CONFIG_TASK_DELAY_ACCT */
 #endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/delayacct.c
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -12,6 +12,7 @@
  */

 #include <linux/sched.h>
+#include <linux/time.h>

 int delayacct_on;	/* Delay accounting turned on/off */

@@ -34,3 +35,33 @@ static int __init delayacct_init(void)
 	return 0;
 }
 core_initcall(delayacct_init);
+
+inline void delayacct_blkio(struct timespec *start, struct timespec *end)
+{
+	unsigned long long delay;
+
+	if (!delayacct_on)
+		return;
+
+	delay = timespec_nsdiff(start, end);
+
+	spin_lock(&current->delays.lock);
+	current->delays.blkio_delay += delay;
+	current->delays.blkio_count++;
+	spin_unlock(&current->delays.lock);
+}
+
+inline void delayacct_swapin(struct timespec *start, struct timespec *end)
+{
+	unsigned long long delay;
+
+	if (!delayacct_on)
+		return;
+
+	delay = timespec_nsdiff(start, end);
+
+	spin_lock(&current->delays.lock);
+	current->delays.swapin_delay += delay;
+	current->delays.swapin_count++;
+	spin_unlock(&current->delays.lock);
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][Patch 4/5] Per-task delay accounting: Swap in delays
  2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
                   ` (2 preceding siblings ...)
  2005-12-07 22:23 ` [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays Shailabh Nagar
@ 2005-12-07 22:28 ` Shailabh Nagar
  2005-12-07 22:29 ` [RFC][Patch 5/5] Per-task delay accounting: procfs interface Shailabh Nagar
  4 siblings, 0 replies; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:28 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
  instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- collect delays for only swapin page faults instead of all page faults.

11/14/05: First post


delayacct-swapin.patch

Record time spent by a task waiting for its pages to be swapped in.
This statistic can help in adjusting the rss limits of
tasks (process), especially relative to each other, when the system is
under memory pressure.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>

 include/linux/delayacct.h |    3 +++
 include/linux/sched.h     |    2 ++
 mm/memory.c               |   16 +++++++++-------
 3 files changed, 14 insertions(+), 7 deletions(-)

Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -20,11 +20,14 @@
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern void delayacct_tsk_init(struct task_struct *tsk);
 extern void delayacct_blkio(struct timespec *start, struct timespec *end);
+extern void delayacct_swapin(struct timespec *start, struct timespec *end);
 #else
 static inline void delayacct_tsk_init(struct task_struct *tsk)
 {}
 static inline void delayacct_blkio(struct timespec *start, struct timespec *end)
 {}
+static inline void delayacct_swapin(struct timespec *start, struct timespec *end)
+{}

 #endif /* CONFIG_TASK_DELAY_ACCT */
 #endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -548,6 +548,8 @@ struct task_delay_info {
 	/* Add stats in pairs: uint64_t delay, uint32_t count */
 	uint64_t blkio_delay;	/* wait for sync block io completion */
 	uint32_t blkio_count;
+	uint64_t swapin_delay;	/* wait for pages to be swapped in */
+	uint32_t swapin_count;
 };
 #endif

Index: linux-2.6.15-rc5/mm/memory.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/memory.c
+++ linux-2.6.15-rc5/mm/memory.c
@@ -2201,16 +2201,15 @@ static inline int handle_pte_fault(struc

 	old_entry = entry = *pte;
 	if (!pte_present(entry)) {
-		if (pte_none(entry)) {
-			int ret;
-			__attribute__((unused)) struct timespec start, end;
+		int ret;
+		__attribute__((unused)) struct timespec start, end;

+		getnstimestamp(&start);
+		if (pte_none(entry)) {
 			if (!vma->vm_ops || !vma->vm_ops->nopage)
 				return do_anonymous_page(mm, vma, address,
 					pte, pmd, write_access);

-			if (vma->vm_file)
-				getnstimestamp(&start);
 			ret = do_no_page(mm, vma, address,
 					 pte, pmd, write_access);
 			if (vma->vm_file) {
@@ -2222,8 +2221,11 @@ static inline int handle_pte_fault(struc
 		if (pte_file(entry))
 			return do_file_page(mm, vma, address,
 					pte, pmd, write_access, entry);
-		return do_swap_page(mm, vma, address,
-					pte, pmd, write_access, entry);
+		ret = do_swap_page(mm, vma, address,
+				   pte, pmd, write_access, entry);
+		getnstimestamp(&end);
+ 		delayacct_swapin(&start, &end);
+		return ret;
 	}

 	ptl = pte_lockptr(mm, pmd);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][Patch 5/5] Per-task delay accounting: procfs interface
  2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
                   ` (3 preceding siblings ...)
  2005-12-07 22:28 ` [RFC][Patch 4/5] Per-task delay accounting: Swap in delays Shailabh Nagar
@ 2005-12-07 22:29 ` Shailabh Nagar
  4 siblings, 0 replies; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 22:29 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

Creates /proc/<pid>/delay interface for getting per-task
delay statistics (time spent by a task waiting for cpu,
sync block I/O completion, swapping in pages etc.) The cpu
stats are available only if CONFIG_SCHEDSTATS is enabled.

The interface allows a task's delay stats (excluding cpu)
to be reset to zero. This is particularly useful if
delay accounting is being turned on/off dynamically.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>

 fs/proc/base.c            |   65 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/delayacct.h |    6 ++++
 kernel/delayacct.c        |   33 +++++++++++++++++++++++
 3 files changed, 104 insertions(+)

Index: linux-2.6.15-rc5/fs/proc/base.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/proc/base.c
+++ linux-2.6.15-rc5/fs/proc/base.c
@@ -71,6 +71,8 @@
 #include <linux/cpuset.h>
 #include <linux/audit.h>
 #include <linux/poll.h>
+#include <linux/delayacct.h>
+#include <linux/kernel.h>
 #include "internal.h"

 /*
@@ -166,6 +168,10 @@ enum pid_directory_inos {
 	PROC_TID_OOM_SCORE,
 	PROC_TID_OOM_ADJUST,

+#ifdef CONFIG_TASK_DELAY_ACCT
+        PROC_TID_DELAY_ACCT,
+        PROC_TGID_DELAY_ACCT,
+#endif
 	/* Add new entries before this */
 	PROC_TID_FD_DIR = 0x8000,	/* 0x8000-0xffff */
 };
@@ -220,6 +226,9 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_AUDITSYSCALL
 	E(PROC_TGID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
 #endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+	E(PROC_TGID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 	{0,0,NULL,0}
 };
 static struct pid_entry tid_base_stuff[] = {
@@ -262,6 +271,9 @@ static struct pid_entry tid_base_stuff[]
 #ifdef CONFIG_AUDITSYSCALL
 	E(PROC_TID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
 #endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+	E(PROC_TID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 	{0,0,NULL,0}
 };

@@ -1066,6 +1078,53 @@ static struct file_operations proc_secco
 };
 #endif /* CONFIG_SECCOMP */

+#ifdef CONFIG_TASK_DELAY_ACCT
+ssize_t proc_delayacct_write(struct file *file, const char __user *buffer,
+				size_t count, loff_t *ppos)
+{
+	struct task_struct *tsk = proc_task(file->f_dentry->d_inode);
+	char kbuf[DELAYACCT_PROC_MAX_WRITE + 1];
+	int cmd, ret;
+
+	if (count > DELAYACCT_PROC_MAX_WRITE)
+		return -EINVAL;
+	if (copy_from_user(&kbuf, buffer, count))
+        	return -EFAULT;
+
+	cmd = simple_strtoul(kbuf, NULL, 10);
+	ret = delayacct_task_write(tsk, cmd);
+
+	if (ret)
+		return ret;
+	return count;
+}
+
+ssize_t proc_delayacct_read(struct file *file, char __user *buffer,
+				size_t count, loff_t *ppos)
+{
+	struct task_struct *tsk = proc_task(file->f_dentry->d_inode);
+	char kbuf[DELAYACCT_PROC_MAX_READ + 1];
+	size_t len;
+	loff_t __ppos = *ppos;
+
+	len = delayacct_task_read(tsk, kbuf);
+
+	if (__ppos >= len)
+		return 0;
+	if (count > len-__ppos)
+		count = len-__ppos;
+	if (copy_to_user(buffer, kbuf + __ppos, count))
+		return -EFAULT;
+	*ppos = __ppos + count;
+	return count;
+}
+
+static struct file_operations proc_delayacct_operations = {
+        .read           = proc_delayacct_read,
+        .write          = proc_delayacct_write,
+};
+#endif
+
 static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct inode *inode = dentry->d_inode;
@@ -1786,6 +1845,12 @@ static struct dentry *proc_pident_lookup
 			inode->i_fop = &proc_loginuid_operations;
 			break;
 #endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+		case PROC_TID_DELAY_ACCT:
+		case PROC_TGID_DELAY_ACCT:
+			inode->i_fop = &proc_delayacct_operations;
+			break;
+#endif
 		default:
 			printk("procfs: impossible type (%d)",p->type);
 			iput(inode);
Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -16,11 +16,17 @@

 #include <linux/sched.h>

+/* Maximum data that a user can read/write from/to /proc/<tgid>/delay */
+#define DELAYACCT_PROC_MAX_READ	256
+#define DELAYACCT_PROC_MAX_WRITE	8
+
 #ifdef CONFIG_TASK_DELAY_ACCT
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern void delayacct_tsk_init(struct task_struct *tsk);
 extern void delayacct_blkio(struct timespec *start, struct timespec *end);
 extern void delayacct_swapin(struct timespec *start, struct timespec *end);
+extern int delayacct_task_write(struct task_struct *tsk, int cmd);
+extern size_t delayacct_task_read(struct task_struct *tsk, char *buf);
 #else
 static inline void delayacct_tsk_init(struct task_struct *tsk)
 {}
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/delayacct.c
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -13,6 +13,7 @@

 #include <linux/sched.h>
 #include <linux/time.h>
+#include <linux/delayacct.h>

 int delayacct_on;	/* Delay accounting turned on/off */

@@ -65,3 +66,35 @@ inline void delayacct_swapin(struct time
 	current->delays.swapin_count++;
 	spin_unlock(&current->delays.lock);
 }
+
+/* User writes @cmd to /proc/<tgid>/delay */
+inline int delayacct_task_write(struct task_struct *tsk, int cmd)
+{
+	if (cmd == 0) {
+		spin_lock(&tsk->delays.lock);
+		memset(&tsk->delays, 0, sizeof(tsk->delays));
+		spin_unlock(&tsk->delays.lock);
+	}
+	return 0;
+}
+
+/* User reads from /proc/<tgid>/delay */
+inline size_t delayacct_task_read(struct task_struct *tsk, char *buf)
+{
+	unsigned long long run_delay = 0;
+	unsigned long run_count = 0;
+
+#ifdef CONFIG_SCHEDSTATS
+	run_delay = jiffies_to_usecs(tsk->sched_info.run_delay) * 1000;
+	run_count = tsk->sched_info.pcnt ;
+#endif
+	return snprintf(buf, DELAYACCT_PROC_MAX_READ,
+		 "%lu %llu %llu %u %llu %u %llu\n",
+		 run_count,
+		 (uint64_t) current_sched_time(tsk),
+		 (uint64_t) run_delay,
+		 (unsigned int) tsk->delays.blkio_count,
+		 (uint64_t) tsk->delays.blkio_delay,
+		 (unsigned int) tsk->delays.swapin_count,
+		 (uint64_t) tsk->delays.swapin_delay);
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ckrm-tech] [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays
  2005-12-07 22:23 ` [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays Shailabh Nagar
@ 2005-12-07 22:33   ` Dave Hansen
  2005-12-07 23:06     ` Shailabh Nagar
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2005-12-07 22:33 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, LSE, ckrm-tech, Guillaume Thouvenin,
	Jay Lan, Jens Axboe, Suparna Bhattacharya

On Wed, 2005-12-07 at 22:23 +0000, Shailabh Nagar wrote:
> 
> +       if (-EIOCBQUEUED == ret) {
> +               __attribute__((unused)) struct timespec start, end;
> +

Those "unused" things suck.  They're really ugly.

Doesn't making your delay functions into static inlines make the unused
warnings go away?

-- Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ckrm-tech] [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays
  2005-12-07 22:33   ` [ckrm-tech] " Dave Hansen
@ 2005-12-07 23:06     ` Shailabh Nagar
  0 siblings, 0 replies; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-07 23:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, elsa-devel, LSE, ckrm-tech, Guillaume Thouvenin,
	Jay Lan, Jens Axboe, Suparna Bhattacharya

Dave Hansen wrote:
> On Wed, 2005-12-07 at 22:23 +0000, Shailabh Nagar wrote:
> 
>>+       if (-EIOCBQUEUED == ret) {
>>+               __attribute__((unused)) struct timespec start, end;
>>+
> 
> 
> Those "unused" things suck.  They're really ugly.
> 
> Doesn't making your delay functions into static inlines make the unused
> warnings go away?

They do indeed. Thanks !
It was a holdover from when the delay funcs were macros. Will fix everywhere.

--Shailabh


> 
> -- Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-07 22:13 ` [RFC][Patch 1/5] nanosecond timestamps and diffs Shailabh Nagar
@ 2005-12-12 18:50   ` Christoph Lameter
  2005-12-12 19:31     ` Shailabh Nagar
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2005-12-12 18:50 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe


On Wed, 7 Dec 2005, Shailabh Nagar wrote:

> +void getnstimestamp(struct timespec *ts)

There is already getnstimeofday in the kernel.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 18:50   ` [Lse-tech] " Christoph Lameter
@ 2005-12-12 19:31     ` Shailabh Nagar
  2005-12-12 19:49       ` john stultz
  0 siblings, 1 reply; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-12 19:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe, John Stultz

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> 
> 
>>+void getnstimestamp(struct timespec *ts)
> 
> 
> There is already getnstimeofday in the kernel.
> 
> 

Yes, and that function is being used within the getnstimestamp() being proposed.
However, John Stultz had advised that getnstimeofday could get affected by calls to
settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.

John, could you elaborate ?

Thanks,
Shailabh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 19:31     ` Shailabh Nagar
@ 2005-12-12 19:49       ` john stultz
  2005-12-12 20:00         ` Shailabh Nagar
  2005-12-13 18:35         ` Jay Lan
  0 siblings, 2 replies; 20+ messages in thread
From: john stultz @ 2005-12-12 19:49 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: Christoph Lameter, linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> Christoph Lameter wrote:
> > On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> > 
> > 
> >>+void getnstimestamp(struct timespec *ts)
> > 
> > 
> > There is already getnstimeofday in the kernel.
> > 
> 
> Yes, and that function is being used within the getnstimestamp() being proposed.
> However, John Stultz had advised that getnstimeofday could get affected by calls to
> settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> 
> John, could you elaborate ?

I think you pretty well have it covered. 

getnstimeofday + wall_to_monotonic should be higher-res and more
reliable (then TSC based sched_clock(), for example) for getting a
timestamp.

There may be performance concerns as you have to access the clock
hardware in getnstimeofday(), but there really is no other way for
reliable finely grained monotonically increasing timestamps.

thanks
-john


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 19:49       ` john stultz
@ 2005-12-12 20:00         ` Shailabh Nagar
  2005-12-12 20:07           ` john stultz
  2005-12-13 18:35         ` Jay Lan
  1 sibling, 1 reply; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-12 20:00 UTC (permalink / raw)
  To: john stultz
  Cc: Christoph Lameter, linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

john stultz wrote:
> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> 
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>
>>>
>>>
>>>>+void getnstimestamp(struct timespec *ts)
>>>
>>>
>>>There is already getnstimeofday in the kernel.
>>>
>>
>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>
>>John, could you elaborate ?
> 
> 
> I think you pretty well have it covered. 
> 
> getnstimeofday + wall_to_monotonic should be higher-res and more
> reliable (then TSC based sched_clock(), for example) for getting a
> timestamp.
> 
> There may be performance concerns as you have to access the clock
> hardware in getnstimeofday(), but there really is no other way for
> reliable finely grained monotonically increasing timestamps.
> 
> thanks
> -john

Thanks, that clarifies. I guess the other underlying concern here would be whether these
improvements (in resolution and reliability) should be going into getnstimeofday()
itself (rather than creating a new func for the same) ? Or is it better to leave
getnstimeofday as it is ?

Thanks,
Shailabh


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 20:00         ` Shailabh Nagar
@ 2005-12-12 20:07           ` john stultz
  2005-12-13  0:54             ` George Anzinger
  0 siblings, 1 reply; 20+ messages in thread
From: john stultz @ 2005-12-12 20:07 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: Christoph Lameter, linux-kernel, elsa-devel, lse-tech, ckrm-tech,
	Guillaume Thouvenin, Jay Lan, Jens Axboe

On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> > 
> >>Christoph Lameter wrote:
> >>
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> > 
> > I think you pretty well have it covered. 
> > 
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
> > 
> > There may be performance concerns as you have to access the clock
> > hardware in getnstimeofday(), but there really is no other way for
> > reliable finely grained monotonically increasing timestamps.
> > 

> Thanks, that clarifies. I guess the other underlying concern here would be whether these
> improvements (in resolution and reliability) should be going into getnstimeofday()
> itself (rather than creating a new func for the same) ? Or is it better to leave
> getnstimeofday as it is ?

No, getnstimeofday() is very much needed to get a nanosecond grained
wall-time clock, so a new function is needed for the monotonic clock.

In my timeofday re-work I have used the name "get_monotonic_clock()" and
"get_monotonic_clock_ts()" for basically the same functionality
(providing a ktime and a timespec respectively). You might consider
naming it as such, but resolving these naming collisions shouldn't be
too difficult either way.

thanks
-john


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 20:07           ` john stultz
@ 2005-12-13  0:54             ` George Anzinger
  2005-12-13  3:48               ` Nish Aravamudan
  0 siblings, 1 reply; 20+ messages in thread
From: George Anzinger @ 2005-12-13  0:54 UTC (permalink / raw)
  To: john stultz
  Cc: Shailabh Nagar, Christoph Lameter, linux-kernel, elsa-devel,
	lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan, Jens Axboe

john stultz wrote:
> On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
> 
>>john stultz wrote:
>>
>>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>>
>>>
>>>>Christoph Lameter wrote:
>>>>
>>>>
>>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>>
>>>>>>+void getnstimestamp(struct timespec *ts)
>>>>>
>>>>>There is already getnstimeofday in the kernel.
>>>>
>>>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>>>
>>>>John, could you elaborate ?
>>>
>>>I think you pretty well have it covered. 
>>>
>>>getnstimeofday + wall_to_monotonic should be higher-res and more
>>>reliable (then TSC based sched_clock(), for example) for getting a
>>>timestamp.
>>>
>>>There may be performance concerns as you have to access the clock
>>>hardware in getnstimeofday(), but there really is no other way for
>>>reliable finely grained monotonically increasing timestamps.
>>>
> 
> 
>>Thanks, that clarifies. I guess the other underlying concern here would be whether these
>>improvements (in resolution and reliability) should be going into getnstimeofday()
>>itself (rather than creating a new func for the same) ? Or is it better to leave
>>getnstimeofday as it is ?
> 
> 
> No, getnstimeofday() is very much needed to get a nanosecond grained
> wall-time clock, so a new function is needed for the monotonic clock.
> 
> In my timeofday re-work I have used the name "get_monotonic_clock()" and
> "get_monotonic_clock_ts()" for basically the same functionality
> (providing a ktime and a timespec respectively). You might consider
> naming it as such, but resolving these naming collisions shouldn't be
> too difficult either way.

Indeed.  Lets use a name with "monotonic" in it, please.  And, 
possibly not "clock".  How about get_nsmonotonic_time() or some such?


-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-13  0:54             ` George Anzinger
@ 2005-12-13  3:48               ` Nish Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nish Aravamudan @ 2005-12-13  3:48 UTC (permalink / raw)
  To: george
  Cc: john stultz, Shailabh Nagar, Christoph Lameter, linux-kernel,
	elsa-devel, lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan,
	Jens Axboe

On 12/12/05, George Anzinger <george@mvista.com> wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
> >
> >>john stultz wrote:
> >>
> >>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >>>
> >>>
> >>>>Christoph Lameter wrote:
> >>>>
> >>>>
> >>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>>
> >>>>>>+void getnstimestamp(struct timespec *ts)
> >>>>>
> >>>>>There is already getnstimeofday in the kernel.
> >>>>
> >>>>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>>>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>>>
> >>>>John, could you elaborate ?
> >>>
> >>>I think you pretty well have it covered.
> >>>
> >>>getnstimeofday + wall_to_monotonic should be higher-res and more
> >>>reliable (then TSC based sched_clock(), for example) for getting a
> >>>timestamp.
> >>>
> >>>There may be performance concerns as you have to access the clock
> >>>hardware in getnstimeofday(), but there really is no other way for
> >>>reliable finely grained monotonically increasing timestamps.
> >>>
> >
> >
> >>Thanks, that clarifies. I guess the other underlying concern here would be whether these
> >>improvements (in resolution and reliability) should be going into getnstimeofday()
> >>itself (rather than creating a new func for the same) ? Or is it better to leave
> >>getnstimeofday as it is ?
> >
> >
> > No, getnstimeofday() is very much needed to get a nanosecond grained
> > wall-time clock, so a new function is needed for the monotonic clock.
> >
> > In my timeofday re-work I have used the name "get_monotonic_clock()" and
> > "get_monotonic_clock_ts()" for basically the same functionality
> > (providing a ktime and a timespec respectively). You might consider
> > naming it as such, but resolving these naming collisions shouldn't be
> > too difficult either way.
>
> Indeed.  Lets use a name with "monotonic" in it, please.  And,
> possibly not "clock".  How about get_nsmonotonic_time() or some such?

I agree -- personal preference, though, I prefer units at the end,
i.e. get_monotonic_time_ns() or get_monotonic_time_nsecs().

Thanks,
Nish

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-12 19:49       ` john stultz
  2005-12-12 20:00         ` Shailabh Nagar
@ 2005-12-13 18:35         ` Jay Lan
  2005-12-13 21:16           ` john stultz
                             ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Jay Lan @ 2005-12-13 18:35 UTC (permalink / raw)
  To: john stultz
  Cc: Shailabh Nagar, Christoph Lameter, linux-kernel, elsa-devel,
	lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan, Jens Axboe

john stultz wrote:
> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> 
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>
>>>
>>>
>>>>+void getnstimestamp(struct timespec *ts)
>>>
>>>
>>>There is already getnstimeofday in the kernel.
>>>
>>
>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>
>>John, could you elaborate ?
> 
> 
> I think you pretty well have it covered. 
> 
> getnstimeofday + wall_to_monotonic should be higher-res and more
> reliable (then TSC based sched_clock(), for example) for getting a
> timestamp.

How is this proposed function different from
do_posix_clock_monotonic_gettime()?
It calls getnstimeofday(), it also adjusts with wall_to_monotinic.

It seems to me we just need to EXPORT_SYMBOL_GPL the
do_posix_clock_monotonic_gettime()?

Thanks,
  - jay

> 
> There may be performance concerns as you have to access the clock
> hardware in getnstimeofday(), but there really is no other way for
> reliable finely grained monotonically increasing timestamps.
> 
> thanks
> -john
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-13 18:35         ` Jay Lan
@ 2005-12-13 21:16           ` john stultz
  2005-12-13 21:44           ` Shailabh Nagar
  2005-12-13 23:05           ` [ckrm-tech] " Matt Helsley
  2 siblings, 0 replies; 20+ messages in thread
From: john stultz @ 2005-12-13 21:16 UTC (permalink / raw)
  To: Jay Lan
  Cc: Shailabh Nagar, Christoph Lameter, linux-kernel, elsa-devel,
	lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan, Jens Axboe

On Tue, 2005-12-13 at 10:35 -0800, Jay Lan wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >>Christoph Lameter wrote:
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> > 
> > I think you pretty well have it covered. 
> > 
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
> 
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
> 
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?

Indeed, this would be the same.

thanks
-john


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-13 18:35         ` Jay Lan
  2005-12-13 21:16           ` john stultz
@ 2005-12-13 21:44           ` Shailabh Nagar
  2005-12-13 22:13             ` George Anzinger
  2005-12-13 23:05           ` [ckrm-tech] " Matt Helsley
  2 siblings, 1 reply; 20+ messages in thread
From: Shailabh Nagar @ 2005-12-13 21:44 UTC (permalink / raw)
  To: Jay Lan
  Cc: john stultz, Christoph Lameter, linux-kernel, elsa-devel,
	lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan, Jens Axboe

Jay Lan wrote:
> john stultz wrote:
> 
>> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>
>>> Christoph Lameter wrote:
>>>
>>>> On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>
>>>>
>>>>
>>>>> +void getnstimestamp(struct timespec *ts)
>>>>
>>>>
>>>>
>>>> There is already getnstimeofday in the kernel.
>>>>
>>>
>>> Yes, and that function is being used within the getnstimestamp()
>>> being proposed.
>>> However, John Stultz had advised that getnstimeofday could get
>>> affected by calls to
>>> settimeofday and had recommended adjusting the getnstimeofday value
>>> with wall_to_monotonic.
>>>
>>> John, could you elaborate ?
>>
>>
>>
>> I think you pretty well have it covered.
>> getnstimeofday + wall_to_monotonic should be higher-res and more
>> reliable (then TSC based sched_clock(), for example) for getting a
>> timestamp.
> 
> 
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
> 
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?
> 
> Thanks,
>  - jay
> 

Hmmm. Looks like do_posix_clock_monotonic_gettime will suffice for this patch.

Wonder why the clock parameter to do_posix_clock_monotonic_get is needed ?
Doesn't seem to be used.

Any possibility of these set of functions changing their behaviour ?

-- Shailabh







>>
>> There may be performance concerns as you have to access the clock
>> hardware in getnstimeofday(), but there really is no other way for
>> reliable finely grained monotonically increasing timestamps.
>>
>> thanks
>> -john
>>
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-13 21:44           ` Shailabh Nagar
@ 2005-12-13 22:13             ` George Anzinger
  0 siblings, 0 replies; 20+ messages in thread
From: George Anzinger @ 2005-12-13 22:13 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: Jay Lan, john stultz, Christoph Lameter, linux-kernel,
	elsa-devel, lse-tech, ckrm-tech, Guillaume Thouvenin, Jay Lan,
	Jens Axboe

Shailabh Nagar wrote:
> Jay Lan wrote:
> 
>>john stultz wrote:
>>
>>
>>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>>
>>>
>>>>Christoph Lameter wrote:
>>>>
>>>>
>>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>+void getnstimestamp(struct timespec *ts)
>>>>>
>>>>>
>>>>>
>>>>>There is already getnstimeofday in the kernel.
>>>>>
>>>>
>>>>Yes, and that function is being used within the getnstimestamp()
>>>>being proposed.
>>>>However, John Stultz had advised that getnstimeofday could get
>>>>affected by calls to
>>>>settimeofday and had recommended adjusting the getnstimeofday value
>>>>with wall_to_monotonic.
>>>>
>>>>John, could you elaborate ?
>>>
>>>
>>>
>>>I think you pretty well have it covered.
>>>getnstimeofday + wall_to_monotonic should be higher-res and more
>>>reliable (then TSC based sched_clock(), for example) for getting a
>>>timestamp.
>>
>>
>>How is this proposed function different from
>>do_posix_clock_monotonic_gettime()?
>>It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
>>
>>It seems to me we just need to EXPORT_SYMBOL_GPL the
>>do_posix_clock_monotonic_gettime()?
>>
>>Thanks,
>> - jay
>>
> 
> 
> Hmmm. Looks like do_posix_clock_monotonic_gettime will suffice for this patch.
> 
> Wonder why the clock parameter to do_posix_clock_monotonic_get is needed ?

Because it is called indirectly by the table driven posix clocks and 
timers code where the clock, usually, is needed.

> Doesn't seem to be used.
> 
> Any possibility of these set of functions changing their behaviour ?

Always :), but things are pretty stable now.  Might want to add a 
comment that it is being used outside of the posix "box".


-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ckrm-tech] Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs
  2005-12-13 18:35         ` Jay Lan
  2005-12-13 21:16           ` john stultz
  2005-12-13 21:44           ` Shailabh Nagar
@ 2005-12-13 23:05           ` Matt Helsley
  2 siblings, 0 replies; 20+ messages in thread
From: Matt Helsley @ 2005-12-13 23:05 UTC (permalink / raw)
  To: Jay Lan
  Cc: john stultz, Shailabh Nagar, Christoph Lameter, linux-kernel,
	elsa-devel, lse-tech, CKRM-Tech, Guillaume Thouvenin, Jay Lan,
	Jens Axboe

On Tue, 2005-12-13 at 10:35 -0800, Jay Lan wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> > 
> >>Christoph Lameter wrote:
> >>
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>
> >>>
> >>>
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>>
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> > 
> > 
> > I think you pretty well have it covered. 
> > 
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
> 
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
> 
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?
> 
> Thanks,
>   - jay

Ah, yes. I should've searched for gettime rather than gettimeofday when
I was looking for a suitable function.

Two minor differences exist:

1) getnstimestamp does not fetch an unused copy of jiffies_64
2) getnstimestamp uses and advertises an explicit maximum resolution

	I don't think either of these really matter so I'll post a series of
patches:

1) EXPORTing (_SYMBOL_GPL) do_posix_clock_monotonic_gettime()
2) using do_posix_clock_monotonic_gettime() as a timestamp
3) removing getnstimestamp()

Thanks,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-12-13 23:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-07 22:08 [RFC][Patch 0/5] Per-task delay accounting Shailabh Nagar
2005-12-07 22:13 ` [RFC][Patch 1/5] nanosecond timestamps and diffs Shailabh Nagar
2005-12-12 18:50   ` [Lse-tech] " Christoph Lameter
2005-12-12 19:31     ` Shailabh Nagar
2005-12-12 19:49       ` john stultz
2005-12-12 20:00         ` Shailabh Nagar
2005-12-12 20:07           ` john stultz
2005-12-13  0:54             ` George Anzinger
2005-12-13  3:48               ` Nish Aravamudan
2005-12-13 18:35         ` Jay Lan
2005-12-13 21:16           ` john stultz
2005-12-13 21:44           ` Shailabh Nagar
2005-12-13 22:13             ` George Anzinger
2005-12-13 23:05           ` [ckrm-tech] " Matt Helsley
2005-12-07 22:15 ` [RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off Shailabh Nagar
2005-12-07 22:23 ` [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays Shailabh Nagar
2005-12-07 22:33   ` [ckrm-tech] " Dave Hansen
2005-12-07 23:06     ` Shailabh Nagar
2005-12-07 22:28 ` [RFC][Patch 4/5] Per-task delay accounting: Swap in delays Shailabh Nagar
2005-12-07 22:29 ` [RFC][Patch 5/5] Per-task delay accounting: procfs interface Shailabh Nagar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.