linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
@ 2002-10-09 22:47 george anzinger
  2002-10-09 23:14 ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: george anzinger @ 2002-10-09 22:47 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]

(Sigh) For some reason the earlier version of this message
was truncated near the end of the patch.  Lets  try again. 

This patch, in conjunction with the "core" high-res-timers
patch implements high resolution timers on the i386
platforms.  The high-res-timers use the periodic interrupt
to "remind" the system to look at the clock.  The clock
should be relatively high resolution (1 micro second or
better).  This patch allows configuring of three possible
clocks, the TSC, the ACPI pm timer, or the Programmable
interrupt timer (PIT).  Most of the changes in this patch
are in the arch/i386/time.c code.

This patch uses (if available) the APIC timer(s) to generate
1/HZ ticks and sub 1/HZ ticks as needed.  The PIT still
interrupts, but if the APIC timer is available, just causes
the wall clock update.  No attempt is made to make this
interrupt happen on jiffie boundaries, however, the APIC
timers are disciplined to expire on 1/HZ boundaries to give
consistent timer latencies WRT to the system time.

With this patch applied and enabled (at config time in the
processor feature section), the system clock will be the
specified clock.  The PIT is not used to keep track of time,
but only to remind the system to look at the clock.  Sub
jiffies are kept and available for code that knows how to
use them.

Patch is against 2.5.41-bk2
-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

[-- Attachment #2: hrtimers-i386-2.5.41-bk2-1.0.patch --]
[-- Type: text/plain, Size: 81626 bytes --]

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/Config.help linux/arch/i386/Config.help
--- linux-2.5.41-bk2-core/arch/i386/Config.help	Wed Oct  9 13:55:48 2002
+++ linux/arch/i386/Config.help	Wed Oct  9 14:08:47 2002
@@ -52,6 +52,75 @@
   Say Y here if you are building a kernel for a desktop, embedded
   or real-time system.  Say N if you are unsure.
 
+High-res-timers
+CONFIG_HIGH_RES_TIMERS
+  POSIX timers are available by default.  This option enables high
+  resolution POSIX timers.  With this option the resolution is at
+  least 1 micro second.  High resolution is not free.  If enabled this
+  option will add a small overhead each time a timer expires that is
+  not on a 1/HZ tick boundry.  If no such timers are used the overhead
+  is nil.
+
+  This option enables two additional POSIX CLOCKS, CLOCK_REALTIME_HR
+  and CLOCK_MONOTONIC_HR.  Note that this option does not change the
+  resolution of CLOCK_REALTIME or CLOCK_MONOTONIC which remain at 1/HZ
+  resolution.
+
+High-res-timers clock
+CONFIG_HIGH_RES_TIMER_ACPI_PM 
+  This option allows you to choose the wall clock timer for your system.
+  With high resolution timers on the x86 platforms it is best to keep
+  the interrupt generating timer separate from the time keeping timer.
+  On x86 platforms there are three possible sources implemented for the
+  wall clock.  These are:
+ 
+  <timer>				<resolution>
+  ACPI power management (pm) timer	~280 nano seconds
+  TSC (Time Stamp Counter)		1/CPU clock
+  PIT (Programmable Interrupt Timer)	~838 nano seconds
+
+  The PIT is used to generate interrupts and at any given time will be
+  programmed to interrupt when the next timer is to expire or on the
+  next 1/HZ tick.  For this reason it is best to not use this timer as
+  the wall clock timer.  This timer has a resolution of 838 nano
+  seconds.  THIS OPTION SHOULD ONLY BE USED IF BOTH ACPI AND TSC ARE
+  NOT AVAILABLE.
+
+  The TSC runs at the cpu clock rate (i.e. its resolution is 1/CPU
+  clock) and it has a very low access time.  However, it is subject,
+  in some (incorrect) processors, to throttling to cool the cpu, and
+  to other slow downs during power management.  If your cpu is correct
+  and does not change the TSC frequency for throttling or power
+  management this is the best clock timer.
+
+  The ACPI pm timer is available on systems with Advanced Configuration
+  and Power Interface support.  The pm timer is available on these
+  systems even if you don't use or enable ACPI in the software or the
+  BIOS (but see Default ACPI pm timer address).  The timer has a
+  resolution of about 280 nanoseconds, however, the access time is a bit
+  higher that that of the TSC.  Since it is part of ACPI it is intended
+  to keep track of time while the system is under power management, thus
+  it is not subject to the problems of the TSC.
+
+  If you enable the ACPI pm timer and it can not be found, it is
+  possible that your BIOS is not producing the ACPI table or that your
+  machine does not support ACPI.  In the former case, see "Default ACPI
+  pm timer address".  If the timer is not found the boot will fail when
+  trying to calibrate the delay timer.
+
+Default ACPI pm timer address
+CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD
+  This option is available for use on systems where the BIOS does not
+  generate the ACPI tables if ACPI is not enabled.  For example some
+  BIOSes will not generate the ACPI tables if APM is enabled.  The ACPI
+  pm timer is still available but can not be found by the software.
+  This option allows you to supply the needed address.  When the high
+  resolution timers code finds a valid ACPI pm timer address it reports
+  it in the boot messages log (look for lines that begin with
+  "High-res-timers:").  You can turn on the ACPI support in the BIOS,
+  boot the system and find this value.  You can then enter it at
+  configure time.  Both the report and the entry are in decimal.
+
 CONFIG_X86
   This is Linux's home port.  Linux was originally native to the Intel
   386, and runs on all the later x86 processors including the Intel
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/config.in linux/arch/i386/config.in
--- linux-2.5.41-bk2-core/arch/i386/config.in	Wed Oct  9 14:01:44 2002
+++ linux/arch/i386/config.in	Wed Oct  9 14:08:47 2002
@@ -156,6 +156,23 @@
 bool 'Huge TLB Page Support' CONFIG_HUGETLB_PAGE
 
 bool 'Symmetric multi-processing support' CONFIG_SMP
+bool 'Configure High-Resolution-Timers' CONFIG_HIGH_RES_TIMERS
+#
+# We assume that if the box doesn't have a TSC it doesn't have ACPI either.
+#
+if [ "$CONFIG_HIGH_RES_TIMERS" = "y" -a "$CONFIG_X86_TSC" = "y" ]; then
+	choice 'Clock source?' \
+		"ACPI-pm-timer  CONFIG_HIGH_RES_TIMER_ACPI_PM  \
+		Time-stamp-counter/TSC  CONFIG_HIGH_RES_TIMER_TSC \
+		Programable-interrupt-timer/PIT CONFIG_HIGH_RES_TIMER_PIT" Time-stamp-counter/TSC
+else
+	if [ "$CONFIG_HIGH_RES_TIMERS" = "y" ]; then
+		define_bool CONFIG_HIGH_RES_TIMER_PIT y
+        fi
+fi
+if [ "$CONFIG_HIGH_RES_TIMER_ACPI_PM" = "y" ]; then
+	int 'Default ACPI pm timer address' CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD 0
+fi 
 bool 'Preemptible Kernel' CONFIG_PREEMPT
 if [ "$CONFIG_SMP" != "y" ]; then
    bool 'Local APIC support on uniprocessors' CONFIG_X86_UP_APIC
@@ -350,6 +367,7 @@
 else
    define_bool CONFIG_BLK_DEV_HD n
 fi
+
 endmenu
 
 mainmenu_option next_comment
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/Makefile linux/arch/i386/kernel/Makefile
--- linux-2.5.41-bk2-core/arch/i386/kernel/Makefile	Wed Oct  9 14:01:44 2002
+++ linux/arch/i386/kernel/Makefile	Wed Oct  9 14:08:47 2002
@@ -17,6 +17,7 @@
 obj-$(CONFIG_KGDB)		+= kgdb_stub.o 
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
+obj-$(CONFIG_HIGH_RES_TIMER_ACPI_PM) += high-res-tbxfroot.o
 obj-$(CONFIG_MICROCODE)		+= microcode.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_ACPI)		+= acpi.o
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/apic.c linux/arch/i386/kernel/apic.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/apic.c	Wed Oct  9 13:55:48 2002
+++ linux/arch/i386/kernel/apic.c	Wed Oct  9 14:08:47 2002
@@ -801,7 +801,7 @@
  * P5 APIC double write bug.
  */
 
-#define APIC_DIVISOR 16
+#define APIC_DIVISOR 1
 
 void __setup_APIC_LVTT(unsigned int clocks)
 {
@@ -812,12 +812,12 @@
 	apic_write_around(APIC_LVTT, lvtt1_value);
 
 	/*
-	 * Divide PICLK by 16
+	 * Divide PICLK by 1
 	 */
 	tmp_value = apic_read(APIC_TDCR);
 	apic_write_around(APIC_TDCR, (tmp_value
 				& ~(APIC_TDR_DIV_1 | APIC_TDR_DIV_TMBASE))
-				| APIC_TDR_DIV_16);
+				| APIC_TDR_DIV_1);
 
 	apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
 }
@@ -1030,10 +1030,20 @@
 		 * Interrupts are already masked off at this point.
 		 */
 		prof_counter[cpu] = prof_multiplier[cpu];
+		/* 
+		 * deal with profiling later...
+		 */
+#ifndef CONFIG_HIGH_RES_TIMERS
 		if (prof_counter[cpu] != prof_old_multiplier[cpu]) {
 			__setup_APIC_LVTT(calibration_result/prof_counter[cpu]);
 			prof_old_multiplier[cpu] = prof_counter[cpu];
 		}
+#else
+		/*
+		* This is the 1/HZ count, can be changed by HRT code.
+		*/
+		__setup_APIC_LVTT(calibration_result);
+#endif
 
 #ifdef CONFIG_SMP
 		update_process_times(user);
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/high-res-tbxfroot.c linux/arch/i386/kernel/high-res-tbxfroot.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/high-res-tbxfroot.c	Wed Dec 31 16:00:00 1969
+++ linux/arch/i386/kernel/high-res-tbxfroot.c	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,272 @@
+/******************************************************************************
+ *
+ * Module Name: tbxfroot - Find the root ACPI table (RSDT)
+ *              $Revision: 49 $
+ *
+ *****************************************************************************/
+
+/*
+ *  Copyright (C) 2000, 2001 R. Byron Moore
+
+ *  This code purloined and modified by George Anzinger
+ *                          Copyright (C) 2002 by MontaVista Software.
+ *  It is part of the high-res-timers ACPI option and its sole purpose is
+ *  to find the darn timer.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+/* This is most annoying!  We want to find the address of the pm timer in the
+ * ACPI hardware package.  We know there is one if ACPI is available at all 
+ * as it is part of the basic ACPI hardware set. 
+ * However, the powers that be have conspired to make it a real
+ * pain to find the address.  We have written a minimal search routine
+ * that we use only once on boot up.  We try to cover all the bases including
+ * checksum, and version.  We will try to get some constants and structures
+ * from the ACPI code in an attempt to follow it, but darn, what a mess.
+ *
+ * First problem, the include files are in the driver package....
+ * and what a mess they are.  We pick up the kernel string and types first.
+
+ * But then there is the COMPILER_DEPENDENT_UINT64 ...
+ */
+
+#define COMPILER_DEPENDENT_UINT64   unsigned long long
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <../drivers/acpi/include/actypes.h>
+#include <../drivers/acpi/include/actbl.h>
+#include <../drivers/acpi/include/acconfig.h>
+#include <linux/init.h>
+#include <asm/page.h>
+
+#define STRNCMP(d,s,n)  strncmp((d), (s), (NATIVE_INT)(n))
+#define RSDP_CHECKSUM_LENGTH 20
+
+#ifndef CONFIG_ACPI
+/*******************************************************************************
+ *
+ * FUNCTION:    hrt_acpi_checksum
+ *
+ * PARAMETERS:  Buffer              - Buffer to checksum
+ *              Length              - Size of the buffer
+ *
+ * RETURNS      8 bit checksum of buffer
+ *
+ * DESCRIPTION: Computes an 8 bit checksum of the buffer(length) and returns it.
+ *
+ ******************************************************************************/
+static  __init
+u8
+hrt_acpi_checksum (
+	void                    *buffer,
+	u32                     length)
+{
+	u8                      *limit;
+	u8                      *rover;
+	u8                      sum = 0;
+
+
+	if (buffer && length) {
+		/*  Buffer and Length are valid   */
+
+		limit = (u8 *) buffer + length;
+
+		for (rover = buffer; rover < limit; rover++) {
+			sum = (u8) (sum + *rover);
+		}
+	}
+
+	return (sum);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION:    hrt_acpi_scan_memory_for_rsdp
+ *
+ * PARAMETERS:  Start_address       - Starting pointer for search
+ *              Length              - Maximum length to search
+ *
+ * RETURN:      Pointer to the RSDP if found, otherwise NULL.
+ *
+ * DESCRIPTION: Search a block of memory for the RSDP signature
+ *
+ ******************************************************************************/
+static  __init
+u8 *
+hrt_acpi_scan_memory_for_rsdp (
+	u8                      *start_address,
+	u32                     length)
+{
+	u32                     offset;
+	u8                      *mem_rover;
+
+
+	/* Search from given start addr for the requested length  */
+
+	for (offset = 0, mem_rover = start_address;
+		 offset < length;
+		 offset += RSDP_SCAN_STEP, mem_rover += RSDP_SCAN_STEP) {
+
+		/* The signature and checksum must both be correct */
+
+		if (STRNCMP ((NATIVE_CHAR *) mem_rover,
+				RSDP_SIG, sizeof (RSDP_SIG)-1) == 0 &&
+			hrt_acpi_checksum (mem_rover, RSDP_CHECKSUM_LENGTH) == 0) {
+			/* If so, we have found the RSDP */
+
+;
+			return (mem_rover);
+		}
+	}
+
+	/* Searched entire block, no RSDP was found */
+
+
+	return (NULL);
+}
+
+
+/*******************************************************************************
+ *
+ * FUNCTION:    hrt_acpi_find_rsdp
+ *
+ * PARAMETERS: 
+ *
+ * RETURN:      Logical address of rsdp
+ *
+ * DESCRIPTION: Search lower 1_mbyte of memory for the root system descriptor
+ *              pointer structure.  If it is found, return its address,
+ *              else return 0.
+ *
+ *              NOTE: The RSDP must be either in the first 1_k of the Extended
+ *              BIOS Data Area or between E0000 and FFFFF (ACPI 1.0 section
+ *              5.2.2; assertion #421).
+ *
+ ******************************************************************************/
+/* Constants used in searching for the RSDP in low memory */
+
+#define LO_RSDP_WINDOW_BASE         0           /* Physical Address */
+#define HI_RSDP_WINDOW_BASE         0xE0000     /* Physical Address */
+#define LO_RSDP_WINDOW_SIZE         0x400
+#define HI_RSDP_WINDOW_SIZE         0x20000
+#define RSDP_SCAN_STEP              16
+
+static  __init
+RSDP_DESCRIPTOR *
+hrt_find_acpi_rsdp (void)
+{
+	u8                      *mem_rover;
+
+
+        /*
+         * 1) Search EBDA (low memory) paragraphs
+         */
+        mem_rover = hrt_acpi_scan_memory_for_rsdp((u8 *)__va(LO_RSDP_WINDOW_BASE),
+                                                     LO_RSDP_WINDOW_SIZE);
+
+        if (!mem_rover) {
+                /*
+                 * 2) Search upper memory: 
+                 *    16-byte boundaries in E0000h-F0000h
+                 */
+                mem_rover = hrt_acpi_scan_memory_for_rsdp((u8 *)__va(HI_RSDP_WINDOW_BASE),
+                                                         HI_RSDP_WINDOW_SIZE);
+        }
+
+        if (mem_rover) {
+                /* Found it, return the logical address */
+
+                return (RSDP_DESCRIPTOR *)mem_rover;
+        }
+        return (RSDP_DESCRIPTOR *)0;
+}
+
+__init
+u32
+hrt_get_acpi_pm_ptr(void)
+{
+        fadt_descriptor_rev2 *fadt;
+        RSDT_DESCRIPTOR_REV2 *rsdt;
+        XSDT_DESCRIPTOR_REV2 *xsdt;
+        RSDP_DESCRIPTOR *rsdp = hrt_find_acpi_rsdp ();
+
+        if ( ! rsdp){
+                printk("ACPI: System description tables not found\n");
+                return 0;
+        }
+        /*
+         * Now that we have that problem out of the way, lets set up this
+         * timer.  We need to figure the addresses based on the revision
+         * of ACPI, which is in this here table we just found.
+         * We will not check the RSDT checksum, but will the FADT.
+         */
+        if ( rsdp->revision == 2){
+                xsdt = (XSDT_DESCRIPTOR_REV2 *)__va(rsdp->xsdt_physical_address);
+                fadt = (fadt_descriptor_rev2 *)__va(xsdt->table_offset_entry [0]);
+        }else{
+                rsdt = (RSDT_DESCRIPTOR_REV2 *)__va(rsdp->rsdt_physical_address);
+                fadt = (fadt_descriptor_rev2 *)__va(rsdt->table_offset_entry [0]);
+        }
+        /*
+         * Verify the signature and the checksum
+         */
+        if (STRNCMP ((NATIVE_CHAR *) fadt->header.signature ,
+                     FADT_SIG, sizeof (FADT_SIG)-1) == 0 &&
+            hrt_acpi_checksum ((NATIVE_CHAR *)fadt, fadt->header.length) == 0) {
+                /*
+                 * looks good.  Again, based on revision,
+                 * pluck the addresses we want and get out.
+                 */
+                if ( rsdp->revision == 2){
+                        return (u32 )fadt->Xpm_tmr_blk.address;
+                }else{
+                        return (u32 )fadt->V1_pm_tmr_blk;
+                }
+        }
+        printk("ACPI: Signature or checksum failed on FADT\n");
+        return 0;
+}
+
+#else
+int acpi_get_firmware_table (
+	acpi_string             signature,
+	u32                     instance,
+	u32                     flags,
+	acpi_table_header       **table_pointer);
+
+extern  fadt_descriptor_rev2 acpi_fadt;
+__init
+u32
+hrt_get_acpi_pm_ptr(void)
+{
+        fadt_descriptor_rev2 *fadt = &acpi_fadt;
+        fadt_descriptor_rev2 local_fadt;
+
+        if (! fadt || !fadt->header.signature[0]){
+                fadt = &local_fadt;
+                acpi_get_firmware_table("FACP",1,0,(acpi_table_header **)&fadt);
+        }
+        if ( ! fadt|| !fadt->header.signature[0]){
+                printk("ACPI: Could not find the ACPI pm timer.");
+        }
+               
+        if ( fadt->header.revision == 2){
+                        return (u32)fadt->Xpm_tmr_blk.address;
+        }else{
+                        return (u32 )fadt->V1_pm_tmr_blk;
+        }
+}
+#endif
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/time.c linux/arch/i386/kernel/time.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/time.c	Thu Oct  3 10:41:57 2002
+++ linux/arch/i386/kernel/time.c	Wed Oct  9 14:08:47 2002
@@ -29,7 +29,10 @@
  *	Fixed a xtime SMP race (we need the xtime_lock rw spinlock to
  *	serialize accesses to xtime/lost_ticks).
  */
-
+/* 2002-8-13 George Anzinger  Modified for High res timers: 
+ *                            Copyright (C) 2002 MontaVista Software
+*/
+#define _INCLUDED_FROM_TIME_C
 #include <linux/errno.h>
 #include <linux/sched.h>
 #include <linux/kernel.h>
@@ -62,19 +65,20 @@
 
 extern spinlock_t i8259A_lock;
 
-#include "do_timer.h"
 
 /*
  * for x86_do_profile()
  */
 #include <linux/irq.h>
+#include <asm/sc_math.h>
+#include <linux/hrtime.h>
 
+#include "do_timer.h"
 u64 jiffies_64;
 
 unsigned long cpu_khz;	/* Detected as we calibrate the TSC */
 
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
+static __initdata unsigned long tsc_cycles_per_5_jiffies; /* set only if TSC */
 
 static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
 
@@ -88,7 +92,24 @@
 extern rwlock_t xtime_lock;
 extern unsigned long wall_jiffies;
 
+
+#ifndef CONFIG_HIGH_RES_TIMERS
+
+/* Number of usecs that the last interrupt was delayed */
+static int delay_at_last_interrupt;
+
+#endif  /* CONFIG_HIGH_RES_TIMERS */
+
 spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+/*
+ * We have three of these do_xxx_gettimeoffset() routines:
+ * do_fast_gettimeoffset(void) for TSC systems with out high-res-timers
+ * do_slow_gettimeoffset(void) for ~TSC systems with out high-res-timers
+ * do_highres__gettimeoffset(void) for systems with high-res-timers
+ *
+ * Pick the desired one at compile time...
+ */
+#if ! defined(CONFIG_HIGH_RES_TIMERS) && defined(CONFIG_X86_TSC)
 
 static inline unsigned long do_fast_gettimeoffset(void)
 {
@@ -109,23 +130,19 @@
 	 * Using a mull instead of a divl saves up to 31 clock cycles
 	 * in the critical path.
          */
-
-	__asm__("mull %2"
-		:"=a" (eax), "=d" (edx)
-		:"rm" (fast_gettimeoffset_quotient),
-		 "0" (eax));
-
+	edx = mpy_sc32(eax, fast_gettimeoffset_quotient);
 	/* our adjusted time offset in microseconds */
 	return delay_at_last_interrupt + edx;
 }
+#define do_gettimeoffset() do_fast_gettimeoffset()
+#endif
 
 #define TICK_SIZE (tick_nsec / 1000)
 
 spinlock_t i8253_lock = SPIN_LOCK_UNLOCKED;
 EXPORT_SYMBOL(i8253_lock);
 
-#ifndef CONFIG_X86_TSC
-
+#if  ! defined(CONFIG_HIGH_RES_TIMERS) && ! defined(CONFIG_X86_TSC)
 /* This function must be called with interrupts disabled 
  * It was inspired by Steve McCanne's microtime-i386 for BSD.  -- jrs
  * 
@@ -223,10 +240,21 @@
 
 static unsigned long (*do_gettimeoffset)(void) = do_slow_gettimeoffset;
 
-#else
+#endif
+
+#ifdef CONFIG_HIGH_RES_TIMERS
 
-#define do_gettimeoffset()	do_fast_gettimeoffset()
+static unsigned long do_highres_gettimeoffset(void)
+{
+        /*
+         * We are under the xtime_lock here.
+         */
+        long tmp = quick_get_cpuctr();
+        long rtn = arch_cycles_to_usec(tmp + sub_jiffie());
+	return rtn;
+}
 
+#define do_gettimeoffset() do_highres_gettimeoffset()
 #endif
 
 /*
@@ -241,16 +269,25 @@
 	read_lock_irqsave(&xtime_lock, flags);
 	usec = do_gettimeoffset();
 	{
+                /*
+                 * FIX ME***** Due to adjtime and such
+                 * this should be changed to actually update
+                 * wall time using the proper routine.
+                 * Otherwise we run the risk of time moving
+                 * backward due to different interpretations
+                 * of the jiffie.  I.e jiffie != 1/HZ
+                 * (but it is close).
+                 */
 		unsigned long lost = jiffies - wall_jiffies;
 		if (lost)
-			usec += lost * (1000000 / HZ);
+			usec += lost * (USEC_PER_SEC / HZ);
 	}
 	sec = xtime.tv_sec;
 	usec += (xtime.tv_nsec / 1000);
 	read_unlock_irqrestore(&xtime_lock, flags);
 
-	while (usec >= 1000000) {
-		usec -= 1000000;
+	while (usec >= USEC_PER_SEC) {
+		usec -= USEC_PER_SEC;
 		sec++;
 	}
 
@@ -268,10 +305,10 @@
 	 * made, and then undo it!
 	 */
 	tv->tv_usec -= do_gettimeoffset();
-	tv->tv_usec -= (jiffies - wall_jiffies) * (1000000 / HZ);
+	tv->tv_usec -= (jiffies - wall_jiffies) * (USEC_PER_SEC / HZ);
 
 	while (tv->tv_usec < 0) {
-		tv->tv_usec += 1000000;
+		tv->tv_usec += USEC_PER_SEC;
 		tv->tv_sec--;
 	}
 
@@ -361,7 +398,7 @@
  * timer_interrupt() needs to keep up the real-time clock,
  * as well as call the "do_timer()" routine every clocktick
  */
-static inline void do_timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+static inline void do_timer_interrupt(int irq, struct pt_regs *regs)
 {
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
@@ -381,36 +418,29 @@
 
 	do_timer_interrupt_hook(regs);
 
-	/*
+        /* 
+         * This is dumb for two reasons.  
+         * 1.) it is based on wall time which has not yet been updated.
+         * 2.) it is checked each tick for something that happens each
+         *     10 min.  Why not use a timer for it?  Much lower overhead,
+         *     in fact, zero if STA_UNSYNC is set.
+         */
+        /*
 	 * If we have an externally synchronized Linux clock, then update
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
 	if ((time_status & STA_UNSYNC) == 0 &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
-	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
-	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
+	    (xtime.tv_nsec ) >= 500000000 - ((unsigned) tick_nsec) / 2 &&
+	    (xtime.tv_nsec ) <= 500000000 + ((unsigned) tick_nsec) / 2) {
 		if (set_rtc_mmss(xtime.tv_sec) == 0)
 			last_rtc_update = xtime.tv_sec;
 		else
-			last_rtc_update = xtime.tv_sec - 600; /* do it again in 60 s */
+                        /* do it again in 60 s */	
+			last_rtc_update = xtime.tv_sec - 600; 
 	}
 	    
-#ifdef CONFIG_MCA
-	if( MCA_bus ) {
-		/* The PS/2 uses level-triggered interrupts.  You can't
-		turn them off, nor would you want to (any attempt to
-		enable edge-triggered interrupts usually gets intercepted by a
-		special hardware circuit).  Hence we have to acknowledge
-		the timer interrupt.  Through some incredibly stupid
-		design idea, the reset for IRQ 0 is done by setting the
-		high bit of the PPI port B (0x61).  Note that some PS/2s,
-		notably the 55SX, work fine if this is removed.  */
-
-		irq = inb_p( 0x61 );	/* read the current state */
-		outb_p( irq|0x80, 0x61 );	/* reset the IRQ */
-	}
-#endif
 }
 
 static int use_tsc;
@@ -422,24 +452,28 @@
  */
 void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
-	int count;
-
 	/*
 	 * Here we are in the timer irq handler. We just have irqs locally
 	 * disabled but we don't know if the timer_bh is running on the other
-	 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
+	 * CPU. We need to avoid to SMP race with it. NOTE: we don't need
 	 * the irq version of write_lock because as just said we have irq
 	 * locally disabled. -arca
 	 */
 	write_lock(&xtime_lock);
 
+#ifndef CONFIG_HIGH_RES_TIMERS
 	if (use_tsc)
 	{
+		int count;
 		/*
 		 * It is important that these two operations happen almost at
 		 * the same time. We do the RDTSC stuff first, since it's
 		 * faster. To avoid any inconsistencies, we need interrupts
 		 * disabled locally.
+                 * Note: It is dumb to put the spin_lock() between these two
+                 * operations since we are trying to sync the two clocks.
+                 * Also, the rdtscl is so fast, know one will know the
+                 * difference.
 		 */
 
 		/*
@@ -447,11 +481,11 @@
 		 * has the SA_INTERRUPT flag set. -arca
 		 */
 	
-		/* read Pentium cycle counter */
 
+		spin_lock(&i8253_lock);
+		/* read Pentium cycle counter */
 		rdtscl(last_tsc_low);
 
-		spin_lock(&i8253_lock);
 		outb_p(0x00, 0x43);     /* latch the count ASAP */
 
 		count = inb_p(0x40);    /* read the latched count */
@@ -461,13 +495,95 @@
 		count = ((LATCH-1) - count) * TICK_SIZE;
 		delay_at_last_interrupt = (count + LATCH/2) / LATCH;
 	}
- 
-	do_timer_interrupt(irq, NULL, regs);
+#endif /* ! CONFIG_HIGH_RES_TIMERS */ 
+	do_timer_interrupt(irq, regs);
 
+#ifdef CONFIG_MCA
+        /*
+         * This code mover here from do_timer_interrupt() as part of the
+         * high-res timers change because it should be done every interrupt
+         * but do_timer_interrupt() wants to return early if it is not a 
+         * "1/HZ" tick interrupt.  For non-high-res systems the code is in
+         * exactly the same location (i.e. it is moved from the tail of the
+         * above called function to the next thing after the function).
+         */
+	if( MCA_bus ) {
+		/* The PS/2 uses level-triggered interrupts.  You can't
+		turn them off, nor would you want to (any attempt to
+		enable edge-triggered interrupts usually gets intercepted by a
+		special hardware circuit).  Hence we have to acknowledge
+		the timer interrupt.  Through some incredibly stupid
+		design idea, the reset for IRQ 0 is done by setting the
+		high bit of the PPI port B (0x61).  Note that some PS/2s,
+		notably the 55SX, work fine if this is removed.  */
+
+		irq = inb_p( 0x61 );	/* read the current state */
+		outb_p( irq|0x80, 0x61 );	/* reset the IRQ */
+	}
+#endif
 	write_unlock(&xtime_lock);
 
 }
 
+#ifdef CONFIG_HIGH_RES_TIMERS
+/*
+ * ALL_PERIODIC mode is used if we MUST support the NMI watchdog.  In this
+ * case we must continue to provide interrupts even if they are not serviced.
+ * In this mode, we leave the chip in periodic mode programmed to interrupt
+ * every jiffie.  This is done by, for short intervals, programming a short
+ * time, waiting till it is loaded and then programming the 1/HZ.  The chip
+ * will not load the 1/HZ count till the short count expires.  If the last
+ * interrupt was programmed to be short, we need to program another short
+ * to cover the remaining part of the jiffie and can then just leave the
+ * chip alone.  Note that is is also a low overhead way of doing things as
+ * we do not have to mess with the chip MOST of the time.
+ */
+
+int _schedule_next_int(unsigned long jiffie_f,long sub_jiffie_in, int always)
+{
+        long sub_jiff_offset; 
+        IF_ALL_PERIODIC( 
+		int * last_was_long = &_last_was_long[smp_processor_id()];
+		if ((sub_jiffie_in == -1) && *last_was_long) return 0);
+        /* 
+         * First figure where we are in time. 
+         * A note on locking.  We are under the timerlist_lock here.  This
+         * means that interrupts are off already, so don't use irq versions.
+         */
+        if_SMP( read_lock(&xtime_lock));
+
+        sub_jiff_offset = quick_update_jiffies_sub(jiffie_f);
+
+        if_SMP( read_unlock(&xtime_lock));
+
+
+        if ((IF_ALL_PERIODIC( *last_was_long =) (sub_jiffie_in == -1 ))) {
+
+                sub_jiff_offset = cycles_per_jiffies - sub_jiff_offset;
+
+        }else{
+                 sub_jiff_offset = sub_jiffie_in - sub_jiff_offset;
+        }
+        /*
+         * If time is already passed, just return saying so.
+         */
+        if (! always && (sub_jiff_offset <  high_res_test_val)){
+                IF_ALL_PERIODIC( *last_was_long = 0);
+                return 1;
+        }
+        reload_timer_chip(sub_jiff_offset);
+        return 0;
+}
+
+#ifdef CONFIG_APM
+void restart_timer(void)
+{
+        start_PIT();
+}
+#endif /* CONFIG__APM */
+#endif /* CONFIG_HIGH_RES_TIMERS */
+
+
 /* not static: needed by APM */
 unsigned long get_cmos_time(void)
 {
@@ -510,6 +626,26 @@
 	return mktime(year, mon, day, hour, min, sec);
 }
 
+#define CAL_JIFS 5
+#define CALIBRATE_LATCH	(((CAL_JIFS * CLOCK_TICK_RATE) + HZ/2)/HZ)
+#define CALIBRATE_TIME	((CAL_JIFS * USEC_PER_SEC)/HZ)
+#define CALIBRATE_TIME_NSEC (CAL_JIFS * (NSEC_PER_SEC/HZ))
+
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+void __init hrtimer_init(void)
+{
+        /*
+         * The init_hrtimers macro is in the choosen support package
+         * depending on the clock source, PIT, TSC, or ACPI pm timer.
+	 */
+        init_hrtimers();
+        start_PIT();
+}
+#else
+#define hrtimer_init()
+#endif /* ! CONFIG_HIGH_RES_TIMERS */ 
+
 /* ------ Calibrate the TSC ------- 
  * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
  * Too much 64-bit arithmetic here to do this cleanly in C, and for
@@ -519,8 +655,6 @@
  * device.
  */
 
-#define CALIBRATE_LATCH	(5 * LATCH)
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
 
 #ifdef CONFIG_X86_TSC
 static unsigned long __init calibrate_tsc(void)
@@ -571,6 +705,14 @@
 		/* Error: ECPUTOOSLOW */
 		if (endlow <= CALIBRATE_TIME)
 			goto bad_ctc;
+                /*
+                 * endlow at this point is CAL_JIFS*arch clocks
+                 * per jiffie.  Set up the value for 
+                 * high_res use. Note: keep the whole
+                 * value for now, hrtimer_init will do
+                 * the divide (want that precision).
+                 */
+                tsc_cycles_per_5_jiffies = endlow;
 
 		__asm__("divl %2"
 			:"=a" (endlow), "=d" (endhigh)
@@ -585,6 +727,9 @@
 	 * 32 bits..
 	 */
 bad_ctc:
+#ifdef CONFIG_HIGH_RES_TIMERS
+        printk("******************** TSC calibrate failed!\n");
+#endif
 	return 0;
 }
 #endif /* CONFIG_X86_TSC */
@@ -658,6 +803,7 @@
 	
 	xtime.tv_sec = get_cmos_time();
 	xtime.tv_nsec = 0;
+        IF_HIGH_RES(tick_nsec = NSEC_PER_SEC / HZ);
 
 /*
  * If we have APM enabled or the CPU clock speed is variable
@@ -700,17 +846,19 @@
 #ifndef do_gettimeoffset
 			do_gettimeoffset = do_fast_gettimeoffset;
 #endif
+                        /*
+                         * Kick off the high res timers
+                         */
+                        hrtimer_init();
 
 			/* report CPU clock rate in Hz.
 			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
 			 * clock/second. Our precision is about 100 ppm.
 			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
+			cpu_khz = div_sc32( 1000, tsc_quotient);
+			{	
+				printk("Detected %lu.%03lu MHz processor.\n", 
+				       cpu_khz / 1000, cpu_khz % 1000);
 			}
 #ifdef CONFIG_CPU_FREQ
 			cpufreq_register_notifier(&time_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER);
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/mach-generic/do_timer.h linux/arch/i386/mach-generic/do_timer.h
--- linux-2.5.41-bk2-core/arch/i386/mach-generic/do_timer.h	Thu Sep 26 11:23:49 2002
+++ linux/arch/i386/mach-generic/do_timer.h	Wed Oct  9 14:08:47 2002
@@ -14,6 +14,11 @@
 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
 	do_timer(regs);
+        IF_HIGH_RES(
+                if (!(new_jiffie() & 1))
+                        return;
+                jiffies_intr = 0;
+                )
 /*
  * In the SMP case we use the local APIC timer interrupt to do the
  * profiling, except when we simulate SMP mode on a uniprocessor
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-M386.h linux/include/asm-i386/hrtime-M386.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-M386.h	Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-M386.h	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,247 @@
+/*
+ *
+ * File: include/asm-i386/hrtime-M386.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas.  Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference.  Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ * 
+ * Authors: Balaji S., Raghavan Menon
+ *	    Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Thanx to Michael Barabanov for helping me with the non-pentium code.
+ *
+ * Please send bug-reports/suggestions/comments to utime@ittc.ukans.edu
+ * 
+ * Further details about this project can be obtained at
+ *    http://hegel.ittc.ukans.edu/projects/utime/ 
+ *    or in the file Documentation/utime.txt
+ */
+/* This is in case its not a pentuim or a ppro.
+ * we dont have access to the cycle counters
+ */
+/* 
+ * This code swiped from the utime project to support high res timers
+ * Principle thief George Anzinger george@mvista.com
+ */
+#ifndef _ASM_HRTIME_M386_H
+#define _ASM_HRTIME_M386_H
+
+#ifdef __KERNEL__
+
+
+extern int base_c0,base_c0_offset;
+#define timer_latch_reset(x) _timer_latch_reset = x
+extern int _timer_latch_reset;
+
+/*
+ * Never call this routine with local ints on.
+ * update_jiffies_sub()
+ */
+
+extern inline unsigned int read_timer_chip(void)
+{
+	unsigned int next_intr;
+
+	LATCH_CNT0();
+	READ_CNT0(next_intr);
+	return next_intr;
+}
+
+#define HR_SCALE_ARCH_NSEC 20
+#define HR_SCALE_ARCH_USEC 30
+#define HR_SCALE_NSEC_ARCH 32
+#define HR_SCALE_USEC_ARCH 29
+
+#define cf_arch_to_usec (SC_n(HR_SCALE_ARCH_USEC,1000000)/ \
+                           (long long)CLOCK_TICK_RATE)
+
+extern inline int arch_cycles_to_usec(long update)
+{
+	return (mpy_sc_n(HR_SCALE_ARCH_USEC, update ,arch_to_usec));
+}
+#define cf_arch_to_nsec (SC_n(HR_SCALE_ARCH_NSEC,1000000000)/ \
+                           (long long)CLOCK_TICK_RATE)
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+        return mpy_sc_n(HR_SCALE_ARCH_NSEC,  update, arch_to_nsec);
+}
+/* 
+ * And the other way...
+ */
+#define cf_usec_to_arch (SC_n( HR_SCALE_USEC_ARCH,CLOCK_TICK_RATE)/ \
+                                            (long long)1000000)
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+        return mpy_sc_n(HR_SCALE_USEC_ARCH,usec,usec_to_arch);
+}
+#define cf_nsec_to_arch (SC_n( HR_SCALE_NSEC_ARCH,CLOCK_TICK_RATE)/ \
+                                            (long long)1000000000)
+extern inline int nsec_to_arch_cycles(long nsec)
+{
+        return (mpy_ex32(nsec,nsec_to_arch));
+}
+/*
+ * If this is defined otherwise to allow NTP adjusting, it should
+ * be scaled by about 16 bits (or so) to allow small percentage
+ * changes
+ */
+#define arch_cycles_to_latch(x) x
+/*
+ * This function updates base_c0
+ * This function is always called under the write_lock_irq(&xtime_lock)
+ * It returns the number of "clocks" since the last call to it.
+ *
+ * There is a problem having a counter that has a period the same as it is
+ * interagated.  I.e. did it just roll over or has a very short time really
+ * elapsed.  (One of the reasons one should not use the PIT for both ints
+ * and time.)  We will take the occurance of an interrupt since last time
+ * to indicate that the counter has reset.  This will work for the 
+ * get_cpuctr() code but is flawed for the quick_get_cpuctr() as it is
+ * called when ever time is requested.  For that code, we make sure that
+ * we never move backward in time.
+ */
+extern inline  unsigned long get_cpuctr(void)
+{
+	int c0;
+	long rtn;
+
+	spin_lock(&i8253_lock);
+	c0 = read_timer_chip();
+
+        rtn = base_c0 - c0 + _timer_latch_reset;
+
+//	if (rtn < 0) {
+//                rtn += _timer_latch_reset;
+//        }
+	base_c0 = c0;
+        base_c0_offset = 0;
+	spin_unlock(&i8253_lock);
+
+	return rtn;
+}
+/*
+ * In an SMP system this is called under the read_lock_irq(xtime_lock)
+ * In a UP system it is also called with this lock (PIT case only)
+ * It returns the number of "clocks" since the last call to get_cpuctr (above).
+ */
+extern inline unsigned long quick_get_cpuctr(void)
+{
+	register  int c0;
+        long rtn;
+
+	spin_lock(&i8253_lock);
+	c0 = read_timer_chip();
+        /*
+         * If the new count is greater than 
+         * the last one (base_c0) the chip has just rolled and an 
+         * interrupt is pending.  To get the time right. We need to add
+         * _timer_latch_reset to the answer.  All this is true if only
+         * one roll is involved, but base_co should be updated at least
+         * every 1/HZ.
+         */
+        rtn = base_c0 - c0;
+	if (rtn < base_c0_offset) {
+                rtn += _timer_latch_reset;
+        }
+        base_c0_offset = rtn;
+	spin_unlock(&i8253_lock);
+        return rtn;
+}
+
+#ifdef _INCLUDED_FROM_TIME_C
+int base_c0 = 0;
+int base_c0_offset = 0;
+struct timer_conversion_bits timer_conversion_bits = {
+        _cycles_per_jiffies: (LATCH),
+        _nsec_to_arch:       cf_nsec_to_arch,
+        _usec_to_arch:       cf_usec_to_arch,
+        _arch_to_nsec:       cf_arch_to_nsec,
+        _arch_to_usec:       cf_arch_to_usec,
+        _arch_to_latch:      1
+};
+int _timer_latch_reset;
+
+#define set_last_timer_cc() (void)(1)
+
+/* This returns the correct cycles_per_sec from a calibrated one
+ */
+#define arch_hrtime_init(x) (CLOCK_TICK_RATE)
+
+/*
+ * The reload_timer_chip routine is called under the timerlist lock (irq off)
+ * and, in SMP, the xtime_lock.  We also take the i8253_lock for the chip access
+ */
+
+extern inline void reload_timer_chip( int new_latch_value)
+{
+	int c1, c1new, delta;
+        unsigned char pit_status;
+	/*
+         * In put value is in timer units for the 386 platform.
+         * We must be called with irq disabled.
+	 */
+	spin_lock(&i8253_lock);
+	/*
+         * we need to get this last value of the timer chip
+	 */
+	LATCH_CNT0_AND_CNT1();
+	READ_CNT0(delta);
+	READ_CNT1(c1);
+	base_c0 -= delta;
+
+	new_latch_value = arch_cycles_to_latch( new_latch_value );
+        if (new_latch_value < TIMER_DELTA){
+                new_latch_value = TIMER_DELTA;
+        }
+        IF_ALL_PERIODIC( put_timer_in_periodic_mode());
+        outb_p(new_latch_value & 0xff, PIT0);	/* LSB */
+	outb(new_latch_value >> 8, PIT0);	/* MSB */
+        do {
+                outb_p(PIT0_LATCH_STATUS,PIT_COMMAND);
+                pit_status = inb(PIT0);
+        }while (pit_status & PIT_NULL_COUNT);
+        do {
+                LATCH_CNT0_AND_CNT1();
+                READ_CNT0(delta);
+                READ_CNT1(c1new);
+        } while (!(((new_latch_value-delta)&0xffff) < 15));
+
+        IF_ALL_PERIODIC(
+                outb_p(LATCH & 0xff, PIT0);	/* LSB */
+                outb(LATCH >> 8, PIT0);	        /* MSB */
+                )
+
+	/*
+         * this is assuming that counter one is latched on with
+	 * 18 as the value
+	 * Most BIOSes do this i guess....
+	 */
+        //IF_DEBUG(if (delta > 50000) BREAKPOINT);
+        c1 -= c1new;
+	base_c0 += ((c1 < 0) ? (c1 + 18) : (c1)) + delta;
+        if ( base_c0 < 0 ){
+                base_c0 += _timer_latch_reset;
+        }
+	spin_unlock(&i8253_lock);
+	return;
+}
+/*
+ * No run time conversion factors need to be set up as the PIT has a fixed
+ * speed.
+ */
+#define init_hrtimers()
+
+#endif /* _INCLUDED_FROM_HRTIME_C_ */
+#define final_clock_init()
+#endif /* __KERNEL__ */
+#endif /* _ASM_HRTIME_M386_H */
+
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-M586.h linux/include/asm-i386/hrtime-M586.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-M586.h	Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-M586.h	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,165 @@
+/*
+ * UTIME: On-demand Microsecond Resolution Timers
+ * ----------------------------------------------
+ *
+ * File: include/asm-i586/hrtime-Macpi.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas.  Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference.  Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ * 
+ * Authors: Balaji S., Raghavan Menon
+ *	    Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Please send bug-reports/suggestions/comments to utime@ittc.ukans.edu
+ * 
+ * Further details about this project can be obtained at
+ *    http://hegel.ittc.ukans.edu/projects/utime/ 
+ *    or in the file Documentation/utime.txt
+ */
+/* 
+ * This code swiped from the utime project to support high res timers
+ * Principle thief George Anzinger george@mvista.com
+ */
+#include <asm/msr.h>
+#ifndef _ASM_HRTIME_M586_H
+#define _ASM_HRTIME_M586_H
+
+#ifdef __KERNEL__
+
+#ifdef _INCLUDED_FROM_TIME_C
+/*
+ * This gets redefined when we calibrate the TSC
+ */
+struct timer_conversion_bits timer_conversion_bits = {
+        _cycles_per_jiffies: LATCH
+};
+#endif
+
+/*
+ * This define avoids an ugly ifdef in time.c
+ */
+#define get_cpuctr_from_timer_interrupt()
+#define timer_latch_reset(s)
+
+/* NOTE: When trying to port this to other architectures define
+ * this to be (void)(1) (ie. #define set_last_timer_cc() (void)(1))
+ * otherwise sched.c would give an undefined reference
+ */
+
+// think this is old cruft... extern void set_last_timer_cc(void);
+/*
+ * These are specific to the pentium counters
+ */
+extern inline unsigned long get_cpuctr(void)
+{
+        /*
+         * We are interested only in deltas so we just use the low bits
+         * at 1GHZ this should be good for 4.2 seconds, at 100GHZ 42 ms
+         */
+	unsigned long old = last_update;
+        rdtscl(last_update);
+	return last_update - old;
+}
+extern inline unsigned long quick_get_cpuctr(void)
+{
+	unsigned long value;
+        rdtscl(value);
+	return value - last_update;
+}
+#define arch_hrtime_init(x) (x)
+
+extern unsigned long long base_cpuctr;
+extern unsigned long base_jiffies;
+/* 
+ * We use various scaling.  The sc32 scales by 2**32, sc_n by the first parm.
+ * When working with constants, choose a scale such that x/n->(32-scale)< 1/2.
+ * So for 1/3 <1/2 so scale of 32, where as 3/1 must be shifted 3 times (3/8) to
+ * be less than 1/2 so scale should be 29
+ *
+ * The principle high end is when we can no longer keep 1/HZ worth of arch
+ * time (TSC counts) in an integer.  This will happen somewhere between 40GHz and
+ * 50GHz with HZ set to 100.  For now we are cool and the scale of 24 works for 
+ * the nano second to arch from 2MHz to 40+GHz.  
+ */
+#define HR_TIME_SCALE_NSEC 22
+#define HR_TIME_SCALE_USEC 14
+extern inline int arch_cycles_to_usec(unsigned long update) 
+{
+	return (mpy_sc32(update ,arch_to_usec));
+}
+/*
+ * We use the same scale for both the pit and the APIC
+ */
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+        return (mpy_sc32(update ,arch_to_latch));
+}
+#define compute_latch(APIC_clocks_jiffie) arch_to_latch = \
+                                             div_sc32(APIC_clocks_jiffie, \
+				                      cycles_per_jiffies);
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+        return mpy_sc_n(HR_TIME_SCALE_NSEC,  update, arch_to_nsec);
+}
+/* 
+ * And the other way...
+ */
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+        return mpy_sc_n(HR_TIME_SCALE_USEC,usec,usec_to_arch);
+}
+extern inline int nsec_to_arch_cycles(unsigned long nsec)
+{
+        return mpy_sc_n(HR_TIME_SCALE_NSEC,nsec,nsec_to_arch);
+}
+
+EXTERN int pit_pgm_correction;
+
+#ifdef _INCLUDED_FROM_TIME_C
+
+#include <asm/io.h>
+
+
+#ifndef USEC_PER_SEC
+#define USEC_PER_SEC 1000000
+#endif
+        /*
+         * Code for runtime calibration of high res timers
+         * Watch out, cycles_per_sec will overflow when we
+         * get a ~ 2.14 GHz machine...
+         * We are starting with tsc_cycles_per_5_jiffies set to 
+         * 5 times the actual value (as set by 
+         * calibrate_tsc() ).
+	 */
+#define init_hrtimers() \
+        arch_to_usec = fast_gettimeoffset_quotient; \
+ \
+        arch_to_latch = div_ll_X_l(mpy_l_X_l_ll(fast_gettimeoffset_quotient, \
+                                                CLOCK_TICK_RATE),           \
+                                   (USEC_PER_SEC));          \
+\
+        arch_to_nsec = div_sc_n(HR_TIME_SCALE_NSEC, \
+                               CALIBRATE_TIME * NSEC_PER_USEC, \
+                               tsc_cycles_per_5_jiffies); \
+ \
+        nsec_to_arch = div_sc_n(HR_TIME_SCALE_NSEC, \
+                                tsc_cycles_per_5_jiffies, \
+                                CALIBRATE_TIME * NSEC_PER_USEC); \
+        usec_to_arch = div_sc_n(HR_TIME_SCALE_USEC, \
+                                tsc_cycles_per_5_jiffies, \
+                                CALIBRATE_TIME ); \
+        cycles_per_jiffies = tsc_cycles_per_5_jiffies / CAL_JIFS;  
+
+
+#endif   /* _INCLUDED_FROM_HRTIME_C */
+#endif				/* __KERNEL__ */
+#endif				/* _ASM_HRTIME-M586_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-Macpi.h linux/include/asm-i386/hrtime-Macpi.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-Macpi.h	Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-Macpi.h	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,214 @@
+/*
+ *
+ * File: include/asm-i386/hrtime-Macpi.h 
+ * Copyright (C) 2001 by MontaVista Software,
+
+ * This software may be used and distributed according to the terms of
+ * the GNU Public License, incorporated herein by reference.
+
+ */
+#include <asm/msr.h>
+#include <asm/io.h>
+#ifndef _ASM_HRTIME_Macpi_H
+#define _ASM_HRTIME_Macpi_H
+
+#ifdef __KERNEL__
+
+/*
+ * This define avoids an ugly ifdef in time.c
+ */
+#define timer_latch_reset(s)
+
+/* NOTE: When trying to port this to other architectures define
+ * this to be (void)(1) (ie. #define set_last_timer_cc() (void)(1))
+ * otherwise sched.c would give an undefined reference
+ */
+
+extern void set_last_timer_cc(void);
+/*
+ * These are specific to the ACPI pm counter
+ * The spec says the counter can be either 32 or 24 bits wide.  We treat them
+ * both as 24 bits.  Its faster than doing the test.
+ */
+#define SIZE_MASK 0xffffff
+
+extern int acpi_pm_tmr_address;
+
+extern inline unsigned long get_cpuctr(void)
+{
+        static long old;
+
+        old = last_update;
+        last_update = inl(acpi_pm_tmr_address);
+        return (last_update - old) & SIZE_MASK;
+}
+extern inline unsigned long quick_get_cpuctr(void)
+{
+        return (inl(acpi_pm_tmr_address) - last_update) & SIZE_MASK;
+}
+#define arch_hrtime_init(x) (x)
+
+
+/* 
+ * We use various scaling.  The sc32 scales by 2**32, sc_n by the first parm.
+ * When working with constants, choose a scale such that x/n->(32-scale)< 1/2.
+ * So for 1/3 <1/2 so scale of 32, where as 3/1 must be shifted 3 times (3/8) to
+ * be less than 1/2 so scale should be 29
+ *
+ */
+#define HR_SCALE_ARCH_NSEC 22
+#define HR_SCALE_ARCH_USEC 32
+#define HR_SCALE_NSEC_ARCH 32
+#define HR_SCALE_USEC_ARCH 29
+
+#ifndef  PM_TIMER_FREQUENCY 
+#define PM_TIMER_FREQUENCY  3579545/*45   counts per second */
+#endif
+#define PM_TIMER_FREQUENCY_x_100  357954545  /* counts per second * 100*/
+
+#define cf_arch_to_usec (SC_32(100000000)/(long long)PM_TIMER_FREQUENCY_x_100)
+extern inline int arch_cycles_to_usec(unsigned long update) 
+{
+	return (mpy_sc32(update ,arch_to_usec));
+}
+#ifndef CONFIG_
+/*
+ * We need to take 1/3 of the presented value (or more exactly)
+ * CLOCK_TICK_RATE /PM_TIMER_FREQUENCY.  Note that these two timers
+ * are on the same cyrstal so will be EXACTLY 1/3.
+ */
+#define cf_arch_to_latch SC_32(CLOCK_TICK_RATE)/(long long)(CLOCK_TICK_RATE * 3)
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+        return (mpy_sc32(update ,arch_to_latch));
+}
+#else
+/*
+ * APIC clocks run from a low of 33MH to say 200MH.  The PM timer
+ * runs about 3.5 MH.  We want to scale so that ( APIC << scale )/PM
+ * is less 2 ^ 32.  Lets use 2 ^ 19, leaves plenty of room.
+ */
+#define HR_SCALE_ARCH_LATCH 19
+
+#define compute_latch(APIC_clocks_jiffie) arch_to_latch = div_sc_n(   \
+                                                    HR_SCALE_ARCH_LATCH,   \
+				                    APIC_clocks_jiffie,   \
+				                    cycles_per_jiffies);
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+        return (mpy_sc_n(HR_SCALE_ARCH_LATCH, update ,arch_to_latch));
+}
+	
+#endif
+
+#define cf_arch_to_nsec (SC_n(HR_SCALE_ARCH_NSEC,100000000000LL)/ \
+                           (long long)PM_TIMER_FREQUENCY_x_100)
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+        return mpy_sc_n(HR_SCALE_ARCH_NSEC,  update, arch_to_nsec);
+}
+/* 
+ * And the other way...
+ */
+#define cf_usec_to_arch (SC_n( HR_SCALE_USEC_ARCH,PM_TIMER_FREQUENCY_x_100)/ \
+                                            (long long)100000000)
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+        return mpy_sc_n(HR_SCALE_USEC_ARCH,usec,usec_to_arch);
+}
+#define cf_nsec_to_arch (SC_n( HR_SCALE_NSEC_ARCH,PM_TIMER_FREQUENCY)/ \
+                                            (long long)1000000000)
+extern inline int nsec_to_arch_cycles(unsigned long nsec)
+{
+        return mpy_sc32(nsec,nsec_to_arch);
+}
+
+//EXTERN int pit_pgm_correction;
+
+#ifdef _INCLUDED_FROM_TIME_C
+
+#include <asm/io.h>
+struct timer_conversion_bits timer_conversion_bits = {
+        _cycles_per_jiffies: ((PM_TIMER_FREQUENCY + HZ/2) / HZ),
+        _nsec_to_arch:       cf_nsec_to_arch,
+        _usec_to_arch:       cf_usec_to_arch,
+        _arch_to_nsec:       cf_arch_to_nsec,
+        _arch_to_usec:       cf_arch_to_usec,
+        _arch_to_latch:      cf_arch_to_latch
+};
+int acpi_pm_tmr_address;
+
+
+/*
+ * No run time conversion factors need to be set up as the pm timer has a fixed
+ * speed.
+ */
+/*
+ * Here we have a local udelay for our init use only.  The system delay has
+ * has not yet been calibrated when we use this, however, we do know
+ * tsc_cycles_per_5_jiffies...
+ */
+extern unsigned long tsc_cycles_per_5_jiffies;
+
+static inline __init void hrt_udelay(int usec)
+{
+        long now,end;
+        rdtscl(end);
+        end += (usec * tsc_cycles_per_5_jiffies) / (USEC_PER_JIFFIES * 5);
+        do {rdtscl(now);} while((end - now) > 0);
+
+}
+extern int hrt_get_acpi_pm_ptr(void);
+
+#if defined( CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD) && CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD > 0
+#define default_pm_add CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD
+#define message "High-res-timers: ACPI pm timer not found.  Trying specified address %d\n"
+#else
+#define default_pm_add 0
+#define message \
+        "High-res-timers: ACPI pm timer not found(%d) and no backup."\
+        "\nCheck BIOS settings or supply a backup.  See configure documentation.\n"
+#endif
+#define fail_message \
+"High-res-timers: >-<--><-->-<-->-<-->-<--><-->-<-->-<-->-<-->-<-->-<-->-<-->-<\n"\
+"High-res-timers: >Failed to find the ACPI pm timer                           <\n"\
+"High-res-timers: >-<--><-->-<-->-<-->-<-->Boot will fail in Calibrate Delay  <\n"\
+"High-res-timers: >Supply a valid default pm timer address                    <\n"\
+"High-res-timers: >or get your BIOS to turn on ACPI support.                  <\n"\
+"High-res-timers: >See CONFIGURE help for more information.                   <\n"\
+"High-res-timers: >-<--><-->-<-->-<-->-<--><-->-<-->-<-->-<-->-<-->-<-->-<-->-<\n"
+/*
+ * After we get the address, we set last_update to the current timer value
+ */
+static inline __init void  init_hrtimers(void)
+{
+        acpi_pm_tmr_address = hrt_get_acpi_pm_ptr(); 
+        if (!acpi_pm_tmr_address){                    
+                printk(message,default_pm_add);
+                if ( (acpi_pm_tmr_address = default_pm_add)){
+                        last_update +=  quick_get_cpuctr();
+                        hrt_udelay(4);
+                       if (!quick_get_cpuctr()){
+                                printk("High-res-timers: No ACPI pm timer found at %d.\n",
+                                       acpi_pm_tmr_address);
+                                acpi_pm_tmr_address = 0;
+                        } 
+                } 
+        }else{
+                if (default_pm_add != acpi_pm_tmr_address) {
+                        printk("High-res-timers: Ignoring supplied default ACPI pm timer address.\n"); 
+                }
+                last_update +=  quick_get_cpuctr();
+        }
+        if (!acpi_pm_tmr_address){
+                printk(fail_message);
+        }else{
+                printk("High-res-timers: Found ACPI pm timer at %d\n",
+                       acpi_pm_tmr_address);
+        }
+}
+
+#endif   /* _INCLUDED_FROM_TIME_C_ */
+#endif				/* __KERNEL__ */
+#endif				/* _ASM_HRTIME-Mapic_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime.h linux/include/asm-i386/hrtime.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime.h	Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime.h	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,482 @@
+/*
+ *
+ * File: include/asm-i386/hrtime.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.  
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas.  Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference.  Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ * 
+ * Authors: Balaji S., Raghavan Menon
+ *	    Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Please send bug-reports/suggestions/comments to utime@ittc.ukans.edu
+ * 
+ * Further details about this project can be obtained at
+ *    http://hegel.ittc.ukans.edu/projects/utime/ 
+ *    or in the file Documentation/high-res-timers/
+ */
+/*
+ * This code purloined from the utime project for high res timers.
+ * Principle modifier George Anzinger george@mvista.com
+ */
+#ifndef _I386_HRTIME_H
+#define _I386_HRTIME_H
+#ifdef __KERNEL__
+
+#include <linux/config.h>	/* for CONFIG_APM etc... */
+#include <asm/types.h>		/* for u16s */
+#include <asm/io.h>
+#include <asm/sc_math.h>        /* scaling math routines */
+#include <asm/delay.h>
+/*
+ * What "IF_ALL_PERIODIC" does it to set up the PIT so that it always,
+ * if we don't touch it again, will tick at a 1/HZ rate.  This is done
+ * by programing the interrupt we want and, once it it loaded, dropping
+ * a 1/HZ program on top of it.  The PIT will give us the desired interrupt
+ * and, at interrupt time, load the 1/HZ program.  So...
+
+ * If no sub 1/HZ ticks are needed AND we are aligned with the 1/HZ 
+ * boundry, we don't need to touch the PIT.  Otherwise we do the above.
+
+ * In theory you could turn this off, but it has been so long....
+
+ * There are two reasons to keep this:
+ * 1. The NMI watchdog uses the timer interrupt to generate the NMI interrupts.
+ * 2. We don't have to touch the PIT unless we have a sub jiffie event in
+ *    the next 1/HZ interval (unless we drift away from the 1/HZ boundry).
+ */
+#if 1
+#define IF_ALL_PERIODIC(a) a
+#else
+#define IF_ALL_PERIODIC(a)
+#endif
+
+
+/*
+ * The high-res-timers option is set up to self configure with different 
+ * platforms.  It is up to the platform to provide certian macros which
+ * override the default macros defined in system without (or with disabled)
+ * high-res-timers.
+ *
+ * To do high-res-timers at some fundamental level the timer interrupt must
+ * be seperated from the time keeping tick.  A tick can still be generated
+ * by the timer interrupt, but it may be surrounded by non-tick interrupts.
+ * It is up to the platform to determine if a particular interrupt is a tick,
+ * and up to the timer code (in timer.c) to determine what time events have
+ * expired.
+ *
+ * Macros:
+ * update_jiffies()  This macro is to compute the new value of jiffie and 
+ *                   sub_jiffie.  If high-res-timers are not available it
+ *                   may be assumed that this macro will be called once
+ *                   every 1/HZ and so should reduce to:
+ *
+ * 	(*(u64 *)&jiffies_64)++;
+ *
+ * sub_jiffie, in this case will always be zero, and need not be addressed.
+ * It is assumed that the sub_jiffie is in platform defined units and runs
+ * from 0 to a value which represents 1/HZ on that platform.  (See conversion
+ * macro requirements below.)
+ * If high-res-timers are available, this macro will be called each timer
+ * interrupt which may be more often than 1/HZ.  It is up to the code to 
+ * determine if a new jiffie has just started and pass this info to:
+ *
+ * new_jiffie() which should return true if the last call to update_jiffie()
+ *              moved the jiffie count (as apposed to just the sub_jiffie).
+ *              For systems without high-res-timers the kernel will predefine
+ *              this to be 0 which will allow the compiler to optimize the code
+ *              for this case.  In SMP systems this should be set to all 1's
+ *              as it is used in a per cpu fashion to indicate that a paricular
+ *              cpu needs to run the accounting code.  It should result
+ *              in a variable that can be cast to a volital long and of
+ *              which the address can be taken.
+ *
+ * schedule_next_int(jiffie_f,sub_jiffie_v,always) is a macro that the 
+ *                                 platform should 
+ *                                 provide that will set up the timer interrupt 
+ *                                 hardware to interrupt at the absolute time
+ *                                 defined by jiffie_f,sub_jiffie_v where the 
+ *                                 units are 1/HZ and the platform defined 
+ *                                 sub_jiffie unit.  This function must 
+ *                                 determine the actual current time and the 
+ *                                 requested offset and act accordingly.  A 
+ *                                 sub_jiffie_v value of -1 should be 
+ *                                 understood to mean the next even jiffie 
+ *                                 regardless of the jiffie_f value.  If 
+ *                                 the current jiffie is not jiffie_f, it 
+ *                                 may be assumed that the requested time 
+ *                                 has passed and an immeadiate interrupt 
+ *                                 should be taken.  If high-res-timers are 
+ *                                 not available, this macro should evaluate 
+ *                                 to nil.  This macro may return 1 if always
+ *                                 if false AND the requested time has passed.
+ *                                 "Always" indicates that an interrupt is
+ *                                 required even if the time has already passed.
+ */
+
+
+/*
+ * no of usecs less than which events cannot be scheduled
+ */
+#define TIMER_DELTA  5
+
+#ifdef _INCLUDED_FROM_TIME_C
+#define EXTERN
+int timer_delta = TIMER_DELTA;
+#else 
+#define EXTERN  extern 
+extern int timer_delta;
+#endif
+
+#define CONFIG_HIGH_RES_RESOLUTION 1000    // nano second resolution 
+                                           // we will use for high res.
+
+#define USEC_PER_JIFFIES  (1000000/HZ)
+/*
+ * This is really: x*(CLOCK_TICK_RATE+HZ/2)/1000000
+ * Note that we can not figure the constant part at
+ * compile time because we would loose precision.
+ */
+#define PIT0_LATCH_STATUS 0xc2
+#define PIT0 0x40
+#define PIT1 0x41
+#define PIT_COMMAND 0x43
+#define PIT0_ONE_SHOT 0x38
+#define PIT0_PERIODIC 0x34
+#define PIT0_LATCH_COUNT 0xd2
+#define PIT01_LATCH_COUNT 0xd6
+#define PIT_NULL_COUNT 0x40
+#define READ_CNT0(varr) {varr = inb(PIT0);varr += (inb(PIT0))<<8;}
+#define READ_CNT1(var) { var = inb(PIT1); }
+#define LATCH_CNT0() { outb(PIT0_LATCH_COUNT,PIT_COMMAND); }
+#define LATCH_CNT0_AND_CNT1() { outb(PIT01_LATCH_COUNT,PIT_COMMAND); }
+
+#define TO_LATCH(x) (((x)*LATCH)/USEC_PER_JIFFIES)
+
+#define sub_jiffie() _sub_jiffie
+#define schedule_next_int(a,b,c)  _schedule_next_int(a,b,c)
+
+#define update_jiffies() update_jiffies_sub()
+#define new_jiffie() _new_jiffie
+#define high_res_test() high_res_test_val = -  cycles_per_jiffies;
+#define high_res_end_test() high_res_test_val = 0;
+
+extern unsigned long next_intr;
+extern spinlock_t i8253_lock;
+extern rwlock_t xtime_lock;
+
+extern int _schedule_next_int(unsigned long jiffie_f,long sub_jiffie_in, int always);
+
+extern unsigned int volatile latch_reload;
+
+EXTERN int jiffies_intr;
+EXTERN long volatile _new_jiffie;
+EXTERN int _sub_jiffie;
+EXTERN unsigned long volatile last_update;
+EXTERN int high_res_test_val;
+
+#ifndef CONFIG_HIGH_RES_TIMER_PIT 
+IF_ALL_PERIODIC(
+        EXTERN  int min_hz_sub_jiffie;
+        EXTERN  int max_hz_sub_jiffie;
+        EXTERN int _last_was_long[NR_CPUS];
+        )
+#endif
+
+extern inline void start_PIT(void)
+{
+	spin_lock(&i8253_lock);
+	outb_p(PIT0_PERIODIC, PIT_COMMAND);
+	outb_p(LATCH & 0xff, PIT0);	/* LSB */
+	outb(LATCH >> 8, PIT0);	/* MSB */
+	spin_unlock(&i8253_lock);
+}
+/*
+ * Now go ahead and include the clock specific file 586/386/acpi
+ * These asm files have extern inline functions to do a lot of
+ * stuff as well as the conversion routines.
+ */
+#ifdef CONFIG_HIGH_RES_TIMER_ACPI_PM
+#include <asm/hrtime-Macpi.h>
+#elif defined(CONFIG_HIGH_RES_TIMER_PIT)
+#include <asm/hrtime-M386.h>
+#elif defined(CONFIG_HIGH_RES_TIMER_TSC)
+#include <asm/hrtime-M586.h>
+#else
+#error "Need one of: CONFIG_HIGH_RES_TIMER_ACPI_PM CONFIG_HIGH_RES_TIMER_TSC CONFIG_HIGH_RES_TIMER_ACPI_PM"
+#endif
+
+extern unsigned long long jiffiesll;
+
+/*
+ * We stole this routine from the Utime code, but there it
+ * calculated microseconds and here we calculate sub_jiffies
+ * which have (in this case) units of TSC count.  (If there
+ * is no TSC, see hrtime-M386.h where a different unit
+ * is used.  This allows the more expensive math (to get
+ * standard units) to be done only when needed.  Also this
+ * makes it as easy (and as efficient) to calculate nano
+ * as well as micro seconds.
+ */
+
+extern inline void arch_update_jiffies (unsigned long update) 
+{
+        /*
+         * update is the delta in sub_jiffies
+         */
+        _sub_jiffie += update;
+        while ((unsigned long)_sub_jiffie > cycles_per_jiffies){
+                _sub_jiffie -= cycles_per_jiffies; 
+                _new_jiffie = ~0;
+		jiffies_intr++;
+		jiffies_64++;
+        }
+}
+#define SC_32_TO_USEC (SC_32(1000000)/ (long long)CLOCK_TICK_RATE)
+
+
+
+/*
+ * This routine is always called under the write_lockirq(xtime_lock)
+ */
+extern inline void update_jiffies_sub(void)
+{
+	unsigned long cycles_update;
+
+	cycles_update = get_cpuctr();
+
+
+	arch_update_jiffies(cycles_update);
+        /*
+         * In the ALL_PERIODIC mode we program the PIT to give periodic
+         * interrupts and, if no sub_jiffie timers are due, leave it alone.
+         * This means that it can drift WRT the clock (TSC or pm timer).
+         * What we are trying to do is to program the next interrupt to
+         * occure on exactly the requested time.  If we are not doing 
+         * sub HZ interrupts we expect to find a small excess of time
+         * beyond the 1/HZ, i.e. _sub_jiffie will have some small value. 
+         * This value will drift AND may jump upward from time to time. 
+         * The drift is due to not having precise tracking between the 
+         * two timers (the PIT and either the TSC or the PM timer) and
+         * the jump is caused by interrupt delays, cache misses etc. 
+         * We need to correct for the drift.  To correct all we need to 
+         * do is to set "last_was_long" to zero and a new timer program 
+         * will be started to "do the right thing".
+ 
+         * Detecting the need to do this correction is another issue. 
+         * Here is what we do:
+         * Each interrupt where last_was_long is !=0 (indicates the
+         * interrupt should be on a 1/HZ boundry) we check the resulting 
+         * _sub_jiffie.  If it is smaller than some MIN value, we do
+         * the correction.  (Note that drift that makes the value  
+         * smaller is the easy one.)  We also require that
+         * _sub_jiffie <= some max at least once over a period of 1 second. 
+         * I.e.  with HZ = 100, we will allow up to 99 "late" interrupts
+         * before we do a correction.
+
+         * The values we use for min_hz_sub_jiffie and max_hz_sub_jiffie 
+         * depend on the units and we will start by, during boot,
+         * observing what MIN appears to be.  We will set max_hz_sub_jiffie
+         * to be about 100 machine cycles more than this.
+
+         * Note that with  min_hz_sub_jiffie and max_hz_sub_jiffie
+         * set to 0, this code will reset the PIT every HZ.
+         */         
+#ifndef CONFIG_HIGH_RES_TIMER_PIT 
+	IF_ALL_PERIODIC(
+	{
+		int *last_was_long = &_last_was_long[smp_processor_id()];
+		if ( ! *last_was_long )
+			return;
+		if ( _sub_jiffie < min_hz_sub_jiffie ){
+			*last_was_long = 0;
+                        return;
+                }
+                if (_sub_jiffie <=  max_hz_sub_jiffie) {
+                        *last_was_long = 1;
+                        return;
+                }
+                if ( ++*last_was_long > HZ ){
+                        *last_was_long = 0;
+                        return;
+                }
+	}
+                )
+#endif
+}
+
+/*
+ * quick_update_jiffies_sub returns the sub_jiffie offset of 
+ * current time from the "ref_jiff" jiffie value.  We do this
+ * with out updating any memory values and thus do not need to
+ * take any locks, if we are careful.
+ *
+ * I don't know how to eliminate the lock in the SMP case, so..
+ * Oh, and also the PIT case requires a lock anyway, so..
+ */
+#if defined (CONFIG_SMP) || defined(CONFIG_HIGH_RES_TIMER_PIT)
+static inline void get_rat_jiffies(unsigned long *jiffies_f,long * _sub_jiffie_f,unsigned long *update)
+{
+	unsigned long flags;
+
+        read_lock_irqsave(&xtime_lock, flags);
+        *jiffies_f = jiffies;
+        *_sub_jiffie_f = _sub_jiffie;
+        *update = quick_get_cpuctr();
+        read_unlock_irqrestore(&xtime_lock, flags);
+}
+#else
+static inline void get_rat_jiffies(unsigned long *jiffies_f,long *_sub_jiffie_f,unsigned long *update)
+{
+        unsigned long last_update_f;
+        do {
+                *jiffies_f = jiffies;
+                last_update_f = last_update;
+                barrier();
+                *_sub_jiffie_f = _sub_jiffie;
+                *update = quick_get_cpuctr();
+                barrier();
+        }while (*jiffies_f != jiffies || last_update_f != last_update);
+}
+#endif /* CONFIG_SMP */
+
+/*
+ * If smp, this must be called with the read_lockirq(&xtime_lock) held.
+ * No lock is needed if not SMP.
+ */
+
+extern inline long quick_update_jiffies_sub(unsigned long ref_jiff)
+{
+	unsigned long update;
+	unsigned long rtn;
+        unsigned long jiffies_f;
+        long  _sub_jiffie_f;
+
+
+        get_rat_jiffies( &jiffies_f,&_sub_jiffie_f,&update);
+
+        rtn = _sub_jiffie_f + (unsigned long) update;
+        rtn += (jiffies_f - ref_jiff) * cycles_per_jiffies;
+        return rtn;
+
+}
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * If we have a local APIC, we will use its counter to get the needed 
+ * interrupts.  Here is where we program it.
+ */
+
+extern void  __setup_APIC_LVTT( unsigned int );
+
+extern inline void reload_timer_chip( int new_latch_value)
+{
+	int new_latch = arch_cycles_to_latch( new_latch_value );
+	/*
+	 * We may want to do more in line code for speed here.
+         * For now, however...
+
+	 * Note: The interrupt routine presets the counter for 1/HZ
+	 * each interrupt so we only deal with requested shorter times
+	 * either due to timer requests or drift.
+         */
+	if ( new_latch < timer_delta) new_latch = timer_delta;
+	__setup_APIC_LVTT(new_latch);
+}
+
+#endif
+#ifndef CONFIG_HIGH_RES_TIMER_PIT
+#ifndef CONFIG_X86_LOCAL_APIC
+extern inline void reload_timer_chip( int new_latch_value)
+{
+        IF_ALL_PERIODIC( unsigned char pit_status);
+	/*
+         * The input value is in arch cycles
+         * We must be called with irq disabled.
+	 */
+
+	new_latch_value = arch_cycles_to_latch( new_latch_value );
+        if (new_latch_value < TIMER_DELTA){
+                new_latch_value = TIMER_DELTA;
+        }
+	spin_lock(&i8253_lock);
+        IF_ALL_PERIODIC(outb_p(PIT0_PERIODIC, PIT_COMMAND););
+	outb_p(new_latch_value & 0xff, PIT0);	/* LSB */
+	outb(new_latch_value >> 8, PIT0);	/* MSB */
+        IF_ALL_PERIODIC(
+                do {
+                        outb_p(PIT0_LATCH_STATUS,PIT_COMMAND);
+                        pit_status = inb(PIT0);
+                }while (pit_status & PIT_NULL_COUNT);
+                outb_p(LATCH & 0xff, PIT0);	/* LSB */
+                outb(LATCH >> 8, PIT0);	        /* MSB */
+                )
+	spin_unlock(&i8253_lock);
+	return;
+}
+#endif //  ! CONFIG_X86_LOCAL_APIC
+/*
+ * Time out for a discussion.  Because the PIT and TSC (or the PIT and
+ * pm timer) may drift WRT each other, we need a way to get the jiffie
+ * interrupt to happen as near to the jiffie roll as possible.  This
+ * insures that we will get the interrupt when the timer is to be
+ * delivered, not before (we would not deliver) or later, making the
+ * jiffie timers different from the sub_jiffie deliveries.  We would
+ * also like any latency between a "requested" interrupt and the
+ * automatic jiffie interrupts from the PIT to be the same.  Since it
+ * takes some time to set up the PIT, we assume that requested
+ * interrupts may be a bit late when compared to the automatic
+ * interrupts.  When we request a jiffie interrupt, we want the
+ * interrupt to happen at the requested time, which will be a bit before
+ * we get to the jiffies update code. 
+ *
+ * What we want to determine here is a.) how long it takes (min) to get
+ * from a requested interrupt to the jiffies update code and b.) how
+ * long it takes when the interrupt is automatic (i.e. from the PIT
+ * reset logic).  When we set "last_was_long" to zero, the next tick
+ * setup code will "request" a jiffies interrupt (as long as we do not
+ * have any sub jiffie timers pending).  The interrupt after the
+ * requested one will be automatic.  Ignoring drift over this 2/HZ time
+ * we then get two latency values, the requested latency and the
+ * automatic latency.  We set up the difference to correct the requested
+ * time and the second one as the center of a window which we will use
+ * to detect the need to resync the PIT.  We do this for HZ ticks and
+ * take the min.
+ */
+#define NANOSEC_SYNC_LIMIT 2000  // Try for 2 usec. max drift
+#define final_clock_init() \
+        { unsigned long end = jiffies + HZ + HZ; \
+          int min_a =  cycles_per_jiffies, min_b =  cycles_per_jiffies;  \
+          long flags;                         \
+          int * last_was_long = &_last_was_long[smp_processor_id()];   \
+          while (time_before(jiffies,end)){ \
+               unsigned long f_jiffies = jiffies;     \
+               while (jiffies == f_jiffies); \
+               *last_was_long = 0;            \
+               while (jiffies == f_jiffies + 1); \
+               read_lock_irqsave(&xtime_lock, flags); \
+               if ( _sub_jiffie < min_a) \
+                     min_a =  _sub_jiffie; \
+               read_unlock_irqrestore(&xtime_lock, flags); \
+               while (jiffies == f_jiffies + 2); \
+               read_lock_irqsave(&xtime_lock, flags); \
+               if ( _sub_jiffie < min_b) \
+                     min_b =  _sub_jiffie; \
+               read_unlock_irqrestore(&xtime_lock, flags); \
+          }                             \
+         min_hz_sub_jiffie = min_b -  nsec_to_arch_cycles(NANOSEC_SYNC_LIMIT);\
+          if( min_hz_sub_jiffie < 0)  min_hz_sub_jiffie = 0; \
+          max_hz_sub_jiffie = min_b +  nsec_to_arch_cycles(NANOSEC_SYNC_LIMIT);\
+       timer_delta = arch_cycles_to_latch(usec_to_arch_cycles(TIMER_DELTA)); \
+       }
+
+
+#endif                          /* not CONFIG_HIGH_RES_TIMER_PIT */
+#endif				/* __KERNEL__ */
+#endif				/* _I386_HRTIME_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/sc_math.h linux/include/asm-i386/sc_math.h
--- linux-2.5.41-bk2-core/include/asm-i386/sc_math.h	Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/sc_math.h	Wed Oct  9 14:08:47 2002
@@ -0,0 +1,143 @@
+#ifndef SC_MATH
+#define SC_MATH
+#define MATH_STR(X) #X
+#define MATH_NAME(X) X
+
+/*
+ * Pre scaling defines
+ */
+#define SC_32(x) ((long long)x<<32)
+#define SC_n(n,x) (((long long)x)<<n)
+/*
+ * This routine preforms the following calculation:
+ *
+ * X = (a*b)>>32
+ * we could, (but don't) also get the part shifted out.
+ */
+extern inline long mpy_sc32(long a,long b)
+{
+        long edx;
+	__asm__("imull %2"
+		:"=a" (a), "=d" (edx)
+		:"rm" (b),
+		 "0" (a));
+        return edx;
+}
+/*
+ * X = (a/b)<<32 or more precisely x = (a<<32)/b
+ */
+
+extern inline long div_sc32(long a, long b)
+{
+        long dum;
+        __asm__("divl %2"
+                :"=a" (b), "=d" (dum)
+                :"r" (b), "0" (0), "1" (a));
+        
+        return b;
+}
+/*
+ * X = (a*b)>>24
+ * we could, (but don't) also get the part shifted out.
+ */
+
+#define mpy_ex24(a,b) mpy_sc_n(24,a,b)
+/*
+ * X = (a/b)<<24 or more precisely x = (a<<24)/b
+ */
+#define div_ex24(a,b) div_sc_n(24,a,b)
+
+/*
+ * The routines allow you to do x = (a/b) << N and
+ * x=(a*b)>>N for values of N from 1 to 32.
+ *
+ * These are handy to have to do scaled math.
+ * Scaled math has two nice features:
+ * A.) A great deal more precision can be maintained by
+ *     keeping more signifigant bits.
+ * B.) Often an in line div can be repaced with a mpy
+ *     which is a LOT faster.
+ */
+
+#define mpy_sc_n(N,aa,bb) ({long edx,a=aa,b=bb; \
+	__asm__("imull %2\n\t" \
+                "shldl $(32-"MATH_STR(N)"),%0,%1"    \
+		:"=a" (a), "=d" (edx)\
+		:"rm" (b),            \
+		 "0" (a)); edx;})
+
+
+#define div_sc_n(N,aa,bb) ({long dum=aa,dum2,b=bb; \
+        __asm__("shrdl $(32-"MATH_STR(N)"),%4,%3\n\t"  \
+                "sarl $(32-"MATH_STR(N)"),%4\n\t"      \
+                "divl %2"              \
+                :"=a" (dum2), "=d" (dum)      \
+                :"rm" (b), "0" (0), "1" (dum)); dum2;})  
+
+  
+/*
+ * (long)X = ((long long)divs) / (long)div
+ * (long)rem = ((long long)divs) % (long)div
+ *
+ * Warning, this will do an exception if X overflows.
+ */
+#define div_long_long_rem(a,b,c) div_ll_X_l_rem(a,b,c)
+
+extern inline long div_ll_X_l_rem(long long divs, long div,long * rem)
+{
+        long dum2;
+        __asm__( "divl %2"
+                :"=a" (dum2), "=d" (*rem)
+                :"rm" (div), "A" (divs));
+        
+        return dum2;
+
+}
+/*
+ * same as above, but no remainder
+ */
+extern inline long div_ll_X_l(long long divs, long div)
+{
+        long dum;
+        return div_ll_X_l_rem(divs,div,&dum);
+}
+/*
+ * (long)X = (((long)divh<<32) | (long)divl) / (long)div
+ * (long)rem = (((long)divh<<32) % (long)divl) / (long)div
+ *
+ * Warning, this will do an exception if X overflows.
+ */
+extern inline long div_h_or_l_X_l_rem(long divh,long divl, long div,long* rem)
+{
+        long dum2;
+        __asm__( "divl %2"
+                :"=a" (dum2), "=d" (*rem)
+                :"rm" (div), "0" (divl),"1" (divh));
+        
+        return dum2;
+
+}
+extern inline long long mpy_l_X_l_ll(long mpy1,long mpy2)
+{
+        long long eax;
+	__asm__("imull %1\n\t"
+		:"=A" (eax)
+		:"rm" (mpy2),
+		 "a" (mpy1));
+        
+        return eax;
+
+}
+extern inline long  mpy_1_X_1_h(long mpy1,long mpy2,long *hi)
+{
+        long eax;
+	__asm__("imull %2\n\t"
+		:"=a" (eax),"=d" (*hi)
+		:"rm" (mpy2),
+		 "0" (mpy1));
+        
+        return eax;
+
+}
+
+#endif
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/signal.h linux/include/asm-i386/signal.h
--- linux-2.5.41-bk2-core/include/asm-i386/signal.h	Mon Sep  9 10:35:04 2002
+++ linux/include/asm-i386/signal.h	Wed Oct  9 14:08:47 2002
@@ -3,6 +3,8 @@
 
 #include <linux/types.h>
 #include <linux/linkage.h>
+#include <linux/time.h>
+#include <asm/ptrace.h>
 
 /* Avoid too many header ordering problems.  */
 struct siginfo;
@@ -216,9 +218,82 @@
 	__asm__("bsfl %1,%0" : "=r"(word) : "rm"(word) : "cc");
 	return word;
 }
+/*
+ * These macros are used by nanosleep() and clock_nanosleep().
+ * The issue is that these functions need the *regs pointer which is 
+ * passed in different ways by the differing archs.
+
+ * Below we do things in two differing ways.  In the long run we would
+ * like to see nano_sleep() go away (glibc should call clock_nanosleep
+ * much as we do).  When that happens and the nano_sleep() system
+ * call entry is retired, there will no longer be any real need for
+ * sys_nanosleep() so the FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP macro
+ * could be undefined, resulting in not needing to stack all the 
+ * parms over again, i.e. better (faster AND smaller) code.
+
+ * And while were at it, there needs to be a way to set the return code
+ * on the way to do_signal().  It (i.e. do_signal()) saves the regs on 
+ * the callers stack to call the user handler and then the return is
+ * done using those registers.  This means that the error code MUST be
+ * set in the register PRIOR to calling do_signal().  See our answer 
+ * below...thanks to  Jim Houston <jim.houston@attbi.com>
+ */
+#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP_NOT
+
+
+#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+extern long do_clock_nanosleep(struct pt_regs *regs, 
+			clockid_t which_clock, 
+			int flags, 
+			const struct timespec *rqtp, 
+			struct timespec *rmtp);
+
+#define NANOSLEEP_ENTRY(a) \
+  asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+                                 struct timespec * rmtp) \
+{       struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+        return do_clock_nanosleep(regs, CLOCK_REALTIME, 0, rqtp, rmtp); \
+} 
+
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+                               clockid_t which_clock,      \
+                               int flags,                  \
+                               const struct timespec *rqtp, \
+                               struct timespec *rmtp)       \
+{       struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+        return do_clock_nanosleep(regs, which_clock, flags, rqtp, rmtp); \
+} \
+long do_clock_nanosleep(struct pt_regs *regs, \
+                    clockid_t which_clock,      \
+                    int flags,                  \
+                    const struct timespec *rqtp, \
+                    struct timespec *rmtp)       \
+{        a
+
+#else
+#define NANOSLEEP_ENTRY(a) \
+      asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+                                     struct timespec * rmtp) \
+{       struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+        a
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+                               clockid_t which_clock,      \
+                               int flags,                  \
+                               const struct timespec *rqtp, \
+                               struct timespec *rmtp)       \
+{       struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+        a
+#endif
 
 struct pt_regs;
 extern int FASTCALL(do_signal(struct pt_regs *regs, sigset_t *oldset));
+#define PT_REGS_ENTRY(type,name,p1_type,p1, p2_type,p2) \
+type name(p1_type p1,p2_type p2)\
+{       struct pt_regs *regs = (struct pt_regs *)&p1;
+
+#define _do_signal() (regs->eax = -EINTR, do_signal(regs, NULL))
+
+
 
 #endif /* __KERNEL__ */
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 22:47 [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1 george anzinger
@ 2002-10-09 23:14 ` Linus Torvalds
  2002-10-09 23:42   ` george anzinger
                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Linus Torvalds @ 2002-10-09 23:14 UTC (permalink / raw)
  To: george anzinger; +Cc: linux-kernel


On Wed, 9 Oct 2002, george anzinger wrote:
> 
> This patch, in conjunction with the "core" high-res-timers
> patch implements high resolution timers on the i386
> platforms.

I really don't get the notion of partial ticks, and quite frankly, this 
isn't going into my tree until some major distribution kicks me in the 
head and explains to me why the hell we have partial ticks instead of just 
making the ticks shorter.

		Linus


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 23:14 ` Linus Torvalds
@ 2002-10-09 23:42   ` george anzinger
  2002-10-10 15:03     ` Eric W. Biederman
  2002-10-10 15:54     ` Oliver Xymoron
  2002-10-13 10:46   ` Ingo Adlung
  2002-10-17 21:54   ` Randy.Dunlap
  2 siblings, 2 replies; 26+ messages in thread
From: george anzinger @ 2002-10-09 23:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> 
> On Wed, 9 Oct 2002, george anzinger wrote:
> >
> > This patch, in conjunction with the "core" high-res-timers
> > patch implements high resolution timers on the i386
> > platforms.
> 
> I really don't get the notion of partial ticks, and quite frankly, this
> isn't going into my tree until some major distribution kicks me in the
> head and explains to me why the hell we have partial ticks instead of just
> making the ticks shorter.
> 
Well, the notion is to provide timers that have resolution
down into the micro seconds.  Since this take a bit more
overhead, we just set up an interrupt on an as needed
basis.  This is why we define both a high res and a low res
clock.  Timers on the low res clock will always use the 1/HZ
tick to drive them and thus do not introduce any additional
overhead.  If this is all that is needed the configure
option can be left off and only these timers will be
available.

On the other hand, if a user requires better resolution,
s/he just turns on the high-res option and incures the
overhead only when it is used and then only at timer expire
time.  Note that the only way to access a high-res timer is
via the POSIX clocks and timers API.  They are not available
to select or any other system call.

Making ticks shorter causes extra overhead ALL the time,
even when it is not needed.  Higher resolution is not free
in any case, but it is much closer to free with this patch
than by increasing HZ (which, of course, can still be
done).  Overhead wise and resolution wise, for timers, we
would be better off with a 1/HZ tick and the "on demand"
high-res interrupts this patch introduces.

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 23:42   ` george anzinger
@ 2002-10-10 15:03     ` Eric W. Biederman
  2002-10-10 15:45       ` george anzinger
  2002-10-10 15:54     ` Oliver Xymoron
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2002-10-10 15:03 UTC (permalink / raw)
  To: george anzinger; +Cc: Linus Torvalds, linux-kernel

george anzinger <george@mvista.com> writes:

> Linus Torvalds wrote:
> > 
> > On Wed, 9 Oct 2002, george anzinger wrote:
> > >
> > > This patch, in conjunction with the "core" high-res-timers
> > > patch implements high resolution timers on the i386
> > > platforms.
> > 
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> > 
> Well, the notion is to provide timers that have resolution
> down into the micro seconds.  Since this take a bit more
> overhead, we just set up an interrupt on an as needed
> basis.  This is why we define both a high res and a low res
> clock.  Timers on the low res clock will always use the 1/HZ
> tick to drive them and thus do not introduce any additional
> overhead.  If this is all that is needed the configure
> option can be left off and only these timers will be
> available.
> 
> On the other hand, if a user requires better resolution,
> s/he just turns on the high-res option and incures the
> overhead only when it is used and then only at timer expire
> time.  Note that the only way to access a high-res timer is
> via the POSIX clocks and timers API.  They are not available
> to select or any other system call.
> 
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed.  Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done).  Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.

???  The issue of ticks is separate from the issue of how often
timer interrupts fire.  Ticks just becomes the maximum resolution
you can support/express.

If it makes sense to have two maximum tick resolutions.  The normal
application maximum tick rate and the special task maximum tick
rate it is probably worth making this only available as a capability
or an rlimit.

Eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10 15:03     ` Eric W. Biederman
@ 2002-10-10 15:45       ` george anzinger
  0 siblings, 0 replies; 26+ messages in thread
From: george anzinger @ 2002-10-10 15:45 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel

"Eric W. Biederman" wrote:
> 
> george anzinger <george@mvista.com> writes:
> 
> > Linus Torvalds wrote:
> > >
> > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > >
> > > > This patch, in conjunction with the "core" high-res-timers
> > > > patch implements high resolution timers on the i386
> > > > platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > Well, the notion is to provide timers that have resolution
> > down into the micro seconds.  Since this take a bit more
> > overhead, we just set up an interrupt on an as needed
> > basis.  This is why we define both a high res and a low res
> > clock.  Timers on the low res clock will always use the 1/HZ
> > tick to drive them and thus do not introduce any additional
> > overhead.  If this is all that is needed the configure
> > option can be left off and only these timers will be
> > available.
> >
> > On the other hand, if a user requires better resolution,
> > s/he just turns on the high-res option and incures the
> > overhead only when it is used and then only at timer expire
> > time.  Note that the only way to access a high-res timer is
> > via the POSIX clocks and timers API.  They are not available
> > to select or any other system call.
> >
> > Making ticks shorter causes extra overhead ALL the time,
> > even when it is not needed.  Higher resolution is not free
> > in any case, but it is much closer to free with this patch
> > than by increasing HZ (which, of course, can still be
> > done).  Overhead wise and resolution wise, for timers, we
> > would be better off with a 1/HZ tick and the "on demand"
> > high-res interrupts this patch introduces.
> 
> ???  The issue of ticks is separate from the issue of how often
> timer interrupts fire.  Ticks just becomes the maximum resolution
> you can support/express.
> 
> If it makes sense to have two maximum tick resolutions.  The normal
> application maximum tick rate and the special task maximum tick
> rate it is probably worth making this only available as a capability
> or an rlimit.
> 
I could support a notion that to use the high-res clock for
a timer the user would need a particular capability.  After
all we do the same for the real time priority.  

Does this get us any closer to acceptance in 2.5?
-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 23:42   ` george anzinger
  2002-10-10 15:03     ` Eric W. Biederman
@ 2002-10-10 15:54     ` Oliver Xymoron
  2002-10-10 16:24       ` george anzinger
  1 sibling, 1 reply; 26+ messages in thread
From: Oliver Xymoron @ 2002-10-10 15:54 UTC (permalink / raw)
  To: george anzinger; +Cc: Linus Torvalds, linux-kernel

On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> Linus Torvalds wrote:
> > 
> > On Wed, 9 Oct 2002, george anzinger wrote:
> > >
> > > This patch, in conjunction with the "core" high-res-timers
> > > patch implements high resolution timers on the i386
> > > platforms.
> > 
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> > 
> Well, the notion is to provide timers that have resolution
> down into the micro seconds.  Since this take a bit more
> overhead, we just set up an interrupt on an as needed
> basis.  This is why we define both a high res and a low res
> clock.  Timers on the low res clock will always use the 1/HZ
> tick to drive them and thus do not introduce any additional
> overhead.  If this is all that is needed the configure
> option can be left off and only these timers will be
> available.
> 
> On the other hand, if a user requires better resolution,
> s/he just turns on the high-res option and incures the
> overhead only when it is used and then only at timer expire
> time.  Note that the only way to access a high-res timer is
> via the POSIX clocks and timers API.  They are not available
> to select or any other system call.
> 
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed.  Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done).  Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.

I think what Linus is getting at is: why not make the units of jiffies
microseconds and give it larger increments on clock ticks? Now you
don't need any special logic to go to better than HZ resolution.
Unfortunately, this means identifying all the things that use HZ as a
measure of how often we check for rescheduling. 

There's also an issue of dynamic range - if we some day soon decide we
want internal timestamps with nanosecond resolution (because units of
.1us are annoying, not because we'll actually have ns accuracy),
then we're seeing timer wraps every couple seconds on 32bit machines
and we're pretty much forced to break into seconds and nanoseconds.
This is arguably saner than jiffies and subjiffies, but it forces
people who are using long timeouts today to use a new interface.

I don't think he can seriously mean cranking HZ up to match whatever
timing requirements we might have - that obviously doesn't scale.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10 15:54     ` Oliver Xymoron
@ 2002-10-10 16:24       ` george anzinger
  2002-10-10 17:04         ` Oliver Xymoron
  0 siblings, 1 reply; 26+ messages in thread
From: george anzinger @ 2002-10-10 16:24 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Linus Torvalds, linux-kernel

Oliver Xymoron wrote:
> 
> On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > Linus Torvalds wrote:
> > >
> > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > >
> > > > This patch, in conjunction with the "core" high-res-timers
> > > > patch implements high resolution timers on the i386
> > > > platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > Well, the notion is to provide timers that have resolution
> > down into the micro seconds.  Since this take a bit more
> > overhead, we just set up an interrupt on an as needed
> > basis.  This is why we define both a high res and a low res
> > clock.  Timers on the low res clock will always use the 1/HZ
> > tick to drive them and thus do not introduce any additional
> > overhead.  If this is all that is needed the configure
> > option can be left off and only these timers will be
> > available.
> >
> > On the other hand, if a user requires better resolution,
> > s/he just turns on the high-res option and incures the
> > overhead only when it is used and then only at timer expire
> > time.  Note that the only way to access a high-res timer is
> > via the POSIX clocks and timers API.  They are not available
> > to select or any other system call.
> >
> > Making ticks shorter causes extra overhead ALL the time,
> > even when it is not needed.  Higher resolution is not free
> > in any case, but it is much closer to free with this patch
> > than by increasing HZ (which, of course, can still be
> > done).  Overhead wise and resolution wise, for timers, we
> > would be better off with a 1/HZ tick and the "on demand"
> > high-res interrupts this patch introduces.
> 
> I think what Linus is getting at is: why not make the units of jiffies
> microseconds and give it larger increments on clock ticks? Now you
> don't need any special logic to go to better than HZ resolution.
> Unfortunately, this means identifying all the things that use HZ as a
> measure of how often we check for rescheduling.

Well then you are still dealing with two measures, the HZ
and the tick rate.  One might also argue that the subjiffie
should be some "normal" thing like nanosecond or micro
second.  I went round and round with this in the beginning. 
What it comes down to it the conversion back and forth is
much easier and faster (less overhead) when using the
natural units of the underlying clock.  This way the
interrupt code, for example, does not have to even do a
conversion.
> 
> There's also an issue of dynamic range - if we some day soon decide we
> want internal timestamps with nanosecond resolution (because units of
> .1us are annoying, not because we'll actually have ns accuracy),
> then we're seeing timer wraps every couple seconds on 32bit machines
> and we're pretty much forced to break into seconds and nanoseconds.
> This is arguably saner than jiffies and subjiffies, but it forces
> people who are using long timeouts today to use a new interface.
> 
> I don't think he can seriously mean cranking HZ up to match whatever
> timing requirements we might have - that obviously doesn't scale.

This is at least the third "take" on what he means, each of
which sends me in a very different direction.  Sure would
like to know what he really means.

I KNOW there is demand for the high-res timers, else I would
not have spent the last year + being funded to do it.  It
also would not be in the OSDL Carrier Grade system if there
was no demand. 

What I would really like to do is address the real issue.
> 
> --
>  "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10 16:24       ` george anzinger
@ 2002-10-10 17:04         ` Oliver Xymoron
  2002-10-10 17:47           ` george anzinger
  0 siblings, 1 reply; 26+ messages in thread
From: Oliver Xymoron @ 2002-10-10 17:04 UTC (permalink / raw)
  To: george anzinger; +Cc: Linus Torvalds, linux-kernel

On Thu, Oct 10, 2002 at 09:24:54AM -0700, george anzinger wrote:
> Oliver Xymoron wrote:
> > 
> > On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > > Linus Torvalds wrote:
> > > >
> > > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > > >
> > > > > This patch, in conjunction with the "core" high-res-timers
> > > > > patch implements high resolution timers on the i386
> > > > > platforms.
> > > >
> > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > isn't going into my tree until some major distribution kicks me in the
> > > > head and explains to me why the hell we have partial ticks instead of just
> > > > making the ticks shorter.
> > > >
> > > Well, the notion is to provide timers that have resolution
> > > down into the micro seconds.  Since this take a bit more
> > > overhead, we just set up an interrupt on an as needed
> > > basis.  This is why we define both a high res and a low res
> > > clock.  Timers on the low res clock will always use the 1/HZ
> > > tick to drive them and thus do not introduce any additional
> > > overhead.  If this is all that is needed the configure
> > > option can be left off and only these timers will be
> > > available.
> > >
> > > On the other hand, if a user requires better resolution,
> > > s/he just turns on the high-res option and incures the
> > > overhead only when it is used and then only at timer expire
> > > time.  Note that the only way to access a high-res timer is
> > > via the POSIX clocks and timers API.  They are not available
> > > to select or any other system call.
> > >
> > > Making ticks shorter causes extra overhead ALL the time,
> > > even when it is not needed.  Higher resolution is not free
> > > in any case, but it is much closer to free with this patch
> > > than by increasing HZ (which, of course, can still be
> > > done).  Overhead wise and resolution wise, for timers, we
> > > would be better off with a 1/HZ tick and the "on demand"
> > > high-res interrupts this patch introduces.
> > 
> > I think what Linus is getting at is: why not make the units of jiffies
> > microseconds and give it larger increments on clock ticks? Now you
> > don't need any special logic to go to better than HZ resolution.
> > Unfortunately, this means identifying all the things that use HZ as a
> > measure of how often we check for rescheduling.
> 
> Well then you are still dealing with two measures, the HZ
> and the tick rate.

Yep, and separating the two breaks a few things. Granted.

> One might also argue that the subjiffie
> should be some "normal" thing like nanosecond or micro
> second.  I went round and round with this in the beginning. 
> What it comes down to it the conversion back and forth is
> much easier and faster (less overhead) when using the
> natural units of the underlying clock.  This way the
> interrupt code, for example, does not have to even do a
> conversion.

Then the argument becomes move jiffies to the most convenient unit
that encompasses what you want to do with subjiffies. Microseconds was
just an example. Most code doesn't really care when ticks happen,
except to the extent that they currently trigger timers, so
jiffies=tick HZ stops being a meaningful measure once timers are
untied from ticks, see?

> > I don't think he can seriously mean cranking HZ up to match whatever
> > timing requirements we might have - that obviously doesn't scale.
> 
> This is at least the third "take" on what he means, each of
> which sends me in a very different direction.  Sure would
> like to know what he really means.

Perhaps if you pose it as a multiple-choice question? I suppose he's
almost sure to answer with "none of the above".

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10 17:04         ` Oliver Xymoron
@ 2002-10-10 17:47           ` george anzinger
  0 siblings, 0 replies; 26+ messages in thread
From: george anzinger @ 2002-10-10 17:47 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Linus Torvalds, linux-kernel

Oliver Xymoron wrote:
> 
> On Thu, Oct 10, 2002 at 09:24:54AM -0700, george anzinger wrote:
> > Oliver Xymoron wrote:
> > >
> > > On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > > > Linus Torvalds wrote:
> > > > >
> > > > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > > > >
> > > > > > This patch, in conjunction with the "core" high-res-timers
> > > > > > patch implements high resolution timers on the i386
> > > > > > platforms.
> > > > >
> > > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > > isn't going into my tree until some major distribution kicks me in the
> > > > > head and explains to me why the hell we have partial ticks instead of just
> > > > > making the ticks shorter.
> > > > >
> > > > Well, the notion is to provide timers that have resolution
> > > > down into the micro seconds.  Since this take a bit more
> > > > overhead, we just set up an interrupt on an as needed
> > > > basis.  This is why we define both a high res and a low res
> > > > clock.  Timers on the low res clock will always use the 1/HZ
> > > > tick to drive them and thus do not introduce any additional
> > > > overhead.  If this is all that is needed the configure
> > > > option can be left off and only these timers will be
> > > > available.
> > > >
> > > > On the other hand, if a user requires better resolution,
> > > > s/he just turns on the high-res option and incures the
> > > > overhead only when it is used and then only at timer expire
> > > > time.  Note that the only way to access a high-res timer is
> > > > via the POSIX clocks and timers API.  They are not available
> > > > to select or any other system call.
> > > >
> > > > Making ticks shorter causes extra overhead ALL the time,
> > > > even when it is not needed.  Higher resolution is not free
> > > > in any case, but it is much closer to free with this patch
> > > > than by increasing HZ (which, of course, can still be
> > > > done).  Overhead wise and resolution wise, for timers, we
> > > > would be better off with a 1/HZ tick and the "on demand"
> > > > high-res interrupts this patch introduces.
> > >
> > > I think what Linus is getting at is: why not make the units of jiffies
> > > microseconds and give it larger increments on clock ticks? Now you
> > > don't need any special logic to go to better than HZ resolution.
> > > Unfortunately, this means identifying all the things that use HZ as a
> > > measure of how often we check for rescheduling.
> >
> > Well then you are still dealing with two measures, the HZ
> > and the tick rate.
> 
> Yep, and separating the two breaks a few things. Granted.
> 
> > One might also argue that the subjiffie
> > should be some "normal" thing like nanosecond or micro
> > second.  I went round and round with this in the beginning.
> > What it comes down to it the conversion back and forth is
> > much easier and faster (less overhead) when using the
> > natural units of the underlying clock.  This way the
> > interrupt code, for example, does not have to even do a
> > conversion.
> 
> Then the argument becomes move jiffies to the most convenient unit
> that encompasses what you want to do with subjiffies. Microseconds was
> just an example. Most code doesn't really care when ticks happen,
> except to the extent that they currently trigger timers, so
> jiffies=tick HZ stops being a meaningful measure once timers are
> untied from ticks, see?

Hm?  Not really sure what this leads to.  Right now the
timers are organized by "tick".  I think this is VERY
useful.  It makes the timer insert VERY fast and the tick
processing equally fast.  A regular "tick" also makes the
accounting overhead flat WRT load, also a GOOD thing.  

One thought I had was to separate out the sub tick events
into a different list and come up with a different interrupt
source for them.  Problem is they MUST stay in sync.  This
is most easily done when they are in the same list.

What you haven't touched on, is the separation of the "tick"
from the clock or time.  The patch implements, a separation
here.  Time is taken from a reliable source (in this patch
either TSC or the ACPI pm timer, but others are possible)
and the "tick" is just a reminder to look at the clock and
update accordingly.  This eliminates the issue of choosing a
HZ value that is so many PPM close to real time and the NTP
issues that causes, such as the current early expiration of
timers.  Try this:

time sleep 60

on a 2.5.40 system.  It will come back with 59.xxx seconds. 
Clearly the sleep was for less than 60.
  
> 
> > > I don't think he can seriously mean cranking HZ up to match whatever
> > > timing requirements we might have - that obviously doesn't scale.
> >
> > This is at least the third "take" on what he means, each of
> > which sends me in a very different direction.  Sure would
> > like to know what he really means.
> 
> Perhaps if you pose it as a multiple-choice question? I suppose he's
> almost sure to answer with "none of the above".
> 
> --
>  "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 23:14 ` Linus Torvalds
  2002-10-09 23:42   ` george anzinger
@ 2002-10-13 10:46   ` Ingo Adlung
  2002-10-14  7:18     ` Vojtech Pavlik
  2002-10-17 21:54   ` Randy.Dunlap
  2 siblings, 1 reply; 26+ messages in thread
From: Ingo Adlung @ 2002-10-13 10:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds



Linus Torvalds wrote:
> On Wed, 9 Oct 2002, george anzinger wrote:
> 
>>This patch, in conjunction with the "core" high-res-timers
>>patch implements high resolution timers on the i386
>>platforms.
> 
> 
> I really don't get the notion of partial ticks, and quite frankly, this 
> isn't going into my tree until some major distribution kicks me in the 
> head and explains to me why the hell we have partial ticks instead of just 
> making the ticks shorter.
> 
> 		Linus

In any kind of virtual environment you would rather prefer a completely 
tickless system alltogether than increased tick rates. In a S/390 
virtual machine, running many hundreds of virtual Linux servers the 
100Hz timer pops are already considerably painful, and going to a higher 
tick rate achieving higher timer resolution is completely prohibitive. 
Similar is true in many embedded systems related to power consumption of 
high frequency ticks.

However, George has shown that introducing the notion of a completely 
tickless system is expensive on Intel overhead wise, thus partial ticks 
seem to be a possibility addressing the needs for embedded and virtual 
environments, getting decent timer resolution as needed.

Ingo Adlung


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-13 10:46   ` Ingo Adlung
@ 2002-10-14  7:18     ` Vojtech Pavlik
  2002-10-14 22:17       ` Pavel Machek
  0 siblings, 1 reply; 26+ messages in thread
From: Vojtech Pavlik @ 2002-10-14  7:18 UTC (permalink / raw)
  To: Ingo Adlung; +Cc: linux-kernel, Linus Torvalds

On Sun, Oct 13, 2002 at 12:46:31PM +0200, Ingo Adlung wrote:

> Linus Torvalds wrote:
> > On Wed, 9 Oct 2002, george anzinger wrote:
> > 
> >>This patch, in conjunction with the "core" high-res-timers
> >>patch implements high resolution timers on the i386
> >>platforms.
> > 
> > 
> > I really don't get the notion of partial ticks, and quite frankly, this 
> > isn't going into my tree until some major distribution kicks me in the 
> > head and explains to me why the hell we have partial ticks instead of just 
> > making the ticks shorter.

Not speaking for a major distro, just for me writing HPET (high
performance event timer ...) support for x86-64 (and it happens to exist
on ia64 as well, and possibly might be in new Intel P4 chipsets, too).

It's a very nice piece of hardware that allows very fine granularity
aperiodic interrupts (in each interrupt you set when the next one will
happen), without much overhead.

It'd be a shame to just set this timer to 1kHz periodic just use that as
a base timer, when you can do much better resolution and latency-wise.
HPET has a base clock > 10 MHz.

> > 		Linus
> 
> In any kind of virtual environment you would rather prefer a completely 
> tickless system alltogether than increased tick rates. In a S/390 
> virtual machine, running many hundreds of virtual Linux servers the 
> 100Hz timer pops are already considerably painful, and going to a higher 
> tick rate achieving higher timer resolution is completely prohibitive. 
> Similar is true in many embedded systems related to power consumption of 
> high frequency ticks.
> 
> However, George has shown that introducing the notion of a completely 
> tickless system is expensive on Intel overhead wise, thus partial ticks 
> seem to be a possibility addressing the needs for embedded and virtual 
> environments, getting decent timer resolution as needed.

When HPET becomes a standard (yes, it's a MS requirement for new PCs),
it won't be expensive on i386 anymore.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-14  7:18     ` Vojtech Pavlik
@ 2002-10-14 22:17       ` Pavel Machek
  2002-10-15  7:13         ` Vojtech Pavlik
  0 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2002-10-14 22:17 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Ingo Adlung, linux-kernel, Linus Torvalds

Hi!

> > >>This patch, in conjunction with the "core" high-res-timers
> > >>patch implements high resolution timers on the i386
> > >>platforms.
> > > 
> > > 
> > > I really don't get the notion of partial ticks, and quite frankly, this 
> > > isn't going into my tree until some major distribution kicks me in the 
> > > head and explains to me why the hell we have partial ticks instead of just 
> > > making the ticks shorter.
> 
> Not speaking for a major distro, just for me writing HPET (high
> performance event timer ...) support for x86-64 (and it happens to exist
> on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
> 
> It's a very nice piece of hardware that allows very fine granularity
> aperiodic interrupts (in each interrupt you set when the next one will
> happen), without much overhead.

I believe the problem is like this: assume you have three timers,
10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
blinking. With current approach, you get

10msec userland runs
<enter kernel>
<process mouse>
<process keyboard>
<process cursor>
<exit kernel>

With hires timers, you get:

3msec userland runs
<enter kernel>
<process mouse>
<exit kernel>
2msec userland runs
<enter kernel>
<process keyboard>
<exit kernel>
...

which is not so efficient. I guess rounding could be implemented to
preserve this "do-all-together" ability?
								Pavel
-- 
When do you have heart between your knees?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-14 22:17       ` Pavel Machek
@ 2002-10-15  7:13         ` Vojtech Pavlik
  2002-10-15 21:45           ` george anzinger
  0 siblings, 1 reply; 26+ messages in thread
From: Vojtech Pavlik @ 2002-10-15  7:13 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Vojtech Pavlik, Ingo Adlung, linux-kernel, Linus Torvalds

On Tue, Oct 15, 2002 at 12:17:47AM +0200, Pavel Machek wrote:
> Hi!
> 
> > > >>This patch, in conjunction with the "core" high-res-timers
> > > >>patch implements high resolution timers on the i386
> > > >>platforms.
> > > > 
> > > > 
> > > > I really don't get the notion of partial ticks, and quite frankly, this 
> > > > isn't going into my tree until some major distribution kicks me in the 
> > > > head and explains to me why the hell we have partial ticks instead of just 
> > > > making the ticks shorter.
> > 
> > Not speaking for a major distro, just for me writing HPET (high
> > performance event timer ...) support for x86-64 (and it happens to exist
> > on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
> > 
> > It's a very nice piece of hardware that allows very fine granularity
> > aperiodic interrupts (in each interrupt you set when the next one will
> > happen), without much overhead.
> 
> I believe the problem is like this: assume you have three timers,
> 10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
> blinking. With current approach, you get
> 
> 10msec userland runs
> <enter kernel>
> <process mouse>
> <process keyboard>
> <process cursor>
> <exit kernel>
> 
> With hires timers, you get:
> 
> 3msec userland runs
> <enter kernel>
> <process mouse>
> <exit kernel>
> 2msec userland runs
> <enter kernel>
> <process keyboard>
> <exit kernel>
> ...
> 
> which is not so efficient. I guess rounding could be implemented to
> preserve this "do-all-together" ability?

Actually that's exactly why you'd want sub-tick timing. For timers where
you don't care too much about the timing ;) you could do the rounding,
and for those where you need exact timing (sound, video, ...) you could
call a different add_timer() which would disable the coalescing.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-15  7:13         ` Vojtech Pavlik
@ 2002-10-15 21:45           ` george anzinger
  0 siblings, 0 replies; 26+ messages in thread
From: george anzinger @ 2002-10-15 21:45 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Pavel Machek, Ingo Adlung, linux-kernel, Linus Torvalds

Vojtech Pavlik wrote:
> 
> On Tue, Oct 15, 2002 at 12:17:47AM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > > >>This patch, in conjunction with the "core" high-res-timers
> > > > >>patch implements high resolution timers on the i386
> > > > >>platforms.
> > > > >
> > > > >
> > > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > > isn't going into my tree until some major distribution kicks me in the
> > > > > head and explains to me why the hell we have partial ticks instead of just
> > > > > making the ticks shorter.
> > >
> > > Not speaking for a major distro, just for me writing HPET (high
> > > performance event timer ...) support for x86-64 (and it happens to exist
> > > on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
> > >
> > > It's a very nice piece of hardware that allows very fine granularity
> > > aperiodic interrupts (in each interrupt you set when the next one will
> > > happen), without much overhead.
> >
> > I believe the problem is like this: assume you have three timers,
> > 10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
> > blinking. With current approach, you get
> >
> > 10msec userland runs
> > <enter kernel>
> > <process mouse>
> > <process keyboard>
> > <process cursor>
> > <exit kernel>
> >
> > With hires timers, you get:
> >
> > 3msec userland runs
> > <enter kernel>
> > <process mouse>
> > <exit kernel>
> > 2msec userland runs
> > <enter kernel>
> > <process keyboard>
> > <exit kernel>
> > ...
> >
> > which is not so efficient. I guess rounding could be implemented to
> > preserve this "do-all-together" ability?
> 
> Actually that's exactly why you'd want sub-tick timing. For timers where
> you don't care too much about the timing ;) you could do the rounding,
> and for those where you need exact timing (sound, video, ...) you could
> call a different add_timer() which would disable the coalescing.

The way you do this with the POSIX interface is to use the
low res CLOCKs.  Internally one would just set the
sub_jiffie in the struct timer_list to zero (as the
initialize code does).  This way the timer would always be
handled on the tick interrupt and would never cause a
"special" sub tick interrupt.

As the patch is currently written, it takes extra effort to
force a sub tick event (as it should) so one has to
"request" it.

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-09 23:14 ` Linus Torvalds
  2002-10-09 23:42   ` george anzinger
  2002-10-13 10:46   ` Ingo Adlung
@ 2002-10-17 21:54   ` Randy.Dunlap
  2002-10-17 22:11     ` Robert Love
  2002-10-18 13:11     ` mbs
  2 siblings, 2 replies; 26+ messages in thread
From: Randy.Dunlap @ 2002-10-17 21:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: george anzinger, linux-kernel

On Wed, 9 Oct 2002, Linus Torvalds wrote:

| On Wed, 9 Oct 2002, george anzinger wrote:
| >
| > This patch, in conjunction with the "core" high-res-timers
| > patch implements high resolution timers on the i386
| > platforms.
|
| I really don't get the notion of partial ticks, and quite frankly, this
| isn't going into my tree until some major distribution kicks me in the
| head and explains to me why the hell we have partial ticks instead of just
| making the ticks shorter.
| -

Carrier Grade Linux is not a distro, but we do integrate these
patches into the CGL patches and will continue to use it.

Please consider adding it to 2.5.

Thanks,
-- 
~Randy


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-17 21:54   ` Randy.Dunlap
@ 2002-10-17 22:11     ` Robert Love
  2002-10-18 13:11     ` mbs
  1 sibling, 0 replies; 26+ messages in thread
From: Robert Love @ 2002-10-17 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Randy.Dunlap, george anzinger, linux-kernel

On Thu, 2002-10-17 at 17:54, Randy.Dunlap wrote:

> Carrier Grade Linux is not a distro, but we do integrate these
> patches into the CGL patches and will continue to use it.
> 
> Please consider adding it to 2.5.

Indeed.  Linus, please consider merging at least George's latest patch
set which provides just the new system calls to support POSIX clocks and
timers.  There is no dependence on the high-resolution bits, so at least
Linux can provide the missing POSIX.4 system calls.

George can then provide the high resolution code separately which can be
debated and optionally merged.

	Robert Love


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-17 21:54   ` Randy.Dunlap
  2002-10-17 22:11     ` Robert Love
@ 2002-10-18 13:11     ` mbs
  1 sibling, 0 replies; 26+ messages in thread
From: mbs @ 2002-10-18 13:11 UTC (permalink / raw)
  To: Randy.Dunlap, Linus Torvalds; +Cc: george anzinger, linux-kernel

On Thursday 17 October 2002 17:54, Randy.Dunlap wrote:
> On Wed, 9 Oct 2002, Linus Torvalds wrote:
> | On Wed, 9 Oct 2002, george anzinger wrote:
> | > This patch, in conjunction with the "core" high-res-timers
> | > patch implements high resolution timers on the i386
> | > platforms.
> |
> | I really don't get the notion of partial ticks, and quite frankly, this
> | isn't going into my tree until some major distribution kicks me in the
> | head and explains to me why the hell we have partial ticks instead of
> | just making the ticks shorter.
> | -

because just making ticks shorter/more frequent just increases timer overhead 
all the time whether you are actually doing anything requiring it or not. 
this is a big waste of cpu cycles.

using the partial tick method put forward by george, you only pay the price 
for higher resolution timers WHEN YOU WANT TO.

most things that want say 1usec precision dont want to do something EVERY us, 
just something every now and then with 1us precision.  things like programs 
that want to block for a 350 usec. but waiting 10 or even 1 msec would be too 
long. 

the timer overhead using fixed interval timers (as you suggest) to support 
that occaisional 350 usec block would eat too much cpu to be practical.

increasing timer frequency penalizes ALL users/processes with increased timer 
overhead all the time for the benefit of the small number of tasks that need 
better resolution.  the sub-jiffie/partial tick model only pays that price 
when there is an actual timed event that needs to occur at that higher 
resolution and the rest of the time the timer overhead remains as it is today 
(which to my mind is 10 times what it needs to be, but that is an argument 
for another day)

embedded systems in particular need higher resolution and these types of 
systems are precisely the systems that can't afford to multiply their timer 
overhead by a factor of 10 or more (as increasing HZ to 1000 does).



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
@ 2002-10-19  1:02 Brad Bozarth
  0 siblings, 0 replies; 26+ messages in thread
From: Brad Bozarth @ 2002-10-19  1:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds

I would like to add my vote to including George's high-res patches in 
2.5...  The advantages have been expounded by others, along with their few 
downsides (compared to just bumping up HZ).  Especially for embedded 
systems, but in general also, these make sense..  I'm nearly done with an 
initial mips implementation, which we'll be using in my group at Cisco.

Thanks,
Brad


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-14  6:50 ` Ulrich Windl
@ 2002-10-15 22:03   ` george anzinger
  0 siblings, 0 replies; 26+ messages in thread
From: george anzinger @ 2002-10-15 22:03 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: jim.houston, linux-kernel

Ulrich Windl wrote:
> 
> On 12 Oct 2002, at 18:03, Jim Houston wrote:
> 
> >
> > >> This patch, in conjunction with the "core" high-res-timers
> > >> patch implements high resolution timers on the i386
> > >> platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > >                Linus
> >
> > Hi Linus,
> >
> > Concurrent has been using previous versions of the Posix timers patch
> > in our 2.4.18 based kernel.  I like this interface and would like to
> > see it included in your kernel.
> 
> Hi,
> 
> I think nobody objects seeing the interface implemented. Maybe just how
> it's implemented. I did not have a close look, but the concept seems
> odd at first sight.
> 
> Using a individial timer as interrupt source may be a different idea
> (if avaliable for the particular hardware), but the there must be a
> balance between busy looping in the kernel and setting up of such an
> individual interrupt.
> 
> The other thing is how to correlate it with the wall clock.

Ah, yes, that is it in a nut shell.  If you try to put the
high res timers in a "special" list and not in the same list
as the low res stuff, you have ordering issues.  It becomes
real easy to have the timers expire in the incorrect order. 

As to interrupt source and time, the biggest issue is that
we don't really have timers that interrupt in "nice" units
of time.  The PIT, for example, has a tick time (i.e. each
count) of 0.838095239 micro seconds.  So how are we to
figure time from such a tick if we want to use an integer
value for HZ.

What my patch suggests is that we use the higher resolution
TSC or pm timer (or what ever is available) and just use the
PIT to remind us to look at the clock, AND that we keep time
in units of that clock.  In some ways we already do this,
but we are not consistent.  For example we advance the time
by less than 1 ms each tick, but we still assume that a tick
is 1 ms when we set up timers.  This leads to standards
failures such as that illustrated by:

time sleep 60

which on a 2.5 system will sleep for less than 60 seconds
because of this.  

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-12 22:03 Jim Houston
@ 2002-10-14  6:50 ` Ulrich Windl
  2002-10-15 22:03   ` george anzinger
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Windl @ 2002-10-14  6:50 UTC (permalink / raw)
  To: jim.houston; +Cc: linux-kernel

On 12 Oct 2002, at 18:03, Jim Houston wrote:

> 
> >> This patch, in conjunction with the "core" high-res-timers 
> >> patch implements high resolution timers on the i386 
> >> platforms. 
> >
> > I really don't get the notion of partial ticks, and quite frankly, this 
> > isn't going into my tree until some major distribution kicks me in the 
> > head and explains to me why the hell we have partial ticks instead of just 
> > making the ticks shorter. 
> > 
> >                Linus 
> 
> Hi Linus,
> 
> Concurrent has been using previous versions of the Posix timers patch
> in our 2.4.18 based kernel.  I like this interface and would like to 
> see it included in your kernel.

Hi,

I think nobody objects seeing the interface implemented. Maybe just how 
it's implemented. I did not have a close look, but the concept seems 
odd at first sight.

Using a individial timer as interrupt source may be a different idea 
(if avaliable for the particular hardware), but the there must be a 
balance between busy looping in the kernel and setting up of such an 
individual interrupt.

The other thing is how to correlate it with the wall clock.

Sorry for not giving answers, I just know the problems...

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
@ 2002-10-12 22:03 Jim Houston
  2002-10-14  6:50 ` Ulrich Windl
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Houston @ 2002-10-12 22:03 UTC (permalink / raw)
  To: torvalds; +Cc: george, high-res-timers-discourse, linux-kernel


>> This patch, in conjunction with the "core" high-res-timers 
>> patch implements high resolution timers on the i386 
>> platforms. 
>
> I really don't get the notion of partial ticks, and quite frankly, this 
> isn't going into my tree until some major distribution kicks me in the 
> head and explains to me why the hell we have partial ticks instead of just 
> making the ticks shorter. 
> 
>                Linus 

Hi Linus,

Concurrent has been using previous versions of the Posix timers patch
in our 2.4.18 based kernel.  I like this interface and would like to 
see it included in your kernel.

What would make the patch more acceptable?  Would it be acceptable
if it used a separate queue for the Posix timers and minimized changes
to timer.c?

To answer the partial tick question, it's a trade off.  If all you need
is 1 milli-second resolution, it might not be worth spliting the tick.
It's a question of how the overhead to set up a timer compares to the 
overhead of the higher frequency tick interrupts.  If you want
micro-second resolution, you need to split the tick.

This is important to folks doing control systems.  They get excited
about timing jitter and resolution.  It is also interesting to folks
doing games.  It's nice to be able to do short delays by blocking rather
than having to spin in a delay loop.

I'd feel better about this being used for critical applications if
the games folks beat it up first.

Jim Houston - Concurrent Computer Corp.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10  0:50 Dan Kegel
  2002-10-10  1:33 ` Ben Greear
  2002-10-10  3:55 ` Jeff Dike
@ 2002-10-10 12:34 ` mbs
  2 siblings, 0 replies; 26+ messages in thread
From: mbs @ 2002-10-10 12:34 UTC (permalink / raw)
  To: Dan Kegel, linux-kernel

On Wednesday 09 October 2002 20:50, Dan Kegel wrote:
> line rate.   Now, I'm way far from the code, but I suspect that
> the interrupt overhead needed to get the precision the customer
> is calling for would be totally prohibitive.  I dunno if we'll

only in a fixed interval tick system.  Early in george's design process I 
argued for a tickless system,(which I had implemented in my company's 
proprietary real-time OS)  which has _no_ extra overhead and does away with 
the 10ms tick entirely.  the precision attained is whatever the highest 
resolution interrupting counter on the system is capable of.

george did extensive benchmarking of candidate implementations of both 
designs and came to the conclusion that the 10ms jiffie fixed interval tick 
plus on demand higher resolution ticks was more suitable for general purpose 
uses than the tickless system, particularly under high load when there are 
many low resolution timed events in the system (as in a server situation).

it turned out that the tickless system was appropriate for embedded systems 
(my focus) which tend to have small numbers of well coordinated tasks running 
and not so good in environments with a lot of things going on, such as a 
multimedia desktop or big honkin server.

whith the hybrid system that george developed, you get the batching benefits 
of low resolution fixed interval timers, which provides all the capability 
most timer services customers need while at the same time, and for minimal 
overhead providing the high resolution timers that the embedded world needs.




-- 
/**************************************************
**   Mark Salisbury       ||      mbs@mc.com     **
**************************************************/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10  0:50 Dan Kegel
  2002-10-10  1:33 ` Ben Greear
@ 2002-10-10  3:55 ` Jeff Dike
  2002-10-10  3:32   ` Dan Kegel
  2002-10-10 12:34 ` mbs
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff Dike @ 2002-10-10  3:55 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel

dank@kegel.com said:
> George's approach would work a lot better when doing lots of UML VM's
> on a single box, too, wouldn't it? 

My thinking on this is that I'll have UML do the on-demand ticks.  So, on
a host with n UMLs, we will no longer have n * HZ timer deliveries/sec.

I haven't thought a lot about it, but this seems largely unconnected to how 
the host does its timers.

The one connection I can think of is that any generic support for on-demand
ticks would be re-used by UML.  And if UML required generic changes for this,
then that would obviously affect the other ports somehow.

				Jeff


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10  3:55 ` Jeff Dike
@ 2002-10-10  3:32   ` Dan Kegel
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Kegel @ 2002-10-10  3:32 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

Jeff Dike wrote:
> 
> dank@kegel.com said:
> > George's approach would work a lot better when doing lots of UML VM's
> > on a single box, too, wouldn't it?
> 
> My thinking on this is that I'll have UML do the on-demand ticks. ...
> any generic support for on-demand
> ticks would be re-used by UML.  And if UML required generic changes for this,
> then that would obviously affect the other ports somehow.

Yes, exactly.  UML wants on-demand ticks, which is exactly what George's
patch
uses, too.  I'm too far from the code to say, but there ought to be some
commonality there.
- Dan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
  2002-10-10  0:50 Dan Kegel
@ 2002-10-10  1:33 ` Ben Greear
  2002-10-10  3:55 ` Jeff Dike
  2002-10-10 12:34 ` mbs
  2 siblings, 0 replies; 26+ messages in thread
From: Ben Greear @ 2002-10-10  1:33 UTC (permalink / raw)
  Cc: linux-kernel


> george anzinger wrote:
> 
>>Linus Torvalds wrote:
>>
>>>I really don't get the notion of partial ticks, and quite frankly, this
>>>isn't going into my tree until some major distribution kicks me in the
>>>head and explains to me why the hell we have partial ticks instead of just
>>>making the ticks shorter.
>>
>>...
>>
>>Making ticks shorter causes extra overhead ALL the time,
>>even when it is not needed.  Higher resolution is not free
>>in any case, but it is much closer to free with this patch
>>than by increasing HZ (which, of course, can still be
>>done).  Overhead wise and resolution wise, for timers, we
>>would be better off with a 1/HZ tick and the "on demand"
>>high-res interrupts this patch introduces.

I would like to add my small vote for including the timers too.

I have not looked at the code, but the idea seems sound (let
those who need the timers pay the price at that time, don't make
the rest of the machine suffer otherwise)....

Enjoy,
Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1
@ 2002-10-10  0:50 Dan Kegel
  2002-10-10  1:33 ` Ben Greear
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Dan Kegel @ 2002-10-10  0:50 UTC (permalink / raw)
  To: linux-kernel

george anzinger wrote:
> Linus Torvalds wrote:
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> ...
> 
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed.  Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done).  Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.

Seems reasonable to me.  Increasing HZ adds overhead -
it makes sense to incur the interrupt overhead only when it's
needed.  In my case, we want to provide fairly precise
network delays (we're doing a WAN simulator), and still hit
line rate.   Now, I'm way far from the code, but I suspect that
the interrupt overhead needed to get the precision the customer
is calling for would be totally prohibitive.  I dunno if we'll
get the precision the customer wants with George's approach,
but we'll get a lot closer than we would setting HZ to 10000
on our wimpy little embedded platform.

George's approach would work a lot better when doing lots of UML VM's
on a single box, too, wouldn't it?
- Dan

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2002-10-19  0:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-09 22:47 [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 5.1 george anzinger
2002-10-09 23:14 ` Linus Torvalds
2002-10-09 23:42   ` george anzinger
2002-10-10 15:03     ` Eric W. Biederman
2002-10-10 15:45       ` george anzinger
2002-10-10 15:54     ` Oliver Xymoron
2002-10-10 16:24       ` george anzinger
2002-10-10 17:04         ` Oliver Xymoron
2002-10-10 17:47           ` george anzinger
2002-10-13 10:46   ` Ingo Adlung
2002-10-14  7:18     ` Vojtech Pavlik
2002-10-14 22:17       ` Pavel Machek
2002-10-15  7:13         ` Vojtech Pavlik
2002-10-15 21:45           ` george anzinger
2002-10-17 21:54   ` Randy.Dunlap
2002-10-17 22:11     ` Robert Love
2002-10-18 13:11     ` mbs
2002-10-10  0:50 Dan Kegel
2002-10-10  1:33 ` Ben Greear
2002-10-10  3:55 ` Jeff Dike
2002-10-10  3:32   ` Dan Kegel
2002-10-10 12:34 ` mbs
2002-10-12 22:03 Jim Houston
2002-10-14  6:50 ` Ulrich Windl
2002-10-15 22:03   ` george anzinger
2002-10-19  1:02 Brad Bozarth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).