linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] preempt-cleanup.patch, 2.6.9-rc2
@ 2004-09-14  9:15 Ingo Molnar
  2004-09-14  9:34 ` [printk] make console_conditional_schedule() __sched and use cond_resched() William Lee Irwin III
  2004-09-14  9:38 ` [patch] preempt-lock-need-resched.patch, 2.6.9-rc2 Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14  9:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]


the attached patch is ontop of preempt-smp.patch. This is another
generic fallout from the voluntary-preempt patchset: a cleanup of the
cond_resched() infrastructure, in preparation of the latency reduction
patches. The changes:

 - uninline cond_resched() - this makes the footprint smaller,
   especially once the number of cond_resched() points increase.

 - add a 'was rescheduled' return value to cond_resched. This makes it
   symmetric to cond_resched_lock() and later latency reduction patches
   rely on the ability to tell whether there was any preemption.

 - make cond_resched() more robust by using the same mechanism as
   preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's
   state - e.g. if the task is in TASK_ZOMBIE but gets preempted via
   cond_resched() just prior scheduling off then this approach preserves
   TASK_ZOMBIE.

 - the patch also adds need_lockbreak() which critical sections can use 
   to detect lock-break requests.

i've tested the patch on x86 SMP and UP.

	Ingo

[-- Attachment #2: preempt-cleanup.patch --]
[-- Type: text/plain, Size: 2972 bytes --]


the attached patch is ontop of preempt-smp.patch. This is another
generic fallout from the voluntary-preempt patchset: a cleanup of the
cond_resched() infrastructure, in preparation of the latency reduction
patches. The changes:

 - uninline cond_resched() - this makes the footprint smaller,
   especially once the number of cond_resched() points increase.

 - add a 'was rescheduled' return value to cond_resched. This makes it
   symmetric to cond_resched_lock() and later latency reduction patches
   rely on the ability to tell whether there was any preemption.

 - make cond_resched() more robust by using the same mechanism as
   preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's
   state - e.g. if the task is in TASK_ZOMBIE but gets preempted via
   cond_resched() just prior scheduling off then this approach preserves
   TASK_ZOMBIE.

 - the patch also adds need_lockbreak() which critical sections can use
   to detect lock-break requests.

i've tested the patch on x86 SMP and UP.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/include/linux/sched.h.orig	
+++ linux/include/linux/sched.h	
@@ -951,15 +951,24 @@ static inline int need_resched(void)
 	return unlikely(test_thread_flag(TIF_NEED_RESCHED));
 }
 
-extern void __cond_resched(void);
-static inline void cond_resched(void)
-{
-	if (need_resched())
-		__cond_resched();
-}
-
+/*
+ * cond_resched() and cond_resched_lock(): latency reduction via
+ * explicit rescheduling in places that are safe. The return
+ * value indicates whether a reschedule was done in fact.
+ */
+extern int cond_resched(void);
 extern int cond_resched_lock(spinlock_t * lock);
 
+/*
+ * Does a critical section need to be broken due to another
+ * task waiting?:
+ */
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)
+# define need_lockbreak(lock) ((lock)->break_lock)
+#else
+# define need_lockbreak(lock) 0
+#endif
+
 /* Reevaluate whether the task has signals pending delivery.
    This is required every time the blocked sigset_t changes.
    callers must hold sighand->siglock.  */
--- linux/kernel/sched.c.orig	
+++ linux/kernel/sched.c	
@@ -3539,13 +3539,25 @@ asmlinkage long sys_sched_yield(void)
 	return 0;
 }
 
-void __sched __cond_resched(void)
+static inline void __cond_resched(void)
 {
-	set_current_state(TASK_RUNNING);
-	schedule();
+	do {
+		preempt_count() += PREEMPT_ACTIVE;
+		schedule();
+		preempt_count() -= PREEMPT_ACTIVE;
+	} while (need_resched());
 }
 
-EXPORT_SYMBOL(__cond_resched);
+int __sched cond_resched(void)
+{
+	if (need_resched()) {
+		__cond_resched();
+		return 1;
+	}
+	return 0;
+}
+
+EXPORT_SYMBOL(cond_resched);
 
 /*
  * cond_resched_lock() - if a reschedule is pending, drop the given lock,
@@ -3568,8 +3580,7 @@ int cond_resched_lock(spinlock_t * lock)
 	if (need_resched()) {
 		_raw_spin_unlock(lock);
 		preempt_enable_no_resched();
-		set_current_state(TASK_RUNNING);
-		schedule();
+		__cond_resched();
 		spin_lock(lock);
 		return 1;
 	}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [printk] make console_conditional_schedule() __sched and use cond_resched()
  2004-09-14  9:15 [patch] preempt-cleanup.patch, 2.6.9-rc2 Ingo Molnar
@ 2004-09-14  9:34 ` William Lee Irwin III
  2004-09-14  9:38 ` [patch] preempt-lock-need-resched.patch, 2.6.9-rc2 Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14  9:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

On Tue, Sep 14, 2004 at 11:15:29AM +0200, Ingo Molnar wrote:
> the attached patch is ontop of preempt-smp.patch. This is another
> generic fallout from the voluntary-preempt patchset: a cleanup of the
> cond_resched() infrastructure, in preparation of the latency reduction
> patches. The changes:

Relatively minor add-on (not necessarily tied to it or required to be
taken or a fix for any bug). Since cond_resched() is using
PREEMPT_ACTIVE now, it may be useful to update the open-coded instance
of cond_resched() to use the generic call. Also, it should probably be
__sched so the caller shows up in wchan.


-- wli

Index: mm5-2.6.9-rc1/kernel/printk.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/printk.c	2004-08-24 01:13:48.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/printk.c	2004-09-14 02:23:41.222204160 -0700
@@ -658,12 +658,10 @@
  *
  * Must be called within acquire_console_sem().
  */
-void console_conditional_schedule(void)
+void __sched console_conditional_schedule(void)
 {
-	if (console_may_schedule && need_resched()) {
-		set_current_state(TASK_RUNNING);
-		schedule();
-	}
+	if (console_may_schedule)
+		cond_resched();
 }
 EXPORT_SYMBOL(console_conditional_schedule);
 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] preempt-lock-need-resched.patch, 2.6.9-rc2
  2004-09-14  9:15 [patch] preempt-cleanup.patch, 2.6.9-rc2 Ingo Molnar
  2004-09-14  9:34 ` [printk] make console_conditional_schedule() __sched and use cond_resched() William Lee Irwin III
@ 2004-09-14  9:38 ` Ingo Molnar
  2004-09-14  9:51   ` [patch] sched: add cond_resched_softirq() Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14  9:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 299 bytes --]


the attached preempt-lock-need-resched.patch is ontop of the preempt-smp
and preempt-cleanups patches. It adds lock_need_resched() which is to
check for the necessity of lock-break in a critical section. Used by
later latency-break patches.

tested on x86, should work on all architectures.

	Ingo

[-- Attachment #2: preempt-lock-need-resched.patch --]
[-- Type: text/plain, Size: 933 bytes --]


the attached preempt-lock-need-resched.patch is ontop of the preempt-smp
and preempt-cleanups patches. It adds lock_need_resched() which is to
check for the necessity of lock-break in a critical section. Used by
later latency-break patches.

tested on x86, should work on all architectures.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/include/linux/sched.h.orig	
+++ linux/include/linux/sched.h	
@@ -969,6 +972,17 @@ extern int cond_resched_lock(spinlock_t 
 # define need_lockbreak(lock) 0
 #endif
 
+/*
+ * Does a critical section need to be broken due to another
+ * task waiting or preemption being signalled:
+ */
+static inline int lock_need_resched(spinlock_t *lock)
+{
+	if (need_lockbreak(lock) || need_resched())
+		return 1;
+	return 0;
+}
+
 /* Reevaluate whether the task has signals pending delivery.
    This is required every time the blocked sigset_t changes.
    callers must hold sighand->siglock.  */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: add cond_resched_softirq()
  2004-09-14  9:38 ` [patch] preempt-lock-need-resched.patch, 2.6.9-rc2 Ingo Molnar
@ 2004-09-14  9:51   ` Ingo Molnar
  2004-09-14  9:57     ` [patch] sched: fix latency in random driver Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14  9:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 290 bytes --]


the attached patch comes after preempt-lock-need-resched.patch.

it adds cond_resched_softirq() which can be used by _process context_
softirqs-disabled codepaths to preempt if necessary. The function will
enable softirqs before scheduling. (Later patches will use this
primitive.)

	Ingo

[-- Attachment #2: sched-add-cond_resched_softirq.patch --]
[-- Type: text/plain, Size: 1431 bytes --]


the attached patch comes after preempt-lock-need-resched.patch.

it adds cond_resched_softirq() which can be used by _process context_
softirqs-disabled codepaths to preempt if necessary. The function will
enable softirqs before scheduling. (Later patches will use this
primitive.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/include/linux/sched.h.orig	
+++ linux/include/linux/sched.h	
@@ -955,9 +955,12 @@ static inline int need_resched(void)
  * cond_resched() and cond_resched_lock(): latency reduction via
  * explicit rescheduling in places that are safe. The return
  * value indicates whether a reschedule was done in fact.
+ * cond_resched_lock() will drop the spinlock before scheduling,
+ * cond_resched_softirq() will enable bhs before scheduling.
  */
 extern int cond_resched(void);
 extern int cond_resched_lock(spinlock_t * lock);
+extern int cond_resched_softirq(void);
 
 /*
  * Does a critical section need to be broken due to another
--- linux/kernel/sched.c.orig	
+++ linux/kernel/sched.c	
@@ -3589,6 +3589,22 @@ int cond_resched_lock(spinlock_t * lock)
 
 EXPORT_SYMBOL(cond_resched_lock);
 
+int __sched cond_resched_softirq(void)
+{
+	BUG_ON(!in_softirq());
+
+	if (need_resched()) {
+		__local_bh_enable();
+		__cond_resched();
+		local_bh_disable();
+		return 1;
+	}
+	return 0;
+}
+
+EXPORT_SYMBOL(cond_resched_softirq);
+
+
 /**
  * yield - yield the current processor to other threads.
  *

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: fix latency in random driver
  2004-09-14  9:51   ` [patch] sched: add cond_resched_softirq() Ingo Molnar
@ 2004-09-14  9:57     ` Ingo Molnar
  2004-09-14 10:06       ` [patch] sched, ext3: fix scheduling latencies in ext3 Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14  9:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, tytso

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]


the attached patch fixes possibly long scheduling latencies in the
/dev/random driver's rekey_seq_generator() function, by moving the
expensive get_random_bytes() function out from under ip_lock.

has been in the -VP patchset for quite some time.

	Ingo

[-- Attachment #2: fix-latency-random.patch --]
[-- Type: text/plain, Size: 955 bytes --]


the attached patch fixes possibly long scheduling latencies in the
/dev/random driver's rekey_seq_generator() function, by moving the
expensive get_random_bytes() function out from under ip_lock.

has been in the -VP patchset for quite some time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/drivers/char/random.c.orig	
+++ linux/drivers/char/random.c	
@@ -2220,17 +2220,18 @@ static unsigned int ip_cnt;
 
 static void rekey_seq_generator(void *private_)
 {
-	struct keydata *keyptr;
+	struct keydata *keyptr, tmp;
 	struct timeval 	tv;
 
 	do_gettimeofday(&tv);
+	get_random_bytes(tmp.secret, sizeof(tmp.secret));
 
 	spin_lock_bh(&ip_lock);
 	keyptr = &ip_keydata[ip_cnt&1];
 
 	keyptr = &ip_keydata[1^(ip_cnt&1)];
 	keyptr->rekey_time = tv.tv_sec;
-	get_random_bytes(keyptr->secret, sizeof(keyptr->secret));
+	memcpy(keyptr->secret, tmp.secret, sizeof(keyptr->secret));
 	keyptr->count = (ip_cnt&COUNT_MASK)<<HASH_BITS;
 	mb();
 	ip_cnt++;

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, ext3: fix scheduling latencies in ext3
  2004-09-14  9:57     ` [patch] sched: fix latency in random driver Ingo Molnar
@ 2004-09-14 10:06       ` Ingo Molnar
  2004-09-14 10:13         ` [patch] sched, vfs: fix scheduling latencies in invalidate_inodes() Ingo Molnar
  2004-09-14 10:19         ` [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent() Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Stephen C. Tweedie, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 226 bytes --]


the attached patch fixes long scheduling latencies in the ext3 code, and
it also cleans up the existing lock-break functionality to use the new
primitives.

This patch has been in the -VP patchset for quite some time.

	Ingo

[-- Attachment #2: fix-latency-ext3.patch --]
[-- Type: text/plain, Size: 5447 bytes --]


the attached patch fixes long scheduling latencies in the ext3 code, and
it also cleans up the existing lock-break functionality to use the new
primitives.

This patch has been in the -VP patchset for quite some time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/jbd/checkpoint.c.orig	
+++ linux/fs/jbd/checkpoint.c	
@@ -333,6 +333,10 @@ int log_do_checkpoint(journal_t *journal
 				break;
 			}
 			retry = __flush_buffer(journal, jh, bhs, &batch_count, &drop_count);
+			if (cond_resched_lock(&journal->j_list_lock)) {
+				retry = 1;
+				break;
+			}
 		} while (jh != last_jh && !retry);
 
 		if (batch_count)
@@ -487,6 +491,14 @@ int __journal_clean_checkpoint_list(jour
 				/* Use trylock because of the ranknig */
 				if (jbd_trylock_bh_state(jh2bh(jh)))
 					ret += __try_to_free_cp_buf(jh);
+				/*
+				 * This function only frees up some memory
+				 * if possible so we dont have an obligation
+				 * to finish processing. Bail out if preemption
+				 * requested:
+				 */
+				if (need_resched())
+					goto out;
 			} while (jh != last_jh);
 		}
 	} while (transaction != last_transaction);
--- linux/fs/jbd/commit.c.orig	
+++ linux/fs/jbd/commit.c	
@@ -262,7 +262,7 @@ write_out_data:
 			__journal_file_buffer(jh, commit_transaction,
 						BJ_Locked);
 			jbd_unlock_bh_state(bh);
-			if (need_resched()) {
+			if (lock_need_resched(&journal->j_list_lock)) {
 				spin_unlock(&journal->j_list_lock);
 				goto write_out_data;
 			}
@@ -288,7 +288,7 @@ write_out_data:
 				jbd_unlock_bh_state(bh);
 				journal_remove_journal_head(bh);
 				put_bh(bh);
-				if (need_resched()) {
+				if (lock_need_resched(&journal->j_list_lock)) {
 					spin_unlock(&journal->j_list_lock);
 					goto write_out_data;
 				}
@@ -333,11 +333,7 @@ write_out_data:
 			jbd_unlock_bh_state(bh);
 		}
 		put_bh(bh);
-		if (need_resched()) {
-			spin_unlock(&journal->j_list_lock);
-			cond_resched();
-			spin_lock(&journal->j_list_lock);
-		}
+		cond_resched_lock(&journal->j_list_lock);
 	}
 	spin_unlock(&journal->j_list_lock);
 
@@ -545,6 +541,8 @@ wait_for_iobuf:
 			wait_on_buffer(bh);
 			goto wait_for_iobuf;
 		}
+		if (cond_resched())
+			goto wait_for_iobuf;
 
 		if (unlikely(!buffer_uptodate(bh)))
 			err = -EIO;
@@ -599,6 +597,8 @@ wait_for_iobuf:
 			wait_on_buffer(bh);
 			goto wait_for_ctlbuf;
 		}
+		if (cond_resched())
+			goto wait_for_ctlbuf;
 
 		if (unlikely(!buffer_uptodate(bh)))
 			err = -EIO;
@@ -719,6 +719,7 @@ skip_commit: /* The journal should be un
 	J_ASSERT(commit_transaction->t_shadow_list == NULL);
 	J_ASSERT(commit_transaction->t_log_list == NULL);
 
+restart_loop:
 	while (commit_transaction->t_forget) {
 		transaction_t *cp_transaction;
 		struct buffer_head *bh;
@@ -792,6 +793,8 @@ skip_commit: /* The journal should be un
 			release_buffer_page(bh);
 		}
 		spin_unlock(&journal->j_list_lock);
+		if (cond_resched())
+			goto restart_loop;
 	}
 
 	/* Done with this transaction! */
--- linux/fs/jbd/recovery.c.orig	
+++ linux/fs/jbd/recovery.c	
@@ -354,6 +354,8 @@ static int do_one_pass(journal_t *journa
 		struct buffer_head *	obh;
 		struct buffer_head *	nbh;
 
+		cond_resched();		/* We're under lock_kernel() */
+
 		/* If we already know where to stop the log traversal,
 		 * check right now that we haven't gone past the end of
 		 * the log. */
--- linux/fs/ext3/ialloc.c.orig	
+++ linux/fs/ext3/ialloc.c	
@@ -733,6 +733,7 @@ unsigned long ext3_count_free_inodes (st
 		if (!gdp)
 			continue;
 		desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
+		cond_resched();
 	}
 	return desc_count;
 #endif
--- linux/fs/ext3/namei.c.orig	
+++ linux/fs/ext3/namei.c	
@@ -674,6 +674,7 @@ static int dx_make_map (struct ext3_dir_
 			map_tail->hash = h.hash;
 			map_tail->offs = (u32) ((char *) de - base);
 			count++;
+			cond_resched();
 		}
 		/* XXX: do we need to check rec_len == 0 case? -Chris */
 		de = (struct ext3_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len));
--- linux/fs/ext3/super.c.orig	
+++ linux/fs/ext3/super.c	
@@ -1231,6 +1231,8 @@ static int ext3_fill_super (struct super
 	sbi->s_resuid = EXT3_DEF_RESUID;
 	sbi->s_resgid = EXT3_DEF_RESGID;
 
+	unlock_kernel();
+
 	blocksize = sb_min_blocksize(sb, EXT3_MIN_BLOCK_SIZE);
 	if (!blocksize) {
 		printk(KERN_ERR "EXT3-fs: unable to set blocksize\n");
@@ -1576,6 +1578,7 @@ static int ext3_fill_super (struct super
 	percpu_counter_mod(&sbi->s_dirs_counter,
 		ext3_count_dirs(sb));
 
+	lock_kernel();
 	return 0;
 
 failed_mount3:
@@ -1597,6 +1600,7 @@ failed_mount:
 out_fail:
 	sb->s_fs_info = NULL;
 	kfree(sbi);
+	lock_kernel();
 	return -EINVAL;
 }
 
@@ -2113,9 +2117,11 @@ int ext3_statfs (struct super_block * sb
 		 * block group descriptors.  If the sparse superblocks
 		 * feature is turned on, then not all groups have this.
 		 */
-		for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++)
+		for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
 			overhead += ext3_bg_has_super(sb, i) +
 				ext3_bg_num_gdb(sb, i);
+			cond_resched();
+		}
 
 		/*
 		 * Every block group has an inode bitmap, a block
--- linux/fs/ext3/balloc.c.orig	
+++ linux/fs/ext3/balloc.c	
@@ -207,6 +207,11 @@ do_more:
 		}
 		jbd_lock_bh_state(bitmap_bh);
 #endif
+		if (need_resched()) {
+			jbd_unlock_bh_state(bitmap_bh);
+			cond_resched();
+			jbd_lock_bh_state(bitmap_bh);
+		}
 		/* @@@ This prevents newly-allocated data from being
 		 * freed and then reallocated within the same
 		 * transaction. 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, vfs: fix scheduling latencies in invalidate_inodes()
  2004-09-14 10:06       ` [patch] sched, ext3: fix scheduling latencies in ext3 Ingo Molnar
@ 2004-09-14 10:13         ` Ingo Molnar
  2004-09-14 10:19         ` [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent() Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 315 bytes --]


the attached patch fixes long scheduling latencies in
invalidate_inodes(). The lock-break is a bit tricky to not get into a
livelock scenario: we use a dummy inode as a marker at which point we
can continue the scanning after the schedule.

this patch has been tested as part of the -VP patchset for weeks.

	Ingo

[-- Attachment #2: fix-latency-invalidate-inodes.patch --]
[-- Type: text/plain, Size: 2629 bytes --]


the attached patch fixes long scheduling latencies in
invalidate_inodes(). The lock-break is a bit tricky to not get into a
livelock scenario: we use a dummy inode as a marker at which point we
can continue the scanning after the schedule.

this patch has been tested as part of the -VP patchset for weeks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/inode.c.orig	
+++ linux/fs/inode.c	
@@ -296,7 +296,7 @@ static void dispose_list(struct list_hea
 /*
  * Invalidate all inodes for a device.
  */
-static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose)
+static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose, struct list_head *mark)
 {
 	struct list_head *next;
 	int busy = 0, count = 0;
@@ -306,6 +306,20 @@ static int invalidate_list(struct list_h
 		struct list_head * tmp = next;
 		struct inode * inode;
 
+		/*
+		 * Preempt if necessary. To make this safe we use a dummy
+		 * inode as a marker - we can continue off that point.
+		 * We rely on this sb's inodes (including the marker) not
+		 * getting reordered within the list during umount. Other
+		 * inodes might get reordered.
+		 */
+		if (lock_need_resched(&inode_lock)) {
+			list_add_tail(mark, next);
+			BUG_ON(mark->next != next);
+			cond_resched_lock(&inode_lock);
+			tmp = next = mark->next;
+			list_del(mark);
+		}
 		next = next->next;
 		if (tmp == head)
 			break;
@@ -347,18 +361,26 @@ int invalidate_inodes(struct super_block
 {
 	int busy;
 	LIST_HEAD(throw_away);
+	struct inode *marker;
+	struct list_head *mark;
+
+	marker = kmalloc(sizeof(*marker), SLAB_KERNEL|__GFP_REPEAT);
+	memset(marker, 0, sizeof(*marker));
+	mark = &marker->i_list;
 
 	down(&iprune_sem);
 	spin_lock(&inode_lock);
-	busy = invalidate_list(&inode_in_use, sb, &throw_away);
-	busy |= invalidate_list(&inode_unused, sb, &throw_away);
-	busy |= invalidate_list(&sb->s_dirty, sb, &throw_away);
-	busy |= invalidate_list(&sb->s_io, sb, &throw_away);
+	busy = invalidate_list(&inode_in_use, sb, &throw_away, mark);
+	busy |= invalidate_list(&inode_unused, sb, &throw_away, mark);
+	busy |= invalidate_list(&sb->s_dirty, sb, &throw_away, mark);
+	busy |= invalidate_list(&sb->s_io, sb, &throw_away, mark);
 	spin_unlock(&inode_lock);
 
 	dispose_list(&throw_away);
 	up(&iprune_sem);
 
+	kfree(marker);
+
 	return busy;
 }
 
@@ -429,6 +451,8 @@ static void prune_icache(int nr_to_scan)
 	for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
 		struct inode *inode;
 
+		cond_resched_lock(&inode_lock);
+
 		if (list_empty(&inode_unused))
 			break;
 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent()
  2004-09-14 10:06       ` [patch] sched, ext3: fix scheduling latencies in ext3 Ingo Molnar
  2004-09-14 10:13         ` [patch] sched, vfs: fix scheduling latencies in invalidate_inodes() Ingo Molnar
@ 2004-09-14 10:19         ` Ingo Molnar
  2004-09-14 10:25           ` [patch] sched, net: fix scheduling latencies in netstat Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]


the attached patch fixes long scheduling latencies in select_parent()
and prune_dcache(). The prune_dcache() lock-break is easy, but for
select_parent() the only viable solution i found was to break out if
there's a resched necessary - the reordering is not necessary and the
dcache scanning/shrinking will later on do it anyway.

This patch has been in the -VP patchset for weeks.

	Ingo

[-- Attachment #2: fix-latency-dcache.patch --]
[-- Type: text/plain, Size: 1249 bytes --]


the attached patch fixes long scheduling latencies in select_parent()
and prune_dcache(). The prune_dcache() lock-break is easy, but for
select_parent() the only viable solution i found was to break out if
there's a resched necessary - the reordering is not necessary and the
dcache scanning/shrinking will later on do it anyway.

This patch has been in the -VP patchset for weeks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/dcache.c.orig	
+++ linux/fs/dcache.c	
@@ -381,6 +381,8 @@ static void prune_dcache(int count)
 		struct dentry *dentry;
 		struct list_head *tmp;
 
+		cond_resched_lock(&dcache_lock);
+
 		tmp = dentry_unused.prev;
 		if (tmp == &dentry_unused)
 			break;
@@ -553,6 +555,14 @@ resume:
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
 		next = tmp->next;
 
+		/*
+		 * select_parent() is a performance optimization, it is
+		 * not necessary to complete it. Abort if a reschedule is
+		 * pending:
+		 */
+		if (need_resched())
+			goto out;
+
 		if (!list_empty(&dentry->d_lru)) {
 			dentry_stat.nr_unused--;
 			list_del_init(&dentry->d_lru);
@@ -590,6 +600,7 @@ this_parent->d_parent->d_name.name, this
 #endif
 		goto resume;
 	}
+out:
 	spin_unlock(&dcache_lock);
 	return found;
 }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, net: fix scheduling latencies in netstat
  2004-09-14 10:19         ` [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent() Ingo Molnar
@ 2004-09-14 10:25           ` Ingo Molnar
  2004-09-14 10:44             ` [patch] sched, net: fix scheduling latencies in __release_sock Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]


the attached patch fixes long scheduling latencies caused by access to
the /proc/net/tcp file. The seqfile functions keep softirqs disabled for
a very long time (i've seen reports of 20+ msecs, if there are enough
sockets in the system). With the attached patch it's below 100 usecs.

the cond_resched_softirq() relies on the implicit knowledge that this
code executes in process context and runs with softirqs disabled.

potentially enabling softirqs means that the socket list might change
between buckets - but this is not an issue since seqfiles have a 4K
iteration granularity anyway and /proc/net/tcp is often (much) larger
than that.

This patch has been in the -VP patchset for weeks.

	Ingo

[-- Attachment #2: fix-latency-netstat.patch --]
[-- Type: text/plain, Size: 1519 bytes --]


the attached patch fixes long scheduling latencies caused by access to
the /proc/net/tcp file. The seqfile functions keep softirqs disabled for
a very long time (i've seen reports of 20+ msecs, if there are enough
sockets in the system). With the attached patch it's below 100 usecs.

the cond_resched_softirq() relies on the implicit knowledge that this
code executes in process context and runs with softirqs disabled.

potentially enabling softirqs means that the socket list might change
between buckets - but this is not an issue since seqfiles have a 4K
iteration granularity anyway and /proc/net/tcp is often (much) larger
than that.

This patch has been in the -VP patchset for weeks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/net/ipv4/tcp_ipv4.c.orig	
+++ linux/net/ipv4/tcp_ipv4.c	
@@ -2227,7 +2227,10 @@ static void *established_get_first(struc
 		struct sock *sk;
 		struct hlist_node *node;
 		struct tcp_tw_bucket *tw;
-	       
+
+		/* We can reschedule _before_ having picked the target: */
+		cond_resched_softirq();
+
 		read_lock(&tcp_ehash[st->bucket].lock);
 		sk_for_each(sk, node, &tcp_ehash[st->bucket].chain) {
 			if (sk->sk_family != st->family) {
@@ -2274,6 +2277,10 @@ get_tw:
 		}
 		read_unlock(&tcp_ehash[st->bucket].lock);
 		st->state = TCP_SEQ_STATE_ESTABLISHED;
+
+		/* We can reschedule between buckets: */
+		cond_resched_softirq();
+
 		if (++st->bucket < tcp_ehash_size) {
 			read_lock(&tcp_ehash[st->bucket].lock);
 			sk = sk_head(&tcp_ehash[st->bucket].chain);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, net: fix scheduling latencies in __release_sock
  2004-09-14 10:25           ` [patch] sched, net: fix scheduling latencies in netstat Ingo Molnar
@ 2004-09-14 10:44             ` Ingo Molnar
  2004-09-14 10:50               ` [patch] sched, mm: fix scheduling latencies in copy_page_range() Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David S. Miller, Alexey Kuznetsov

[-- Attachment #1: Type: text/plain, Size: 317 bytes --]


the attached patch fixes long scheduling latencies caused by backlog
triggered by __release_sock(). That code only executes in process
context, and we've made the backlog queue private already at this point
so it is safe to do a cond_resched_softirq().

This patch has been in the -VP patchset for some time.

	Ingo

[-- Attachment #2: fix-latency-release_sock.patch --]
[-- Type: text/plain, Size: 808 bytes --]


the attached patch fixes long scheduling latencies caused by backlog
triggered by __release_sock(). That code only executes in process
context, and we've made the backlog queue private already at this point
so it is safe to do a cond_resched_softirq().

This patch has been in the -VP patchset for some time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/net/core/sock.c.orig	
+++ linux/net/core/sock.c	
@@ -933,6 +933,15 @@ void __release_sock(struct sock *sk)
 
 			skb->next = NULL;
 			sk->sk_backlog_rcv(sk, skb);
+
+			/*
+			 * We are in process context here with softirqs
+			 * disabled, use cond_resched_softirq() to preempt.
+			 * This is safe to do because we've taken the backlog
+			 * queue private:
+			 */
+			cond_resched_softirq();
+
 			skb = next;
 		} while (skb != NULL);
 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, mm: fix scheduling latencies in copy_page_range()
  2004-09-14 10:44             ` [patch] sched, net: fix scheduling latencies in __release_sock Ingo Molnar
@ 2004-09-14 10:50               ` Ingo Molnar
  2004-09-14 10:56                 ` [patch] sched, mm: fix scheduling latencies in unmap_vmas() Ingo Molnar
  2004-09-14 10:59                 ` [patch] sched, mm: fix scheduling latencies in get_user_pages() Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 154 bytes --]

The attached patch does a lock-break in copy_page_range() if needed, to
reduce scheduling latencies.

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-copy_page_range.patch --]
[-- Type: text/plain, Size: 1081 bytes --]


The attached patch does a lock-break in copy_page_range() if needed, to
reduce scheduling latencies.

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/mm/memory.c.orig	
+++ linux/mm/memory.c	
@@ -337,6 +337,25 @@ cont_copy_pte_range_noset:
 				}
 				src_pte++;
 				dst_pte++;
+				/*
+				 * We are holding two locks at this point -
+				 * any of them could generate latencies in
+				 * another task on another CPU:
+				 */
+				if (need_resched() ||
+					need_lockbreak(&src->page_table_lock) ||
+					need_lockbreak(&dst->page_table_lock))
+				{
+					pte_unmap_nested(src_pte);
+					pte_unmap(dst_pte);
+					spin_unlock(&src->page_table_lock);
+					spin_unlock(&dst->page_table_lock);
+					cond_resched();
+					spin_lock(&dst->page_table_lock);
+					spin_lock(&src->page_table_lock);
+					dst_pte = pte_offset_map(dst_pmd, address);
+					src_pte = pte_offset_map_nested(src_pmd, address);
+				}
 			} while ((unsigned long)src_pte & PTE_TABLE_MASK);
 			pte_unmap_nested(src_pte-1);
 			pte_unmap(dst_pte-1);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 11:06                     ` [patch] sched, tty: fix scheduling latencies in tty_io.c Ingo Molnar
@ 2004-09-14 10:53                       ` Alan Cox
  2004-09-14 12:00                         ` Ingo Molnar
  2004-09-14 11:08                       ` [patch] sched, pty: fix scheduling latencies in pty.c Ingo Molnar
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
  2 siblings, 1 reply; 76+ messages in thread
From: Alan Cox @ 2004-09-14 10:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro

On Maw, 2004-09-14 at 12:06, Ingo Molnar wrote:
> the attached patch fixes long scheduling latencies in tty_read() and 
> tty_write() caused by the BKL.

Have you verified that none of the ldisc methods rely on the big kernel
lock. In doing the tty audit I found several cases that tty drivers
still rely on this lock so I am curious why you make this change.

Would it not be better to fix the tty layer locking rather than
introduces new random memory corruptors ?


Alan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, mm: fix scheduling latencies in unmap_vmas()
  2004-09-14 10:50               ` [patch] sched, mm: fix scheduling latencies in copy_page_range() Ingo Molnar
@ 2004-09-14 10:56                 ` Ingo Molnar
  2004-09-14 10:59                 ` [patch] sched, mm: fix scheduling latencies in get_user_pages() Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]


The attached patch fixes long latencies in unmap_vmas(). We had
lockbreak code in that function already but it did not take delayed
effects of TLB-gather into account.

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-unmap_vmas.patch --]
[-- Type: text/plain, Size: 1659 bytes --]


The attached patch fixes long latencies in unmap_vmas(). We had
lockbreak code in that function already but it did not take delayed
effects of TLB-gather into account.

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/mm/memory.c.orig	
+++ linux/mm/memory.c	
@@ -497,19 +497,15 @@ static void unmap_page_range(struct mmu_
 	tlb_end_vma(tlb, vma);
 }
 
-/* Dispose of an entire struct mmu_gather per rescheduling point */
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT)
-#define ZAP_BLOCK_SIZE	(FREE_PTE_NR * PAGE_SIZE)
-#endif
-
-/* For UP, 256 pages at a time gives nice low latency */
-#if !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT)
-#define ZAP_BLOCK_SIZE	(256 * PAGE_SIZE)
-#endif
-
+#ifdef CONFIG_PREEMPT
+/*
+ * It's not an issue to have a small zap block size - TLB flushes
+ * only happen once normally, due to the tlb->need_flush optimization.
+ */
+# define ZAP_BLOCK_SIZE	(8 * PAGE_SIZE)
+#else
 /* No preempt: go for improved straight-line efficiency */
-#if !defined(CONFIG_PREEMPT)
-#define ZAP_BLOCK_SIZE	(1024 * PAGE_SIZE)
+# define ZAP_BLOCK_SIZE	(1024 * PAGE_SIZE)
 #endif
 
 /**
@@ -584,15 +580,15 @@ int unmap_vmas(struct mmu_gather **tlbp,
 
 			start += block;
 			zap_bytes -= block;
-			if ((long)zap_bytes > 0)
-				continue;
-			if (!atomic && need_resched()) {
+			if (!atomic) {
 				int fullmm = tlb_is_full_mm(*tlbp);
 				tlb_finish_mmu(*tlbp, tlb_start, start);
 				cond_resched_lock(&mm->page_table_lock);
 				*tlbp = tlb_gather_mmu(mm, fullmm);
 				tlb_start_valid = 0;
 			}
+			if ((long)zap_bytes > 0)
+				continue;
 			zap_bytes = ZAP_BLOCK_SIZE;
 		}
 	}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, mm: fix scheduling latencies in get_user_pages()
  2004-09-14 10:50               ` [patch] sched, mm: fix scheduling latencies in copy_page_range() Ingo Molnar
  2004-09-14 10:56                 ` [patch] sched, mm: fix scheduling latencies in unmap_vmas() Ingo Molnar
@ 2004-09-14 10:59                 ` Ingo Molnar
  2004-09-14 11:02                   ` [patch] sched, mm: fix scheduling latencies in filemap_sync() Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 10:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 126 bytes --]


The attached patch fixes long scheduling latencies in get_user_pages().

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-get_user_pages.patch --]
[-- Type: text/plain, Size: 503 bytes --]


The attached patch fixes long scheduling latencies in get_user_pages().

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/mm/memory.c.orig	
+++ linux/mm/memory.c	
@@ -781,6 +781,8 @@ int get_user_pages(struct task_struct *t
 		do {
 			struct page *map;
 			int lookup_write = write;
+
+			cond_resched_lock(&mm->page_table_lock);
 			while (!(map = follow_page(mm, start, lookup_write))) {
 				/*
 				 * Shortcut for anonymous pages. We don't want

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, mm: fix scheduling latencies in filemap_sync()
  2004-09-14 10:59                 ` [patch] sched, mm: fix scheduling latencies in get_user_pages() Ingo Molnar
@ 2004-09-14 11:02                   ` Ingo Molnar
  2004-09-14 11:06                     ` [patch] sched, tty: fix scheduling latencies in tty_io.c Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 151 bytes --]


The attached patch, written by Andrew Morton, fixes long scheduling
latencies in filemap_sync().

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-filemap_sync.patch --]
[-- Type: text/plain, Size: 1320 bytes --]


The attached patch, written by Andrew Morton, fixes long scheduling
latencies in filemap_sync().

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/mm/msync.c.orig	
+++ linux/mm/msync.c	
@@ -92,8 +92,8 @@ static inline int filemap_sync_pmd_range
 	return error;
 }
 
-static int filemap_sync(struct vm_area_struct * vma, unsigned long address,
-	size_t size, unsigned int flags)
+static int __filemap_sync(struct vm_area_struct *vma, unsigned long address,
+			size_t size, unsigned int flags)
 {
 	pgd_t * dir;
 	unsigned long end = address + size;
@@ -131,6 +131,31 @@ static int filemap_sync(struct vm_area_s
 	return error;
 }
 
+#ifdef CONFIG_PREEMPT
+static int filemap_sync(struct vm_area_struct *vma, unsigned long address,
+			size_t size, unsigned int flags)
+{
+	const size_t chunk = 64 * 1024;	/* bytes */
+	int error = 0;
+
+	while (size) {
+		size_t sz = min(size, chunk);
+
+		error |= __filemap_sync(vma, address, sz, flags);
+		cond_resched();
+		address += sz;
+		size -= sz;
+	}
+	return error;
+}
+#else
+static int filemap_sync(struct vm_area_struct *vma, unsigned long address,
+			size_t size, unsigned int flags)
+{
+	return __filemap_sync(vma, address, size, flags);
+}
+#endif
+
 /*
  * MS_SYNC syncs the entire file - including mappings.
  *

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 11:02                   ` [patch] sched, mm: fix scheduling latencies in filemap_sync() Ingo Molnar
@ 2004-09-14 11:06                     ` Ingo Molnar
  2004-09-14 10:53                       ` Alan Cox
                                         ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]


the attached patch fixes long scheduling latencies in tty_read() and 
tty_write() caused by the BKL.

has been tested as part of the -VP patchset and the tty_read() side 
has also been tested in earlier -mm trees.

	Ingo

[-- Attachment #2: fix-latency-tty_io.patch --]
[-- Type: text/plain, Size: 1129 bytes --]


the attached patch fixes long scheduling latencies in tty_read() and
tty_write() caused by the BKL.

has been tested as part of the -VP patchset and the tty_read() side
has also been tested in earlier -mm trees.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/drivers/char/tty_io.c.orig	
+++ linux/drivers/char/tty_io.c	
@@ -660,12 +660,10 @@ static ssize_t tty_read(struct file * fi
 	if (!tty || (test_bit(TTY_IO_ERROR, &tty->flags)))
 		return -EIO;
 
-	lock_kernel();
 	if (tty->ldisc.read)
 		i = (tty->ldisc.read)(tty,file,buf,count);
 	else
 		i = -EIO;
-	unlock_kernel();
 	if (i > 0)
 		inode->i_atime = CURRENT_TIME;
 	return i;
@@ -688,17 +686,13 @@ static inline ssize_t do_tty_write(
 		return -ERESTARTSYS;
 	}
 	if ( test_bit(TTY_NO_WRITE_SPLIT, &tty->flags) ) {
-		lock_kernel();
 		written = write(tty, file, buf, count);
-		unlock_kernel();
 	} else {
 		for (;;) {
 			unsigned long size = max((unsigned long)PAGE_SIZE*2, 16384UL);
 			if (size > count)
 				size = count;
-			lock_kernel();
 			ret = write(tty, file, buf, size);
-			unlock_kernel();
 			if (ret <= 0)
 				break;
 			written += ret;

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched, pty: fix scheduling latencies in pty.c
  2004-09-14 11:06                     ` [patch] sched, tty: fix scheduling latencies in tty_io.c Ingo Molnar
  2004-09-14 10:53                       ` Alan Cox
@ 2004-09-14 11:08                       ` Ingo Molnar
  2004-09-14 11:12                         ` [patch] might_sleep() additions to fs-writeback.c Ingo Molnar
  2004-09-14 11:25                         ` [patch] fix keventd execution dependency Ingo Molnar
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
  2 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 162 bytes --]


the attached patch fixes long scheduling latencies in pty_read()caused
by the BKL.

has been tested as part of the -VP patchset and in earlier -mm trees.

	Ingo

[-- Attachment #2: fix-latency-pty.patch --]
[-- Type: text/plain, Size: 474 bytes --]


the attached patch fixes long scheduling latencies in pty_read()caused
by the BKL.

has been tested as part of the -VP patchset and in earlier -mm trees.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/drivers/char/pty.c.orig	
+++ linux/drivers/char/pty.c	
@@ -139,6 +139,7 @@ static int pty_write(struct tty_struct *
 			c     += n;
 			count -= n;
 			to->ldisc.receive_buf(to, temp_buffer, NULL, n);
+			cond_resched();
 		}
 		up(&tty->flip.pty_sem);
 	} else {

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] might_sleep() additions to fs-writeback.c
  2004-09-14 11:08                       ` [patch] sched, pty: fix scheduling latencies in pty.c Ingo Molnar
@ 2004-09-14 11:12                         ` Ingo Molnar
  2004-09-14 11:25                         ` [patch] fix keventd execution dependency Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]


add two more might_sleep() checks: to sync_inode() and
generic_osync_inode().

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/fs-writeback.c.orig	
+++ linux/fs/fs-writeback.c	
@@ -578,6 +578,7 @@ int sync_inode(struct inode *inode, stru
 {
 	int ret;
 
+	might_sleep();
 	spin_lock(&inode_lock);
 	ret = __writeback_single_inode(inode, wbc);
 	spin_unlock(&inode_lock);
@@ -622,6 +623,7 @@ int generic_osync_inode(struct inode *in
 	}
 	current->flags &= ~PF_SYNCWRITE;
 
+	might_sleep();
 	spin_lock(&inode_lock);
 	if ((inode->i_state & I_DIRTY) &&
 	    ((what & OSYNC_INODE) || (inode->i_state & I_DIRTY_DATASYNC)))

[-- Attachment #2: might_sleep-fs-writeback.patch --]
[-- Type: text/plain, Size: 631 bytes --]


add two more might_sleep() checks: to sync_inode() and
generic_osync_inode().

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/fs-writeback.c.orig	
+++ linux/fs/fs-writeback.c	
@@ -578,6 +578,7 @@ int sync_inode(struct inode *inode, stru
 {
 	int ret;
 
+	might_sleep();
 	spin_lock(&inode_lock);
 	ret = __writeback_single_inode(inode, wbc);
 	spin_unlock(&inode_lock);
@@ -622,6 +623,7 @@ int generic_osync_inode(struct inode *in
 	}
 	current->flags &= ~PF_SYNCWRITE;
 
+	might_sleep();
 	spin_lock(&inode_lock);
 	if ((inode->i_state & I_DIRTY) &&
 	    ((what & OSYNC_INODE) || (inode->i_state & I_DIRTY_DATASYNC)))

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 12:00                         ` Ingo Molnar
@ 2004-09-14 11:18                           ` Alan Cox
  2004-09-14 12:27                             ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Alan Cox @ 2004-09-14 11:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro

On Maw, 2004-09-14 at 13:00, Ingo Molnar wrote:
> > Would it not be better to fix the tty layer locking rather than
> > introduces new random memory corruptors ?
> 
> sure ... any volunteers?

Not that I've seen. I'm fixing some locking but not stuff that depends
on lock_kernel(). In the meantime therefore it seems inappropriate to
add random corruptors and bugs to the kernel in pursuit of low latency
other than as private "add this but beware" patches.

The tty I/O patches should not be applied until the tty layer doesn't
rely on the lock_kernel locking. So if people want the low latency stuff
covering the tty code I suggest that rather than making other peoples
machines crash more, they volunteer. The console driver might be a good
starting point.

Alan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] fix keventd execution dependency
  2004-09-14 11:08                       ` [patch] sched, pty: fix scheduling latencies in pty.c Ingo Molnar
  2004-09-14 11:12                         ` [patch] might_sleep() additions to fs-writeback.c Ingo Molnar
@ 2004-09-14 11:25                         ` Ingo Molnar
  2004-09-15 22:18                           ` Rusty Russell
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]


We dont want to execute off keventd since it might hold a semaphore our
callers hold too. This can happen when kthread_create() is called from
within keventd. This happened due to the IRQ threading patches but it
could happen with other code too.

	Ingo

[-- Attachment #2: fix-keventd.patch --]
[-- Type: text/plain, Size: 1425 bytes --]


We dont want to execute off keventd since it might hold a semaphore our
callers hold too. This can happen when kthread_create() is called from
within keventd. This happened due to the IRQ threading patches but it
could happen with other code too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/kernel/kthread.c.orig	
+++ linux/kernel/kthread.c	
@@ -14,6 +14,12 @@
 #include <linux/module.h>
 #include <asm/semaphore.h>
 
+/*
+ * We dont want to execute off keventd since it might
+ * hold a semaphore our callers hold too:
+ */
+static struct workqueue_struct *helper_wq;
+
 struct kthread_create_info
 {
 	/* Information passed to kthread() from keventd. */
@@ -126,12 +132,13 @@ struct task_struct *kthread_create(int (
 	init_completion(&create.started);
 	init_completion(&create.done);
 
-	/* If we're being called to start the first workqueue, we
-	 * can't use keventd. */
-	if (!keventd_up())
+	/*
+	 * The workqueue needs to start up first:
+	 */
+	if (!helper_wq)
 		work.func(work.data);
 	else {
-		schedule_work(&work);
+		queue_work(helper_wq, &work);
 		wait_for_completion(&create.done);
 	}
 	if (!IS_ERR(create.result)) {
@@ -183,3 +190,13 @@ int kthread_stop(struct task_struct *k)
 	return ret;
 }
 EXPORT_SYMBOL(kthread_stop);
+
+static __init int helper_init(void)
+{
+	helper_wq = create_singlethread_workqueue("kthread");
+	BUG_ON(!helper_wq);
+
+	return 0;
+}
+core_initcall(helper_init);
+

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: fix scheduling latencies in mttr.c
  2004-09-14 11:06                     ` [patch] sched, tty: fix scheduling latencies in tty_io.c Ingo Molnar
  2004-09-14 10:53                       ` Alan Cox
  2004-09-14 11:08                       ` [patch] sched, pty: fix scheduling latencies in pty.c Ingo Molnar
@ 2004-09-14 11:28                       ` Ingo Molnar
  2004-09-14 11:32                         ` [patch] sched: fix scheduling latencies in vgacon.c Ingo Molnar
                                           ` (3 more replies)
  2 siblings, 4 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


fix scheduling latencies in the MTRR-setting codepath.
Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: fix scheduling latencies in vgacon.c
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
@ 2004-09-14 11:32                         ` Ingo Molnar
  2004-09-14 11:35                         ` [patch] sched: fix scheduling latencies in NTFS mount Ingo Molnar
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 197 bytes --]


this patch fixes scheduling latencies in vgacon_do_font_op(). The code
is protected by vga_lock already so it's safe to drop (and re-acquire)
the BKL.

has been tested in the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-vgacon.patch --]
[-- Type: text/plain, Size: 906 bytes --]


this patch fixes scheduling latencies in vgacon_do_font_op(). The code
is protected by vga_lock already so it's safe to drop (and re-acquire)
the BKL.

has been tested in the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/drivers/video/console/vgacon.c.orig	
+++ linux/drivers/video/console/vgacon.c	
@@ -49,6 +49,7 @@
 #include <linux/spinlock.h>
 #include <linux/ioport.h>
 #include <linux/init.h>
+#include <linux/smp_lock.h>
 #include <video/vga.h>
 #include <asm/io.h>
 
@@ -763,6 +764,7 @@ static int vgacon_do_font_op(struct vgas
 		charmap += 4 * cmapsz;
 #endif
 
+	unlock_kernel();
 	spin_lock_irq(&vga_lock);
 	/* First, the Sequencer */
 	vga_wseq(state->vgabase, VGA_SEQ_RESET, 0x1);
@@ -848,6 +850,7 @@ static int vgacon_do_font_op(struct vgas
 		vga_wattr(state->vgabase, VGA_AR_ENABLE_DISPLAY, 0);	
 	}
 	spin_unlock_irq(&vga_lock);
+	lock_kernel();
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: fix scheduling latencies in NTFS mount
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
  2004-09-14 11:32                         ` [patch] sched: fix scheduling latencies in vgacon.c Ingo Molnar
@ 2004-09-14 11:35                         ` Ingo Molnar
  2004-09-14 13:31                           ` Anton Altaparmakov
  2004-09-14 11:42                         ` [patch] sched: fix scheduling latencies for !PREEMPT kernels Ingo Molnar
  2004-09-14 13:25                         ` [patch] sched: fix scheduling latencies in mtrr.c Ingo Molnar
  3 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 226 bytes --]


this patch fixes scheduling latencies in the NTFS mount path. We are
dropping the BKL because the code itself is using the ntfs_lock
semaphore which provides safe locking.

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-ntfs-mount.patch --]
[-- Type: text/plain, Size: 1174 bytes --]


this patch fixes scheduling latencies in the NTFS mount path. We are
dropping the BKL because the code itself is using the ntfs_lock
semaphore which provides safe locking.

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/ntfs/super.c.orig	
+++ linux/fs/ntfs/super.c	
@@ -29,6 +29,7 @@
 #include <linux/buffer_head.h>
 #include <linux/vfs.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 
 #include "ntfs.h"
 #include "sysctl.h"
@@ -2288,6 +2289,8 @@ static int ntfs_fill_super(struct super_
 	vol->fmask = 0177;
 	vol->dmask = 0077;
 
+	unlock_kernel();
+
 	/* Important to get the mount options dealt with now. */
 	if (!parse_options(vol, (char*)opt))
 		goto err_out_now;
@@ -2424,6 +2427,7 @@ static int ntfs_fill_super(struct super_
 		}
 		up(&ntfs_lock);
 		sb->s_export_op = &ntfs_export_ops;
+		lock_kernel();
 		return 0;
 	}
 	ntfs_error(sb, "Failed to allocate root directory.");
@@ -2527,6 +2531,7 @@ iput_tmp_ino_err_out_now:
 	}
 	/* Errors at this stage are irrelevant. */
 err_out_now:
+	lock_kernel();
 	sb->s_fs_info = NULL;
 	kfree(vol);
 	ntfs_debug("Failed, returning -EINVAL.");

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
  2004-09-14 11:32                         ` [patch] sched: fix scheduling latencies in vgacon.c Ingo Molnar
  2004-09-14 11:35                         ` [patch] sched: fix scheduling latencies in NTFS mount Ingo Molnar
@ 2004-09-14 11:42                         ` Ingo Molnar
  2004-09-14 12:55                           ` Nick Piggin
  2004-09-14 13:25                         ` [patch] sched: fix scheduling latencies in mtrr.c Ingo Molnar
  3 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 11:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 412 bytes --]


this patch adds a handful of cond_resched() points to a number of
key, scheduling-latency related non-inlined functions.

this reduces preemption latency for !PREEMPT kernels. These are
scheduling points complementary to PREEMPT_VOLUNTARY scheduling
points (might_sleep() places) - i.e. these are all points where
an explicit cond_resched() had to be added.

has been tested as part of the -VP patchset.

	Ingo

[-- Attachment #2: fix-latency-nopreempt.patch --]
[-- Type: text/plain, Size: 2843 bytes --]


this patch adds a handful of cond_resched() points to a number of
key, scheduling-latency related non-inlined functions.

this reduces preemption latency for !PREEMPT kernels. These are
scheduling points complementary to PREEMPT_VOLUNTARY scheduling
points (might_sleep() places) - i.e. these are all points where
an explicit cond_resched() had to be added.

has been tested as part of the -VP patchset.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/exec.c.orig	
+++ linux/fs/exec.c	
@@ -184,6 +184,7 @@ static int count(char __user * __user * 
 			argv++;
 			if(++i > max)
 				return -E2BIG;
+			cond_resched();
 		}
 	}
 	return i;
--- linux/fs/fs-writeback.c.orig	
+++ linux/fs/fs-writeback.c	
@@ -366,6 +366,7 @@ sync_sb_inodes(struct super_block *sb, s
 			list_move(&inode->i_list, &sb->s_dirty);
 		}
 		spin_unlock(&inode_lock);
+		cond_resched();
 		iput(inode);
 		spin_lock(&inode_lock);
 		if (wbc->nr_to_write <= 0)
--- linux/fs/select.c.orig	
+++ linux/fs/select.c	
@@ -239,6 +239,7 @@ int do_select(int n, fd_set_bits *fds, l
 						retval++;
 					}
 				}
+				cond_resched();
 			}
 			if (res_in)
 				*rinp = res_in;
--- linux/kernel/printk.c.orig	
+++ linux/kernel/printk.c	
@@ -280,6 +280,7 @@ int do_syslog(int type, char __user * bu
 			error = __put_user(c,buf);
 			buf++;
 			i++;
+			cond_resched();
 			spin_lock_irq(&logbuf_lock);
 		}
 		spin_unlock_irq(&logbuf_lock);
@@ -321,6 +322,7 @@ int do_syslog(int type, char __user * bu
 			c = LOG_BUF(j);
 			spin_unlock_irq(&logbuf_lock);
 			error = __put_user(c,&buf[count-1-i]);
+			cond_resched();
 			spin_lock_irq(&logbuf_lock);
 		}
 		spin_unlock_irq(&logbuf_lock);
@@ -336,6 +338,7 @@ int do_syslog(int type, char __user * bu
 					error = -EFAULT;
 					break;
 				}
+				cond_resched();
 			}
 		}
 		break;
--- linux/mm/memory.c.orig	
+++ linux/mm/memory.c	
@@ -1516,6 +1516,7 @@ do_no_page(struct mm_struct *mm, struct 
 	}
 	smp_rmb();  /* Prevent CPU from reordering lock-free ->nopage() */
 retry:
+	cond_resched();
 	new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);
 
 	/* no page was available -- either SIGBUS or OOM */
--- linux/mm/slab.c.orig	
+++ linux/mm/slab.c	
@@ -2793,7 +2793,7 @@ static void cache_reap(void *unused)
 next_unlock:
 		spin_unlock_irq(&searchp->spinlock);
 next:
-		;
+		cond_resched();
 	}
 	check_irq_on();
 	up(&cache_chain_sem);
--- linux/mm/vmscan.c.orig	
+++ linux/mm/vmscan.c	
@@ -361,6 +361,8 @@ static int shrink_list(struct list_head 
 		int may_enter_fs;
 		int referenced;
 
+		cond_resched();
+
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
@@ -710,6 +712,7 @@ refill_inactive_zone(struct zone *zone, 
 		reclaim_mapped = 1;
 
 	while (!list_empty(&l_hold)) {
+		cond_resched();
 		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
 		if (page_mapped(page)) {

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 10:53                       ` Alan Cox
@ 2004-09-14 12:00                         ` Ingo Molnar
  2004-09-14 11:18                           ` Alan Cox
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 12:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Maw, 2004-09-14 at 12:06, Ingo Molnar wrote:
> > the attached patch fixes long scheduling latencies in tty_read() and 
> > tty_write() caused by the BKL.
> 
> Have you verified that none of the ldisc methods rely on the big
> kernel lock. In doing the tty audit I found several cases that tty
> drivers still rely on this lock so I am curious why you make this
> change.
> 
> Would it not be better to fix the tty layer locking rather than
> introduces new random memory corruptors ?

sure ... any volunteers?

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 12:27                             ` Ingo Molnar
@ 2004-09-14 12:11                               ` Alan Cox
  0 siblings, 0 replies; 76+ messages in thread
From: Alan Cox @ 2004-09-14 12:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro

On Maw, 2004-09-14 at 13:27, Ingo Molnar wrote:
> would it be a sufficient fix to grab some sort of tty_sem mutex in the
> places where the patch drops the BKL - or are there other entry paths to
> this code?

There are a whole load of entry points. I've been trying to document all
the ldisc layer stuff as I fix the basic bugs in it. A semaphore isnt
sufficient because some of the entry points have to be multi-entry so
you'd need to go modify all the ldisc internals (I think that will have
to happen eventually). It also relies on it for read v write locking
still.

So far all I've fixed is the vfs/ldisc locking to ensure that open/close
and other events are properly sequenced. I've not yet finished tackling
the ldisc/driver cases where driver->close() completes and the ldisc
calls the driver still.

At that point its at least then down to the drivers to get fixed.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched, tty: fix scheduling latencies in tty_io.c
  2004-09-14 11:18                           ` Alan Cox
@ 2004-09-14 12:27                             ` Ingo Molnar
  2004-09-14 12:11                               ` Alan Cox
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 12:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Maw, 2004-09-14 at 13:00, Ingo Molnar wrote:
> > > Would it not be better to fix the tty layer locking rather than
> > > introduces new random memory corruptors ?
> > 
> > sure ... any volunteers?
> 
> Not that I've seen. I'm fixing some locking but not stuff that depends
> on lock_kernel(). [...]

would it be a sufficient fix to grab some sort of tty_sem mutex in the
places where the patch drops the BKL - or are there other entry paths to
this code?

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 11:42                         ` [patch] sched: fix scheduling latencies for !PREEMPT kernels Ingo Molnar
@ 2004-09-14 12:55                           ` Nick Piggin
  2004-09-14 13:22                             ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Nick Piggin @ 2004-09-14 12:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Ingo Molnar wrote:

> --- linux/mm/vmscan.c.orig	
> +++ linux/mm/vmscan.c	
> @@ -361,6 +361,8 @@ static int shrink_list(struct list_head 
>  		int may_enter_fs;
>  		int referenced;
>  
> +		cond_resched();
> +
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> @@ -710,6 +712,7 @@ refill_inactive_zone(struct zone *zone, 
>  		reclaim_mapped = 1;
>  
>  	while (!list_empty(&l_hold)) {
> +		cond_resched();
>  		page = lru_to_page(&l_hold);
>  		list_del(&page->lru);
>  		if (page_mapped(page)) {


Could these ones go up a level? We break down scanning into 32 page
chunks, so I don't think it needs to be checked every page.

in shrink_zone(), maybe stick them:

         while (nr_active || nr_inactive) {
                 if (nr_active) {
---> here
                         sc->nr_to_scan = min(nr_active,
                                         (unsigned long)SWAP_CLUSTER_MAX);
                         nr_active -= sc->nr_to_scan;
                         refill_inactive_zone(zone, sc);
                 }

                 if (nr_inactive) {
---> and here
                         sc->nr_to_scan = min(nr_inactive,
                                         (unsigned long)SWAP_CLUSTER_MAX);
                         nr_inactive -= sc->nr_to_scan;
                         shrink_cache(zone, sc);
                         if (sc->nr_to_reclaim <= 0)
                                 break;
                 }
         }

Does that work for you?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies in mtrr.c
  2004-09-14 13:25                         ` [patch] sched: fix scheduling latencies in mtrr.c Ingo Molnar
@ 2004-09-14 13:15                           ` Alan Cox
  2004-09-14 15:00                             ` Ingo Molnar
  2004-09-14 18:22                           ` Zwane Mwaikambo
  1 sibling, 1 reply; 76+ messages in thread
From: Alan Cox @ 2004-09-14 13:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel Mailing List

On Maw, 2004-09-14 at 14:25, Ingo Molnar wrote:
> caveat: one of the wbinvd() removals is correct i believe, but the other
> one might be problematic. It doesnt seem logical to do a wbinvd() at
> that point though ...

See the intel ppro manual volume 3 page 11-25. Its quite specific about
the sequence, so unless anything changes with AMD or later processors
the change seems to match the description.

IRQs are required to be off far longer than this sequence according to
the docs however, and PGE is supposed to be cleared too.

Step 11: set CD flag
Step 12: wbinvd
Step 13: set pge in CR4 if previously cleared



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 12:55                           ` Nick Piggin
@ 2004-09-14 13:22                             ` Ingo Molnar
  2004-09-14 13:33                               ` Nick Piggin
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 13:22 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Could these ones go up a level? We break down scanning into 32 page
> chunks, so I don't think it needs to be checked every page.

not really - we can occasionally get into high latencies even with a
single page - if a single page is mapped by alot of processes.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies in mtrr.c
  2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
                                           ` (2 preceding siblings ...)
  2004-09-14 11:42                         ` [patch] sched: fix scheduling latencies for !PREEMPT kernels Ingo Molnar
@ 2004-09-14 13:25                         ` Ingo Molnar
  2004-09-14 13:15                           ` Alan Cox
  2004-09-14 18:22                           ` Zwane Mwaikambo
  3 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 13:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]


> fix scheduling latencies in the MTRR-setting codepath.
> Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!

patch attached ...

caveat: one of the wbinvd() removals is correct i believe, but the other
one might be problematic. It doesnt seem logical to do a wbinvd() at
that point though ...

	Ingo

[-- Attachment #2: fix-latency-mtrr.patch --]
[-- Type: text/plain, Size: 1392 bytes --]


fix scheduling latencies in the MTRR-setting codepath.
Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/kernel/cpu/mtrr/generic.c.orig	
+++ linux/arch/i386/kernel/cpu/mtrr/generic.c	
@@ -240,11 +240,14 @@ static void prepare_set(void)
 	/*  Note that this is not ideal, since the cache is only flushed/disabled
 	   for this CPU while the MTRRs are changed, but changing this requires
 	   more invasive changes to the way the kernel boots  */
-	spin_lock(&set_atomicity_lock);
+	/*
+	 * Since we are disabling the cache dont allow any interrupts - they
+	 * would run extremely slow and would only increase the pain:
+	 */
+	spin_lock_irq(&set_atomicity_lock);
 
 	/*  Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
 	cr0 = read_cr0() | 0x40000000;	/* set CD flag */
-	wbinvd();
 	write_cr0(cr0);
 	wbinvd();
 
@@ -266,8 +269,7 @@ static void prepare_set(void)
 
 static void post_set(void)
 {
-	/*  Flush caches and TLBs  */
-	wbinvd();
+	/*  Flush TLBs (no need to flush caches - they are disabled)  */
 	__flush_tlb();
 
 	/* Intel (P6) standard MTRRs */
@@ -279,7 +281,7 @@ static void post_set(void)
 	/*  Restore value of CR4  */
 	if ( cpu_has_pge )
 		write_cr4(cr4);
-	spin_unlock(&set_atomicity_lock);
+	spin_unlock_irq(&set_atomicity_lock);
 }
 
 static void generic_set_all(void)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies in NTFS mount
  2004-09-14 11:35                         ` [patch] sched: fix scheduling latencies in NTFS mount Ingo Molnar
@ 2004-09-14 13:31                           ` Anton Altaparmakov
  0 siblings, 0 replies; 76+ messages in thread
From: Anton Altaparmakov @ 2004-09-14 13:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, lkml, Alexander Viro

On Tue, 2004-09-14 at 12:35, Ingo Molnar wrote:
> this patch fixes scheduling latencies in the NTFS mount path. We are
> dropping the BKL because the code itself is using the ntfs_lock
> semaphore which provides safe locking.
> 
> has been tested as part of the -VP patchset.

Looks good.  I designed the NTFS driver to not rely on the BKL at all so
you should be able to drop it anywhere you like.  (-:

I have applied your patch to my ntfs-2.6-devel bk tree so it doesn't get
lost.

Thanks,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 13:22                             ` Ingo Molnar
@ 2004-09-14 13:33                               ` Nick Piggin
  2004-09-14 14:09                                 ` Andrea Arcangeli
  2004-09-14 14:54                                 ` Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Nick Piggin @ 2004-09-14 13:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>Could these ones go up a level? We break down scanning into 32 page
>>chunks, so I don't think it needs to be checked every page.
> 
> 
> not really - we can occasionally get into high latencies even with a
> single page - if a single page is mapped by alot of processes.
> 

So doing it in the loop doesn't really give you a deterministic
maximum latency if somebody is out to cause trouble, does it?

OTOH, I guess libc or some shared memory segment may be mapped
into a lot of processes even on RT applictions.

Another thing, I don't mean this to sound like a rhetorical question,
but if we have a preemptible kernel, why is it a good idea to sprinkle
cond_rescheds everywhere? Isn't this now the worst of both worlds?
Why would someone who really cares about latency not enable preempt?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 13:33                               ` Nick Piggin
@ 2004-09-14 14:09                                 ` Andrea Arcangeli
  2004-09-14 14:28                                   ` Nick Piggin
  2004-09-14 16:31                                   ` William Lee Irwin III
  2004-09-14 14:54                                 ` Ingo Molnar
  1 sibling, 2 replies; 76+ messages in thread
From: Andrea Arcangeli @ 2004-09-14 14:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
> cond_rescheds everywhere? Isn't this now the worst of both worlds?

1) cond_resched should become a noop if CONFIG_PREEMPT=y
   (cond_resched_lock of course should still unlock/relock if
    need_resched() is set, but not __cond_resched).
2) all Ingo's new and old might_sleep should be converted to
   cond_resched (or optionally to cond_resched_costly, see point 5).
3) might_sleep should return a debug statement.
4) cond_resched should call might_sleep if need_resched is not set if
   CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
   if CONFIG_PREEMPT=y after we implement point 1.
5) no further config option should exist (if we really add an option
   it should be called CONFIG_COND_RESCHED_COSTLY of similar to 
   differentiate scheduling points in fast paths (like spinlock places
   with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
   cond_resched_costly())

I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
implicit need_resched() checks anyways).

> Why would someone who really cares about latency not enable preempt?

to avoid lots of worthless cond_resched in all spin_unlock and to avoid
kernel crashes if some driver is not preempt complaint?

I've a better question for you, why would someone ever disable
CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
can tell. If something it should be renamed to
"CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 14:09                                 ` Andrea Arcangeli
@ 2004-09-14 14:28                                   ` Nick Piggin
  2004-09-14 15:03                                     ` Andrea Arcangeli
  2004-09-14 16:31                                   ` William Lee Irwin III
  1 sibling, 1 reply; 76+ messages in thread
From: Nick Piggin @ 2004-09-14 14:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
> 
>>cond_rescheds everywhere? Isn't this now the worst of both worlds?
> 
> 
> 1) cond_resched should become a noop if CONFIG_PREEMPT=y
>    (cond_resched_lock of course should still unlock/relock if
>     need_resched() is set, but not __cond_resched).

Unfortunately we need to keep the cond_rescheds that are called under
the bkl. Otherwise yes, this would be nice to be able to do.

> 2) all Ingo's new and old might_sleep should be converted to
>    cond_resched (or optionally to cond_resched_costly, see point 5).
> 3) might_sleep should return a debug statement.
> 4) cond_resched should call might_sleep if need_resched is not set if
>    CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
>    if CONFIG_PREEMPT=y after we implement point 1.
> 5) no further config option should exist (if we really add an option
>    it should be called CONFIG_COND_RESCHED_COSTLY of similar to 
>    differentiate scheduling points in fast paths (like spinlock places
>    with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
>    cond_resched_costly())
> 
> I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
> point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
> implicit need_resched() checks anyways).
> 

Which is why we don't need more of them ;)

> 
>>Why would someone who really cares about latency not enable preempt?
> 
> 
> to avoid lots of worthless cond_resched in all spin_unlock and to avoid
> kernel crashes if some driver is not preempt complaint?
> 

Well I don't know how good an argument the crashes one is these days,
but generally (as far as I know) those who really care about latency
shouldn't mind about some extra overheads.

Now I don't disagree with some cond_rescheds for places where !PREEMPT
latency would otherwise be massive, but cases like doing cond_resched
for every page in the scanner is something that you could say is imposing
overhead on people who *don't* want it (ie !PREEMPT).

> I've a better question for you, why would someone ever disable
> CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
> can tell. If something it should be renamed to
> "CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)
> 

:) I don't think Ingo intended this for merging as is. Maybe it is to
test how much progress he has made.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 13:33                               ` Nick Piggin
  2004-09-14 14:09                                 ` Andrea Arcangeli
@ 2004-09-14 14:54                                 ` Ingo Molnar
  2004-09-14 22:55                                   ` Nick Piggin
  2004-09-15  0:35                                   ` Lee Revell
  1 sibling, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 14:54 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Another thing, I don't mean this to sound like a rhetorical question,
> but if we have a preemptible kernel, why is it a good idea to sprinkle
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
> would someone who really cares about latency not enable preempt?

two things:

1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
even SuSE. This is pretty telling.

2) 10 new cond_resched()'s are not precisely 'sprinkle everywhere'.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies in mtrr.c
  2004-09-14 13:15                           ` Alan Cox
@ 2004-09-14 15:00                             ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-14 15:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linux Kernel Mailing List


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Maw, 2004-09-14 at 14:25, Ingo Molnar wrote:
> > caveat: one of the wbinvd() removals is correct i believe, but the other
> > one might be problematic. It doesnt seem logical to do a wbinvd() at
> > that point though ...
> 
> See the intel ppro manual volume 3 page 11-25. Its quite specific
> about the sequence, so unless anything changes with AMD or later
> processors the change seems to match the description.
> 
> IRQs are required to be off far longer than this sequence according to
> the docs however, and PGE is supposed to be cleared too.

the problem is, we've had IRQs on in all of the 2.6 kernels so any
argument about extra wbinvd brings the obvious question: 'why did it
work so far'?

the only wbinvd that makes sense is the one you quote:

> Step 11: set CD flag
> Step 12: wbinvd
> Step 13: set pge in CR4 if previously cleared

and i agree that the pge thing should be fixed too.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 14:28                                   ` Nick Piggin
@ 2004-09-14 15:03                                     ` Andrea Arcangeli
  2004-09-14 18:05                                       ` Robert Love
  2004-09-15  1:02                                       ` Lee Revell
  0 siblings, 2 replies; 76+ messages in thread
From: Andrea Arcangeli @ 2004-09-14 15:03 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Wed, Sep 15, 2004 at 12:28:49AM +1000, Nick Piggin wrote:
> Andrea Arcangeli wrote:
> >On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
> >
> >>cond_rescheds everywhere? Isn't this now the worst of both worlds?
> >
> >
> >1) cond_resched should become a noop if CONFIG_PREEMPT=y
> >   (cond_resched_lock of course should still unlock/relock if
> >    need_resched() is set, but not __cond_resched).
> 
> Unfortunately we need to keep the cond_rescheds that are called under
> the bkl. Otherwise yes, this would be nice to be able to do.

we simply need a cond_resched_bkl() for that, no? Very few places are
still serialized with the BKL, so I don't think it would be a big issue
to convert those few places to use cond_resched_bkl.

> Which is why we don't need more of them ;)

eheh ;)

> Well I don't know how good an argument the crashes one is these days,
> but generally (as far as I know) those who really care about latency
> shouldn't mind about some extra overheads.

sure, that's especially true for the hardirq and softirq total scheduler
offloading. The real question is where a generic desktop positions. I
doubt on a generic desktop a latency over 1msec matters much,
top performance of repetitive tasks that sums up like hardirqs for a NIC
sounds more sensible to me.

And for the other usages RTAI or any other hard realtime sounds safer
anyways.

> Now I don't disagree with some cond_rescheds for places where !PREEMPT
> latency would otherwise be massive, but cases like doing cond_resched
> for every page in the scanner is something that you could say is imposing
> overhead on people who *don't* want it (ie !PREEMPT).

definitely. Note that this could be simply fixed by having a
CONFIG_PREEMPT around it, but the real fix is definitely to make
cond_resched a noop with PREEMPT=y and secondly to add a
cond_resched_bkl defined as the current cond_resched as suggested above.

> :) I don't think Ingo intended this for merging as is. Maybe it is to
> test how much progress he has made.

I hope so, he said the latest patches were cleaned up and he removed the
debugging/performance cruft (the most explicit message was
20040719115952.GA13564@elte.hu) but it's still not clear to me if he
left the CONFIG_VOLOUNTARY_PREEMPT config option because he intends to
merge it or just temporarily. Comments?

Thanks.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 14:09                                 ` Andrea Arcangeli
  2004-09-14 14:28                                   ` Nick Piggin
@ 2004-09-14 16:31                                   ` William Lee Irwin III
  2004-09-14 16:39                                     ` Andrea Arcangeli
  1 sibling, 1 reply; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14 16:31 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, Sep 14, 2004 at 04:09:05PM +0200, Andrea Arcangeli wrote:
> 1) cond_resched should become a noop if CONFIG_PREEMPT=y
>    (cond_resched_lock of course should still unlock/relock if
>     need_resched() is set, but not __cond_resched).
> 2) all Ingo's new and old might_sleep should be converted to
>    cond_resched (or optionally to cond_resched_costly, see point 5).
> 3) might_sleep should return a debug statement.
> 4) cond_resched should call might_sleep if need_resched is not set if
>    CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
>    if CONFIG_PREEMPT=y after we implement point 1.
> 5) no further config option should exist (if we really add an option
>    it should be called CONFIG_COND_RESCHED_COSTLY of similar to 
>    differentiate scheduling points in fast paths (like spinlock places
>    with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
>    cond_resched_costly())
> I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
> point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
> implicit need_resched() checks anyways).

The might_sleep() in cond_resched() sounds particularly useful to pick
up misapplications of cond_resched().


On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
>> Why would someone who really cares about latency not enable preempt?
>> cond_rescheds everywhere? Isn't this now the worst of both worlds?

On Tue, Sep 14, 2004 at 04:09:05PM +0200, Andrea Arcangeli wrote:
> to avoid lots of worthless cond_resched in all spin_unlock and to avoid
> kernel crashes if some driver is not preempt complaint?
> I've a better question for you, why would someone ever disable
> CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
> can tell. If something it should be renamed to
> "CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)

Well, thankfully we've taken the whole of the preempt-related code in
spin_unlock() and all the other locking primitives out of line for the
CONFIG_PREEMPT case (and potentially more, though that would be at the
expense of x86(-64)).

preempt_schedule() is actually what's used in preempt_check_resched(),
not cond_resched(), and this inspired me to take a look at that. What
on earth are people smoking with all these loops done up with gotos?
I suppose this isn't even the half of it, but it's what I looked at.


-- wli

Nuke some superfluous gotos in preempt_schedule().

Index: mm5-2.6.9-rc1/kernel/sched.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sched.c	2004-09-13 16:27:46.998672328 -0700
+++ mm5-2.6.9-rc1/kernel/sched.c	2004-09-14 09:01:46.514087864 -0700
@@ -2464,18 +2464,19 @@
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task.  Just return..
 	 */
-	if (unlikely(ti->preempt_count || irqs_disabled()))
-		return;
-
-need_resched:
-	ti->preempt_count = PREEMPT_ACTIVE;
-	schedule();
-	ti->preempt_count = 0;
+	if (likely(!ti->preempt_count && !irqs_disabled())) {
+		do {
+			ti->preempt_count = PREEMPT_ACTIVE;
+			schedule();
+			ti->preempt_count = 0;
 
-	/* we could miss a preemption opportunity between schedule and now */
-	barrier();
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
-		goto need_resched;
+			/*
+			 * Without this barrier, we could miss a
+			 * preemption opportunity between schedule and now
+			 */
+			barrier();
+		} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
+	}
 }
 
 EXPORT_SYMBOL(preempt_schedule);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 16:31                                   ` William Lee Irwin III
@ 2004-09-14 16:39                                     ` Andrea Arcangeli
  0 siblings, 0 replies; 76+ messages in thread
From: Andrea Arcangeli @ 2004-09-14 16:39 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, Sep 14, 2004 at 09:31:06AM -0700, William Lee Irwin III wrote:
> The might_sleep() in cond_resched() sounds particularly useful to pick
> up misapplications of cond_resched().

agreed ;)

> I suppose this isn't even the half of it, but it's what I looked at.

looks much nicer indeed (the previous code was basic-like ;).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 15:03                                     ` Andrea Arcangeli
@ 2004-09-14 18:05                                       ` Robert Love
  2004-09-14 18:52                                         ` William Lee Irwin III
  2004-09-15  1:02                                       ` Lee Revell
  1 sibling, 1 reply; 76+ messages in thread
From: Robert Love @ 2004-09-14 18:05 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 17:03 +0200, Andrea Arcangeli wrote:

> we simply need a cond_resched_bkl() for that, no? Very few places are
> still serialized with the BKL, so I don't think it would be a big issue
> to convert those few places to use cond_resched_bkl.

Yes, this is all we need to do.

cond_resched() goes away under PREEMPT.

cond_resched_bkl() does not.

I did this a looong time ago, but did not get much interest.

Explicitly marking places that use BKL's "I can always call schedule()"
assumption help make it easier to phase out that assumption, too.  Or at
least better mark it.

	Robert Love



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies in mtrr.c
  2004-09-14 13:25                         ` [patch] sched: fix scheduling latencies in mtrr.c Ingo Molnar
  2004-09-14 13:15                           ` Alan Cox
@ 2004-09-14 18:22                           ` Zwane Mwaikambo
  1 sibling, 0 replies; 76+ messages in thread
From: Zwane Mwaikambo @ 2004-09-14 18:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Alan Cox

On Tue, 14 Sep 2004, Ingo Molnar wrote:

> 
> > fix scheduling latencies in the MTRR-setting codepath.
> > Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!
> 
> patch attached ...
> 
> caveat: one of the wbinvd() removals is correct i believe, but the other
> one might be problematic. It doesnt seem logical to do a wbinvd() at
> that point though ...

Just a data point, Volume 3 "MTRR considerations in MP systems" states 
that the second wbinvd() isn't required for P6 and Pentium 4 but might be 
necessary for future systems.

	Zwane


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 18:05                                       ` Robert Love
@ 2004-09-14 18:52                                         ` William Lee Irwin III
  2004-09-14 19:02                                           ` Robert Love
  0 siblings, 1 reply; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14 18:52 UTC (permalink / raw)
  To: Robert Love
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 17:03 +0200, Andrea Arcangeli wrote:
>> we simply need a cond_resched_bkl() for that, no? Very few places are
>> still serialized with the BKL, so I don't think it would be a big issue
>> to convert those few places to use cond_resched_bkl.

On Tue, Sep 14, 2004 at 02:05:03PM -0400, Robert Love wrote:
> Yes, this is all we need to do.
> cond_resched() goes away under PREEMPT.
> cond_resched_bkl() does not.
> I did this a looong time ago, but did not get much interest.
> Explicitly marking places that use BKL's "I can always call schedule()"
> assumption help make it easier to phase out that assumption, too.  Or at
> least better mark it.

I'd vaguely prefer to clean up the BKL (ab)users... of course, this
involves working with some of the dirtiest code in the kernel that's
already discouraged most/all of those who work on reducing/eliminating
BKL use from touching it... maybe the latency trend is the final nail
in the coffin of resistance to cleaning that up, though I agree with
Alan that we have to be very careful about it, particularly since all
prior attempts failed to be sufficiently so.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 18:52                                         ` William Lee Irwin III
@ 2004-09-14 19:02                                           ` Robert Love
  2004-09-14 19:21                                             ` William Lee Irwin III
  2004-09-14 19:25                                             ` Andrea Arcangeli
  0 siblings, 2 replies; 76+ messages in thread
From: Robert Love @ 2004-09-14 19:02 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 11:52 -0700, William Lee Irwin III wrote:

> I'd vaguely prefer to clean up the BKL (ab)users... of course, this
> involves working with some of the dirtiest code in the kernel that's
> already discouraged most/all of those who work on reducing/eliminating
> BKL use from touching it... maybe the latency trend is the final nail
> in the coffin of resistance to cleaning that up, though I agree with
> Alan that we have to be very careful about it, particularly since all
> prior attempts failed to be sufficiently so.

I'd love to just throw away all the BKL users, too, William.  But
pragmatism and my cautious sense of reality tells me that the BKL is not
going anywhere anytime soon.  We might get it down to 1% of its previous
usage, but it is awful intertwined in some places.  It will take some
time.

What I think we can do is start looking at removing the special
properties of the BKL, or at least better containing and documenting
them.  The BKL has a few oddities over other spin locks:

	- you can safely call schedule() while holding it
	- you can grab it recursively
	- you cannot use it in interrupt handlers

Getting rid of these, or at least better delineating them, will move the
BKL closer to being just a very granular lock.

cond_resched_bkl() is a step toward that.

I want the BKL gone, too; I think reducing its specialness is a good way
to achieve that.  If you can also s/BKL/some other lock/ at the same
time, rock that.

Best,

	Robert Love



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:21                                             ` William Lee Irwin III
@ 2004-09-14 19:19                                               ` Alan Cox
  2004-09-15  0:22                                                 ` Lee Revell
  2004-09-15  1:18                                                 ` William Lee Irwin III
  2004-09-14 19:26                                               ` Robert Love
  2004-09-14 21:06                                               ` William Lee Irwin III
  2 siblings, 2 replies; 76+ messages in thread
From: Alan Cox @ 2004-09-14 19:19 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Robert Love, Andrea Arcangeli, Nick Piggin, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List

On Maw, 2004-09-14 at 20:21, William Lee Irwin III wrote:
> The "safely call schedule() while holding it" needs quite a bit of
> qualification; it's implicitly dropped during voluntary context
> switches and reacquired when rescheduled, but it's not valid to force
> such a task into an involuntary context switch and calling schedule()
> implies dropping the lock, so it has to be done at the proper times.
> This is a complex semantic that likely trips up numerous callers (as
> well as attempts to explain it, though surely you know these things,
> and merely wanted a shorter line for the bullet point).

The problem is people think of the BKL as a lock. It isn't a lock it has
never been a lock and it's all DaveM's fault 8) for naming it that when
he moved my asm entry point stuff into C.

The BKL turns on old style unix non-pre-emptive sematics between all
code that is within lock_kernel sections, that is it. That also makes it
hard to clean up because lock_kernel is delimiting code properties (its
essentially almost a function attribute) and spin_lock/down/up and
friends are real locks and lock data.

I've seen very few cases where there is a magic transform from one to
the other because of this. So if you want to kill the BKL add proper
locking to data structures covered by BKL users until the lock_kernel
simply does nothing.

> or sweeps others care to devolve to me, so I'll largely be using
> whatever tactic whoever cares to drive all this (probably Alan) prefers.

Fix the data structure locking starting at the lowest level is how I've
always tackled these messes. When the low level locking is right the
rest just works (usually 8)).

	"Lock data not code"

Alan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:02                                           ` Robert Love
@ 2004-09-14 19:21                                             ` William Lee Irwin III
  2004-09-14 19:19                                               ` Alan Cox
                                                                 ` (2 more replies)
  2004-09-14 19:25                                             ` Andrea Arcangeli
  1 sibling, 3 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14 19:21 UTC (permalink / raw)
  To: Robert Love
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 11:52 -0700, William Lee Irwin III wrote:
>> I'd vaguely prefer to clean up the BKL (ab)users... of course, this
>> involves working with some of the dirtiest code in the kernel that's
>> already discouraged most/all of those who work on reducing/eliminating
>> BKL use from touching it... maybe the latency trend is the final nail
>> in the coffin of resistance to cleaning that up, though I agree with
>> Alan that we have to be very careful about it, particularly since all
>> prior attempts failed to be sufficiently so.

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> I'd love to just throw away all the BKL users, too, William.  But
> pragmatism and my cautious sense of reality tells me that the BKL is not
> going anywhere anytime soon.  We might get it down to 1% of its previous
> usage, but it is awful intertwined in some places.  It will take some
> time.

Far from "just throw away" -- this is hard work! Very hard work, and a
number of people have already tried and failed.


On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> What I think we can do is start looking at removing the special
> properties of the BKL, or at least better containing and documenting
> them.  The BKL has a few oddities over other spin locks:
> 	- you can safely call schedule() while holding it
> 	- you can grab it recursively
> 	- you cannot use it in interrupt handlers

The "safely call schedule() while holding it" needs quite a bit of
qualification; it's implicitly dropped during voluntary context
switches and reacquired when rescheduled, but it's not valid to force
such a task into an involuntary context switch and calling schedule()
implies dropping the lock, so it has to be done at the proper times.
This is a complex semantic that likely trips up numerous callers (as
well as attempts to explain it, though surely you know these things,
and merely wanted a shorter line for the bullet point).


On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> Getting rid of these, or at least better delineating them, will move the
> BKL closer to being just a very granular lock.
> cond_resched_bkl() is a step toward that.
> I want the BKL gone, too; I think reducing its specialness is a good way
> to achieve that.  If you can also s/BKL/some other lock/ at the same
> time, rock that.

I'd actually like to go the opposite direction from Ingo: I'd like to
remove uses of the sleeping characteristic first, as in my mind that is
the most pernicious and causes the most subtleties. Afterward, the
recursion. Except it's unclear that I have the time/etc. resources to
address it apart from taking on small pieces of whatever auditing work
or sweeps others care to devolve to me, so I'll largely be using
whatever tactic whoever cares to drive all this (probably Alan) prefers.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:02                                           ` Robert Love
  2004-09-14 19:21                                             ` William Lee Irwin III
@ 2004-09-14 19:25                                             ` Andrea Arcangeli
  2004-09-14 19:29                                               ` Robert Love
  2004-09-14 19:34                                               ` William Lee Irwin III
  1 sibling, 2 replies; 76+ messages in thread
From: Andrea Arcangeli @ 2004-09-14 19:25 UTC (permalink / raw)
  To: Robert Love
  Cc: William Lee Irwin III, Nick Piggin, Ingo Molnar, Andrew Morton,
	linux-kernel

Hi Robert,

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> 	- you can safely call schedule() while holding it
> 	- you can grab it recursively
> 	- you cannot use it in interrupt handlers

the latter won't make it harder to get rid of at least ;)

> 
> Getting rid of these, or at least better delineating them, will move the
> BKL closer to being just a very granular lock.
> 
> cond_resched_bkl() is a step toward that.

yes, I don't think it will make thing worse in respect of dropping the
bkl, if something it should help.

probably the bkl is still there because removing it won't bring much
further value to the kernel at runtime, it'd probably only make the
kernel a bit cleaner and simpler.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:21                                             ` William Lee Irwin III
  2004-09-14 19:19                                               ` Alan Cox
@ 2004-09-14 19:26                                               ` Robert Love
  2004-09-14 21:06                                               ` William Lee Irwin III
  2 siblings, 0 replies; 76+ messages in thread
From: Robert Love @ 2004-09-14 19:26 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 12:21 -0700, William Lee Irwin III wrote:

> Far from "just throw away" -- this is hard work! Very hard work, and a
> number of people have already tried and failed.

That is my point.

> The "safely call schedule() while holding it" needs quite a bit of
> qualification; it's implicitly dropped during voluntary context
> switches and reacquired when rescheduled, but it's not valid to force
> such a task into an involuntary context switch and calling schedule()
> implies dropping the lock, so it has to be done at the proper times.
> This is a complex semantic that likely trips up numerous callers (as
> well as attempts to explain it, though surely you know these things,
> and merely wanted a shorter line for the bullet point).

Right.  I meant safe against deadlocks.  It obviously is not "safe" to
reschedule in the middle of your critical region.

> I'd actually like to go the opposite direction from Ingo: I'd like to
> remove uses of the sleeping characteristic first, as in my mind that is
> the most pernicious and causes the most subtleties. Afterward, the
> recursion. Except it's unclear that I have the time/etc. resources to
> address it apart from taking on small pieces of whatever auditing work
> or sweeps others care to devolve to me, so I'll largely be using
> whatever tactic whoever cares to drive all this (probably Alan) prefers.

This, too, is exactly what I am saying.

I want to remove the sleep characteristic.

One way to do that is start special casing it.  Document it with the
code, e.g. make an explicit cond_resched_bkl().

	Robert Love



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:25                                             ` Andrea Arcangeli
@ 2004-09-14 19:29                                               ` Robert Love
  2004-09-14 19:34                                               ` William Lee Irwin III
  1 sibling, 0 replies; 76+ messages in thread
From: Robert Love @ 2004-09-14 19:29 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: William Lee Irwin III, Nick Piggin, Ingo Molnar, Andrew Morton,
	linux-kernel

On Tue, 2004-09-14 at 21:25 +0200, Andrea Arcangeli wrote:

Hi, Andrea.

> On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> > 	- you can safely call schedule() while holding it
> > 	- you can grab it recursively
> > 	- you cannot use it in interrupt handlers
> 
> the latter won't make it harder to get rid of at least ;)

Indeed.  I should not of lumped the last property in with the "things to
get rid of", although it is one of the implicit rules of the BKL.

We probably don't want to actually start disabling interrupts for no
reason when we grab the BKL. ;-)

Although the locks that replace the BKL can certainly be BKL-safe locks.

> yes, I don't think it will make thing worse in respect of dropping the
> bkl, if something it should help.
> 
> probably the bkl is still there because removing it won't bring much
> further value to the kernel at runtime, it'd probably only make the
> kernel a bit cleaner and simpler.

I agree.  Barring a few worst-case examples, I think the only reason
going forward to reduce the BKL's use is cleanliness and simplicity.  It
is rather hard at times to find just what the BKL is locking.

	Robert Love



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:25                                             ` Andrea Arcangeli
  2004-09-14 19:29                                               ` Robert Love
@ 2004-09-14 19:34                                               ` William Lee Irwin III
  1 sibling, 0 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14 19:34 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Robert Love, Nick Piggin, Ingo Molnar, Andrew Morton, Alan Cox,
	linux-kernel

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
>> Getting rid of these, or at least better delineating them, will move the
>> BKL closer to being just a very granular lock.
>> cond_resched_bkl() is a step toward that.

On Tue, Sep 14, 2004 at 09:25:13PM +0200, Andrea Arcangeli wrote:
> yes, I don't think it will make thing worse in respect of dropping the
> bkl, if something it should help.
> probably the bkl is still there because removing it won't bring much
> further value to the kernel at runtime, it'd probably only make the
> kernel a bit cleaner and simpler.

I think the real trouble is that the locking being so hard to analyze,
especially  when it's intermixed with normal locking, causes real bugs.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:21                                             ` William Lee Irwin III
  2004-09-14 19:19                                               ` Alan Cox
  2004-09-14 19:26                                               ` Robert Love
@ 2004-09-14 21:06                                               ` William Lee Irwin III
  2 siblings, 0 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-14 21:06 UTC (permalink / raw)
  To: Robert Love
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton,
	Arjan van de Ven, linux-kernel

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
>> I'd love to just throw away all the BKL users, too, William.  But
>> pragmatism and my cautious sense of reality tells me that the BKL is not
>> going anywhere anytime soon.  We might get it down to 1% of its previous
>> usage, but it is awful intertwined in some places.  It will take some
>> time.

On Tue, Sep 14, 2004 at 12:21:04PM -0700, William Lee Irwin III wrote:
> Far from "just throw away" -- this is hard work! Very hard work, and a
> number of people have already tried and failed.

Well, since sleep_on() and relatives require the BKL for safety and
otherwise are unsafe, here's a patch to mark them deprecated, suggested
by Arjan van de Ven and others. vs. 2.6.9-rc1-mm5. Compiletested on x86-64.


-- wli


Index: mm5-2.6.9-rc1/include/linux/wait.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/wait.h	2004-09-13 16:27:50.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/wait.h	2004-09-14 13:36:00.766337272 -0700
@@ -23,6 +23,7 @@
 #include <linux/list.h>
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
+#include <linux/compiler.h>
 #include <asm/system.h>
 #include <asm/current.h>
 
@@ -281,12 +282,33 @@
  * They are racy.  DO NOT use them, use the wait_event* interfaces above.  
  * We plan to remove these interfaces during 2.7.
  */
-extern void FASTCALL(sleep_on(wait_queue_head_t *q));
-extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q,
-				      signed long timeout));
-extern void FASTCALL(interruptible_sleep_on(wait_queue_head_t *q));
-extern long FASTCALL(interruptible_sleep_on_timeout(wait_queue_head_t *q,
-						    signed long timeout));
+
+void FASTCALL(__sleep_on(wait_queue_head_t *));
+long FASTCALL(__sleep_on_timeout(wait_queue_head_t *, signed long));
+void FASTCALL(__interruptible_sleep_on(wait_queue_head_t *));
+long FASTCALL(__interruptible_sleep_on_timeout(wait_queue_head_t *, signed long));
+
+static inline void __deprecated sleep_on(wait_queue_head_t *q)
+{
+	__sleep_on(q);
+}
+
+static inline long __deprecated
+sleep_on_timeout(wait_queue_head_t *q, signed long timeout)
+{
+	__sleep_on_timeout(q, timeout);
+}
+
+static inline __deprecated void interruptible_sleep_on(wait_queue_head_t *q)
+{
+	__interruptible_sleep_on(q);
+}
+
+static inline long __deprecated
+interruptible_sleep_on_timeout(wait_queue_head_t *q, signed long timeout)
+{
+	__interruptible_sleep_on_timeout(q, timeout);
+}
 
 /*
  * Waitqueues which are removed from the waitqueue_head at wakeup time
Index: mm5-2.6.9-rc1/kernel/sched.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sched.c	2004-09-14 09:01:46.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/sched.c	2004-09-14 13:36:22.743996160 -0700
@@ -2633,7 +2633,7 @@
 	__remove_wait_queue(q, &wait);			\
 	spin_unlock_irqrestore(&q->lock, flags);
 
-void fastcall __sched interruptible_sleep_on(wait_queue_head_t *q)
+void fastcall __sched __interruptible_sleep_on(wait_queue_head_t *q)
 {
 	SLEEP_ON_VAR
 
@@ -2644,9 +2644,9 @@
 	SLEEP_ON_TAIL
 }
 
-EXPORT_SYMBOL(interruptible_sleep_on);
+EXPORT_SYMBOL(__interruptible_sleep_on);
 
-long fastcall __sched interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
+long fastcall __sched __interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
 	SLEEP_ON_VAR
 
@@ -2659,9 +2659,9 @@
 	return timeout;
 }
 
-EXPORT_SYMBOL(interruptible_sleep_on_timeout);
+EXPORT_SYMBOL(__interruptible_sleep_on_timeout);
 
-void fastcall __sched sleep_on(wait_queue_head_t *q)
+void fastcall __sched __sleep_on(wait_queue_head_t *q)
 {
 	SLEEP_ON_VAR
 
@@ -2672,9 +2672,9 @@
 	SLEEP_ON_TAIL
 }
 
-EXPORT_SYMBOL(sleep_on);
+EXPORT_SYMBOL(__sleep_on);
 
-long fastcall __sched sleep_on_timeout(wait_queue_head_t *q, long timeout)
+long fastcall __sched __sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
 	SLEEP_ON_VAR
 
@@ -2687,7 +2687,7 @@
 	return timeout;
 }
 
-EXPORT_SYMBOL(sleep_on_timeout);
+EXPORT_SYMBOL(__sleep_on_timeout);
 
 void set_user_nice(task_t *p, long nice)
 {

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 14:54                                 ` Ingo Molnar
@ 2004-09-14 22:55                                   ` Nick Piggin
  2004-09-15  6:19                                     ` Ingo Molnar
  2004-09-15  0:35                                   ` Lee Revell
  1 sibling, 1 reply; 76+ messages in thread
From: Nick Piggin @ 2004-09-14 22:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>Another thing, I don't mean this to sound like a rhetorical question,
>>but if we have a preemptible kernel, why is it a good idea to sprinkle
> 
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>>cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
>>would someone who really cares about latency not enable preempt?
> 
> 
> two things:
> 
> 1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
> even SuSE. This is pretty telling.
> 
> 2) 10 new cond_resched()'s are not precisely 'sprinkle everywhere'.
> 

No, but I mean putting them right down into fastpaths like the vmscan
one, for example.

And if I remember correctly, you resorted to putting them into
might_sleep as well (but I haven't read the code for a while, maybe
you're now getting decent results without doing that).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:19                                               ` Alan Cox
@ 2004-09-15  0:22                                                 ` Lee Revell
  2004-09-15  1:46                                                   ` William Lee Irwin III
  2004-09-15  1:18                                                 ` William Lee Irwin III
  1 sibling, 1 reply; 76+ messages in thread
From: Lee Revell @ 2004-09-15  0:22 UTC (permalink / raw)
  To: Alan Cox
  Cc: William Lee Irwin III, Robert Love, Andrea Arcangeli,
	Nick Piggin, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List

On Tue, 2004-09-14 at 15:19, Alan Cox wrote:
> Fix the data structure locking starting at the lowest level is how I've
> always tackled these messes. When the low level locking is right the
> rest just works (usually 8)).
> 
> 	"Lock data not code"
> 

Although, there is at least one case (reiser3) where we know which data
structures the BKL is supposed to be protecting, because the code does
something like reiserfs_write_lock(foo_data_structure) which gets
define'd away to lock_kernel().  And apparently some of the best and
brightest on LKML have tried and failed to fix it, and even Hans says
"it's HARD, the fix is reiser4".

So, maybe some of the current uses should be tagged as WONTFIX.

Lee


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 14:54                                 ` Ingo Molnar
  2004-09-14 22:55                                   ` Nick Piggin
@ 2004-09-15  0:35                                   ` Lee Revell
  1 sibling, 0 replies; 76+ messages in thread
From: Lee Revell @ 2004-09-15  0:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 10:54, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> > Another thing, I don't mean this to sound like a rhetorical question,
> > but if we have a preemptible kernel, why is it a good idea to sprinkle
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
> > would someone who really cares about latency not enable preempt?
> 
> two things:
> 
> 1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
> even SuSE. This is pretty telling.
> 

I am not sure this means preemption is a bad idea, it just means there's
no point in enabling CONFIG_PREEMPT with the current kernel because it's
not enough of an improvement to make a difference for low latency
applications.

Lee


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 15:03                                     ` Andrea Arcangeli
  2004-09-14 18:05                                       ` Robert Love
@ 2004-09-15  1:02                                       ` Lee Revell
  2004-09-15  1:39                                         ` William Lee Irwin III
  1 sibling, 1 reply; 76+ messages in thread
From: Lee Revell @ 2004-09-15  1:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 11:03, Andrea Arcangeli wrote:
> On Wed, Sep 15, 2004 at 12:28:49AM +1000, Nick Piggin wrote:
> > Well I don't know how good an argument the crashes one is these days,
> > but generally (as far as I know) those who really care about latency
> > shouldn't mind about some extra overheads.
> 
> sure, that's especially true for the hardirq and softirq total scheduler
> offloading. The real question is where a generic desktop positions. I
> doubt on a generic desktop a latency over 1msec matters much,
> top performance of repetitive tasks that sums up like hardirqs for a NIC
> sounds more sensible to me.
> 
> And for the other usages RTAI or any other hard realtime sounds safer
> anyways.
> 

For a generic desktop I don't think any of this makes much of a
difference; AFAIK none of the VP testers have reported a perceptible
difference in system responsiveness.  A good point of comparison here is
what Microsoft OS'es can do.  My Windows XP setup works pretty well with
a latency of 2.66ms or 128 frames at 48KHZ, and is rock solid at 256
frames or 5.33ms.

However for low latency audio Mac OS X is our real competition.  OS X
can deliver audio latencies of probably 0.5ms.  There is not much point
in going much lower than this because the difference becomes
imperceptible and the more frequent cache thrashing becomes an issue;
this is close enough to the limits of what sound hardware is capable of
anyway.

With Ingo's patches the worst case latency on the same machine as my XP
example is about 150 usecs.  So, it seems to me that Ingo's patches can
achieve results as good or better than OSX even without the one or two
"dangerous" changes, like the removal of lock_kernel around
do_tty_write.

Lee


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 19:19                                               ` Alan Cox
  2004-09-15  0:22                                                 ` Lee Revell
@ 2004-09-15  1:18                                                 ` William Lee Irwin III
  1 sibling, 0 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15  1:18 UTC (permalink / raw)
  To: Alan Cox
  Cc: Robert Love, Andrea Arcangeli, Nick Piggin, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List

On Tue, Sep 14, 2004 at 08:19:57PM +0100, Alan Cox wrote:
> The problem is people think of the BKL as a lock. It isn't a lock it has
> never been a lock and it's all DaveM's fault 8) for naming it that when
> he moved my asm entry point stuff into C.
> The BKL turns on old style unix non-pre-emptive sematics between all
> code that is within lock_kernel sections, that is it. That also makes it
> hard to clean up because lock_kernel is delimiting code properties (its
> essentially almost a function attribute) and spin_lock/down/up and
> friends are real locks and lock data.
> I've seen very few cases where there is a magic transform from one to
> the other because of this. So if you want to kill the BKL add proper
> locking to data structures covered by BKL users until the lock_kernel
> simply does nothing.

The perfect critic has no method, save one.


On Maw, 2004-09-14 at 20:21, William Lee Irwin III wrote:
>> or sweeps others care to devolve to me, so I'll largely be using
>> whatever tactic whoever cares to drive all this (probably Alan) prefers.

On Tue, Sep 14, 2004 at 08:19:57PM +0100, Alan Cox wrote:
> Fix the data structure locking starting at the lowest level is how I've
> always tackled these messes. When the low level locking is right the
> rest just works (usually 8)).

This is the opposite of what some parties endorse, which is pushing the
BKL downward safely as audits determine, adding locking to enable it to
be done safely, and so on. So ad hoc it is.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  1:02                                       ` Lee Revell
@ 2004-09-15  1:39                                         ` William Lee Irwin III
  2004-09-15  2:11                                           ` Lee Revell
  2004-09-15  9:56                                           ` Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15  1:39 UTC (permalink / raw)
  To: Lee Revell
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, Sep 14, 2004 at 09:02:07PM -0400, Lee Revell wrote:
> For a generic desktop I don't think any of this makes much of a
> difference; AFAIK none of the VP testers have reported a perceptible
> difference in system responsiveness.  A good point of comparison here is
> what Microsoft OS'es can do.  My Windows XP setup works pretty well with
> a latency of 2.66ms or 128 frames at 48KHZ, and is rock solid at 256
> frames or 5.33ms.
> However for low latency audio Mac OS X is our real competition.  OS X
> can deliver audio latencies of probably 0.5ms.  There is not much point
> in going much lower than this because the difference becomes
> imperceptible and the more frequent cache thrashing becomes an issue;
> this is close enough to the limits of what sound hardware is capable of
> anyway.
> With Ingo's patches the worst case latency on the same machine as my XP
> example is about 150 usecs.  So, it seems to me that Ingo's patches can
> achieve results as good or better than OSX even without the one or two
> "dangerous" changes, like the removal of lock_kernel around
> do_tty_write.

The code we're most worried is buggy, not just nonperformant.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  0:22                                                 ` Lee Revell
@ 2004-09-15  1:46                                                   ` William Lee Irwin III
  2004-09-15  2:00                                                     ` Lee Revell
  0 siblings, 1 reply; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15  1:46 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Tue, 2004-09-14 at 15:19, Alan Cox wrote:
>> Fix the data structure locking starting at the lowest level is how I've
>> always tackled these messes. When the low level locking is right the
>> rest just works (usually 8)).
>> 	"Lock data not code"

On Tue, Sep 14, 2004 at 08:22:29PM -0400, Lee Revell wrote:
> Although, there is at least one case (reiser3) where we know which data
> structures the BKL is supposed to be protecting, because the code does
> something like reiserfs_write_lock(foo_data_structure) which gets
> define'd away to lock_kernel().  And apparently some of the best and
> brightest on LKML have tried and failed to fix it, and even Hans says
> "it's HARD, the fix is reiser4".
> So, maybe some of the current uses should be tagged as WONTFIX.

I've not heard a peep about anyone trying to fix this. It should be
killed off along with the rest, of course, but like I said before, it's
the messiest, dirtiest, and ugliest code that's left to go through,
which is why it's been left for last. e.g. driver ->ioctl() methods.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  1:46                                                   ` William Lee Irwin III
@ 2004-09-15  2:00                                                     ` Lee Revell
  2004-09-15  2:36                                                       ` William Lee Irwin III
  0 siblings, 1 reply; 76+ messages in thread
From: Lee Revell @ 2004-09-15  2:00 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
> On Tue, Sep 14, 2004 at 08:22:29PM -0400, Lee Revell wrote:
> > Although, there is at least one case (reiser3) where we know which data
> > structures the BKL is supposed to be protecting, because the code does
> > something like reiserfs_write_lock(foo_data_structure) which gets
> > define'd away to lock_kernel().  And apparently some of the best and
> > brightest on LKML have tried and failed to fix it, and even Hans says
> > "it's HARD, the fix is reiser4".
> > So, maybe some of the current uses should be tagged as WONTFIX.
> 
> I've not heard a peep about anyone trying to fix this. It should be
> killed off along with the rest, of course, but like I said before, it's
> the messiest, dirtiest, and ugliest code that's left to go through,
> which is why it's been left for last. e.g. driver ->ioctl() methods.
> 

Andrew tried to fix this a few times in 2.4 and it broke the FS in
subtle ways.  Don't have an archive link but the message is
<20040712163141.31ef1ad6.akpm@osdl.org>.  I asked Hans directly about it
and he said "balancing makes it hard, the fix is reiser4", see
<411925FA.2000303@namesys.com>.

Lee 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  1:39                                         ` William Lee Irwin III
@ 2004-09-15  2:11                                           ` Lee Revell
  2004-09-15 11:17                                             ` Ingo Molnar
  2004-09-15  9:56                                           ` Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Lee Revell @ 2004-09-15  2:11 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andrea Arcangeli, Nick Piggin, Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2004-09-14 at 21:39, William Lee Irwin III wrote:
> > With Ingo's patches the worst case latency on the same machine as my XP
> > example is about 150 usecs.  So, it seems to me that Ingo's patches can
> > achieve results as good or better than OSX even without the one or two
> > "dangerous" changes, like the removal of lock_kernel around
> > do_tty_write.
> 
> The code we're most worried is buggy, not just nonperformant.
> 

Understood.  The only dangerous change I know of in the VP patches is
the one Alan took issue with, the aforementioned removal of lock_kernel
in the tty code.  IIRC this only produced a latency of about 300 usecs,
and only when switching VT's from a console to X and back.  My point was
that it's quite possible that we can get OSX-like results without the
more dangerous changes.

Ingo, if you want to send me a patch set without the more controversial
changes, I can compare the performance.  A diff against the latest VP
patch would be OK.

Lee




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  2:00                                                     ` Lee Revell
@ 2004-09-15  2:36                                                       ` William Lee Irwin III
  2004-09-15  2:59                                                         ` Lee Revell
  2004-09-15 13:36                                                         ` Hans Reiser
  0 siblings, 2 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15  2:36 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
>> I've not heard a peep about anyone trying to fix this. It should be
>> killed off along with the rest, of course, but like I said before, it's
>> the messiest, dirtiest, and ugliest code that's left to go through,
>> which is why it's been left for last. e.g. driver ->ioctl() methods.

On Tue, Sep 14, 2004 at 10:00:44PM -0400, Lee Revell wrote:
> Andrew tried to fix this a few times in 2.4 and it broke the FS in
> subtle ways.  Don't have an archive link but the message is
> <20040712163141.31ef1ad6.akpm@osdl.org>.  I asked Hans directly about it
> and he said "balancing makes it hard, the fix is reiser4", see
> <411925FA.2000303@namesys.com>.

I have neither of these locally. I suspect someone needs to care enough
about the code for anything to happen soon. I suppose there are things
that probably weren't tried, e.g. auditing to make sure dependencies on
external synchronization are taken care of, removing implicit sleeping
with the BKL held, then punt a private recursive spinlock in reiser3's
direction. Not sure what went on, or if I want to get involved in this
particular case.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  2:36                                                       ` William Lee Irwin III
@ 2004-09-15  2:59                                                         ` Lee Revell
  2004-09-15 13:36                                                         ` Hans Reiser
  1 sibling, 0 replies; 76+ messages in thread
From: Lee Revell @ 2004-09-15  2:59 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Tue, 2004-09-14 at 22:36, William Lee Irwin III wrote:
> I have neither of these locally. I suspect someone needs to care enough
> about the code for anything to happen soon. I suppose there are things
> that probably weren't tried, e.g. auditing to make sure dependencies on
> external synchronization are taken care of, removing implicit sleeping
> with the BKL held, then punt a private recursive spinlock in reiser3's
> direction. Not sure what went on, or if I want to get involved in this
> particular case.
> 

There isn't really any information in the archives about what was
tried.  Here's Andrew's message:

http://lkml.org/lkml/2004/7/12/266

And Hans':

http://lkml.org/lkml/2004/8/10/320

I suspect that "Use reiser4 (or ext3) if you care about latency" is a
good enough answer for most people.

Lee


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-14 22:55                                   ` Nick Piggin
@ 2004-09-15  6:19                                     ` Ingo Molnar
  2004-09-15  8:23                                       ` Nick Piggin
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15  6:19 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> No, but I mean putting them right down into fastpaths like the vmscan
> one, for example.

it is a very simple no-parameters call to a function that reads a
likely-cached word and returns. The cost is in the 2-3 cycles range - a
_single_ cachemiss can be 10-100 times more expensive, and cachemisses
happen very frequently in every iteration of the VM _scanning_ path
since it (naturally and inevitably) deals with lots of sparsely
scattered data structures that havent been referenced for quite some
time.

The function (cond_resched()) triggers scheduling only very rarely, you
should not be worried about that aspect either.

> And if I remember correctly, you resorted to putting them into
> might_sleep as well (but I haven't read the code for a while, maybe
> you're now getting decent results without doing that).

i'm not arguing that now at all, that preemption model clearly has to be
an optional thing - at least initially.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  6:19                                     ` Ingo Molnar
@ 2004-09-15  8:23                                       ` Nick Piggin
  2004-09-15  8:43                                         ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Nick Piggin @ 2004-09-15  8:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>No, but I mean putting them right down into fastpaths like the vmscan
>>one, for example.
> 
> 
> it is a very simple no-parameters call to a function that reads a
> likely-cached word and returns. The cost is in the 2-3 cycles range - a
> _single_ cachemiss can be 10-100 times more expensive, and cachemisses
> happen very frequently in every iteration of the VM _scanning_ path
> since it (naturally and inevitably) deals with lots of sparsely
> scattered data structures that havent been referenced for quite some
> time.
> 

OK, this one thing isn't going to be noticable. But why you really
must have the check for every page, and not in the logical place
where we batch them up? You're obviously aiming for the lowest
latencies possible *without* CONFIG_PREEMPT.

But I'm thinking, why add overhead for people who don't care about
sub-ms latency (ie. most of us)? And at the same time, why would
anyone in a latency critical environment not enable preempt?


> The function (cond_resched()) triggers scheduling only very rarely, you
> should not be worried about that aspect either.
> 

No, I'm not worried about that.

> 
>>And if I remember correctly, you resorted to putting them into
>>might_sleep as well (but I haven't read the code for a while, maybe
>>you're now getting decent results without doing that).
> 
> 
> i'm not arguing that now at all, that preemption model clearly has to be
> an optional thing - at least initially.
> 

OK.

Alternatively, I'd say tell everyone who wants really low latency to
enable CONFIG_PREEMPT, which automatically gives the minimum possible
preempt latency, delimited (and defined) by critical sections, instead
of the more ad-hoc "sprinkling" ;)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  8:23                                       ` Nick Piggin
@ 2004-09-15  8:43                                         ` Ingo Molnar
  2004-09-15 10:09                                           ` William Lee Irwin III
  2004-09-16  1:03                                           ` Nick Piggin
  0 siblings, 2 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15  8:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> OK.
> 
> Alternatively, I'd say tell everyone who wants really low latency to
> enable CONFIG_PREEMPT, which automatically gives the minimum possible
> preempt latency, delimited (and defined) by critical sections, instead
> of the more ad-hoc "sprinkling" ;)

it's not ad-hoc. These are the 10 remaining points for which there is no
natural might_sleep() point nearby (according to measurements). That's
why i called them 'complementary'. They cause zero problems for the
normal kernel (we already have another 70 cond_resched() points), but
they _are_ the ones needed in addition if might_sleep() also does
cond_resched().

the 'reliability' of latency break-up depends on the basic preemption
model. Believe me, even with CONFIG_PREEMPT there were a boatload of
critical sections that had insanely long latencies that nobody fixed
until the VP patchset came along. Without CONFIG_PREEMPT the number of
possibly latency-paths increases, but the situation is the same as with
CONFIG_PREEMPT: you need tools, people that test stuff and lots of
manual work to break them up reliably. You will never be 'done' but you
can do a reasonably good job for workloads that people care about.

the 'final' preemption model [for hard-RT purposes] that i believe will
make it into the Linux kernel one nice day is total preemptability of
everything but the core preemption code (i.e. the scheduler and
interrupt controllers). _That_ might be something that has provable
latencies. Note that such a 'total preemption' model has prerequisites
too, like the deterministic execution of hardirqs/softirqs.

note that the current lock-break-up activities still make alot of sense
even under the total-preemption model: it decreases the latency of
kernel-using hard-RT applications. (raw total preemption only guarantees
quick scheduling of the hard-RT task - it doesnt guarantee that the task
can complete any useful kernel/syscall work.)

since we already see at least 4 different viable preemption models
placed on different points in the 'latency reliability' spectrum, it
makes little sense to settle for any of them. So i'm aiming to keep the
core code flexible to have them all without much fuss, and usage will
decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
altogether. We cannot know at this point, it all depends on how usage
(and consequently, hardware) evolves.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  1:39                                         ` William Lee Irwin III
  2004-09-15  2:11                                           ` Lee Revell
@ 2004-09-15  9:56                                           ` Ingo Molnar
  2004-09-15  9:57                                             ` William Lee Irwin III
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15  9:56 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Lee Revell, Andrea Arcangeli, Nick Piggin, Andrew Morton, linux-kernel


* William Lee Irwin III <wli@holomorphy.com> wrote:

> > With Ingo's patches the worst case latency on the same machine as my XP
> > example is about 150 usecs.  So, it seems to me that Ingo's patches can
> > achieve results as good or better than OSX even without the one or two
> > "dangerous" changes, like the removal of lock_kernel around
> > do_tty_write.
> 
> The code we're most worried is buggy, not just nonperformant.

what code do you mean? The one i know about and which Lee is referring
to above is the 6-lines tty unlocking change - the one which Alan
objected to rightfully. I've zapped it from my tree.

(nobody objected to the original thread on lkml weeks ago where the tty
unlocking change was proposed, implemented, alpha-tested and beta-tested
in -mm for several releases - that's why it showed up in my 20+
latency-reduction patches.)

No latency changes to the tty layer until someone fixes its locking. End
of story.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  9:56                                           ` Ingo Molnar
@ 2004-09-15  9:57                                             ` William Lee Irwin III
  2004-09-15 10:12                                               ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15  9:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Andrea Arcangeli, Nick Piggin, Andrew Morton, linux-kernel

* William Lee Irwin III <wli@holomorphy.com> wrote:
>> The code we're most worried is buggy, not just nonperformant.

On Wed, Sep 15, 2004 at 11:56:14AM +0200, Ingo Molnar wrote:
> what code do you mean? The one i know about and which Lee is referring
> to above is the 6-lines tty unlocking change - the one which Alan
> objected to rightfully. I've zapped it from my tree.
> (nobody objected to the original thread on lkml weeks ago where the tty
> unlocking change was proposed, implemented, alpha-tested and beta-tested
> in -mm for several releases - that's why it showed up in my 20+
> latency-reduction patches.)
> No latency changes to the tty layer until someone fixes its locking. End
> of story.

I had the buggy code being associated with BKL use in mind as a motive
for the BKL sweeps etc., and wasn't referring to any pending changes.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  8:43                                         ` Ingo Molnar
@ 2004-09-15 10:09                                           ` William Lee Irwin III
  2004-09-15 10:21                                             ` Ingo Molnar
  2004-09-16  1:03                                           ` Nick Piggin
  1 sibling, 1 reply; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15 10:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> the 'final' preemption model [for hard-RT purposes] that i believe will
> make it into the Linux kernel one nice day is total preemptability of
> everything but the core preemption code (i.e. the scheduler and
> interrupt controllers). _That_ might be something that has provable
> latencies. Note that such a 'total preemption' model has prerequisites
> too, like the deterministic execution of hardirqs/softirqs.

I don't see deterministic execution time of hardirqs/softirqs happening
on stock hardware and without serious driver work, and I don't see much
hard RT ever happening on SMP due to lock contention. But maybe that
just means it's difficult.


On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> note that the current lock-break-up activities still make alot of sense
> even under the total-preemption model: it decreases the latency of
> kernel-using hard-RT applications. (raw total preemption only guarantees
> quick scheduling of the hard-RT task - it doesnt guarantee that the task
> can complete any useful kernel/syscall work.)

One reason I'm not complaining is because voluntary switching is
required to reschedule in otherwise non-preemptible critical sections.


On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> since we already see at least 4 different viable preemption models
> placed on different points in the 'latency reliability' spectrum, it
> makes little sense to settle for any of them. So i'm aiming to keep the
> core code flexible to have them all without much fuss, and usage will
> decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
> CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
> CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
> altogether. We cannot know at this point, it all depends on how usage
> (and consequently, hardware) evolves.

Well, one thing that completely voluntary context switch -based
critical section management tells us that reliance on implicit kernel
preemption doesn't is which codepaths matter: whatever codepaths would
need to be preempted implicitly then explicitly show up as latency
blips on the latency instrumentation radar screen, and by so doing at
least get annotated with "this codepath matters for latency" and
whoever may inadvertently try to extend a nonpreemptible critical
section over the scheduling point will have a big "stop" sign in front
of them.  So, even if CONFIG_PREEMPT is the way and the annotations are
nops there, the annotation of latency-critical scheduling points, even
where they would be preemptible with CONFIG_PREEMPT=y, has value.

And, of course, the preemption reenablement required for voluntary
yielding from non-preemptible critical sections also has a positive
operational effect with CONFIG_PREEMPT=y, so there is nothing in VP
that doesn't also benefit CONFIG_PREEMPT.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  9:57                                             ` William Lee Irwin III
@ 2004-09-15 10:12                                               ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15 10:12 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Lee Revell, Andrea Arcangeli, Nick Piggin, Andrew Morton, linux-kernel


* William Lee Irwin III <wli@holomorphy.com> wrote:

> I had the buggy code being associated with BKL use in mind as a motive
> for the BKL sweeps etc., and wasn't referring to any pending changes.

ok, fair enough.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15 10:09                                           ` William Lee Irwin III
@ 2004-09-15 10:21                                             ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15 10:21 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Nick Piggin, Andrew Morton, linux-kernel


* William Lee Irwin III <wli@holomorphy.com> wrote:

> I don't see deterministic execution time of hardirqs/softirqs
> happening on stock hardware and without serious driver work, and I
> don't see much hard RT ever happening on SMP due to lock contention.
> But maybe that just means it's difficult.

actually, check out what i did to SMP latencies via the
preempt-smp.patch. This patch brings SMP worst-case latencies to UP
levels. E.g. on a dual system the worst-case should be roughly twice the
UP worst-case latency: if both CPUs are trying to execute the same
critical section. But there is no nondeterministic spinning anymore. You
obviously cannot get better than that (other than increasing parallelism
and reducing the number of critical sections).

(not all cases are covered - if some code is not using spinlocks or
rwlocks but some static flag and is emulating spinlocks then it could
spin uncontrollably. Those places show up in the tracer quite easily - i
had to fix only one so far.)

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  2:11                                           ` Lee Revell
@ 2004-09-15 11:17                                             ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-15 11:17 UTC (permalink / raw)
  To: Lee Revell
  Cc: William Lee Irwin III, Andrea Arcangeli, Nick Piggin,
	Andrew Morton, linux-kernel


* Lee Revell <rlrevell@joe-job.com> wrote:

> Ingo, if you want to send me a patch set without the more
> controversial changes, I can compare the performance.  A diff against
> the latest VP patch would be OK.

just undo the tty.c changes. (i.e. manually remove the tty.c chunks from
the -S0 patch and apply the result.)

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  2:36                                                       ` William Lee Irwin III
  2004-09-15  2:59                                                         ` Lee Revell
@ 2004-09-15 13:36                                                         ` Hans Reiser
  2004-09-15 20:40                                                           ` William Lee Irwin III
  1 sibling, 1 reply; 76+ messages in thread
From: Hans Reiser @ 2004-09-15 13:36 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Lee Revell, Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

William Lee Irwin III wrote:

>On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
>  
>
>>>I've not heard a peep about anyone trying to fix this. It should be
>>>killed off along with the rest, of course, but like I said before, it's
>>>the messiest, dirtiest, and ugliest code that's left to go through,
>>>which is why it's been left for last. e.g. driver ->ioctl() methods.
>>>      
>>>
>
>On Tue, Sep 14, 2004 at 10:00:44PM -0400, Lee Revell wrote:
>  
>
>>Andrew tried to fix this a few times in 2.4 and it broke the FS in
>>subtle ways.  Don't have an archive link but the message is
>><20040712163141.31ef1ad6.akpm@osdl.org>.  I asked Hans directly about it
>>and he said "balancing makes it hard, the fix is reiser4", see
>><411925FA.2000303@namesys.com>.
>>    
>>
>
>I have neither of these locally. I suspect someone needs to care enough
>about the code for anything to happen soon. I suppose there are things
>that probably weren't tried, e.g. auditing to make sure dependencies on
>external synchronization are taken care of, removing implicit sleeping
>with the BKL held, then punt a private recursive spinlock in reiser3's
>direction. Not sure what went on, or if I want to get involved in this
>particular case.
>
>
>-- wli
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>  
>
Why bother?  It is V3, it should be left undisturbed except for 
bugfixes.  Please, spend your efforts on reducing V4 latency and 
measuring whether it fails to scale to multiple processors.  That would 
be very useful to me if someone helped with that.  V4 has the 
architecture for doing such things well, but there are always accidental 
bottlenecks that testing can discover, and I am sure we will have a 
handful of things preventing us from scaling well that are not hard to 
fix.  It would be nice to fix those......

The hard stuff for scalability, the locking of the tree, we did that.  
We just haven't tested and evaluated and refined like we need to in V4.



Hans

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15 13:36                                                         ` Hans Reiser
@ 2004-09-15 20:40                                                           ` William Lee Irwin III
  0 siblings, 0 replies; 76+ messages in thread
From: William Lee Irwin III @ 2004-09-15 20:40 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Lee Revell, Alan Cox, Robert Love, Andrea Arcangeli, Nick Piggin,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Wed, Sep 15, 2004 at 06:36:24AM -0700, Hans Reiser wrote:
> Why bother?  It is V3, it should be left undisturbed except for 
> bugfixes.  Please, spend your efforts on reducing V4 latency and 
> measuring whether it fails to scale to multiple processors.  That would 
> be very useful to me if someone helped with that.  V4 has the 
> architecture for doing such things well, but there are always accidental 
> bottlenecks that testing can discover, and I am sure we will have a 
> handful of things preventing us from scaling well that are not hard to 
> fix.  It would be nice to fix those......
> The hard stuff for scalability, the locking of the tree, we did that.  
> We just haven't tested and evaluated and refined like we need to in V4.

It's not for scalability; it's for cleaning up the users, which are
universally buggy. My suggestion above would not, in fact, make reiser3
any more scalable; it would merely isolate the locking semantics it
couldn't live without into its own internals.


-- wli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] fix keventd execution dependency
  2004-09-14 11:25                         ` [patch] fix keventd execution dependency Ingo Molnar
@ 2004-09-15 22:18                           ` Rusty Russell
  0 siblings, 0 replies; 76+ messages in thread
From: Rusty Russell @ 2004-09-15 22:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, lkml - Kernel Mailing List

On Tue, 2004-09-14 at 21:25, Ingo Molnar wrote:
> We dont want to execute off keventd since it might hold a semaphore our
> callers hold too. This can happen when kthread_create() is called from
> within keventd. This happened due to the IRQ threading patches but it
> could happen with other code too.

Ackl, thanks Ingo, looks fine.

Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-15  8:43                                         ` Ingo Molnar
  2004-09-15 10:09                                           ` William Lee Irwin III
@ 2004-09-16  1:03                                           ` Nick Piggin
  2004-09-16  6:14                                             ` Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Nick Piggin @ 2004-09-16  1:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>OK.
>>
>>Alternatively, I'd say tell everyone who wants really low latency to
>>enable CONFIG_PREEMPT, which automatically gives the minimum possible
>>preempt latency, delimited (and defined) by critical sections, instead
>>of the more ad-hoc "sprinkling" ;)
> 
> 
> it's not ad-hoc. These are the 10 remaining points for which there is no
> natural might_sleep() point nearby (according to measurements). That's
> why i called them 'complementary'. They cause zero problems for the
> normal kernel (we already have another 70 cond_resched() points), but
> they _are_ the ones needed in addition if might_sleep() also does
> cond_resched().
> 

No, I mean putting cond_resched in might_sleep is ad hoc. But that
doesn't mean it doesn't work - it obviously does ;)

> the 'reliability' of latency break-up depends on the basic preemption
> model. Believe me, even with CONFIG_PREEMPT there were a boatload of
> critical sections that had insanely long latencies that nobody fixed
> until the VP patchset came along. Without CONFIG_PREEMPT the number of
> possibly latency-paths increases, but the situation is the same as with
> CONFIG_PREEMPT: you need tools, people that test stuff and lots of
> manual work to break them up reliably. You will never be 'done' but you
> can do a reasonably good job for workloads that people care about.
> 

All the other stuff in your patches are obviously very important with
or without "full preempt", and make up the bulk of the *hard* work.
You have no arguments from me about that.

> the 'final' preemption model [for hard-RT purposes] that i believe will
> make it into the Linux kernel one nice day is total preemptability of
> everything but the core preemption code (i.e. the scheduler and
> interrupt controllers). _That_ might be something that has provable
> latencies. Note that such a 'total preemption' model has prerequisites
> too, like the deterministic execution of hardirqs/softirqs.
> 
> note that the current lock-break-up activities still make alot of sense
> even under the total-preemption model: it decreases the latency of
> kernel-using hard-RT applications. (raw total preemption only guarantees
> quick scheduling of the hard-RT task - it doesnt guarantee that the task
> can complete any useful kernel/syscall work.)
> 
> since we already see at least 4 different viable preemption models
> placed on different points in the 'latency reliability' spectrum, it
> makes little sense to settle for any of them. So i'm aiming to keep the
> core code flexible to have them all without much fuss, and usage will
> decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
> CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
> CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
> altogether. We cannot know at this point, it all depends on how usage
> (and consequently, hardware) evolves.
> 

OK I'll leave it at that. We'll see what happens.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels
  2004-09-16  1:03                                           ` Nick Piggin
@ 2004-09-16  6:14                                             ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2004-09-16  6:14 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> No, I mean putting cond_resched in might_sleep is ad hoc. But that
> doesn't mean it doesn't work - it obviously does ;)

ah, ok, i understand your point now. No, i'm not submitting the
CONFIG_PREEMPT_VOLUNTARY .config switch (and the kernel.h 2-liner) at
this point. All the latency breakers so far will mainly benefit
CONFIG_PREEMPT - which is also the primary preempt model used by
bleeding-edge testers.

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2004-09-16  6:13 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-14  9:15 [patch] preempt-cleanup.patch, 2.6.9-rc2 Ingo Molnar
2004-09-14  9:34 ` [printk] make console_conditional_schedule() __sched and use cond_resched() William Lee Irwin III
2004-09-14  9:38 ` [patch] preempt-lock-need-resched.patch, 2.6.9-rc2 Ingo Molnar
2004-09-14  9:51   ` [patch] sched: add cond_resched_softirq() Ingo Molnar
2004-09-14  9:57     ` [patch] sched: fix latency in random driver Ingo Molnar
2004-09-14 10:06       ` [patch] sched, ext3: fix scheduling latencies in ext3 Ingo Molnar
2004-09-14 10:13         ` [patch] sched, vfs: fix scheduling latencies in invalidate_inodes() Ingo Molnar
2004-09-14 10:19         ` [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent() Ingo Molnar
2004-09-14 10:25           ` [patch] sched, net: fix scheduling latencies in netstat Ingo Molnar
2004-09-14 10:44             ` [patch] sched, net: fix scheduling latencies in __release_sock Ingo Molnar
2004-09-14 10:50               ` [patch] sched, mm: fix scheduling latencies in copy_page_range() Ingo Molnar
2004-09-14 10:56                 ` [patch] sched, mm: fix scheduling latencies in unmap_vmas() Ingo Molnar
2004-09-14 10:59                 ` [patch] sched, mm: fix scheduling latencies in get_user_pages() Ingo Molnar
2004-09-14 11:02                   ` [patch] sched, mm: fix scheduling latencies in filemap_sync() Ingo Molnar
2004-09-14 11:06                     ` [patch] sched, tty: fix scheduling latencies in tty_io.c Ingo Molnar
2004-09-14 10:53                       ` Alan Cox
2004-09-14 12:00                         ` Ingo Molnar
2004-09-14 11:18                           ` Alan Cox
2004-09-14 12:27                             ` Ingo Molnar
2004-09-14 12:11                               ` Alan Cox
2004-09-14 11:08                       ` [patch] sched, pty: fix scheduling latencies in pty.c Ingo Molnar
2004-09-14 11:12                         ` [patch] might_sleep() additions to fs-writeback.c Ingo Molnar
2004-09-14 11:25                         ` [patch] fix keventd execution dependency Ingo Molnar
2004-09-15 22:18                           ` Rusty Russell
2004-09-14 11:28                       ` [patch] sched: fix scheduling latencies in mttr.c Ingo Molnar
2004-09-14 11:32                         ` [patch] sched: fix scheduling latencies in vgacon.c Ingo Molnar
2004-09-14 11:35                         ` [patch] sched: fix scheduling latencies in NTFS mount Ingo Molnar
2004-09-14 13:31                           ` Anton Altaparmakov
2004-09-14 11:42                         ` [patch] sched: fix scheduling latencies for !PREEMPT kernels Ingo Molnar
2004-09-14 12:55                           ` Nick Piggin
2004-09-14 13:22                             ` Ingo Molnar
2004-09-14 13:33                               ` Nick Piggin
2004-09-14 14:09                                 ` Andrea Arcangeli
2004-09-14 14:28                                   ` Nick Piggin
2004-09-14 15:03                                     ` Andrea Arcangeli
2004-09-14 18:05                                       ` Robert Love
2004-09-14 18:52                                         ` William Lee Irwin III
2004-09-14 19:02                                           ` Robert Love
2004-09-14 19:21                                             ` William Lee Irwin III
2004-09-14 19:19                                               ` Alan Cox
2004-09-15  0:22                                                 ` Lee Revell
2004-09-15  1:46                                                   ` William Lee Irwin III
2004-09-15  2:00                                                     ` Lee Revell
2004-09-15  2:36                                                       ` William Lee Irwin III
2004-09-15  2:59                                                         ` Lee Revell
2004-09-15 13:36                                                         ` Hans Reiser
2004-09-15 20:40                                                           ` William Lee Irwin III
2004-09-15  1:18                                                 ` William Lee Irwin III
2004-09-14 19:26                                               ` Robert Love
2004-09-14 21:06                                               ` William Lee Irwin III
2004-09-14 19:25                                             ` Andrea Arcangeli
2004-09-14 19:29                                               ` Robert Love
2004-09-14 19:34                                               ` William Lee Irwin III
2004-09-15  1:02                                       ` Lee Revell
2004-09-15  1:39                                         ` William Lee Irwin III
2004-09-15  2:11                                           ` Lee Revell
2004-09-15 11:17                                             ` Ingo Molnar
2004-09-15  9:56                                           ` Ingo Molnar
2004-09-15  9:57                                             ` William Lee Irwin III
2004-09-15 10:12                                               ` Ingo Molnar
2004-09-14 16:31                                   ` William Lee Irwin III
2004-09-14 16:39                                     ` Andrea Arcangeli
2004-09-14 14:54                                 ` Ingo Molnar
2004-09-14 22:55                                   ` Nick Piggin
2004-09-15  6:19                                     ` Ingo Molnar
2004-09-15  8:23                                       ` Nick Piggin
2004-09-15  8:43                                         ` Ingo Molnar
2004-09-15 10:09                                           ` William Lee Irwin III
2004-09-15 10:21                                             ` Ingo Molnar
2004-09-16  1:03                                           ` Nick Piggin
2004-09-16  6:14                                             ` Ingo Molnar
2004-09-15  0:35                                   ` Lee Revell
2004-09-14 13:25                         ` [patch] sched: fix scheduling latencies in mtrr.c Ingo Molnar
2004-09-14 13:15                           ` Alan Cox
2004-09-14 15:00                             ` Ingo Molnar
2004-09-14 18:22                           ` Zwane Mwaikambo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).