linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.4.10pre7aa1
@ 2001-09-12 11:04 Dipankar Sarma
  2001-09-12 14:03 ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-12 11:04 UTC (permalink / raw)
  To: rusty; +Cc: Andrea Arcangeli, linux-kernel, Paul Mckenney

In article <20010912182440.3975719b.rusty@rustcorp.com.au> you wrote:
> On Mon, 10 Sep 2001 17:54:17 +0200
> Andrea Arcangeli <andrea@suse.de> wrote:
>> Only in 2.4.10pre7aa1: 00_rcu-1
>> 
>> 	wait_for_rcu and call_rcu implementation (from IBM). I did some
>> 	modifications with respect to the original version from IBM.
>> 	In particular I dropped the vmalloc_rcu/kmalloc_rcu, the
>> 	rcu_head must always be allocated in the data structures, it has
>> 	to be a field of a class, rather than hiding it in the allocation
>> 	and playing dirty and risky with casts on a bigger allocation.

> Hi Andrea, 

> 	Like the kernel threads approach, but AFAICT it won't work for the case of two CPUs running wait_for_rcu at the same time (on a 4-way or above).

The patch I submitted to Andrea had logic to make sure that
two CPUs don't execute wait_for_rcu() at the same time.
Somehow it seems to have got lost in Andrea's modifications.

I will look at that and submit a new patch to Andrea, if necessary.

As for wrappers, I am agnostic. However, I think sooner or later
people will start asking for them, if we go by our past experience.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-12 11:04 2.4.10pre7aa1 Dipankar Sarma
@ 2001-09-12 14:03 ` Andrea Arcangeli
  2001-09-12 14:42   ` 2.4.10pre7aa1 Dipankar Sarma
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-12 14:03 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: rusty, linux-kernel, Paul Mckenney

On Wed, Sep 12, 2001 at 04:34:26PM +0530, Dipankar Sarma wrote:
> In article <20010912182440.3975719b.rusty@rustcorp.com.au> you wrote:
> > On Mon, 10 Sep 2001 17:54:17 +0200
> > Andrea Arcangeli <andrea@suse.de> wrote:
> >> Only in 2.4.10pre7aa1: 00_rcu-1
> >> 
> >> 	wait_for_rcu and call_rcu implementation (from IBM). I did some
> >> 	modifications with respect to the original version from IBM.
> >> 	In particular I dropped the vmalloc_rcu/kmalloc_rcu, the
> >> 	rcu_head must always be allocated in the data structures, it has
> >> 	to be a field of a class, rather than hiding it in the allocation
> >> 	and playing dirty and risky with casts on a bigger allocation.
> 
> > Hi Andrea, 
> 
> > 	Like the kernel threads approach, but AFAICT it won't work for the case of two CPUs running wait_for_rcu at the same time (on a 4-way or above).

Good catch!

As for your alternate approch patch I've a few comments:

1) there maybe an RT tasks running, shrinking ram without reschedules in
between (ignore the current page_alloc that does a bogus schedule before
starting memory balancing), so schedule may never run and the RT task
can run oom, so you should at least set need_resched of the interesting
cpus + send the IPI reschedule before returning from call_rcu to avoid
to be starved

2) the real design issue here if we should pay 8k per-cpu and zero cpu
cost for the fast paths, or if we want to pay with a new branch in
schedule() fast path, I preferred the krcud approch for that reason:

+       if (atomic_read(&rcu_pending))
+               goto rcu_process;
+rcu_process_back:

> The patch I submitted to Andrea had logic to make sure that
> two CPUs don't execute wait_for_rcu() at the same time.
> Somehow it seems to have got lost in Andrea's modifications.

I think the bug was in your original patch too, I'm pretty sure I didn't
broke anything while changing the API a little.

> I will look at that and submit a new patch to Andrea, if necessary.

I prefer to allow all cpus to enter wait_for_rcu at the same time rather
than putting a serializing semaphore around wait_for_rcu (it should
scale pretty well if we don't serialize around wait_for_rcu).

The way I prefer to fix it is just to replace the rcu_sema with a per-cpu
semaphore and have wait_for_rcu running down on such per-cpu semaphore
of the interesting cpu, should be a few liner patch (we have space
free for it in the per-cpu rcu_data cacheline).

> As for wrappers, I am agnostic. However, I think sooner or later
> people will start asking for them, if we go by our past experience.

Maybe I'm missing something but what's the problem in allocating the
struct rcu_head in the data structure? I don't think it's not much more
complicated than the cast magics, and in general I prefer to avoid casts
on larger buffers to get advantage of the C compile time sanity checking ;).

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-12 14:03 ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-12 14:42   ` Dipankar Sarma
  2001-09-12 14:53     ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-12 14:42 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: rusty, linux-kernel, Paul Mckenney

On Wed, Sep 12, 2001 at 04:03:13PM +0200, Andrea Arcangeli wrote:
> > > 	Like the kernel threads approach, but AFAICT it won't work for the case of two CPUs running wait_for_rcu at the same time (on a 4-way or above).
> 
> Good catch!

It barfs on our 4way with the FD management patch and chat benchmark :-)

> > The patch I submitted to Andrea had logic to make sure that
> > two CPUs don't execute wait_for_rcu() at the same time.
> > Somehow it seems to have got lost in Andrea's modifications.
> 
> I think the bug was in your original patch too, I'm pretty sure I didn't
> broke anything while changing the API a little.

You changed the way I maintained the wait_list and current_list.
The basic logic was that new callbacks are always added to the
wait list. The wait_for_rcu() is started only if current_list
was empty and we just moved the wait_list to current_list. The
key step was moving the wait_list to current_list *after* doing
a wait_for_rcu(). This prevents another CPU from doing a wait_for_rcu().
Either that or I missed something big time :-)


> 
> > I will look at that and submit a new patch to Andrea, if necessary.
> 
> I prefer to allow all cpus to enter wait_for_rcu at the same time rather
> than putting a serializing semaphore around wait_for_rcu (it should
> scale pretty well if we don't serialize around wait_for_rcu).

Serializing is not what I want to do either. Instead the other
CPUs just add to the wait_list and return if there is a wait_for_rcu()
going on. What we have seen is that relatively larger batches around
a single recurring wait_for_rcu() will do reasonably well in terms
of performance.

> 
> The way I prefer to fix it is just to replace the rcu_sema with a per-cpu
> semaphore and have wait_for_rcu running down on such per-cpu semaphore
> of the interesting cpu, should be a few liner patch (we have space
> free for it in the per-cpu rcu_data cacheline).

It should be possible to do this. However, I am not sure we would
really benefit significantly from allowing multiple wait_for_rcu()s
to run parallelly. I would much rather see per-CPU lists implemented
and avoid keventd eventually.


> 
> > As for wrappers, I am agnostic. However, I think sooner or later
> > people will start asking for them, if we go by our past experience.
> 
> Maybe I'm missing something but what's the problem in allocating the
> struct rcu_head in the data structure? I don't think it's not much more
> complicated than the cast magics, and in general I prefer to avoid casts
> on larger buffers to get advantage of the C compile time sanity checking ;).

One disadvantage of the wrappers is that we would be wasting most of
the L1 cache line for rcu_head and that could be relatively significant for 
a small frequently allocated structure. And no, I don't see any problem asking
people to allocate the rcu_head in the data structure.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-12 14:42   ` 2.4.10pre7aa1 Dipankar Sarma
@ 2001-09-12 14:53     ` Andrea Arcangeli
  2001-09-16 12:23       ` 2.4.10pre7aa1 Rusty Russell
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-12 14:53 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: rusty, linux-kernel, Paul Mckenney

On Wed, Sep 12, 2001 at 08:12:29PM +0530, Dipankar Sarma wrote:
> You changed the way I maintained the wait_list and current_list.
> The basic logic was that new callbacks are always added to the
> wait list. The wait_for_rcu() is started only if current_list
> was empty and we just moved the wait_list to current_list. The
> key step was moving the wait_list to current_list *after* doing
> a wait_for_rcu(). This prevents another CPU from doing a wait_for_rcu().
> Either that or I missed something big time :-)

Really when Rusty said "multiple cpus calling wait_for_rcu" I was thinking 
at common code calling wait_for_rcu directly (in such a case you would
have a problem too), I thought it was exported as well as call_rcu.

If you mean races with call_rcu they cannot be explained by wait_for_rcu
called by different cpus also with my approch because there's only one
keventd so only one wait_for_rcu can run at once with my current code
(obviously, only keventd will ever recall wait_for_rcu).

The problem should be elsewhere.

Also we still don't address the case of keventd being starved by RT
tasks. Maybe we should just make keventd RT, but then it would hang if
somebody reinserts itself for a long time :(. Maybe Russel's approch is
the cleaner after all, it just adds a branch in schedule fast path but
(once fixed properly with the IPI and need_resched and dropping the
unused irq checks that we don't want anyways to avoid even further
slowdown of the slow paths) then the other issues goes away as well as
the memory consumation.

> One disadvantage of the wrappers is that we would be wasting most of
> the L1 cache line for rcu_head and that could be relatively significant for 
> a small frequently allocated structure. And no, I don't see any problem asking
> people to allocate the rcu_head in the data structure.

Ok. As usual people should care to order the fields in cacheline
optimized manner, so for example they should care to put the rcu_head at
the very end if they want to reserve the cacheline for the "hot" fields.
This can infact save a cacheline if the data structure is very small.
It's something we cannot choose in rcu_kmalloc etc... only the user can.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-12 14:53     ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-16 12:23       ` Rusty Russell
  0 siblings, 0 replies; 32+ messages in thread
From: Rusty Russell @ 2001-09-16 12:23 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Dipankar Sarma, linux-kernel, Paul Mckenney

In message <20010912165335.F695@athlon.random> you write:
> somebody reinserts itself for a long time :(. Maybe Russel's approch is
> the cleaner after all, it just adds a branch in schedule fast path but
> (once fixed properly with the IPI and need_resched and dropping the
> unused irq checks that we don't want anyways to avoid even further
> slowdown of the slow paths) then the other issues goes away as well as
> the memory consumation.

IPI and need_resched?  That's incredibly heavy: why do you want to add
this?  Real-time tasks still call schedule() every so often, so I
don't understand your aim here...

> Ok. As usual people should care to order the fields in cacheline
> optimized manner, so for example they should care to put the rcu_head at
> the very end if they want to reserve the cacheline for the "hot" fields.

Agreed.  Added a new field to rcu_head, and hence a new arg to
call_rcu(), so now you do things like:

	call_rcu(&myfoo->rcu, kfree, &myfoo);

Other changes:

      Added synchronize_kernel(), which is useful for module unload.
      Architectures which don't enter schedule() in idle loop check rcu.
      Optimizations for uniprocessor.
      RCU callbacks are allowed to sleep.

Cheers,
Rusty.
--
Premature optmztion is rt of all evl. --DK

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/include/linux/rcupdate.h working-2.4.9-rcu/include/linux/rcupdate.h
--- linux-2.4.9-official/include/linux/rcupdate.h	Thu Jan  1 10:00:00 1970
+++ working-2.4.9-rcu/include/linux/rcupdate.h	Sun Sep 16 22:14:44 2001
@@ -0,0 +1,51 @@
+#ifndef _LINUX_RCUPDATE_H
+#define _LINUX_RCUPDATE_H
+/* Read-Copy-Update For Linux. */
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_SMP
+struct rcu_head
+{
+	struct rcu_head *next;
+	void (*func)(void *data);
+	void *data;
+};
+
+/* Queues future request may sleep (if caller can sleep: on UP is
+   called immediately). */
+void call_rcu(struct rcu_head *head, void (*func)(void *data), void *data);
+
+/* Count of pending requests: for optimization in schedule() */
+extern atomic_t rcu_pending;
+static inline int is_rcu_pending(void)
+{
+	return atomic_read(&num_rcu_pending) != 0;
+}
+
+/* Wait for every CPU to have moved on.  Sleeps. */
+void synchronize_kernel(void);
+
+#else /* !SMP */
+
+/* Remember the good old days when we didn't have to worry about more
+   than one processor? */
+struct rcu_head { };
+
+#define is_rcu_pending() 0
+
+static inline void call_rcu(struct rcu_head *head,
+			    void (*func)(void *data),
+			    void *data)
+{
+	func(data);
+}
+
+static inline void synchronize_kernel(void)
+{
+}
+#endif /*CONFIG_SMP*/
+
+/* Called by schedule() when batch reference count drops to zero. */
+void rcu_batch_done(void);
+#endif
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/kernel/Makefile working-2.4.9-rcu/kernel/Makefile
--- linux-2.4.9-official/kernel/Makefile	Sat Dec 30 09:07:24 2000
+++ working-2.4.9-rcu/kernel/Makefile	Wed Sep 12 18:19:24 2001
@@ -9,12 +9,12 @@
 
 O_TARGET := kernel.o
 
-export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o
+export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o rcupdate.o
 
 obj-y     = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
 	    module.o exit.o itimer.o info.o time.o softirq.o resource.o \
 	    sysctl.o acct.o capability.o ptrace.o timer.o user.o \
-	    signal.o sys.o kmod.o context.o
+	    signal.o sys.o kmod.o context.o rcupdate.o
 
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += ksyms.o
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/kernel/rcupdate.c working-2.4.9-rcu/kernel/rcupdate.c
--- linux-2.4.9-official/kernel/rcupdate.c	Thu Jan  1 10:00:00 1970
+++ working-2.4.9-rcu/kernel/rcupdate.c	Sun Sep 16 17:31:13 2001
@@ -0,0 +1,99 @@
+/* Read-Copy-Update For Linux.
+   (C)2001 Rusty Russell.
+
+      Concept stolen from original Read Copy Update paper:
+
+   http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html and
+   http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
+*/
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+#include <linux/threads.h>
+#include <asm/system.h>
+#include <linux/interrupt.h>
+
+#ifdef CONFIG_SMP
+/* Count of pending requests: for optimization in schedule() */
+atomic_t rcu_pending = ATOMIC_INIT(0);
+
+/* Two batches per CPU : one (pending) is an internal queue of waiting
+   requests, being prepended to as new requests come in.  The other
+   (rcu_waiting) is waiting completion of schedule()s on all CPUs. */
+struct rcu_batch
+{
+	/* Two sets of queues: one queueing, one waiting quiescent finish */
+	int queueing;
+	/* Three queues: hard interrupt, soft interrupt, neither */
+	struct rcu_head *head[2][3];
+} __attribute__((__aligned__(SMP_CACHE_BYTES)));
+
+static struct rcu_batch rcu_batch[NR_CPUS];
+
+void call_rcu(struct rcu_head *head, void (*func)(void *data), void *data)
+{
+	unsigned cpu = smp_processor_id();
+	unsigned state;
+	struct rcu_head **headp;
+
+	head->func = func;
+	head->data = data;
+	if (in_interrupt()) {
+		if (in_irq()) state = 2;
+		else state = 1;
+	} else state = 0;
+
+	/* Figure out which queue we're on. */
+	headp = &rcu_batch[cpu].head[rcu_batch[cpu].queueing][state];
+
+	atomic_inc(&rcu_pending);
+	/* Prepend to this CPU's list: no locks needed. */
+	head->next = *headp;
+	*headp = head;
+}
+
+/* Calls every callback in the waiting rcu batch. */
+void rcu_batch_done(void)
+{
+	struct rcu_head *i, *next;
+	struct rcu_batch *mybatch;
+	unsigned int q;
+
+	mybatch = &rcu_batch[smp_processor_id()];
+	/* Call callbacks: probably delete themselves, may schedule. */
+	for (q = 0; q < 3; q++) {
+		for (i = mybatch->head[!mybatch->queueing][q]; i; i = next) {
+			next = i->next;
+			i->func(i->data);
+			atomic_dec(&rcu_pending);
+		}
+		mybatch->head[!mybatch->queueing][q] = NULL;
+	}
+
+	/* Start queueing on this batch. */
+	mybatch->queueing = !mybatch->queueing;
+}
+
+/* Because of FASTCALL declaration of complete, we use this wrapper */
+static void wakeme_after_rcu(void *completion)
+{
+	complete(completion);
+}
+
+void synchronize_kernel(void)
+{
+	struct rcu_head rcu;
+	struct completion completion;
+
+	/* Will wake me after RCU finished */
+	call_rcu(&sync.head, wakeme_after_rcu, &completion);
+
+	/* Wait for it */
+	wait_for_completion(&completion);
+}
+
+EXPORT_SYMBOL(call_rcu);
+EXPORT_SYMBOL(synchronize_kernel);
+#endif /*CONFIG_SMP*/
+
+/* Uniprocessor doesn't need an rcu_batch_done, since that gets
+   dead-code-eliminated in schedule() */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/kernel/sched.c working-2.4.9-rcu/kernel/sched.c
--- linux-2.4.9-official/kernel/sched.c	Sat Jul 21 08:12:21 2001
+++ working-2.4.9-rcu/kernel/sched.c	Sun Sep 16 17:11:07 2001
@@ -26,6 +26,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/completion.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -99,12 +100,15 @@
 	struct schedule_data {
 		struct task_struct * curr;
 		cycles_t last_schedule;
+		int ring_count, finished_count;
 	} schedule_data;
 	char __pad [SMP_CACHE_BYTES];
-} aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0}}};
+} aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0,0,0}}};
 
 #define cpu_curr(cpu) aligned_data[(cpu)].schedule_data.curr
 #define last_schedule(cpu) aligned_data[(cpu)].schedule_data.last_schedule
+#define ring_count(cpu) aligned_data[(cpu)].schedule_data.ring_count
+#define finished_count(cpu) aligned_data[(cpu)].schedule_data.finished_count
 
 struct kernel_stat kstat;
 
@@ -544,6 +548,10 @@
 
 	release_kernel_lock(prev, this_cpu);
 
+	if (is_rcu_pending())
+		goto rcu_process;
+rcu_process_back:
+
 	/*
 	 * 'sched_data' is protected by the fact that we can run
 	 * only one process per CPU.
@@ -693,6 +701,22 @@
 	c = goodness(prev, this_cpu, prev->active_mm);
 	next = prev;
 	goto still_running_back;
+
+rcu_process:
+	/* Avoid cache line effects if value hasn't changed */
+	c = ring_count((this_cpu + 1) % smp_num_cpus) + 1;
+	if (c != ring_count(this_cpu)) {
+		/* Do subtraction to avoid int wrap corner case */
+		if (c - finished_count(this_cpu) >= 0) {
+			/* Avoid reentry: temporarily set finish_count
+                           far in the future */
+			finished_count(this_cpu) = c + INT_MAX;
+			rcu_batch_done();
+			finished_count(this_cpu) = c + smp_num_cpus;
+		}
+		ring_count(this_cpu) = c;
+	}
+	goto rcu_process_back;
 
 move_rr_last:
 	if (!prev->counter) {
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/alpha/kernel/process.c working-2.4.9-rcu/arch/alpha/kernel/process.c
--- linux-2.4.9-official/arch/alpha/kernel/process.c	Wed Jul  4 11:15:07 2001
+++ working-2.4.9-rcu/arch/alpha/kernel/process.c	Sun Sep 16 18:26:35 2001
@@ -30,6 +30,7 @@
 #include <linux/reboot.h>
 #include <linux/tty.h>
 #include <linux/console.h>
+#include <linux/rcupdate.h>
 
 #include <asm/reg.h>
 #include <asm/uaccess.h>
@@ -86,7 +87,8 @@
 		   get into the scheduler unnecessarily.  */
 		long oldval = xchg(&current->need_resched, -1UL);
 		if (!oldval)
-			while (current->need_resched < 0);
+			while (current->need_resched < 0
+				&& !is_rcu_pending());
 		schedule();
 		check_pgt_cache();
 	}
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/arm/kernel/process.c working-2.4.9-rcu/arch/arm/kernel/process.c
--- linux-2.4.9-official/arch/arm/kernel/process.c	Fri Aug 17 05:12:35 2001
+++ working-2.4.9-rcu/arch/arm/kernel/process.c	Sun Sep 16 21:55:40 2001
@@ -23,6 +23,7 @@
 #include <linux/reboot.h>
 #include <linux/interrupt.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #include <asm/system.h>
 #include <asm/io.h>
@@ -92,7 +93,7 @@
 		if (!idle)
 			idle = arch_idle;
 		leds_event(led_idle_start);
-		while (!current->need_resched)
+		while (!current->need_resched && !is_rcu_pending())
 			idle();
 		leds_event(led_idle_end);
 		schedule();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/i386/kernel/process.c working-2.4.9-rcu/arch/i386/kernel/process.c
--- linux-2.4.9-official/arch/i386/kernel/process.c	Sat Aug 11 15:12:23 2001
+++ working-2.4.9-rcu/arch/i386/kernel/process.c	Sun Sep 16 18:26:57 2001
@@ -33,6 +33,7 @@
 #include <linux/reboot.h>
 #include <linux/init.h>
 #include <linux/mc146818rtc.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -131,7 +132,7 @@
 		void (*idle)(void) = pm_idle;
 		if (!idle)
 			idle = default_idle;
-		while (!current->need_resched)
+		while (!current->need_resched && !is_rcu_pending())
 			idle();
 		schedule();
 		check_pgt_cache();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/ia64/kernel/process.c working-2.4.9-rcu/arch/ia64/kernel/process.c
--- linux-2.4.9-official/arch/ia64/kernel/process.c	Sat Aug 11 15:12:23 2001
+++ working-2.4.9-rcu/arch/ia64/kernel/process.c	Sun Sep 16 21:56:37 2001
@@ -17,6 +17,7 @@
 #include <linux/smp_lock.h>
 #include <linux/stddef.h>
 #include <linux/unistd.h>
+#include <linux/rcupdate.h>
 
 #include <asm/delay.h>
 #include <asm/efi.h>
@@ -121,7 +122,7 @@
 		if (!current->need_resched)
 			min_xtp();
 #endif
-		while (!current->need_resched)
+		while (!current->need_resched && !is_rcu_pending())
 			continue;
 #ifdef CONFIG_SMP
 		normal_xtp();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/mips/kernel/process.c working-2.4.9-rcu/arch/mips/kernel/process.c
--- linux-2.4.9-official/arch/mips/kernel/process.c	Wed Jul  4 11:15:08 2001
+++ working-2.4.9-rcu/arch/mips/kernel/process.c	Sun Sep 16 18:27:32 2001
@@ -19,6 +19,7 @@
 #include <linux/sys.h>
 #include <linux/user.h>
 #include <linux/a.out.h>
+#include <linux/rcupdate.h>
 
 #include <asm/bootinfo.h>
 #include <asm/cpu.h>
@@ -40,7 +41,7 @@
 	init_idle();
 
 	while (1) {
-		while (!current->need_resched)
+		while (!current->need_resched && !is_rcu_pending())
 			if (cpu_wait)
 				(*cpu_wait)();
 		schedule();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/mips64/kernel/process.c working-2.4.9-rcu/arch/mips64/kernel/process.c
--- linux-2.4.9-official/arch/mips64/kernel/process.c	Thu Feb 22 14:24:54 2001
+++ working-2.4.9-rcu/arch/mips64/kernel/process.c	Sun Sep 16 18:27:42 2001
@@ -18,6 +18,7 @@
 #include <linux/sys.h>
 #include <linux/user.h>
 #include <linux/a.out.h>
+#include <linux/rcupdate.h>
 
 #include <asm/bootinfo.h>
 #include <asm/pgtable.h>
@@ -36,7 +37,7 @@
 	current->nice = 20;
 	current->counter = -100;
 	while (1) {
-		while (!current->need_resched)
+		while (!current->need_resched && !is_rcu_pending())
 			if (wait_available)
 				__asm__("wait");
 		schedule();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/parisc/kernel/process.c working-2.4.9-rcu/arch/parisc/kernel/process.c
--- linux-2.4.9-official/arch/parisc/kernel/process.c	Thu Feb 22 14:24:55 2001
+++ working-2.4.9-rcu/arch/parisc/kernel/process.c	Sun Sep 16 18:27:51 2001
@@ -26,6 +26,7 @@
 #include <linux/init.h>
 #include <linux/version.h>
 #include <linux/elf.h>
+#include <linux/rcupdate.h>
 
 #include <asm/machdep.h>
 #include <asm/offset.h>
@@ -74,7 +75,7 @@
 	current->counter = -100;
 
 	while (1) {
-		while (!current->need_resched) {
+		while (!current->need_resched && !is_rcu_pending()) {
 		}
 		schedule();
 		check_pgt_cache();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/ppc/kernel/idle.c working-2.4.9-rcu/arch/ppc/kernel/idle.c
--- linux-2.4.9-official/arch/ppc/kernel/idle.c	Mon May 28 12:42:20 2001
+++ working-2.4.9-rcu/arch/ppc/kernel/idle.c	Sun Sep 16 18:19:32 2001
@@ -23,6 +23,7 @@
 #include <linux/unistd.h>
 #include <linux/ptrace.h>
 #include <linux/slab.h>
+#include <linux/rcupdate.h>
 
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
@@ -74,7 +75,7 @@
 		if (do_power_save && !current->need_resched)
 			power_save();
 
-		if (current->need_resched) {
+		if (current->need_resched || is_rcu_pending()) {
 			schedule();
 			check_pgt_cache();
 		}
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/s390/kernel/process.c working-2.4.9-rcu/arch/s390/kernel/process.c
--- linux-2.4.9-official/arch/s390/kernel/process.c	Fri Aug 17 05:12:35 2001
+++ working-2.4.9-rcu/arch/s390/kernel/process.c	Sun Sep 16 18:20:57 2001
@@ -36,6 +36,7 @@
 #include <linux/delay.h>
 #include <linux/reboot.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -63,7 +64,7 @@
 	wait_psw.mask = _WAIT_PSW_MASK;
 	wait_psw.addr = (unsigned long) &&idle_wakeup | 0x80000000L;
 	while(1) {
-                if (current->need_resched) {
+                if (current->need_resched || is_rcu_pending()) {
                         schedule();
                         check_pgt_cache();
                         continue;
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/s390x/kernel/process.c working-2.4.9-rcu/arch/s390x/kernel/process.c
--- linux-2.4.9-official/arch/s390x/kernel/process.c	Fri Aug 17 05:12:35 2001
+++ working-2.4.9-rcu/arch/s390x/kernel/process.c	Sun Sep 16 18:20:49 2001
@@ -36,6 +36,7 @@
 #include <linux/delay.h>
 #include <linux/reboot.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -63,7 +64,7 @@
 	wait_psw.mask = _WAIT_PSW_MASK;
 	wait_psw.addr = (unsigned long) &&idle_wakeup;
 	while(1) {
-                if (current->need_resched) {
+                if (current->need_resched || is_rcu_pending()) {
                         schedule();
                         check_pgt_cache();
                         continue;
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/sh/kernel/process.c working-2.4.9-rcu/arch/sh/kernel/process.c
--- linux-2.4.9-official/arch/sh/kernel/process.c	Sun Apr 29 06:16:37 2001
+++ working-2.4.9-rcu/arch/sh/kernel/process.c	Sun Sep 16 18:21:53 2001
@@ -34,6 +34,7 @@
 #include <linux/reboot.h>
 #include <linux/init.h>
 #include <linux/irq.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -71,7 +72,7 @@
 	current->counter = -100;
 
 	while (1) {
-		while (!current->need_resched) {
+		while (!current->need_resched && !is_rcu_pending()) {
 			if (hlt_counter)
 				continue;
 			__sti();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/sparc/kernel/process.c working-2.4.9-rcu/arch/sparc/kernel/process.c
--- linux-2.4.9-official/arch/sparc/kernel/process.c	Thu Feb 22 14:24:56 2001
+++ working-2.4.9-rcu/arch/sparc/kernel/process.c	Sun Sep 16 18:24:35 2001
@@ -27,6 +27,7 @@
 #include <linux/smp_lock.h>
 #include <linux/reboot.h>
 #include <linux/delay.h>
+#include <linux/rcupdate.h>
 
 #include <asm/auxio.h>
 #include <asm/oplib.h>
@@ -114,7 +115,7 @@
 	init_idle();
 
 	while(1) {
-		if(current->need_resched) {
+		if(current->need_resched || is_rcu_pending()) {
 			schedule();
 			check_pgt_cache();
 		}
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.9-official/arch/sparc64/kernel/process.c working-2.4.9-rcu/arch/sparc64/kernel/process.c
--- linux-2.4.9-official/arch/sparc64/kernel/process.c	Wed Jul  4 11:15:08 2001
+++ working-2.4.9-rcu/arch/sparc64/kernel/process.c	Sun Sep 16 18:25:34 2001
@@ -28,6 +28,7 @@
 #include <linux/config.h>
 #include <linux/reboot.h>
 #include <linux/delay.h>
+#include <linux/rcupdate.h>
 
 #include <asm/oplib.h>
 #include <asm/uaccess.h>
@@ -88,7 +89,7 @@
 	init_idle();
 
 	while(1) {
-		if (current->need_resched != 0) {
+		if (current->need_resched != 0 || is_rcu_pending()) {
 			unidle_me();
 			schedule();
 			check_pgt_cache();

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-17  9:13 Dipankar Sarma
  0 siblings, 0 replies; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-17  9:13 UTC (permalink / raw)
  To: riel; +Cc: Andrea Arcangeli, linux-kernel

In article <Pine.LNX.4.33L.0109161433530.9536-100000@imladris.rielhome.conectiva> you wrote:
> On Sun, 16 Sep 2001, Andrea Arcangeli wrote:

>> However the issue with keventd and the fact we can get away with a
>> single per-cpu counter increase in the scheduler fast path made us to
>> think it's cleaner to just spend such cycle for each schedule rather
>> than having yet another 8k per cpu wasted and longer taskslists (a
>> local cpu increase is cheaper than a conditional jump).

> So why don't we put the test+branch inside keventd ?

> wakeup_krcud(void)
> {
> 	krcud_wanted = 1;
> 	wakeup(&keventd);
> }

> cheers,

> Rik
> -- 

keventd is not suitable for RCU at all. It can get starved out by RT
threads and that can result in either memory pressure or performance
degradation depending on how RCU is being used.

I have a patch that uses a per-cpu quiescent state counter. Cost of 
this on schedule() path is one per-cpu counter increment. I will
mail out the patch as soon as I can complete testing Andrea's review
comments on a bigger SMP box.

Most impartantly :-) it doesn't use kernel threads.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-16 17:23           ` 2.4.10pre7aa1 Andrea Arcangeli
  2001-09-16 17:34             ` 2.4.10pre7aa1 Rik van Riel
@ 2001-09-16 19:04             ` Christoph Hellwig
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2001-09-16 19:04 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, linux-kernel

On Sun, Sep 16, 2001 at 07:23:16PM +0200, Andrea Arcangeli wrote:
> > I can't quite remember if it was Linus or Larry who said:
> > 
> > "Threads are for people who don't understand state machines"
> > 
> > 
> > If you cannot make your code clean without adding another
> > thread, it's probably a bad sign ;)
> 
> Ask yourself why libaio in glibc uses threads.

Because glibc always uses the more bloated appropeach if there is a choice?

/me runs

> When there's no async-io
> hook you have no choice. Adding the hook is an advantage if you're going
> to use it during production, much better than
> rescheduling/creating/destroying various threads during production, but
> if you only need to register the hook once per day you'd waste time all
> the production time checking if somebody is registered in the hook.

I'd really like to see Ben's worktodo's in 2.4 - they are usefull even
without his whole asynchio framework and don't need big kernel changes..

	Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-16 17:34             ` 2.4.10pre7aa1 Rik van Riel
@ 2001-09-16 18:16               ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-16 18:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Christoph Hellwig, linux-kernel

On Sun, Sep 16, 2001 at 02:34:55PM -0300, Rik van Riel wrote:
> On Sun, 16 Sep 2001, Andrea Arcangeli wrote:
> 
> > However the issue with keventd and the fact we can get away with a
> > single per-cpu counter increase in the scheduler fast path made us to
> > think it's cleaner to just spend such cycle for each schedule rather
> > than having yet another 8k per cpu wasted and longer taskslists (a
> > local cpu increase is cheaper than a conditional jump).
> 
> So why don't we put the test+branch inside keventd ?

first keventd runs non RT, second it slowsdown keventd but I agree that
would be a minor issue. The best approch to me seems the one I
outlined in the last email (per-cpu sequence counter as only additional
cost in schedule).

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-16 17:23           ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-16 17:34             ` Rik van Riel
  2001-09-16 18:16               ` 2.4.10pre7aa1 Andrea Arcangeli
  2001-09-16 19:04             ` 2.4.10pre7aa1 Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2001-09-16 17:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Christoph Hellwig, linux-kernel

On Sun, 16 Sep 2001, Andrea Arcangeli wrote:

> However the issue with keventd and the fact we can get away with a
> single per-cpu counter increase in the scheduler fast path made us to
> think it's cleaner to just spend such cycle for each schedule rather
> than having yet another 8k per cpu wasted and longer taskslists (a
> local cpu increase is cheaper than a conditional jump).

So why don't we put the test+branch inside keventd ?

wakeup_krcud(void)
{
	krcud_wanted = 1;
	wakeup(&keventd);
}

cheers,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-16 17:00         ` 2.4.10pre7aa1 Rik van Riel
@ 2001-09-16 17:23           ` Andrea Arcangeli
  2001-09-16 17:34             ` 2.4.10pre7aa1 Rik van Riel
  2001-09-16 19:04             ` 2.4.10pre7aa1 Christoph Hellwig
  0 siblings, 2 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-16 17:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Christoph Hellwig, linux-kernel

On Sun, Sep 16, 2001 at 02:00:10PM -0300, Rik van Riel wrote:
> On Mon, 10 Sep 2001, Andrea Arcangeli wrote:
> 
> > > My problem with this appropech is just that we use kernel threads for
> > > more and more stuff - always creating new ones.  I think at some point
> > > they will sum up badly.
> >
> > They almost only costs memory. I also don't like unnecessary kernel
> > threads but I can see usefulness for this one, OTOH as you said the
> > latency of the wait_for_rcu isn't very critical but usually I prefer to
> > save cycles with memory where I can and where it's even cleaner to do so.
> 
> I can't quite remember if it was Linus or Larry who said:
> 
> "Threads are for people who don't understand state machines"
> 
> 
> If you cannot make your code clean without adding another
> thread, it's probably a bad sign ;)

Ask yourself why libaio in glibc uses threads. When there's no async-io
hook you have no choice. Adding the hook is an advantage if you're going
to use it during production, much better than
rescheduling/creating/destroying various threads during production, but
if you only need to register the hook once per day you'd waste time all
the production time checking if somebody is registered in the hook.

So while the "Threads are for people who don't understand state
machines" argument works for the userspace fileservers, it really
doesn't apply to the rcu slow path where we don't want to hurt the
schedule fast path with an hook.

However the issue with keventd and the fact we can get away with a
single per-cpu counter increase in the scheduler fast path made us to
think it's cleaner to just spend such cycle for each schedule rather
than having yet another 8k per cpu wasted and longer taskslists (a
local cpu increase is cheaper than a conditional jump).

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 19:06       ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-16 17:00         ` Rik van Riel
  2001-09-16 17:23           ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2001-09-16 17:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Christoph Hellwig, linux-kernel

On Mon, 10 Sep 2001, Andrea Arcangeli wrote:

> > My problem with this appropech is just that we use kernel threads for
> > more and more stuff - always creating new ones.  I think at some point
> > they will sum up badly.
>
> They almost only costs memory. I also don't like unnecessary kernel
> threads but I can see usefulness for this one, OTOH as you said the
> latency of the wait_for_rcu isn't very critical but usually I prefer to
> save cycles with memory where I can and where it's even cleaner to do so.

I can't quite remember if it was Linus or Larry who said:

"Threads are for people who don't understand state machines"


If you cannot make your code clean without adding another
thread, it's probably a bad sign ;)

cheers,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
       [not found] <20010910175416.A714@athlon.random>
  2001-09-10 17:41 ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-12  8:24 ` Rusty Russell
  1 sibling, 0 replies; 32+ messages in thread
From: Rusty Russell @ 2001-09-12  8:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

On Mon, 10 Sep 2001 17:54:17 +0200
Andrea Arcangeli <andrea@suse.de> wrote:
> Only in 2.4.10pre7aa1: 00_rcu-1
> 
> 	wait_for_rcu and call_rcu implementation (from IBM). I did some
> 	modifications with respect to the original version from IBM.
> 	In particular I dropped the vmalloc_rcu/kmalloc_rcu, the
> 	rcu_head must always be allocated in the data structures, it has
> 	to be a field of a class, rather than hiding it in the allocation
> 	and playing dirty and risky with casts on a bigger allocation.

Hi Andrea, 

	Like the kernel threads approach, but AFAICT it won't work for the case of two CPUs running wait_for_rcu at the same time (on a 4-way or above).

	Please try actually *using* the RCU code before you complain about the wrappers: you'll end up writing your own wrappers.  I look forward to seeing what you come up with (handling the case of the rcu structure in an arbitrary offset within the structure is possible, but my solutions were all less neat).

Preferred patch below,
Rusty.

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.4.7-module/include/linux/rcupdate.h working-2.4.7-rcu/include/linux/rcupdate.h
--- working-2.4.7-module/include/linux/rcupdate.h	Thu Jan  1 10:00:00 1970
+++ working-2.4.7-rcu/include/linux/rcupdate.h	Wed Aug 29 10:19:13 2001
@@ -0,0 +1,58 @@
+#ifndef _LINUX_RCUPDATE_H
+#define _LINUX_RCUPDATE_H
+/* Read-Copy-Update For Linux. */
+#include <linux/malloc.h>
+#include <linux/cache.h>
+#include <linux/vmalloc.h>
+#include <asm/atomic.h>
+
+struct rcu_head
+{
+	struct rcu_head *next;
+	void (*func)(void *obj);
+};
+
+/* Count of pending requests: for optimization in schedule() */
+extern atomic_t rcu_pending;
+
+/* Queues future request. */
+void call_rcu(struct rcu_head *head, void (*func)(void *head));
+
+/* Convenience wrappers: */
+static inline void *kmalloc_rcu(size_t size, int flags)
+{
+	void *ret;
+
+	size += L1_CACHE_ALIGN(sizeof(struct rcu_head));
+	ret = kmalloc(size, flags);
+	if (!ret)
+		return NULL;
+	return ret + L1_CACHE_ALIGN(sizeof(struct rcu_head));
+}
+
+static inline void kfree_rcu(void *obj)
+{
+	call_rcu(obj - L1_CACHE_ALIGN(sizeof(struct rcu_head)),
+		 (void (*)(void *))kfree);
+}
+
+static inline void *vmalloc_rcu(size_t size)
+{
+	void *ret;
+
+	size += L1_CACHE_ALIGN(sizeof(struct rcu_head));
+	ret = vmalloc(size);
+	if (!ret)
+		return NULL;
+	return ret + L1_CACHE_ALIGN(sizeof(struct rcu_head));
+}
+
+static inline void vfree_rcu(void *obj)
+{
+	call_rcu(obj - L1_CACHE_ALIGN(sizeof(struct rcu_head)),
+		 (void (*)(void *))vfree);
+}
+
+/* Called by schedule() when batch reference count drops to zero. */
+void rcu_batch_done(void);
+#endif
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.4.7-module/kernel/Makefile working-2.4.7-rcu/kernel/Makefile
--- working-2.4.7-module/kernel/Makefile	Sat Dec 30 09:07:24 2000
+++ working-2.4.7-rcu/kernel/Makefile	Wed Aug 29 10:12:08 2001
@@ -9,12 +9,12 @@
 
 O_TARGET := kernel.o
 
-export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o
+export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o rcupdate.o
 
 obj-y     = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
 	    module.o exit.o itimer.o info.o time.o softirq.o resource.o \
 	    sysctl.o acct.o capability.o ptrace.o timer.o user.o \
-	    signal.o sys.o kmod.o context.o
+	    signal.o sys.o kmod.o context.o rcupdate.o
 
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += ksyms.o
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.4.7-module/kernel/rcupdate.c working-2.4.7-rcu/kernel/rcupdate.c
--- working-2.4.7-module/kernel/rcupdate.c	Thu Jan  1 10:00:00 1970
+++ working-2.4.7-rcu/kernel/rcupdate.c	Wed Aug 29 10:20:31 2001
@@ -0,0 +1,65 @@
+/* Read-Copy-Update For Linux. */
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/* Count of pending requests: for optimization in schedule() */
+atomic_t rcu_pending = ATOMIC_INIT(0);
+
+/* Two batches per CPU : one (pending) is an internal queue of waiting
+   requests, being prepended to as new requests come in.  The other
+   (rcu_waiting) is waiting completion of schedule()s on all CPUs. */
+struct rcu_batch
+{
+	/* Two sets of queues: one queueing, one waiting quiescent finish */
+	int queueing;
+	/* Three queues: hard interrupt, soft interrupt, neither */
+	struct rcu_head *head[2][3];
+} __attribute__((__aligned__(SMP_CACHE_BYTES)));
+
+static struct rcu_batch rcu_batch[NR_CPUS];
+
+void call_rcu(struct rcu_head *head, void (*func)(void *head))
+{
+	unsigned cpu = smp_processor_id();
+	unsigned state;
+	struct rcu_head **headp;
+
+	head->func = func;
+	if (in_interrupt()) {
+		if (in_irq()) state = 2;
+		else state = 1;
+	} else state = 0;
+
+	/* Figure out which queue we're on. */
+	headp = &rcu_batch[cpu].head[rcu_batch[cpu].queueing][state];
+
+	atomic_inc(&rcu_pending);
+	/* Prepend to this CPU's list: no locks needed. */
+	head->next = *headp;
+	*headp = head;
+}
+
+/* Calls every callback in the waiting rcu batch. */
+void rcu_batch_done(void)
+{
+	struct rcu_head *i, *next;
+	struct rcu_batch *mybatch;
+	unsigned int q;
+
+	mybatch = &rcu_batch[smp_processor_id()];
+	/* Call callbacks: probably delete themselves, must not schedule. */
+	for (q = 0; q < 3; q++) {
+		for (i = mybatch->head[!mybatch->queueing][q]; i; i = next) {
+			next = i->next;
+			i->func(i);
+			atomic_dec(&rcu_pending);
+		}
+		mybatch->head[!mybatch->queueing][q] = NULL;
+	}
+
+	/* Start queueing on this batch. */
+	mybatch->queueing = !mybatch->queueing;
+}
+
+EXPORT_SYMBOL(call_rcu);
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.4.7-module/kernel/sched.c working-2.4.7-rcu/kernel/sched.c
--- working-2.4.7-module/kernel/sched.c	Sun Jul 22 13:13:25 2001
+++ working-2.4.7-rcu/kernel/sched.c	Wed Aug 29 10:23:02 2001
@@ -26,6 +26,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/completion.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -99,12 +100,15 @@
 	struct schedule_data {
 		struct task_struct * curr;
 		cycles_t last_schedule;
+		int ring_count, finished_count;
 	} schedule_data;
 	char __pad [SMP_CACHE_BYTES];
-} aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0}}};
+} aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0,0,0}}};
 
 #define cpu_curr(cpu) aligned_data[(cpu)].schedule_data.curr
 #define last_schedule(cpu) aligned_data[(cpu)].schedule_data.last_schedule
+#define ring_count(cpu) aligned_data[(cpu)].schedule_data.ring_count
+#define finished_count(cpu) aligned_data[(cpu)].schedule_data.finished_count
 
 struct kernel_stat kstat;
 
@@ -544,6 +548,10 @@
 
 	release_kernel_lock(prev, this_cpu);
 
+	if (atomic_read(&rcu_pending))
+		goto rcu_process;
+rcu_process_back:
+
 	/*
 	 * 'sched_data' is protected by the fact that we can run
 	 * only one process per CPU.
@@ -693,6 +701,19 @@
 	c = goodness(prev, this_cpu, prev->active_mm);
 	next = prev;
 	goto still_running_back;
+
+rcu_process:
+	/* Avoid cache line effects if value hasn't changed */
+	c = ring_count((this_cpu + 1) % smp_num_cpus) + 1;
+	if (c != ring_count(this_cpu)) {
+		/* Do subtraction to avoid int wrap corner case */
+		if (c - finished_count(this_cpu) >= 0) {
+			rcu_batch_done();
+			finished_count(this_cpu) = c + smp_num_cpus;
+		}
+		ring_count(this_cpu) = c;
+	}
+	goto rcu_process_back;
 
 move_rr_last:
 	if (!prev->counter) {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11 13:56 ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-11 14:27   ` Dipankar Sarma
  0 siblings, 0 replies; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-11 14:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: alan, Paul Mckenney, linux-kernel

On Tue, Sep 11, 2001 at 03:56:52PM +0200, Andrea Arcangeli wrote:
> Nothing is needed but without changes we would prefer not to include the
> rcu patch in the kernel. AFIK (and I'm far from being an expert here) I
> can upload the source protected by patent to kernel.org and everybody
> but US citizens can safely run the code protected by patent without
> having to pay the patent holder. So in short the problem is that it
> wouldn't be nice if you could download freely the linux kernel but you
> couldn't use it freely in the US without first dropping the rcu patch.
> 
> In short AFIK from your part you should just make a modification to the
> patent (or whatever else legal paperwork) saying that the usage of the
> rcu technology in GPL code is allowed free of charge.

I don't think the patent issue is going to be a problem since the work has
long been approved at IBM, legal-wise. It is just that none of us were
aware of what is required for grant of use.

I have already started the follow up process in Beaverton and will get back to
you ASAP. Sorry about the confusion.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11 13:05 2.4.10pre7aa1 Dipankar Sarma
@ 2001-09-11 13:56 ` Andrea Arcangeli
  2001-09-11 14:27   ` 2.4.10pre7aa1 Dipankar Sarma
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-11 13:56 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: alan, Paul Mckenney, linux-kernel

On Tue, Sep 11, 2001 at 06:35:34PM +0530, Dipankar Sarma wrote:
> In article <E15gmqD-0002YK-00@the-village.bc.nu> you wrote:
> >> BTW, I fixed a few more issues in the rcu patch (grep for
> >> down_interruptible for instance), here an updated patch (will be
> >> included in 2.4.10pre8aa1 [or later -aa]) with the name of rcu-2.
> 
> > I've been made aware of one other isue with the RCU patch
> > US Patent #05442758
> 
> > In the absence of an actual real signed header paper patent use grant for GPL 
> > usage from the Sequent folks that seems to be rather hard to fix.
> 
> > Alan
> 
> Hi Alan,
> 
> IBM bought us a couple of years ago and linux RCU is an IBM approved 
> project. I am not sure I understand what exactly is needed for patent use 
> grant for GPL, but whatever it is, I see absolutely no problem getting it done.
> I would appreciate if you let me know what is needed for GPL.

Nothing is needed but without changes we would prefer not to include the
rcu patch in the kernel. AFIK (and I'm far from being an expert here) I
can upload the source protected by patent to kernel.org and everybody
but US citizens can safely run the code protected by patent without
having to pay the patent holder. So in short the problem is that it
wouldn't be nice if you could download freely the linux kernel but you
couldn't use it freely in the US without first dropping the rcu patch.

In short AFIK from your part you should just make a modification to the
patent (or whatever else legal paperwork) saying that the usage of the
rcu technology in GPL code is allowed free of charge.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11 12:40   ` 2.4.10pre7aa1 Alan Cox
@ 2001-09-11 13:49     ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-11 13:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Dipankar Sarma, hch, linux-kernel, Paul Mckenney

On Tue, Sep 11, 2001 at 01:40:37PM +0100, Alan Cox wrote:
> > BTW, I fixed a few more issues in the rcu patch (grep for
> > down_interruptible for instance), here an updated patch (will be
> > included in 2.4.10pre8aa1 [or later -aa]) with the name of rcu-2.
> 
> I've been made aware of one other isue with the RCU patch
> US Patent #05442758
> 
> In the absence of an actual real signed header paper patent use grant for GPL 
> usage from the Sequent folks that seems to be rather hard to fix.

many thanks for the info. Since I live in Italy I should be safe to use
it and to contribute the developement until Sequent/IBM fixes the legal
problem. As far I can tell it's quite obviously just a theorical problem
but it was very worth noticing.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-11 13:05 Dipankar Sarma
  2001-09-11 13:56 ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-11 13:05 UTC (permalink / raw)
  To: alan; +Cc: Paul Mckenney, Andrea Arcangeli, linux-kernel

In article <E15gmqD-0002YK-00@the-village.bc.nu> you wrote:
>> BTW, I fixed a few more issues in the rcu patch (grep for
>> down_interruptible for instance), here an updated patch (will be
>> included in 2.4.10pre8aa1 [or later -aa]) with the name of rcu-2.

> I've been made aware of one other isue with the RCU patch
> US Patent #05442758

> In the absence of an actual real signed header paper patent use grant for GPL 
> usage from the Sequent folks that seems to be rather hard to fix.

> Alan

Hi Alan,

IBM bought us a couple of years ago and linux RCU is an IBM approved 
project. I am not sure I understand what exactly is needed for patent use 
grant for GPL, but whatever it is, I see absolutely no problem getting it done.
I would appreciate if you let me know what is needed for GPL.

Thanks
Dipankar 
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11 11:04 ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-11 12:40   ` Alan Cox
  2001-09-11 13:49     ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Alan Cox @ 2001-09-11 12:40 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Dipankar Sarma, hch, linux-kernel, Paul Mckenney

> BTW, I fixed a few more issues in the rcu patch (grep for
> down_interruptible for instance), here an updated patch (will be
> included in 2.4.10pre8aa1 [or later -aa]) with the name of rcu-2.

I've been made aware of one other isue with the RCU patch
US Patent #05442758

In the absence of an actual real signed header paper patent use grant for GPL 
usage from the Sequent folks that seems to be rather hard to fix.

Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-11 12:22 Dipankar Sarma
  0 siblings, 0 replies; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-11 12:22 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel, Paul Mckenney

In article <20010911135735.T715@athlon.random> you wrote:
> On Tue, Sep 11, 2001 at 05:23:01PM +0530, Dipankar Sarma wrote:
>> In article <20010911131238.N715@athlon.random> you wrote:
>> > many thanks. At the moment my biggest concern is about the need of
>> > call_rcu not to be starved by RT threads (keventd can be starved so then
>> > it won't matter if krcud is RT because we won't start using it).
>> 
>> > Andrea
>> 
>> I think we can avoid keventd altogether by using a periodic timer (say 10ms)
>> to check for completion of an RC update. The timer may be active
>> only if only if there is any RCU going on in the system - that way
>> we still don't have any impact on the rest of the kernel.

> the timer can a have bigger latency than keventd calling wait_for_rcu
> so it should be a loss in a stright bench with light load, but OTOH we
> only care about getting those callbacks executed eventually and the
> advantage I can see is that the timer cannot get starved.

> Andrea

What kind of timer latencies are we talking about ? I would not be
too concerned if the RCU timers execute in 40ms instead of requested
10ms. The question is are there situations where they can get delayed
by minutes ?

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11 11:53 2.4.10pre7aa1 Dipankar Sarma
@ 2001-09-11 11:57 ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-11 11:57 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: linux-kernel, Paul Mckenney

On Tue, Sep 11, 2001 at 05:23:01PM +0530, Dipankar Sarma wrote:
> In article <20010911131238.N715@athlon.random> you wrote:
> > many thanks. At the moment my biggest concern is about the need of
> > call_rcu not to be starved by RT threads (keventd can be starved so then
> > it won't matter if krcud is RT because we won't start using it).
> 
> > Andrea
> 
> I think we can avoid keventd altogether by using a periodic timer (say 10ms)
> to check for completion of an RC update. The timer may be active
> only if only if there is any RCU going on in the system - that way
> we still don't have any impact on the rest of the kernel.

the timer can a have bigger latency than keventd calling wait_for_rcu
so it should be a loss in a stright bench with light load, but OTOH we
only care about getting those callbacks executed eventually and the
advantage I can see is that the timer cannot get starved.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-11 11:53 Dipankar Sarma
  2001-09-11 11:57 ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-11 11:53 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel, Paul Mckenney

In article <20010911131238.N715@athlon.random> you wrote:
> many thanks. At the moment my biggest concern is about the need of
> call_rcu not to be starved by RT threads (keventd can be starved so then
> it won't matter if krcud is RT because we won't start using it).

> Andrea

I think we can avoid keventd altogether by using a periodic timer (say 10ms)
to check for completion of an RC update. The timer may be active
only if only if there is any RCU going on in the system - that way
we still don't have any impact on the rest of the kernel.

I am working on such a thing - but it will take me a little bit
of time to figure out how to do this in linux.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11  9:39 2.4.10pre7aa1 Maneesh Soni
@ 2001-09-11 11:12 ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-11 11:12 UTC (permalink / raw)
  To: Maneesh Soni; +Cc: LKML

On Tue, Sep 11, 2001 at 03:09:46PM +0530, Maneesh Soni wrote:
> 
> In article <20010910200344.C714@athlon.random> you wrote:
> > Long term of course, but with my further changes before the inclusion
> > the plain current patches shouldn't apply any longer, I'd like if the
> > developers of the current rcu fd patches could check my changes and
> > adapt them (if they agree with my changes of course ;).
> 
> Hello Andrea,
> 
> I have noted your changes and I am modifying the FD patch accordingly. In fact
> in the first version of FD patch I have used the rc_callback() interface which
> equivalent to call_rcu(). 

many thanks. At the moment my biggest concern is about the need of
call_rcu not to be starved by RT threads (keventd can be starved so then
it won't matter if krcud is RT because we won't start using it).  But I
don't have concerns about the API so those issues will be transparent to
the FD patch.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-11  8:51 2.4.10pre7aa1 Dipankar Sarma
@ 2001-09-11 11:04 ` Andrea Arcangeli
  2001-09-11 12:40   ` 2.4.10pre7aa1 Alan Cox
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-11 11:04 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: hch, linux-kernel, Paul Mckenney

On Tue, Sep 11, 2001 at 02:21:58PM +0530, Dipankar Sarma wrote:
> Hi Christoph,
> 
> In article <20010910205250.B22889@caldera.de> you wrote:
> 
> > Hmm, I don't see why latency is important for rcu - we only want to
> > free datastructures.. (mm load?).
> 
> While it is not important for RCU to do the updates quickly, it is
> still necessary that updates are not completely starved out by
> high-priority tasks. So, the idea behind using high-priority
> krcuds is to ensure that they don't get starved thereby delaying
> updates for unreasonably long periods of time which could lead
> to memory pressure or other performance problems depending on
> how RCU is being used. 

good point.

> I agree that it is not always a good idea to use kernel threads for
> everything, but in this case this seems to be the safest and
> most reasonable option.

pretty much agreed.

BTW, I fixed a few more issues in the rcu patch (grep for
down_interruptible for instance), here an updated patch (will be
included in 2.4.10pre8aa1 [or later -aa]) with the name of rcu-2.

diff -urN 2.4.10pre8/include/linux/rcupdate.h rcu/include/linux/rcupdate.h
--- 2.4.10pre8/include/linux/rcupdate.h	Thu Jan  1 01:00:00 1970
+++ rcu/include/linux/rcupdate.h	Tue Sep 11 06:14:17 2001
@@ -0,0 +1,48 @@
+/*
+ * Read-Copy Update mechanism for Linux
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ *              http://lse.sourceforge.net/locking/rcupdate.html
+ *
+ */
+
+#ifndef _LINUX_RCUPDATE_H
+#define _LINUX_RCUPDATE_H
+
+#include <linux/malloc.h>
+#include <linux/vmalloc.h>
+#include <linux/cache.h>
+#include <asm/semaphore.h>
+
+struct rcu_data {
+	struct task_struct *krcud_task;
+	struct semaphore krcud_sema;
+} ____cacheline_aligned_in_smp;
+
+#define krcud_task(cpu) rcu_data[(cpu)].krcud_task
+#define krcud_sema(cpu) rcu_data[(cpu)].krcud_sema
+
+struct rcu_head
+{
+	struct list_head list;
+	void (*func)(void * arg);
+	void * arg;
+};
+
+extern void call_rcu(struct rcu_head * head, void (*func)(void * arg), void * arg);
+
+#endif
diff -urN 2.4.10pre8/kernel/Makefile rcu/kernel/Makefile
--- 2.4.10pre8/kernel/Makefile	Tue Sep 11 04:10:03 2001
+++ rcu/kernel/Makefile	Tue Sep 11 06:14:17 2001
@@ -9,12 +9,12 @@
 
 O_TARGET := kernel.o
 
-export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o exec_domain.o
+export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o exec_domain.o rcupdate.o
 
 obj-y     = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
 	    module.o exit.o itimer.o info.o time.o softirq.o resource.o \
 	    sysctl.o acct.o capability.o ptrace.o timer.o user.o \
-	    signal.o sys.o kmod.o context.o
+	    signal.o sys.o kmod.o context.o rcupdate.o
 
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += ksyms.o
diff -urN 2.4.10pre8/kernel/rcupdate.c rcu/kernel/rcupdate.c
--- 2.4.10pre8/kernel/rcupdate.c	Thu Jan  1 01:00:00 1970
+++ rcu/kernel/rcupdate.c	Tue Sep 11 06:16:39 2001
@@ -0,0 +1,165 @@
+/*
+ * Read-Copy Update mechanism for Linux
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ *              http://lse.sourceforge.net/locking/rcupdate.html
+ *
+ */
+
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <linux/tqueue.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/init.h>
+#include <linux/tqueue.h>
+
+asmlinkage long sys_sched_get_priority_max(int policy);
+
+static spinlock_t rcu_lock = SPIN_LOCK_UNLOCKED;
+static struct list_head rcu_wait_list;
+static struct tq_struct rcu_task;
+static struct semaphore rcu_sema;
+static struct rcu_data rcu_data[NR_CPUS];
+
+/*
+ * Wait for all the CPUs to go through a quiescent state. It assumes
+ * that current CPU doesn't have any reference to RCU protected
+ * data and thus has already undergone a quiescent state since update.
+ */
+static void wait_for_rcu(void)
+{
+	int cpu;
+	int count;
+
+        for (cpu = 0; cpu < smp_num_cpus; cpu++) {
+                if (cpu == smp_processor_id())
+                        continue;
+                up(&krcud_sema(cpu));
+	}
+	count = 0;
+	while (count++ < smp_num_cpus - 1)
+		down(&rcu_sema);
+}
+
+/*
+ * Process a batch of RCU callbacks (the batch can be empty).
+ * There can be only one batch processed at any point of time.
+ */
+static void process_pending_rcus(void *arg)
+{
+	LIST_HEAD(rcu_current_list);
+	struct list_head * entry;
+
+	spin_lock_irq(&rcu_lock);
+	list_splice(&rcu_wait_list, rcu_current_list.prev);
+	INIT_LIST_HEAD(&rcu_wait_list);
+	spin_unlock_irq(&rcu_lock);
+
+	wait_for_rcu();
+
+	while ((entry = rcu_current_list.prev) != &rcu_current_list) {
+		struct rcu_head * head;
+
+		list_del(entry);
+		head = list_entry(entry, struct rcu_head, list);
+		head->func(head->arg);
+	}
+}
+
+/*
+ * Register a RCU callback to be invoked after all CPUs have
+ * gone through a quiescent state.
+ */
+void call_rcu(struct rcu_head * head, void (*func)(void * arg), void * arg)
+{
+	unsigned long flags;
+	int start = 0;
+
+	head->func = func;
+	head->arg = arg;
+
+	spin_lock_irqsave(&rcu_lock, flags);
+	if (list_empty(&rcu_wait_list))
+		start = 1;
+	list_add(&head->list, &rcu_wait_list);
+	spin_unlock_irqrestore(&rcu_lock, flags);
+
+	if (start)
+		schedule_task(&rcu_task);
+}
+
+/*
+ * Per-CPU RCU dameon. It runs at an absurdly high priority so
+ * that it is not starved out by the scheduler thereby holding
+ * up RC updates.
+ */
+static int krcud(void * __bind_cpu)
+{
+	int bind_cpu = *(int *) __bind_cpu;
+	int cpu = cpu_logical_map(bind_cpu);
+
+	daemonize();
+        current->policy = SCHED_FIFO;
+        current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);
+
+	sigfillset(&current->blocked);
+
+	/* Migrate to the right CPU */
+	current->cpus_allowed = 1UL << cpu;
+	while (smp_processor_id() != cpu)
+		schedule();
+
+	sprintf(current->comm, "krcud_CPU%d", bind_cpu);
+	sema_init(&krcud_sema(cpu), 0);
+
+	krcud_task(cpu) = current;
+
+	for (;;) {
+		while (down_interruptible(&krcud_sema(cpu)));
+		up(&rcu_sema);
+	}
+}
+
+static void spawn_krcud(void)
+{
+	int cpu;
+
+	for (cpu = 0; cpu < smp_num_cpus; cpu++) {
+		if (kernel_thread(krcud, (void *) &cpu,
+				  CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0)
+			printk("spawn_krcud() failed for cpu %d\n", cpu);
+		else {
+			while (!krcud_task(cpu_logical_map(cpu))) {
+				current->policy |= SCHED_YIELD;
+				schedule();
+			}
+		}
+	}
+}
+
+static __init int rcu_init(void)
+{
+	sema_init(&rcu_sema, 0);
+	rcu_task.routine = process_pending_rcus;
+	spawn_krcud();
+	return 0;
+}
+
+__initcall(rcu_init);
+
+EXPORT_SYMBOL(call_rcu);

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-11  9:39 Maneesh Soni
  2001-09-11 11:12 ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Maneesh Soni @ 2001-09-11  9:39 UTC (permalink / raw)
  To: andrea; +Cc: LKML


In article <20010910200344.C714@athlon.random> you wrote:
> Long term of course, but with my further changes before the inclusion
> the plain current patches shouldn't apply any longer, I'd like if the
> developers of the current rcu fd patches could check my changes and
> adapt them (if they agree with my changes of course ;).

Hello Andrea,

I have noted your changes and I am modifying the FD patch accordingly. In fact
in the first version of FD patch I have used the rc_callback() interface which
equivalent to call_rcu(). 

Regards,
Maneesh

-- 
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5262355 Extn. 3999
email: smaneesh@sequent.com
http://lse.sourceforge.net/locking/rclock.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
@ 2001-09-11  8:51 Dipankar Sarma
  2001-09-11 11:04 ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2001-09-11  8:51 UTC (permalink / raw)
  To: hch; +Cc: linux-kernel, Paul Mckenney, Andrea Arcangeli

Hi Christoph,

In article <20010910205250.B22889@caldera.de> you wrote:

> Hmm, I don't see why latency is important for rcu - we only want to
> free datastructures.. (mm load?).

While it is not important for RCU to do the updates quickly, it is
still necessary that updates are not completely starved out by
high-priority tasks. So, the idea behind using high-priority
krcuds is to ensure that they don't get starved thereby delaying
updates for unreasonably long periods of time which could lead
to memory pressure or other performance problems depending on
how RCU is being used. 


> On the other hands they are the experts on RCU, not I so I'll believe them.

>> So in short if you really are in pain for 8k per cpu to get the best
>> runtime behaviour and cleaner code I'd at least suggest to use the
>> ksoftirqd way that should be the next best step.

> My problem with this appropech is just that we use kernel threads for
> more and more stuff - always creating new ones.  I think at some point
> they will sum up badly.

> 	Christoph

I agree that it is not always a good idea to use kernel threads for
everything, but in this case this seems to be the safest and
most reasonable option.

FYI, there are a couple of other implementations, but they all affect
code in fast paths even if there is no RCU going on in the system.
One of them is from Rusty that keeps track of CPUs going through
quiescent state from the scheduler context and also executes the
callbacks from the scheduler context. The other patch is based
on our old DYNIX/ptx implementation - it uses one per-cpu context
switch counter to detect quiescent state and checks for completion
of RCU in local timer interrupt handler. Once all the CPUs go
through a quiescent state, the callbacks are processed using
a tasklet. 

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 19:03         ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-10 19:08           ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-10 19:08 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, Linus Torvalds

On Mon, Sep 10, 2001 at 09:03:15PM +0200, Christoph Hellwig wrote:
> I said max_sectors fixes because I meant both.  The paride fixes are
> needed, the sd ones nice - I'd like to see both merged :)

agreed ;)

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 18:52     ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-10 19:06       ` Andrea Arcangeli
  2001-09-16 17:00         ` 2.4.10pre7aa1 Rik van Riel
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-10 19:06 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel

On Mon, Sep 10, 2001 at 08:52:50PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 10, 2001 at 08:03:44PM +0200, Andrea Arcangeli wrote:
> > > Do we really need yet-another per-CPU thread for this?  I'd prefer to have
> > > the context thread per-CPU instead (like in Ben's asynchio patch) and do
> > > this as well.
> > 
> > The first desing solution I proposed to Paul and Dipankar was just to
> > use ksoftirqd for that (in short set need_resched and wait it to be
> > cleared), it worked out nicely and it was a sensible improvement with
> > respect to their previous patches. (also it was reliable, we cannot
> > afford allocations in the wait_for_rcu path to avoid having to introduce
> > fail paths) it was also a noop to the ksoftirqd paths.
> > 
> > However they remarked ksoftirqd wasn't a RT thread so under very high
> > load it could introduce an higher latency to the wait_for_rcu calls.
> 
> Hmm, I don't see why latency is important for rcu - we only want to
> free datastructures.. (mm load?).

latency isn't critical, infact the point of rcu is not to care about the
performance of the writer, so it wouldn't be a showstopper if it takes
more time, but still this doesn't change that with RT threads the writer
will be faster.

> My problem with this appropech is just that we use kernel threads for
> more and more stuff - always creating new ones.  I think at some point
> they will sum up badly.

They almost only costs memory. I also don't like unnecessary kernel
threads but I can see usefulness for this one, OTOH as you said the
latency of the wait_for_rcu isn't very critical but usually I prefer to
save cycles with memory where I can and where it's even cleaner to do so.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 19:01       ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-10 19:03         ` Christoph Hellwig
  2001-09-10 19:08           ` 2.4.10pre7aa1 Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2001-09-10 19:03 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Linus Torvalds

On Mon, Sep 10, 2001 at 09:01:16PM +0200, Andrea Arcangeli wrote:
> > I think the sd part is much more interesting for most users..
> 
> more interesting for most users certainly ;), but it's not needed for
> reliable operations so I thought you were talking about the paride
> fixes (also you quoted the 00_paride-... filename).

I said max_sectors fixes because I meant both.  The paride fixes are
needed, the sd ones nice - I'd like to see both merged :)

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 18:49     ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-10 19:01       ` Andrea Arcangeli
  2001-09-10 19:03         ` 2.4.10pre7aa1 Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-10 19:01 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, Linus Torvalds

On Mon, Sep 10, 2001 at 08:49:28PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 10, 2001 at 08:03:44PM +0200, Andrea Arcangeli wrote:
> > On Mon, Sep 10, 2001 at 07:41:58PM +0200, Christoph Hellwig wrote:
> > > In article <20010910175416.A714@athlon.random> you wrote:
> > > > Only in 2.4.10pre4aa1: 00_paride-max_sectors-1
> > > > Only in 2.4.10pre7aa1: 00_paride-max_sectors-2
> > > >
> > > > 	Rediffed (also noticed the gendisk list changes deleted too much stuff
> > > > 	here so resurrected it).
> > > 
> > > Do you plan to submit the max_sectors changes to Linus & Alan?
> > > Otherwise I will do as they seem to be needed for reliable operation.
> > 
> > agreed, Linus, here it is ready for merging into mainline:
> 
> I think the sd part is much more interesting for most users..

more interesting for most users certainly ;), but it's not needed for
reliable operations so I thought you were talking about the paride
fixes (also you quoted the 00_paride-... filename).

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 18:03   ` 2.4.10pre7aa1 Andrea Arcangeli
  2001-09-10 18:49     ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-10 18:52     ` Christoph Hellwig
  2001-09-10 19:06       ` 2.4.10pre7aa1 Andrea Arcangeli
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2001-09-10 18:52 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

On Mon, Sep 10, 2001 at 08:03:44PM +0200, Andrea Arcangeli wrote:
> > Do we really need yet-another per-CPU thread for this?  I'd prefer to have
> > the context thread per-CPU instead (like in Ben's asynchio patch) and do
> > this as well.
> 
> The first desing solution I proposed to Paul and Dipankar was just to
> use ksoftirqd for that (in short set need_resched and wait it to be
> cleared), it worked out nicely and it was a sensible improvement with
> respect to their previous patches. (also it was reliable, we cannot
> afford allocations in the wait_for_rcu path to avoid having to introduce
> fail paths) it was also a noop to the ksoftirqd paths.
> 
> However they remarked ksoftirqd wasn't a RT thread so under very high
> load it could introduce an higher latency to the wait_for_rcu calls.

Hmm, I don't see why latency is important for rcu - we only want to
free datastructures.. (mm load?).

On the other hands they are the experts on RCU, not I so I'll believe them.

> So in short if you really are in pain for 8k per cpu to get the best
> runtime behaviour and cleaner code I'd at least suggest to use the
> ksoftirqd way that should be the next best step.

My problem with this appropech is just that we use kernel threads for
more and more stuff - always creating new ones.  I think at some point
they will sum up badly.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 18:03   ` 2.4.10pre7aa1 Andrea Arcangeli
@ 2001-09-10 18:49     ` Christoph Hellwig
  2001-09-10 19:01       ` 2.4.10pre7aa1 Andrea Arcangeli
  2001-09-10 18:52     ` 2.4.10pre7aa1 Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2001-09-10 18:49 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Linus Torvalds

On Mon, Sep 10, 2001 at 08:03:44PM +0200, Andrea Arcangeli wrote:
> On Mon, Sep 10, 2001 at 07:41:58PM +0200, Christoph Hellwig wrote:
> > In article <20010910175416.A714@athlon.random> you wrote:
> > > Only in 2.4.10pre4aa1: 00_paride-max_sectors-1
> > > Only in 2.4.10pre7aa1: 00_paride-max_sectors-2
> > >
> > > 	Rediffed (also noticed the gendisk list changes deleted too much stuff
> > > 	here so resurrected it).
> > 
> > Do you plan to submit the max_sectors changes to Linus & Alan?
> > Otherwise I will do as they seem to be needed for reliable operation.
> 
> agreed, Linus, here it is ready for merging into mainline:

I think the sd part is much more interesting for most users..
Version from 2.4.10pre7aa1 is here:

diff -urN 2.4.7/drivers/scsi/sd.c sd_max_sectors/drivers/scsi/sd.c
--- 2.4.7/drivers/scsi/sd.c	Sat Jul 21 00:04:23 2001
+++ sd_max_sectors/drivers/scsi/sd.c	Mon Jul 23 04:31:00 2001
@@ -90,6 +90,7 @@
 static int *sd_sizes;
 static int *sd_blocksizes;
 static int *sd_hardsizes;	/* Hardware sector size */
+static int *sd_max_sectors;
 
 static int check_scsidisk_media_change(kdev_t);
 static int fop_revalidate_scsidisk(kdev_t);
@@ -1095,15 +1096,30 @@
 	if (!sd_hardsizes)
 		goto cleanup_blocksizes;
 
+	sd_max_sectors = kmalloc((sd_template.dev_max << 4) * sizeof(int), GFP_ATOMIC);
+	if (!sd_max_sectors)
+		goto cleanup_max_sectors;
+
 	for (i = 0; i < sd_template.dev_max << 4; i++) {
 		sd_blocksizes[i] = 1024;
 		sd_hardsizes[i] = 512;
+		/*
+		 * Allow lowlevel device drivers to generate 512k large scsi
+		 * commands if they know what they're doing and they ask for it
+		 * explicitly via the SHpnt->max_sectors API.
+		 */
+		sd_max_sectors[i] = MAX_SEGMENTS*8;
 	}
 
 	for (i = 0; i < N_USED_SD_MAJORS; i++) {
 		blksize_size[SD_MAJOR(i)] = sd_blocksizes + i * (SCSI_DISKS_PER_MAJOR << 4);
 		hardsect_size[SD_MAJOR(i)] = sd_hardsizes + i * (SCSI_DISKS_PER_MAJOR << 4);
+		max_sectors[SD_MAJOR(i)] = sd_max_sectors + i * (SCSI_DISKS_PER_MAJOR << 4);
 	}
+	/*
+	 * FIXME: should unregister blksize_size, hardsect_size and max_sectors when
+	 * the module is unloaded.
+	 */
 	sd = kmalloc((sd_template.dev_max << 4) *
 					  sizeof(struct hd_struct),
 					  GFP_ATOMIC);
@@ -1155,6 +1171,8 @@
 cleanup_sd_gendisks:
 	kfree(sd);
 cleanup_sd:
+	kfree(sd_max_sectors);
+cleanup_max_sectors:
 	kfree(sd_hardsizes);
 cleanup_blocksizes:
 	kfree(sd_blocksizes);
diff -urN 2.4.7/drivers/scsi/sym53c8xx.h sd_max_sectors/drivers/scsi/sym53c8xx.h
--- 2.4.7/drivers/scsi/sym53c8xx.h	Wed Jun 20 16:50:58 2001
+++ sd_max_sectors/drivers/scsi/sym53c8xx.h	Mon Jul 23 04:30:58 2001
@@ -96,6 +96,7 @@
 			this_id:        7,			\
 			sg_tablesize:   SCSI_NCR_SG_TABLESIZE,	\
 			cmd_per_lun:    SCSI_NCR_CMD_PER_LUN,	\
+			max_sectors:    MAX_SEGMENTS*8,		\
 			use_clustering: DISABLE_CLUSTERING} 
 
 #else

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
  2001-09-10 17:41 ` 2.4.10pre7aa1 Christoph Hellwig
@ 2001-09-10 18:03   ` Andrea Arcangeli
  2001-09-10 18:49     ` 2.4.10pre7aa1 Christoph Hellwig
  2001-09-10 18:52     ` 2.4.10pre7aa1 Christoph Hellwig
  0 siblings, 2 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2001-09-10 18:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, Linus Torvalds

On Mon, Sep 10, 2001 at 07:41:58PM +0200, Christoph Hellwig wrote:
> In article <20010910175416.A714@athlon.random> you wrote:
> > Only in 2.4.10pre4aa1: 00_paride-max_sectors-1
> > Only in 2.4.10pre7aa1: 00_paride-max_sectors-2
> >
> > 	Rediffed (also noticed the gendisk list changes deleted too much stuff
> > 	here so resurrected it).
> 
> Do you plan to submit the max_sectors changes to Linus & Alan?
> Otherwise I will do as they seem to be needed for reliable operation.

agreed, Linus, here it is ready for merging into mainline:

diff -urN 2.4.10pre6/drivers/block/paride/pd.c paride/drivers/block/paride/pd.c
--- 2.4.10pre6/drivers/block/paride/pd.c	Sun Sep  9 06:04:54 2001
+++ paride/drivers/block/paride/pd.c	Mon Sep 10 03:58:48 2001
@@ -287,6 +287,7 @@
 static struct hd_struct pd_hd[PD_DEVS];
 static int pd_sizes[PD_DEVS];
 static int pd_blocksizes[PD_DEVS];
+static int pd_maxsectors[PD_DEVS];
 
 #define PD_NAMELEN	8
 
@@ -385,56 +386,6 @@
 	}
 }
 
-static inline int pd_new_segment(request_queue_t *q, struct request *req, int max_segments)
-{
-	if (max_segments > cluster)
-		max_segments = cluster;
-
-	if (req->nr_segments < max_segments) {
-		req->nr_segments++;
-		return 1;
-	}
-	return 0;
-}
-
-static int pd_back_merge_fn(request_queue_t *q, struct request *req, 
-			    struct buffer_head *bh, int max_segments)
-{
-	if (req->bhtail->b_data + req->bhtail->b_size == bh->b_data)
-		return 1;
-	return pd_new_segment(q, req, max_segments);
-}
-
-static int pd_front_merge_fn(request_queue_t *q, struct request *req, 
-			     struct buffer_head *bh, int max_segments)
-{
-	if (bh->b_data + bh->b_size == req->bh->b_data)
-		return 1;
-	return pd_new_segment(q, req, max_segments);
-}
-
-static int pd_merge_requests_fn(request_queue_t *q, struct request *req,
-				struct request *next, int max_segments)
-{
-	int total_segments = req->nr_segments + next->nr_segments;
-	int same_segment;
-
-	if (max_segments > cluster)
-		max_segments = cluster;
-
-	same_segment = 0;
-	if (req->bhtail->b_data + req->bhtail->b_size == next->bh->b_data) {
-		total_segments--;
-		same_segment = 1;
-	}
-    
-	if (total_segments > max_segments)
-		return 0;
-
-	req->nr_segments = total_segments;
-	return 1;
-}
-
 int pd_init (void)
 
 {       int i;
@@ -448,9 +399,6 @@
         }
 	q = BLK_DEFAULT_QUEUE(MAJOR_NR);
 	blk_init_queue(q, DEVICE_REQUEST);
-	q->back_merge_fn = pd_back_merge_fn;
-	q->front_merge_fn = pd_front_merge_fn;
-	q->merge_requests_fn = pd_merge_requests_fn;
         read_ahead[MAJOR_NR] = 8;       /* 8 sector (4kB) read ahead */
         
 	pd_gendisk.major = major;
@@ -460,6 +408,9 @@
 	for(i=0;i<PD_DEVS;i++) pd_blocksizes[i] = 1024;
 	blksize_size[MAJOR_NR] = pd_blocksizes;
 
+	for(i=0;i<PD_DEVS;i++) pd_maxsectors[i] = cluster;
+	max_sectors[MAJOR_NR] = pd_maxsectors;
+
 	printk("%s: %s version %s, major %d, cluster %d, nice %d\n",
 		name,name,PD_VERSION,major,cluster,nice);
 	pd_init_units();
@@ -642,6 +593,11 @@
 
         devfs_unregister_blkdev(MAJOR_NR,name);
 	del_gendisk(&pd_gendisk);
+
+	for (unit=0;unit<PD_UNITS;unit++) 
+	   if (PD.present) pi_release(PI);
+
+	max_sectors[MAJOR_NR] = NULL;
 }
 
 #endif

> > Only in 2.4.10pre7aa1: 00_rcu-1
> >
> > 	wait_for_rcu and call_rcu implementation (from IBM). I did some
> > 	modifications with respect to the original version from IBM.
> > 	In particular I dropped the vmalloc_rcu/kmalloc_rcu, the
> > 	rcu_head must always be allocated in the data structures, it has
> > 	to be a field of a class, rather than hiding it in the allocation
> > 	and playing dirty and risky with casts on a bigger allocation.
> 
> Do we really need yet-another per-CPU thread for this?  I'd prefer to have
> the context thread per-CPU instead (like in Ben's asynchio patch) and do
> this as well.

The first desing solution I proposed to Paul and Dipankar was just to
use ksoftirqd for that (in short set need_resched and wait it to be
cleared), it worked out nicely and it was a sensible improvement with
respect to their previous patches. (also it was reliable, we cannot
afford allocations in the wait_for_rcu path to avoid having to introduce
fail paths) it was also a noop to the ksoftirqd paths.

However they remarked ksoftirqd wasn't a RT thread so under very high
load it could introduce an higher latency to the wait_for_rcu calls.
keventd as well isn't a RT task and furthmore currently the
schedule_task as the property that only one task in the keventd queue
will be run at once (but I guess for the latter issue we can probably
ignore it since it's better not to rely on it).

So the obvious next step was to waste another 8k per cpu and to get the
best runtime behaviour and also very clean and self contained code. I
think that's rasonable.

So in short if you really are in pain for 8k per cpu to get the best
runtime behaviour and cleaner code I'd at least suggest to use the
ksoftirqd way that should be the next best step.

> BTW, do you plan to merge patches that actually _use_ this into your tree?

Long term of course, but with my further changes before the inclusion
the plain current patches shouldn't apply any longer, I'd like if the
developers of the current rcu fd patches could check my changes and
adapt them (if they agree with my changes of course ;).

> > Only in 2.4.10pre4aa1: 10_prefetch-4
> > Only in 2.4.10pre7aa1: 10_prefetch-5
> >
> > 	Part of prefetch in mainline, rediffed the architectural parts.
> 
> In my tree I also have an ia64 prefetch patch (I think it's from redhat,
> not sure though), it's appended if you want to take it.

thanks, applied.

Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 2.4.10pre7aa1
       [not found] <20010910175416.A714@athlon.random>
@ 2001-09-10 17:41 ` Christoph Hellwig
  2001-09-10 18:03   ` 2.4.10pre7aa1 Andrea Arcangeli
  2001-09-12  8:24 ` 2.4.10pre7aa1 Rusty Russell
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2001-09-10 17:41 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

In article <20010910175416.A714@athlon.random> you wrote:
> Only in 2.4.10pre4aa1: 00_paride-max_sectors-1
> Only in 2.4.10pre7aa1: 00_paride-max_sectors-2
>
> 	Rediffed (also noticed the gendisk list changes deleted too much stuff
> 	here so resurrected it).

Do you plan to submit the max_sectors changes to Linus & Alan?
Otherwise I will do as they seem to be needed for reliable operation.


> Only in 2.4.10pre7aa1: 00_rcu-1
>
> 	wait_for_rcu and call_rcu implementation (from IBM). I did some
> 	modifications with respect to the original version from IBM.
> 	In particular I dropped the vmalloc_rcu/kmalloc_rcu, the
> 	rcu_head must always be allocated in the data structures, it has
> 	to be a field of a class, rather than hiding it in the allocation
> 	and playing dirty and risky with casts on a bigger allocation.

Do we really need yet-another per-CPU thread for this?  I'd prefer to have
the context thread per-CPU instead (like in Ben's asynchio patch) and do
this as well.

BTW, do you plan to merge patches that actually _use_ this into your tree?

> Only in 2.4.10pre4aa1: 10_prefetch-4
> Only in 2.4.10pre7aa1: 10_prefetch-5
>
> 	Part of prefetch in mainline, rediffed the architectural parts.

In my tree I also have an ia64 prefetch patch (I think it's from redhat,
not sure though), it's appended if you want to take it.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

--- linux/include/asm-ia64/processor.h.org	Thu Jun 28 12:43:20 2001
+++ linux/include/asm-ia64/processor.h	Thu Jun 28 12:48:28 2001
@@ -958,6 +958,25 @@
 	return result;
 }
 
+
+#define ARCH_HAS_PREFETCH
+#define ARCH_HAS_PREFETCHW
+#define ARCH_HAS_SPINLOCK_PREFETCH
+#define PREFETCH_STRIDE 256
+
+extern inline void prefetch(const void *x)
+{
+         __asm__ __volatile__ ("lfetch [%0]" : : "r"(x));
+}
+         
+extern inline void prefetchw(const void *x)
+{
+	__asm__ __volatile__ ("lfetch.excl [%0]" : : "r"(x));
+}
+
+#define spin_lock_prefetch(x)   prefetchw(x)
+
+                  
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_IA64_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2001-09-17  9:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-12 11:04 2.4.10pre7aa1 Dipankar Sarma
2001-09-12 14:03 ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-12 14:42   ` 2.4.10pre7aa1 Dipankar Sarma
2001-09-12 14:53     ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-16 12:23       ` 2.4.10pre7aa1 Rusty Russell
  -- strict thread matches above, loose matches on Subject: below --
2001-09-17  9:13 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 13:05 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 13:56 ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-11 14:27   ` 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 12:22 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 11:53 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 11:57 ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-11  9:39 2.4.10pre7aa1 Maneesh Soni
2001-09-11 11:12 ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-11  8:51 2.4.10pre7aa1 Dipankar Sarma
2001-09-11 11:04 ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-11 12:40   ` 2.4.10pre7aa1 Alan Cox
2001-09-11 13:49     ` 2.4.10pre7aa1 Andrea Arcangeli
     [not found] <20010910175416.A714@athlon.random>
2001-09-10 17:41 ` 2.4.10pre7aa1 Christoph Hellwig
2001-09-10 18:03   ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-10 18:49     ` 2.4.10pre7aa1 Christoph Hellwig
2001-09-10 19:01       ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-10 19:03         ` 2.4.10pre7aa1 Christoph Hellwig
2001-09-10 19:08           ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-10 18:52     ` 2.4.10pre7aa1 Christoph Hellwig
2001-09-10 19:06       ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-16 17:00         ` 2.4.10pre7aa1 Rik van Riel
2001-09-16 17:23           ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-16 17:34             ` 2.4.10pre7aa1 Rik van Riel
2001-09-16 18:16               ` 2.4.10pre7aa1 Andrea Arcangeli
2001-09-16 19:04             ` 2.4.10pre7aa1 Christoph Hellwig
2001-09-12  8:24 ` 2.4.10pre7aa1 Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).