* Re: 2.5.38-mm2
2002-09-23 4:20 2.5.38-mm2 Andrew Morton
@ 2002-09-23 7:16 ` Jens Axboe
2002-09-23 7:43 ` 2.5.38-mm2 Andrew Morton
2002-09-24 21:10 ` 2.5.38-mm2 Bill Davidsen
2002-09-23 9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
2002-09-23 9:56 ` 2.5.38-mm2 [PATCH] (dcache) Dipankar Sarma
2 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2002-09-23 7:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Sun, Sep 22 2002, Andrew Morton wrote:
> +read-latency.patch
>
> Fix the writer-starves-reader elevator problem. This is basically
> the read_latency2 patch from -ac kernels.
>
> On IDE it provides a 100x improvement in read throughput when there
> is heavy writeback happening. 40x on SCSI. You need to disable
Ah interesting. I do still think that it is worth to investigate _why_
both elevator_linus and deadline does not prevent the read starvation.
The read-latency is a hack, not a solution imo.
> tagged command queueing on scsi - it appears to be quite stupidly
> implemented.
Ahem I think you are being excessively harsh, or maybe passing judgement
on something you haven't even looked at. Did you consider that you
_drive_ may be the broken component? Excessive turn-around times for
request when using deep tcq is not unusual, by far.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2
2002-09-23 7:16 ` 2.5.38-mm2 Jens Axboe
@ 2002-09-23 7:43 ` Andrew Morton
2002-09-24 21:10 ` 2.5.38-mm2 Bill Davidsen
1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2002-09-23 7:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: lkml, linux-mm
Jens Axboe wrote:
>
> On Sun, Sep 22 2002, Andrew Morton wrote:
> > +read-latency.patch
> >
> > Fix the writer-starves-reader elevator problem. This is basically
> > the read_latency2 patch from -ac kernels.
> >
> > On IDE it provides a 100x improvement in read throughput when there
> > is heavy writeback happening. 40x on SCSI. You need to disable
>
> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.
I did. See below.
> The read-latency is a hack, not a solution imo.
Well it clearly _is_ a solution. To a grave problem. But hopefully not
the best solution. Really, this is just me saying "ouch". This is
your stuff ;)
> > tagged command queueing on scsi - it appears to be quite stupidly
> > implemented.
>
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.
It's a Fujitsu SCA-2 thing. Could be that other drive manufacturers
have a slight clue, but I doubt it. I bet they just went and designed
the queueing for optimum throughput, with the assumption that reads
and writes are muchly the same thing.
But they're not. They are vastly different things. Your fancy 2GHz
processor twiddles thumbs waiting for reads. But not for writes.
The "hack" _recognises_ this fact - that reads are very different
things from writes.
Let's run the numbers. 128 slot write request queue. 512k writes.
30 mbyte/sec bandwidth. That's two seconds worth of writes in the
request queue.
The reads have basically no chance of getting inserted between those
writes, so the first read has a two second latency, and that's before
adding in any of the passovers which additional writes will enjoy.
It works out that the latency per read is about three seconds. I
have all the traces of this.
Now think about what userspace wants to do. It reads a block from
the directory. Three seconds. Parse the directory, go read an
inode block. Three seconds. Go read the file. Three seconds
if it's less than 56k. Six seconds otherwise.
That's nine seconds since we read the directory block. I'm running
with mem=192m. So by now, the directory block has been reclaimed.
Move onto the next file.
So there is no bug or coding error present in the elevator. Everything
is working as it is designed to. But a streaming write slows read
performance by a factor of 4000.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2
2002-09-23 7:16 ` 2.5.38-mm2 Jens Axboe
2002-09-23 7:43 ` 2.5.38-mm2 Andrew Morton
@ 2002-09-24 21:10 ` Bill Davidsen
1 sibling, 0 replies; 11+ messages in thread
From: Bill Davidsen @ 2002-09-24 21:10 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, lkml, linux-mm
On Mon, 23 Sep 2002, Jens Axboe wrote:
> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.
> The read-latency is a hack, not a solution imo.
>
> > tagged command queueing on scsi - it appears to be quite stupidly
> > implemented.
>
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.
I do think that's what he meant! I think most drives are optimized this
way, and performance would be better if the kernel used the queueing more
sparingly, so the drive couldn't just run with the writes and let the
reads take the leftovers.
I think that's a better long run solution, although the fix addresses the
immediate problem.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-23 4:20 2.5.38-mm2 Andrew Morton
2002-09-23 7:16 ` 2.5.38-mm2 Jens Axboe
@ 2002-09-23 9:45 ` Dipankar Sarma
2002-09-23 16:28 ` Andrew Morton
2002-09-24 4:41 ` Rusty Russell
2002-09-23 9:56 ` 2.5.38-mm2 [PATCH] (dcache) Dipankar Sarma
2 siblings, 2 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23 9:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
> read_barrier_depends.patch
> extended barrier primitives
>
> rcu_ltimer.patch
> RCU core
>
> dcache_rcu.patch
> Use RCU for dcache
>
Hi Andrew,
The following patch fixes a typo for preemptive kernels.
Later I will submit a full rcu_ltimer patch that contains
the call_rcu_preempt() interface which can be useful for
module unloading and the likes. This doesn't affect
the non-preemption path.
Thanks
--
Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--- include/linux/rcupdate.h Mon Sep 23 11:47:26 2002
+++ /tmp/rcupdate.h Mon Sep 23 12:45:21 2002
@@ -116,7 +116,7 @@
return 0;
}
-#ifdef CONFIG_PREEMPTION
+#ifdef CONFIG_PREEMPT
#define rcu_read_lock() preempt_disable()
#define rcu_read_unlock() preempt_enable()
#else
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-23 9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
@ 2002-09-23 16:28 ` Andrew Morton
2002-09-23 17:33 ` Dipankar Sarma
2002-09-24 4:41 ` Rusty Russell
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2002-09-23 16:28 UTC (permalink / raw)
To: dipankar; +Cc: lkml, linux-mm
Dipankar Sarma wrote:
>
> ...
> -#ifdef CONFIG_PREEMPTION
> +#ifdef CONFIG_PREEMPT
> #define rcu_read_lock() preempt_disable()
> #define rcu_read_unlock() preempt_enable()
> #else
Thanks. I just replaced
#ifdef CONFIG_PREEMPTION
#define rcu_read_lock() preempt_disable()
#define rcu_read_unlock() preempt_enable()
#else
#define rcu_read_lock() do {} while(0)
#define rcu_read_unlock() do {} while(0)
#endif
with
#define rcu_read_lock() preempt_disable()
#define rcu_read_unlock() preempt_enable()
because preempt_disable() is a no-op on CONFIG_PREEMPT=n anyway.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-23 16:28 ` Andrew Morton
@ 2002-09-23 17:33 ` Dipankar Sarma
0 siblings, 0 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23 17:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Mon, Sep 23, 2002 at 09:28:41AM -0700, Andrew Morton wrote:
> #ifdef CONFIG_PREEMPTION
> #define rcu_read_lock() preempt_disable()
> #define rcu_read_unlock() preempt_enable()
> #else
> #define rcu_read_lock() do {} while(0)
> #define rcu_read_unlock() do {} while(0)
> #endif
>
> with
>
> #define rcu_read_lock() preempt_disable()
> #define rcu_read_unlock() preempt_enable()
>
> because preempt_disable() is a no-op on CONFIG_PREEMPT=n anyway.
This is fine. The original rcu_ltimer patch needed #ifdef CONFIG_PREEMPT,
so that it could be easily used with 2.4. With preemption in 2.5,
rcu_read_xxx() can be preempt_xxx().
Thanks
--
Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-23 9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
2002-09-23 16:28 ` Andrew Morton
@ 2002-09-24 4:41 ` Rusty Russell
2002-09-24 10:24 ` Dipankar Sarma
1 sibling, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2002-09-24 4:41 UTC (permalink / raw)
To: dipankar; +Cc: akpm, linux-kernel, linux-mm
On Mon, 23 Sep 2002 15:15:59 +0530
Dipankar Sarma <dipankar@in.ibm.com> wrote:
> Later I will submit a full rcu_ltimer patch that contains
> the call_rcu_preempt() interface which can be useful for
> module unloading and the likes. This doesn't affect
> the non-preemption path.
You don't need this: I've dropped the requirement for module
unload.
Cheers!
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-24 4:41 ` Rusty Russell
@ 2002-09-24 10:24 ` Dipankar Sarma
2002-09-24 14:56 ` Rusty Russell
0 siblings, 1 reply; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-24 10:24 UTC (permalink / raw)
To: Rusty Russell; +Cc: akpm, linux-kernel, linux-mm
On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> On Mon, 23 Sep 2002 15:15:59 +0530
> Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > Later I will submit a full rcu_ltimer patch that contains
> > the call_rcu_preempt() interface which can be useful for
> > module unloading and the likes. This doesn't affect
> > the non-preemption path.
>
> You don't need this: I've dropped the requirement for module
> unload.
Isn't wait_for_later() similar to synchornize_kernel() or has the
entire module unloading design been changed since ?
Thanks
--
Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH]
2002-09-24 10:24 ` Dipankar Sarma
@ 2002-09-24 14:56 ` Rusty Russell
0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2002-09-24 14:56 UTC (permalink / raw)
To: dipankar; +Cc: akpm, linux-kernel, linux-mm
In message <20020924155428.B4085@in.ibm.com> you write:
> On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> > On Mon, 23 Sep 2002 15:15:59 +0530
> > Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > > Later I will submit a full rcu_ltimer patch that contains
> > > the call_rcu_preempt() interface which can be useful for
> > > module unloading and the likes. This doesn't affect
> > > the non-preemption path.
> >
> > You don't need this: I've dropped the requirement for module
> > unload.
>
> Isn't wait_for_later() similar to synchornize_kernel() or has the
> entire module unloading design been changed since ?
Yes, that was *days* ago 8)
I now just use a synchronize_kernel() which schedules on every CPU,
and disable preempt in magic places.
Ingo growled at me...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.38-mm2 [PATCH] (dcache)
2002-09-23 4:20 2.5.38-mm2 Andrew Morton
2002-09-23 7:16 ` 2.5.38-mm2 Jens Axboe
2002-09-23 9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
@ 2002-09-23 9:56 ` Dipankar Sarma
2 siblings, 0 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23 9:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
>
> read_barrier_depends.patch
> extended barrier primitives
>
> rcu_ltimer.patch
> RCU core
>
> dcache_rcu.patch
> Use RCU for dcache
>
Hi Andrew,
dcache_rcu orders writes using wmb() (list_del_rcu) while deleting from
the hash list and the d_lookup() hash list traversal requires an rmb() for
alpha. So, we need to use the read_barrier_depends() interface there.
This isn't a problem with any other archs AFAIK.
Thanks
--
Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--- fs/dcache.c Mon Sep 23 11:47:26 2002
+++ /tmp/dcache.c Mon Sep 23 12:54:33 2002
@@ -870,7 +870,9 @@
rcu_read_lock();
tmp = head->next;
for (;;) {
- struct dentry * dentry = list_entry(tmp, struct dentry, d_hash);
+ struct dentry * dentry;
+ read_barrier_depends();
+ dentry = list_entry(tmp, struct dentry, d_hash);
if (tmp == head)
break;
tmp = tmp->next;
^ permalink raw reply [flat|nested] 11+ messages in thread