linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.38-mm2
@ 2002-09-23  4:20 Andrew Morton
  2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrew Morton @ 2002-09-23  4:20 UTC (permalink / raw)
  To: lkml, linux-mm


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/

+linus.patch

 Linus's current diff.

-filemap-fixes.patch

 Merged

+unbreak-writeback-mode.patch

 ext3 in data=writeback mode was oopsing on writeback of MAP_SHARED
 data.

+read-latency.patch

 Fix the writer-starves-reader elevator problem.  This is basically
 the read_latency2 patch from -ac kernels.

 On IDE it provides a 100x improvement in read throughput when there
 is heavy writeback happening.  40x on SCSI.  You need to disable
 tagged command queueing on scsi - it appears to be quite stupidly
 implemented.


linus.patch
  cset-1.580.1.4-to-1.597.txt.gz

ide-high-1.patch

ide-block-fix-1.patch

scsi_hack.patch
  Fix block-highmem for scsi

ext3-htree.patch
  Indexed directories for ext3

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

might_sleep.patch
  debug code to detect might-sleep-inside-spinlock bugs

unbreak-writeback-mode.patch
  Fix ext3's data=writeback mode

queue-congestion.patch
  Infrastructure for communicating request queue congestion to the VM

nonblocking-ext2-preread.patch
  avoid ext2 inode prereads if the queue is congested

nonblocking-pdflush.patch
  non-blocking writeback infrastructure, use it for pdflush

nonblocking-vm.patch
  Non-blocking page reclaim

set_page_dirty-locking-fix.patch
  don't call __mark_inode_dirty under spinlock

prepare_to_wait.patch
  prepare_to_wait/finish_wait: new sleep/wakeup API

vm-wakeups.patch
  Use the faster wakeups in the VM and block layers

sync-helper.patch
  Speed up sys_sync() against multiple spindles

slabasap.patch
  Early and smarter shrinking of slabs

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

buddyinfo.patch
  Add /proc/buddyinfo - stats on the free pages pool

free_area.patch
  Remove struct free_area_struct and free_area_t, use `struct free_area'

per-node-kswapd.patch
  Per-node kswapd instance

topology-api.patch
  Simple topology API

radix_tree_gang_lookup.patch
  radix tree gang lookup

truncate_inode_pages.patch
  truncate/invalidate_inode_pages rewrite

proc_vmstat.patch
  Move the vm accounting out of /proc/stat

kswapd-reclaim-stats.patch
  Add kswapd_steal to /proc/vmstat

iowait.patch
  I/O wait statistics

sard.patch
  SARD disk accounting

remove-gfp_nfs.patch
  remove GFP_NFS

tcp-wakeups.patch
  Use fast wakeups in TCP/IPV4

swapoff-deadlock.patch
  Fix a tmpfs swapoff deadlock

dirty-and-uptodate.patch
  page state cleanup

shmem_rename.patch
  shmem_rename() directory link count fix

dirent-size.patch
  tmpfs: show a non-zero size for directories

tmpfs-trivia.patch
  tmpfs: small fixlets

per-zone-vm.patch
  separate the kswapd and direct reclaim code paths

swsusp-feature.patch
  add shrink_all_memory() for swsusp

adaptec-fix.patch
  partial fix for aic7xxx error recovery

remove-page-virtual.patch
  remove page->virtual for !WANT_PAGE_VIRTUAL

dirty-memory-clamp.patch
  sterner dirty-memory clamping

mempool-wakeup-fix.patch
  Fix for stuck tasks in mempool_alloc()

remove-write_mapping_buffers.patch
  Remove write_mapping_buffers

buffer_boundary-scheduling.patch
  IO schduling for indirect blocks

ll_rw_block-cleanup.patch
  cleanup ll_rw_block()

lseek-ext2_readdir.patch
  remove lock_kernel() from ext2_readdir()

discontig-no-contig_page_data.patch
  undefine contif_page_data for discontigmem

per-node-zone_normal.patch
  ia32 NUMA: per-node ZONE_NORMAL

alloc_pages_node-cleanup.patch
  alloc_pages_node cleanup

read_barrier_depends.patch
  extended barrier primitives

rcu_ltimer.patch
  RCU core

dcache_rcu.patch
  Use RCU for dcache

read-latency.patch
  Elevator fix for writes-starving-reads

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2
  2002-09-23  4:20 2.5.38-mm2 Andrew Morton
@ 2002-09-23  7:16 ` Jens Axboe
  2002-09-23  7:43   ` 2.5.38-mm2 Andrew Morton
  2002-09-24 21:10   ` 2.5.38-mm2 Bill Davidsen
  2002-09-23  9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
  2002-09-23  9:56 ` 2.5.38-mm2 [PATCH] (dcache) Dipankar Sarma
  2 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2002-09-23  7:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Sun, Sep 22 2002, Andrew Morton wrote:
> +read-latency.patch
> 
>  Fix the writer-starves-reader elevator problem.  This is basically
>  the read_latency2 patch from -ac kernels.
> 
>  On IDE it provides a 100x improvement in read throughput when there
>  is heavy writeback happening.  40x on SCSI.  You need to disable

Ah interesting. I do still think that it is worth to investigate _why_
both elevator_linus and deadline does not prevent the read starvation.
The read-latency is a hack, not a solution imo.

>  tagged command queueing on scsi - it appears to be quite stupidly
>  implemented.

Ahem I think you are being excessively harsh, or maybe passing judgement
on something you haven't even looked at. Did you consider that you
_drive_ may be the broken component? Excessive turn-around times for
request when using deep tcq is not unusual, by far.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2
  2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
@ 2002-09-23  7:43   ` Andrew Morton
  2002-09-24 21:10   ` 2.5.38-mm2 Bill Davidsen
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2002-09-23  7:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: lkml, linux-mm

Jens Axboe wrote:
> 
> On Sun, Sep 22 2002, Andrew Morton wrote:
> > +read-latency.patch
> >
> >  Fix the writer-starves-reader elevator problem.  This is basically
> >  the read_latency2 patch from -ac kernels.
> >
> >  On IDE it provides a 100x improvement in read throughput when there
> >  is heavy writeback happening.  40x on SCSI.  You need to disable
> 
> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.

I did.  See below.

> The read-latency is a hack, not a solution imo.

Well it clearly _is_ a solution.  To a grave problem.  But hopefully not
the best solution.  Really, this is just me saying "ouch".  This is
your stuff ;)

> >  tagged command queueing on scsi - it appears to be quite stupidly
> >  implemented.
> 
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.

It's a Fujitsu SCA-2 thing.  Could be that other drive manufacturers
have a slight clue, but I doubt it.  I bet they just went and designed
the queueing for optimum throughput, with the assumption that reads 
and writes are muchly the same thing.

But they're not.  They are vastly different things.  Your fancy 2GHz
processor twiddles thumbs waiting for reads.  But not for writes.
The "hack" _recognises_ this fact - that reads are very different
things from writes.


Let's run the numbers.  128 slot write request queue.  512k writes.
30 mbyte/sec bandwidth.  That's two seconds worth of writes in the
request queue.

The reads have basically no chance of getting inserted between those
writes, so the first read has a two second latency, and that's before
adding in any of the passovers which additional writes will enjoy.

It works out that the latency per read is about three seconds.  I
have all the traces of this.

Now think about what userspace wants to do.  It reads a block from
the directory.  Three seconds.  Parse the directory, go read an
inode block.  Three seconds.  Go read the file.  Three seconds
if it's less than 56k.  Six seconds otherwise.

That's nine seconds since we read the directory block.  I'm running
with mem=192m.  So by now, the directory block has been reclaimed.

Move onto the next file.


So there is no bug or coding error present in the elevator.  Everything
is working as it is designed to. But a streaming write slows read
performance by a factor of 4000.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-23  4:20 2.5.38-mm2 Andrew Morton
  2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
@ 2002-09-23  9:45 ` Dipankar Sarma
  2002-09-23 16:28   ` Andrew Morton
  2002-09-24  4:41   ` Rusty Russell
  2002-09-23  9:56 ` 2.5.38-mm2 [PATCH] (dcache) Dipankar Sarma
  2 siblings, 2 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23  9:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
> read_barrier_depends.patch
>   extended barrier primitives
> 
> rcu_ltimer.patch
>   RCU core
> 
> dcache_rcu.patch
>   Use RCU for dcache
> 

Hi Andrew,

The following patch fixes a typo for preemptive kernels.

Later I will submit a full rcu_ltimer patch that contains
the call_rcu_preempt() interface which can be useful for
module unloading and the likes. This doesn't affect
the non-preemption path.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.


--- include/linux/rcupdate.h	Mon Sep 23 11:47:26 2002
+++ /tmp/rcupdate.h	Mon Sep 23 12:45:21 2002
@@ -116,7 +116,7 @@
 		return 0;
 }
 
-#ifdef CONFIG_PREEMPTION
+#ifdef CONFIG_PREEMPT
 #define rcu_read_lock()		preempt_disable()
 #define rcu_read_unlock()	preempt_enable()
 #else

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH] (dcache)
  2002-09-23  4:20 2.5.38-mm2 Andrew Morton
  2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
  2002-09-23  9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
@ 2002-09-23  9:56 ` Dipankar Sarma
  2 siblings, 0 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23  9:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
> 
> read_barrier_depends.patch
>   extended barrier primitives
> 
> rcu_ltimer.patch
>   RCU core
> 
> dcache_rcu.patch
>   Use RCU for dcache
> 

Hi Andrew,

dcache_rcu orders writes using wmb() (list_del_rcu) while deleting from
the hash list and the d_lookup() hash list traversal requires an rmb() for 
alpha. So, we need to use the read_barrier_depends() interface there. 
This isn't a problem with any other archs AFAIK.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.


--- fs/dcache.c	Mon Sep 23 11:47:26 2002
+++ /tmp/dcache.c	Mon Sep 23 12:54:33 2002
@@ -870,7 +870,9 @@
 	rcu_read_lock();
 	tmp = head->next;
 	for (;;) {
-		struct dentry * dentry = list_entry(tmp, struct dentry, d_hash);
+		struct dentry * dentry;
+		read_barrier_depends();
+	       	dentry = list_entry(tmp, struct dentry, d_hash);
 		if (tmp == head)
 			break;
 		tmp = tmp->next;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-23  9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
@ 2002-09-23 16:28   ` Andrew Morton
  2002-09-23 17:33     ` Dipankar Sarma
  2002-09-24  4:41   ` Rusty Russell
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2002-09-23 16:28 UTC (permalink / raw)
  To: dipankar; +Cc: lkml, linux-mm

Dipankar Sarma wrote:
> 
> ...
> -#ifdef CONFIG_PREEMPTION
> +#ifdef CONFIG_PREEMPT
>  #define rcu_read_lock()                preempt_disable()
>  #define rcu_read_unlock()      preempt_enable()
>  #else

Thanks.  I just replaced

#ifdef CONFIG_PREEMPTION
#define rcu_read_lock()        preempt_disable()
#define rcu_read_unlock()      preempt_enable()
#else
#define rcu_read_lock()        do {} while(0)
#define rcu_read_unlock()      do {} while(0)
#endif

with

#define rcu_read_lock()        preempt_disable()
#define rcu_read_unlock()      preempt_enable()

because preempt_disable() is a no-op on CONFIG_PREEMPT=n anyway.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-23 16:28   ` Andrew Morton
@ 2002-09-23 17:33     ` Dipankar Sarma
  0 siblings, 0 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-23 17:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Mon, Sep 23, 2002 at 09:28:41AM -0700, Andrew Morton wrote:
> #ifdef CONFIG_PREEMPTION
> #define rcu_read_lock()        preempt_disable()
> #define rcu_read_unlock()      preempt_enable()
> #else
> #define rcu_read_lock()        do {} while(0)
> #define rcu_read_unlock()      do {} while(0)
> #endif
> 
> with
> 
> #define rcu_read_lock()        preempt_disable()
> #define rcu_read_unlock()      preempt_enable()
> 
> because preempt_disable() is a no-op on CONFIG_PREEMPT=n anyway.

This is fine. The original rcu_ltimer patch needed #ifdef CONFIG_PREEMPT,
so that it could be easily used with 2.4. With preemption in 2.5, 
rcu_read_xxx() can be preempt_xxx().

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-23  9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
  2002-09-23 16:28   ` Andrew Morton
@ 2002-09-24  4:41   ` Rusty Russell
  2002-09-24 10:24     ` Dipankar Sarma
  1 sibling, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2002-09-24  4:41 UTC (permalink / raw)
  To: dipankar; +Cc: akpm, linux-kernel, linux-mm

On Mon, 23 Sep 2002 15:15:59 +0530
Dipankar Sarma <dipankar@in.ibm.com> wrote:
> Later I will submit a full rcu_ltimer patch that contains
> the call_rcu_preempt() interface which can be useful for
> module unloading and the likes. This doesn't affect
> the non-preemption path.

You don't need this: I've dropped the requirement for module
unload.

Cheers!
Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-24  4:41   ` Rusty Russell
@ 2002-09-24 10:24     ` Dipankar Sarma
  2002-09-24 14:56       ` Rusty Russell
  0 siblings, 1 reply; 11+ messages in thread
From: Dipankar Sarma @ 2002-09-24 10:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: akpm, linux-kernel, linux-mm

On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> On Mon, 23 Sep 2002 15:15:59 +0530
> Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > Later I will submit a full rcu_ltimer patch that contains
> > the call_rcu_preempt() interface which can be useful for
> > module unloading and the likes. This doesn't affect
> > the non-preemption path.
> 
> You don't need this: I've dropped the requirement for module
> unload.

Isn't wait_for_later() similar to synchornize_kernel() or has the
entire module unloading design been changed since ?

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2 [PATCH]
  2002-09-24 10:24     ` Dipankar Sarma
@ 2002-09-24 14:56       ` Rusty Russell
  0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2002-09-24 14:56 UTC (permalink / raw)
  To: dipankar; +Cc: akpm, linux-kernel, linux-mm

In message <20020924155428.B4085@in.ibm.com> you write:
> On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> > On Mon, 23 Sep 2002 15:15:59 +0530
> > Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > > Later I will submit a full rcu_ltimer patch that contains
> > > the call_rcu_preempt() interface which can be useful for
> > > module unloading and the likes. This doesn't affect
> > > the non-preemption path.
> > 
> > You don't need this: I've dropped the requirement for module
> > unload.
> 
> Isn't wait_for_later() similar to synchornize_kernel() or has the
> entire module unloading design been changed since ?

Yes, that was *days* ago 8)

I now just use a synchronize_kernel() which schedules on every CPU,
and disable preempt in magic places.

Ingo growled at me...
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.38-mm2
  2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
  2002-09-23  7:43   ` 2.5.38-mm2 Andrew Morton
@ 2002-09-24 21:10   ` Bill Davidsen
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Davidsen @ 2002-09-24 21:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, lkml, linux-mm

On Mon, 23 Sep 2002, Jens Axboe wrote:

> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.
> The read-latency is a hack, not a solution imo.
> 
> >  tagged command queueing on scsi - it appears to be quite stupidly
> >  implemented.
> 
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.

I do think that's what he meant!  I think most drives are optimized this
way, and performance would be better if the kernel used the queueing more
sparingly, so the drive couldn't just run with the writes and let the
reads take the leftovers. 

I think that's a better long run solution, although the fix addresses the
immediate problem.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-09-24 21:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-23  4:20 2.5.38-mm2 Andrew Morton
2002-09-23  7:16 ` 2.5.38-mm2 Jens Axboe
2002-09-23  7:43   ` 2.5.38-mm2 Andrew Morton
2002-09-24 21:10   ` 2.5.38-mm2 Bill Davidsen
2002-09-23  9:45 ` 2.5.38-mm2 [PATCH] Dipankar Sarma
2002-09-23 16:28   ` Andrew Morton
2002-09-23 17:33     ` Dipankar Sarma
2002-09-24  4:41   ` Rusty Russell
2002-09-24 10:24     ` Dipankar Sarma
2002-09-24 14:56       ` Rusty Russell
2002-09-23  9:56 ` 2.5.38-mm2 [PATCH] (dcache) Dipankar Sarma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).