linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.34-mm2
@ 2002-09-12  6:29 Andrew Morton
  2002-09-15  3:46 ` 2.5.34-mm2 Daniel Phillips
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-09-12  6:29 UTC (permalink / raw)
  To: lkml, linux-mm


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm2/

-throttling-fix.patch
-sleeping-release_page.patch
-dirty-state-accounting.patch
-discontig-cleanup-1.patch
-discontig-cleanup-2.patch
-writeback-thresholds.patch
-buffer-strip.patch
-rmap-speedup.patch
-wli-highpte.patch

 Merged

-lpp2.patch

 Folded into lpp.patch - hugetlb fixes

+lpp-update.patch

 More hugetlb fixes from Rohit.

+pf_nowarn.patch

 Prevent some `page allocation failure' warnings which aren't supposed
 to come out.

+jeremy.patch

 Spel Jermy's naim wright

-segq.patch

 SEGQ had an interaction with the dirty memory management.  This interaction
 was the source of Badari's IO bandwidth regression.  Removed until I have
 time to poke at it.

+wake-speedup.patch

 Badari's pagecache writeout is back up to 270 megs/sec.  The CPUs are pegged
 and the hottest functions are 

  5348 __wake_up                                111.4167
  6954 unlock_page                               72.4375
187676 generic_file_write_nolock                 71.9617
  9577 __scsi_end_request                        54.4148

 I cannot reproduce these profiles with mortal numbers of hard disks, but
 the wakeup code can be sped up heaps.

 The patch implements a new wait/wakeup mechanism which removes wait_queues
 from wait_queue_head's within __wake_up(), rather than within the woken
 process.

+buddyinfo.patch

 /proc/buddyinfo - stats on free page fragmentation.

+free_area.patch

 Nail another gratuitous typedef

+radix_tree_gang_lookup.patch

 Multipage pagecache scan and lookup.

+truncate_inode_pages.patch

 Redo the truncate/invalidate code to use gang lookups.





linus.patch
  cset-1.568.17.13-to-1.648.txt.gz

scsi_hack.patch
  Fix block-highmem for scsi

ext3-htree.patch
  Indexed directories for ext3

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

readv-writev.patch
  O_DIRECT support for readv/writev

llzpr.patch
  Reduce scheduling latency across zap_page_range

buffermem.patch
  Resurrect buffermem accounting

lpp.patch
  ia32 huge tlb pages

lpp-update.patch
  hugetlbpage fixes

sharedmem.patch
  Add /proc/meminfo:Mapped - tha amount of memory which is mapped into pagetables

ext3-sb.patch
  u.ext3_sb -> generic_sbp

oom-fix.patch
  Fix an OOM condition on big highmem machines

tlb-cleanup.patch
  Clean up the tlb gather code

dump-stack.patch
  arch-neutral dump_stack() function

wli-cleanup.patch
  random cleanups

madvise-move.patch
  move mdavise implementation into mm/madvise.c

split-vma.patch
  VMA splitting patch

mmap-fixes.patch
  mmap.c cleanup and lock ranking fixes

buffer-ops-move.patch
  Move submit_bh() and ll_rw_block() into fs/buffer.c

slab-stats.patch
  Display total slab memory in /proc/meminfo

writeback-control.patch
  Cleanup and extension of the writeback paths

free_area_init-cleanup.patch
  free_area_init() code cleanup

alloc_pages-cleanup.patch
  alloc_pages cleanup and optimisation

statm_pgd_range-sucks.patch
  Remove the pagetable walk from /proc/stat

remove-sync_thresh.patch
  Remove /proc/sys/vm/dirty_sync_thresh

pf_nowarn.patch
  Fix up the handling of PF_NOWARN

jeremy.patch
  Spel Jermy's naim wright

queue-congestion.patch
  Infrastructure for communicating request queue congestion to the VM

nonblocking-ext2-preread.patch
  avoid ext2 inode prereads if the queue is congested

nonblocking-pdflush.patch
  non-blocking writeback infrastructure, use it for pdflush

nonblocking-vm.patch
  Non-blocking page reclaim

wake-speedup.patch
  Faster wakeup code

sync-helper.patch
  Speed up sys_sync() against multiple spindles

slabasap.patch
  Early and smarter shrinking of slabs

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

buddyinfo.patch
  Add /proc/buddyinfo - stats on the free pages pool

free_area.patch
  Remove struct free_area_struct and free_area_t, use `struct free_area'

radix_tree_gang_lookup.patch
  radix tree gang lookup

truncate_inode_pages.patch
  truncate/invalidate_inode_pages rewrite

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-12  6:29 2.5.34-mm2 Andrew Morton
@ 2002-09-15  3:46 ` Daniel Phillips
  2002-09-15  4:12   ` 2.5.34-mm2 Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Phillips @ 2002-09-15  3:46 UTC (permalink / raw)
  To: Andrew Morton, lkml, linux-mm

On Thursday 12 September 2002 08:29, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm2/
>
> -sleeping-release_page.patch

What's this one?  Couldn't find it as a broken-out patch.

On the nonblocking vm front, does it rule or suck?  I heard you
mention, on the one hand, huge speedups on some load (dbench I think)
but your in-patch comments mention slowdown by 1.7X on kernel
compile.

-- 
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15  3:46 ` 2.5.34-mm2 Daniel Phillips
@ 2002-09-15  4:12   ` Andrew Morton
  2002-09-15  4:23     ` 2.5.34-mm2 Daniel Phillips
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-09-15  4:12 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: lkml, linux-mm

Daniel Phillips wrote:
> 
> On Thursday 12 September 2002 08:29, Andrew Morton wrote:
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm2/
> >
> > -sleeping-release_page.patch
> 
> What's this one?  Couldn't find it as a broken-out patch.

The `-' means it was removed from the patchset.  Linus merged it.
See  2.5.34/2.5.34-mm1/broken-out/sleeping-release_page.patch

> On the nonblocking vm front, does it rule or suck?

It rules, until someone finds something at which it sucks.

>  I heard you
> mention, on the one hand, huge speedups on some load (dbench I think)
> but your in-patch comments mention slowdown by 1.7X on kernel
> compile.

You misread.  Relative times for running `make -j6 bzImage' with mem=512m:

Unloaded system:		                     1.0
2.5.34-mm4, while running 4 x `dbench 100'           1.7
Any other kernel while running 4 x `dbench 100'      basically infinity

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15  4:12   ` 2.5.34-mm2 Andrew Morton
@ 2002-09-15  4:23     ` Daniel Phillips
  2002-09-15  5:37       ` 2.5.34-mm2 Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Phillips @ 2002-09-15  4:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Sunday 15 September 2002 06:12, Andrew Morton wrote:
> Daniel Phillips wrote:
> >  I heard you
> > mention, on the one hand, huge speedups on some load (dbench I think)
> > but your in-patch comments mention slowdown by 1.7X on kernel
> > compile.
> 
> You misread.  Relative times for running `make -j6 bzImage' with mem=512m:
> 
> Unloaded system:		                     1.0
> 2.5.34-mm4, while running 4 x `dbench 100'           1.7
> Any other kernel while running 4 x `dbench 100'      basically infinity

Oh good :-)

We can make the rescanning go away in time, with more lru lists, but
that sure looks like the low hanging fruit.

-- 
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15  4:23     ` 2.5.34-mm2 Daniel Phillips
@ 2002-09-15  5:37       ` Andrew Morton
  2002-09-15 14:58         ` 2.5.34-mm2 Rik van Riel
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-09-15  5:37 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: lkml, linux-mm

Daniel Phillips wrote:
> 
> On Sunday 15 September 2002 06:12, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > >  I heard you
> > > mention, on the one hand, huge speedups on some load (dbench I think)
> > > but your in-patch comments mention slowdown by 1.7X on kernel
> > > compile.
> >
> > You misread.  Relative times for running `make -j6 bzImage' with mem=512m:
> >
> > Unloaded system:                                   1.0
> > 2.5.34-mm4, while running 4 x `dbench 100'           1.7
> > Any other kernel while running 4 x `dbench 100'      basically infinity
> 
> Oh good :-)
> 
> We can make the rescanning go away in time, with more lru lists,

We don't actually need more lists, I expect.  Dirty and under writeback
pages just don't go on a list at all - cut them off the LRU and
bring them back at IO completion.  We can't do anything useful with
a list of dirty/writeback pages anyway, so why have the list?

It kind of depends whether we want to put swapcache on that list.  I
may just give swapper_inode a superblock and let pdflush write swap.

The interrupt-time page motion is of course essential if we are to
avoid long scans of that list.

That, and replacing the blk_congestion_wait() throttling with a per-classzone
wait_for_some_pages_to_come_clean() throttling pretty much eliminates the
remaining pointless scan activity from the VM, and fixes a current false OOM
scenario in -mm4.

> but that sure looks like the low hanging fruit.

It's low alright.  AFAIK Linux has always had this problem of
seizing up when there's a lot of dirty data around.

Let me quantify infinity:


With mem=512m, on the quad:

`make -j6 bzImage' takes two minutes and two seconds.

On 2.5.34, a concurrent 4 x `dbench 100' slows that same kernel
build down to 35 minutes and 16 seconds.

On 2.5.34-mm4, while running 4 x `dbench 100' that kernel build
takes three minutes and 45 seconds.



That's with seven disks: four for the dbenches, one for the kernel
build, one for swap and one for the executables.  Things would be
worse with less disks because of seek contention.  But that's
to be expected.  The intent of this work is to eliminate this
crosstalk between different activities.  And to avoid blocking things
which aren't touching disk at all.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15  5:37       ` 2.5.34-mm2 Andrew Morton
@ 2002-09-15 14:58         ` Rik van Riel
  2002-09-15 17:13           ` 2.5.34-mm2 Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2002-09-15 14:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel Phillips, lkml, linux-mm

On Sat, 14 Sep 2002, Andrew Morton wrote:
> Daniel Phillips wrote:

> > but that sure looks like the low hanging fruit.
>
> It's low alright.  AFAIK Linux has always had this problem of
> seizing up when there's a lot of dirty data around.

Somehow I doubt the "seizing up" problem is caused by too much
scanning.  In fact, I'm pretty convinced it is caused by having
too much IO submitted at once (and stalling in __get_request_wait).

The scanning is probably not relevant at all and it may be
beneficial to just ignore the scanning for now and do our best
to keep the pages in better LRU order.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15 17:13           ` 2.5.34-mm2 Andrew Morton
@ 2002-09-15 17:08             ` Daniel Phillips
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Phillips @ 2002-09-15 17:08 UTC (permalink / raw)
  To: Andrew Morton, Rik van Riel; +Cc: lkml, linux-mm

On Sunday 15 September 2002 19:13, Andrew Morton wrote:
> Yes, I'm not particularly fussed about (moderate) excess CPU use in these
> situations, and nor about page replacement accuracy, really - pages
> are being slushed through the system so fast that correct aging of the
> ones on the inactive list probably just doesn't count.

What you really mean is, it hasn't gotten to the top of the list
of things that suck.  When we do get around to fashioning a really
effective page ager (LRU-er, more likely) the further improvement
will be obvious, especially under heavy streaming IO load, which
is getting more important all the time.

-- 
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.34-mm2
  2002-09-15 14:58         ` 2.5.34-mm2 Rik van Riel
@ 2002-09-15 17:13           ` Andrew Morton
  2002-09-15 17:08             ` 2.5.34-mm2 Daniel Phillips
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-09-15 17:13 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Daniel Phillips, lkml, linux-mm

Rik van Riel wrote:
> 
> On Sat, 14 Sep 2002, Andrew Morton wrote:
> > Daniel Phillips wrote:
> 
> > > but that sure looks like the low hanging fruit.
> >
> > It's low alright.  AFAIK Linux has always had this problem of
> > seizing up when there's a lot of dirty data around.
> 
> Somehow I doubt the "seizing up" problem is caused by too much
> scanning.  In fact, I'm pretty convinced it is caused by having
> too much IO submitted at once (and stalling in __get_request_wait).

Yes, the latency is due to request queue contention.

Dirty data reaches the tail of the LRU and "innocent" processes are
forced to write it.  But the queue is full.  They sleep until 32
requests are free.  They wake; but so does the heavy dirtier.  The
heavy dirtier immediately fills the queue again.  The innocent
page allocator finds some more dirty data.  Repeat.

It's DoS-via-request queue.  It's made worse by the fact that
kswapd is also DoS'ed, so pretty much all tasks need to perform
direct reclaim.

There are also latency problems, with similar causes, when page-allocating
processes encounter under-writeback pages at the tail of the LRU, but
this happens less often.

> The scanning is probably not relevant at all and it may be
> beneficial to just ignore the scanning for now and do our best
> to keep the pages in better LRU order.
> 

Yes, I'm not particularly fussed about (moderate) excess CPU use in these
situations, and nor about page replacement accuracy, really - pages
are being slushed through the system so fast that correct aging of the
ones on the inactive list probably just doesn't count.

The use of "how much did we scan" to determine when we're out
of memory is a bit of a problem; but the main problem (of which
I'm aware) is that the global throttling via blk_congestion_wait()
is not a sufficiently accurate indication that "pages came clean
in ZONE_NORMAL" on big highmem boxes.

Processes which are performing GFP_KERNEL allocations can keep
on getting woken up for ZONE_HIGHMEM completion, and they eventually
decide it's OOM.  This has only been observed when the dirty memory
limits are manually increased a lot, but it points to a design problem.

I don't know what's going on in `contest', nor in Alex's X build.  We'll
see...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-15 17:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-12  6:29 2.5.34-mm2 Andrew Morton
2002-09-15  3:46 ` 2.5.34-mm2 Daniel Phillips
2002-09-15  4:12   ` 2.5.34-mm2 Andrew Morton
2002-09-15  4:23     ` 2.5.34-mm2 Daniel Phillips
2002-09-15  5:37       ` 2.5.34-mm2 Andrew Morton
2002-09-15 14:58         ` 2.5.34-mm2 Rik van Riel
2002-09-15 17:13           ` 2.5.34-mm2 Andrew Morton
2002-09-15 17:08             ` 2.5.34-mm2 Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).