All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Rik van Riel <riel@conectiva.com.br>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	Dave McCracken <dmccr@us.ibm.com>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [PATCH] Optimize out pte_chain take three
Date: Thu, 11 Jul 2002 13:42:21 -0700	[thread overview]
Message-ID: <3D2DEDAD.A38AFF25@zip.com.au> (raw)
In-Reply-To: Pine.LNX.4.44L.0207111703080.14432-100000@imladris.surriel.com

Rik van Riel wrote:
> 
> ...
> > useful pagecache and swapping everything out.  Our kernels have
> > O_STREAMING because of this.   It simply removes as much pagecache
> > as it can, each time ->nrpages reaches 256.  It's rather effective.
> 
> Now why does that remind me of drop-behind ? ;)

I looked at 2.4-ac as well.  Seems that the dropbehind there only
addresses reads?

This is a specialised application and frankly, I don't think
magical voodoo kernel logic will ever work as well as exposing
capabilities to the application.   The posix_fadvise() API is
basically ideal for this, but it's quite hard for Linux to
implement efficiently.   How do we efficiently discard the 
10,000 pages starting at page offset 25,000,000?

We can do that in O(not much) time with

	radix_tree_gang_lookup(void **pointers, int how_many, int starting_offset)

but that hasn't been written.  It would make truncate/invalidate_inode_pages
tons faster and cleaner too.
 
> > I installed 2.5.25+rmap on my desktop yesterday.  Come in this morning
> > to discover half of memory is inodes, quarter of memory is dentries and
> > I'm 40 megs into swap.  Sigh.
> 
> As requested by Linus, this patch only has the mechanism
> and none of the balancing changes.
> 
> I suspect Ed Tomlinson's patch will fix this issue.

yup.


btw, I was looking into many-spindle writeback performance
yesterday.  It's pretty bad.  Test case is simply four disks,
four ext2 filesytems, four processes flat-out writing to each
disk.

Throughput is only 60% of O_DIRECT because one of the disk's
queues fills up and everybody ends up blocking on that queue.

2.4 has the same problem, and it's basically unsolvable there
because of the global buffer LRU.

In 2.5, the balance_dirty() path is trivially solved by making
the caller of balance_dirty_pages only write back data against 
the superblock which he just dirtied.

However unless I set the dirty memory thresholds super-low
so that in fact none of the queues ever fills, we still hit
the same interqueue contention in the page reclaim code.

I was scratching my head over this for some time:  how come
there are dirty pages at the tail of the LRU, when the inactive
list is quite enormous?  I need to confirm this, but I suspect
it's metadata: we're moving pages to the head of the LRU when
they are first added to the inode, and when writeback is started.
But we're *not* performing that motion when the fs does
mark_buffer_dirty(bitmap block), for example.

So that dirty-against-a-full-queue bitmap block is a little
timebomb, worming its way to the head of the LRU.

Probably, a touch_buffer() in mark_buffer_dirty() will plug this,
but that's even more atomic operations, even more banging on
the pagemap_lru_lock.

I suspect the best fix here is to not have dirty or writeback 
pagecache pages on the LRU at all.  Throttle on memory coming
reclaimable, put the pages back on the LRU when they're clean,
etc.  As we have often discussed.  Big change.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

  reply	other threads:[~2002-07-11 20:42 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-07-09 19:04 [PATCH] Optimize out pte_chain take two Dave McCracken
2002-07-10  0:21 ` Andrew Morton
2002-07-10 14:33   ` [PATCH] Optimize out pte_chain take three Dave McCracken
2002-07-10 15:18     ` Rik van Riel
2002-07-10 17:32       ` William Lee Irwin III
2002-07-10 20:01         ` Andrew Morton
2002-07-10 20:14           ` Rik van Riel
2002-07-10 20:28             ` Andrew Morton
2002-07-10 20:38               ` Rik van Riel
2002-07-13 13:42                 ` Daniel Phillips
2002-07-10 20:33             ` Martin J. Bligh
2002-07-10 22:22           ` William Lee Irwin III
2002-07-11  0:39             ` Andrew Morton
2002-07-11  0:47               ` Rik van Riel
2002-07-11  1:27                 ` Andrew Morton
2002-07-13 14:10                 ` Daniel Phillips
2002-07-11  1:51               ` William Lee Irwin III
2002-07-11  2:28                 ` William Lee Irwin III
2002-07-11 19:54                 ` Andrew Morton
2002-07-11 20:05                   ` Rik van Riel
2002-07-11 20:42                     ` Andrew Morton [this message]
2002-07-11 20:54                       ` Rik van Riel
2002-07-11 21:16                         ` Andrew Morton
2002-07-11 21:41                           ` Rik van Riel
2002-07-11 22:38                             ` Andrew Morton
2002-07-11 23:18                               ` Rik van Riel
2002-07-12 18:27                                 ` Paul Larson
2002-07-12 19:06                                   ` Andrew Morton
2002-07-12 19:28                                 ` Andrew Morton
2002-07-13 15:08                               ` Daniel Phillips
2002-07-11 22:54                   ` William Lee Irwin III
2002-07-13 14:52                   ` Daniel Phillips
2002-07-13 14:08               ` Daniel Phillips
2002-07-13 14:20               ` Daniel Phillips
2002-07-13 14:45             ` Daniel Phillips
2002-07-13 13:22           ` Daniel Phillips
2002-07-13 13:30             ` William Lee Irwin III
2002-07-13 13:55               ` Daniel Phillips
2002-07-13 13:41           ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D2DEDAD.A38AFF25@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=dmccr@us.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@conectiva.com.br \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.