linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Van Maren, Kevin" <kevin.vanmaren@unisys.com>
To: "'Andrew Morton'" <akpm@zip.com.au>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
Subject: RE: The cause of the "VM" performance problem with 2.4.X
Date: Wed, 22 Aug 2001 17:23:46 -0500	[thread overview]
Message-ID: <245F259ABD41D511A07000D0B71C4CBA289F29@us-slc-exch-3.slc.unisys.com> (raw)

Thanks Andrew!  I'll try out your patch shortly, and I'll let you
know how it goes.

> Note how fsync_dev() passes the target device to sync_buffers().  But
> the dirty buffer list is global.  So to write out the dirty buffers
> for a particular device, write_locked_buffers() has to do a linear
> walk of the dirty buffers for *other* devices to find the target
> device.

I had just figured that out.  I hadn't changed the code yet, though.

The other thing that exacerbates the start-from-0 approach is only
flushing 32 blocks at a time.  Especially if we take a long time to
find the first one because of the linear search, not doing much
work once we get there is also a problem.

> And write_unlocked_buffers() uses a quite common construct - it
> scans a list but when it drops the lock, it restarts the scan
> from the start of the list.  (We do this all over the kernel, and
> it keeps on biting us).

It sounds like we want a per-device list, not a global (linear) one.
But I'm sure there are other places in the kernel where the per-
device list would introduce problems (but would make things like
sync() better, since we could flush all devices more in parallel).
But there are probably also problems there when scaling to 100's or
1000's of disks.

> So if the dirty buffer list has 10,000 buffers for device A and
> then 10,000 buffers for device B, and you call fsync_dev(B),
> we end up traversing the 10,000 buffers of device A 10,000/32 times,
> which is a lot.
> 
> In fact, write_unlocked_buffers(A) shoots itself in the foot by
> moving buffers for device A onto BUF_LOCKED, and then restarting the
> scan.  So of *course* we end up with zillions on non-A buffers at the
> head of the list.

Is it safe to restart from the middle of the list after dropping
the lock?  It looks like you try to solve that problem by only using
the hint if the block is still on the dirty list.

I was going to cache that per-device info globally as a quick hack;
I was thinking about hanging it off the kdev_t, since I didn't want
a fixed-size array in write_some_buffers(), but your approach of just
having the caller track it works too.


I'm off to try the patch...


But before I go, here is another profile run, but without your patch,
to validate your hypothesis (seem to have lost several interrupts)

[ia64_spinlock_contention is time spinning on a spinlock (you know
which one); myskupbuffer1 is called each time it "skips" a buffer
in the LRU list (no dev match).
Note that in this case, 26000 elements are skipped on the "average"
call to write_some_buffers.]

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 50.36     40.43    40.43
ia64_spinlock_contention
 23.98     59.68    19.25     2658     7.24     8.17  write_some_buffers
 17.10     73.40    13.72                             cg_record_arc
  4.48     77.00     3.59                             mcount
  1.36     78.09     1.10    85039     0.01     0.01  _make_request
  0.82     78.75     0.66 69385716     0.00     0.00  myskipbuffer1
  0.77     79.37     0.62    85039     0.01     0.01  blk_get_queue

index % time    self  children    called     name
                                                 <spontaneous>
[1]     52.7   40.43    0.00                 ia64_spinlock_contention [1]
-----------------------------------------------
                2.68    0.34     370/2658        sync_old_buffers [11]
               16.57    2.12    2288/2658        write_unlocked_buffers [6]
[2]     28.3   19.25    2.47    2658         write_some_buffers [2]
                0.00    1.77    2658/2658        write_locked_buffers [12]
                0.66    0.00 69385716/69385716     myskipbuffer1 [16]
                0.01    0.01   85056/123329      _refile_buffer [54]
                0.01    0.00   85056/130255      _insert_into_lru_list [70]
                0.01    0.00   85056/85056       myatomic_set_buffer_clean
[93]
                0.00    0.00   85056/85056       my__refile_buffer [104]
                0.00    0.00   85056/85056       mytest_and_set_bit [113]
                0.00    0.00    2658/418442      spin_unlock_ [97]
-----------------------------------------------


Kevin Van Maren

             reply	other threads:[~2001-08-22 22:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-22 22:23 Van Maren, Kevin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2001-08-28 17:35 The cause of the "VM" performance problem with 2.4.X Van Maren, Kevin
2001-08-28 18:52 ` Linus Torvalds
2001-08-28 19:29   ` André Dahlqvist
2001-08-29 13:49     ` Rik van Riel
2001-08-29  8:22   ` Jens Axboe
2001-08-29  8:25     ` Jens Axboe
2001-08-23 17:26 Van Maren, Kevin
2001-08-23 17:06 Van Maren, Kevin
2001-08-23 17:18 ` Andrew Morton
2001-08-23  1:48 Van Maren, Kevin
2001-08-23 16:33 ` Andrew Morton
2001-08-22  5:31 Van Maren, Kevin
2001-08-22 20:19 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=245F259ABD41D511A07000D0B71C4CBA289F29@us-slc-exch-3.slc.unisys.com \
    --to=kevin.vanmaren@unisys.com \
    --cc=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).