linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.18: lru_list_lock contention in write_unlocked_buffers()
@ 2003-04-02  7:50 j-nomura
  2003-04-02  8:23 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: j-nomura @ 2003-04-02  7:50 UTC (permalink / raw)
  To: linux-kernel

Hello,

when I run mkfs while doing other large file I/O in parallel,
the system response becomes terribly bad on 2.4.18 kernel.
(probably on other 2.4 kernels also)

I found there are hard contention on lru_list_lock, which is mostly held
by write_unlocked_buffers().
It happens only on large memory machine because lru_list can grow very long
and write_some_buffers() scans the long list from head on each call.

Lowlatency patch in aa tree did not help this situation.

The patch below is hasty workaround for it.
Any comments, or suggestions to better fix?


For example, on 8 CPUs and 16GB memory system, I've run:
  - 4 instance of 'dd if=/dev/zero of=somefile'
  - 1 instance of mke2fs
  - more than 5 top to keep other CPUs busy

Output of SGI lockstat for the test looked like:
SPINLOCKS         HOLD            WAIT
  UTIL    MEAN(  MAX )   MEAN(  MAX )     TOTAL   NAME
 93.6%    83us( 549ms)   56ms(1591ms)   5076092   lru_list_lock
 92.9%    45ms( 520ms)   26us(6703us)      9283     write_unlocked_buffers+0x30
 0.27%  1354us( 519ms)  4.2us(  16us)       888     wait_for_locked_buffers+0x30
 0.26%    58ms( 549ms)   20us(  25us)        20     invalidate_bdev+0x60

Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
Enterprise Linux Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.


--- linux/fs/buffer.c
+++ linux/fs/buffer.c
@@ -120,6 +120,8 @@ union bdflush_param {
 /* These are the min and max parameter values that we will allow to be assigned */
 int bdflush_min[N_PARAM] = {  0,  10,    5,   25,  0,   1*HZ,   0, 0, 0};
 int bdflush_max[N_PARAM] = {100,50000, 20000, 20000,10000*HZ, 6000*HZ, 100, 0, 0};
+int max_nr_seek_lrulist = 1024;
+int lrulist_seek_delay  = 1;
 
 
 static int 
@@ -231,10 +233,19 @@ static int write_some_buffers(kdev_t dev
 	struct buffer_head *array[NRSYNC];
 	unsigned int count;
 	int nr;
+	int fast_exit = 0;
+	int pass = 0;
 
+repeat:
 	next = lru_list[BUF_DIRTY];
 	nr = nr_buffers_type[BUF_DIRTY];
 	count = 0;
+
+	if (dev && max_nr_seek_lrulist && max_nr_seek_lrulist < nr && !pass) {
+		nr = max_nr_seek_lrulist;
+		fast_exit = 1;
+	}
+
 	while (next && --nr >= 0) {
 		struct buffer_head * bh = next;
 		next = bh->b_next_free;
@@ -267,6 +278,17 @@ static int write_some_buffers(kdev_t dev
 
 	if (count)
 		write_locked_buffers(array, count);
+	if (fast_exit && next) {
+		if (lrulist_seek_delay) {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			schedule_timeout(lrulist_seek_delay*HZ);
+			spin_lock(&lru_list_lock);
+			fast_exit = 0;
+			pass++;
+			goto repeat;
+		}
+		return -EAGAIN;
+	}
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.4.18: lru_list_lock contention in write_unlocked_buffers()
  2003-04-02  7:50 2.4.18: lru_list_lock contention in write_unlocked_buffers() j-nomura
@ 2003-04-02  8:23 ` Andrew Morton
  2003-04-03 11:57   ` j-nomura
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2003-04-02  8:23 UTC (permalink / raw)
  To: j-nomura; +Cc: linux-kernel

j-nomura@ce.jp.nec.com wrote:
>
> Hello,
> 
> when I run mkfs while doing other large file I/O in parallel,
> the system response becomes terribly bad on 2.4.18 kernel.
> (probably on other 2.4 kernels also)
> 
> I found there are hard contention on lru_list_lock, which is mostly held
> by write_unlocked_buffers().
> It happens only on large memory machine because lru_list can grow very long
> and write_some_buffers() scans the long list from head on each call.
> 
> Lowlatency patch in aa tree did not help this situation.
> 
> The patch below is hasty workaround for it.
> Any comments, or suggestions to better fix?
> 

I don't think there's a sane fix for this in the 2.4 context.

What you can do is to convert fsync_dev() to sync _all_ devices and not just
the one which is being closed.

It will take longer, but it converts the O(n*n) search into O(n).

diff -puN fs/buffer.c~a fs/buffer.c
--- 24/fs/buffer.c~a	2003-04-02 00:21:39.000000000 -0800
+++ 24-akpm/fs/buffer.c	2003-04-02 00:21:51.000000000 -0800
@@ -343,6 +343,7 @@ int fsync_no_super(kdev_t dev)
 
 int fsync_dev(kdev_t dev)
 {
+	dev = NODEV;
 	sync_buffers(dev, 0);
 
 	lock_kernel();

_


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.4.18: lru_list_lock contention in write_unlocked_buffers()
  2003-04-02  8:23 ` Andrew Morton
@ 2003-04-03 11:57   ` j-nomura
  0 siblings, 0 replies; 3+ messages in thread
From: j-nomura @ 2003-04-03 11:57 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, j-nomura

Hi,

> > I found there are hard contention on lru_list_lock, which is mostly held
> > by write_unlocked_buffers().
> > It happens only on large memory machine because lru_list can grow very long
> > and write_some_buffers() scans the long list from head on each call.
<snip>
> I don't think there's a sane fix for this in the 2.4 context.
> 
> What you can do is to convert fsync_dev() to sync _all_ devices and not just
> the one which is being closed.
> 
> It will take longer, but it converts the O(n*n) search into O(n).

thank you.
Putting the same modification in __block_fsync reduce the contention
very much.

The other solution might be adapting nfract value to match the memory size
to avoid lru_list growing too long.

Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
Enterprise Linux Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.


--- linux/fs/block_dev.c
+++ linux/fs/block_dev.c
@@ -174,7 +174,7 @@ static int __block_fsync(struct inode * 
 	int ret, err;
 
 	ret = filemap_fdatasync(inode->i_mapping);
-	err = sync_buffers(inode->i_rdev, 1);
+	err = sync_buffers(NODEV, 1);
 	if (err && !ret)
 		ret = err;
 	err = filemap_fdatawait(inode->i_mapping);
--- linux/fs/buffer.c
+++ linux/fs/buffer.c
@@ -384,7 +384,7 @@ int fsync_no_super(kdev_t dev)
 
 int fsync_dev(kdev_t dev)
 {
-	sync_buffers(dev, 0);
+	sync_buffers(NODEV, 0);
 
 	lock_kernel();
 	sync_inodes(dev);

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-04-03 11:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-02  7:50 2.4.18: lru_list_lock contention in write_unlocked_buffers() j-nomura
2003-04-02  8:23 ` Andrew Morton
2003-04-03 11:57   ` j-nomura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).