Re: [PATCH] Significant performace improvements on reiserfs systems

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] Significant performace improvements on reiserfs systems
       [not found] ` <20010920170812.CCCACE641B@ns1.suse.com>
@ 2001-09-20 17:19   ` Chris Mason
  2001-09-20 17:39     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Chris Mason @ 2001-09-20 17:19 UTC (permalink / raw)
  To: Dieter Nützel, Beau Kuiper, Andrew Morton
  Cc: Andrea Arcangeli, Robert Love, Linux Kernel List, ReiserFS List



On Thursday, September 20, 2001 07:08:25 PM +0200 Dieter Nützel
<Dieter.Nuetzel@hamburg.de> wrote:


> Please have a look at Robert Love's Linux kernel preemption patches and
> the  conversation about my reported latency results.
> 

Andrew Morton has patches that significantly improve the reiserfs latency,
looks like the last one he sent me was 2.4.7-pre9.  He and I did a bunch of
work to make sure they introduce schedules only when it was safe.

Andrew, are these still maintained or should I pull out the reiserfs bits?

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 17:19   ` [PATCH] Significant performace improvements on reiserfs systems Chris Mason
@ 2001-09-20 17:39     ` Andrew Morton
  2001-09-20 18:47     ` Andrea Arcangeli
  2001-09-20 20:52     ` Robert Love
  2 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2001-09-20 17:39 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dieter Nützel, Beau Kuiper, Andrea Arcangeli, Robert Love,
	Linux Kernel List, ReiserFS List

Chris Mason wrote:
> 
> On Thursday, September 20, 2001 07:08:25 PM +0200 Dieter Nützel
> <Dieter.Nuetzel@hamburg.de> wrote:
> 
> > Please have a look at Robert Love's Linux kernel preemption patches and
> > the  conversation about my reported latency results.
> >
> 
> Andrew Morton has patches that significantly improve the reiserfs latency,
> looks like the last one he sent me was 2.4.7-pre9.  He and I did a bunch of
> work to make sure they introduce schedules only when it was safe.
> 
> Andrew, are these still maintained or should I pull out the reiserfs bits?
> 

This is the reiserfs part - it applies to 2.4.10-pre12 OK.

For the purposes of Robert's patch, conditional_schedule()
should be defined as 

	if (current->need_resched && current->lock_depth == 0) {
		unlock_kernel();
		lock_kernel();
	}

which is somewhat crufty, because the implementation of lock_kernel()
is arch-specific.  But all architectures seem to implement it the same way.



--- linux-2.4.7/include/linux/reiserfs_fs.h	Sat Jul 21 12:37:14 2001
+++ linux-akpm/include/linux/reiserfs_fs.h	Sun Jul 22 22:06:17 2001
@@ -1165,8 +1165,8 @@ extern inline loff_t max_reiserfs_offset
 #define fs_generation(s) ((s)->u.reiserfs_sb.s_generation_counter)
 #define get_generation(s) atomic_read (&fs_generation(s))
 #define FILESYSTEM_CHANGED_TB(tb)  (get_generation((tb)->tb_sb) != (tb)->fs_gen)
-#define fs_changed(gen,s) (gen != get_generation (s))
-
+#define __fs_changed(gen,s) (gen != get_generation (s))
+#define fs_changed(gen,s) ({conditional_schedule(); __fs_changed(gen,s);})
 
 /***************************************************************************/
 /*                  FIXATE NODES                                           */
--- linux-2.4.7/fs/reiserfs/bitmap.c	Wed May  2 22:00:06 2001
+++ linux-akpm/fs/reiserfs/bitmap.c	Sun Jul 22 22:06:17 2001
@@ -420,19 +420,31 @@ free_and_return:
     }
        
 
-    reiserfs_prepare_for_journal(s, SB_AP_BITMAP(s)[i], 1) ;
 
+/* this check needs to go before preparing the buffer because that can
+** schedule when low-latency patches are in use.  It is ok if the buffer
+** is locked, preparing it will wait on it, and we handle the case where
+** this block was allocated while we sleep below.
+*/
 #ifdef CONFIG_REISERFS_CHECK
-    if (buffer_locked (SB_AP_BITMAP (s)[i]) || is_reusable (s, search_start, 0) == 0)
-	reiserfs_panic (s, "vs-4140: reiserfs_new_blocknrs: bitmap block is locked or bad block number found");
+    if (is_reusable (s, search_start, 0) == 0)
+	reiserfs_panic (s, "vs-4140: reiserfs_new_blocknrs: bad block number found");
 #endif
 
+    reiserfs_prepare_for_journal(s, SB_AP_BITMAP(s)[i], 1) ;
+
     /* if this bit was already set, we've scheduled, and someone else
     ** has allocated it.  loop around and try again
     */
     if (reiserfs_test_and_set_le_bit (j, SB_AP_BITMAP (s)[i]->b_data)) {
 	reiserfs_warning("vs-4150: reiserfs_new_blocknrs, block not free");
 	reiserfs_restore_prepared_buffer(s, SB_AP_BITMAP(s)[i]) ;
+	/* if this block has been allocated while we slept, it is
+	** impossible to find any more contiguous blocks for ourselves.
+	** If we are doing preallocation, give up now and return.
+	*/
+	if (for_prealloc)
+	    goto free_and_return ;
 	amount_needed++ ;
 	continue ;
     }    
--- linux-2.4.7/fs/reiserfs/buffer2.c	Tue Jan 16 10:31:19 2001
+++ linux-akpm/fs/reiserfs/buffer2.c	Sun Jul 22 22:06:17 2001
@@ -73,7 +73,9 @@ void wait_buffer_until_released (struct 
 
 struct buffer_head  * reiserfs_bread (kdev_t n_dev, int n_block, int n_size) 
 {
-    return bread (n_dev, n_block, n_size);
+    struct buffer_head *ret = bread (n_dev, n_block, n_size);
+    conditional_schedule();
+    return ret;
 }
 
 /* This function looks for a buffer which contains a given block.  If
--- linux-2.4.7/fs/reiserfs/journal.c	Sat Jul 21 12:37:14 2001
+++ linux-akpm/fs/reiserfs/journal.c	Sun Jul 22 22:06:17 2001
@@ -577,6 +577,7 @@ inline void insert_journal_hash(struct r
 
 /* lock the current transaction */
 inline static void lock_journal(struct super_block *p_s_sb) {
+  conditional_schedule();
   while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) {
     sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ;
   }
@@ -706,6 +707,7 @@ reiserfs_panic(s, "journal-539: flush_co
 	mark_buffer_dirty(tbh) ;
       }
       ll_rw_block(WRITE, 1, &tbh) ;
+      conditional_schedule();
       count++ ;
       put_bh(tbh) ; /* once for our get_hash */
     } 
@@ -833,6 +835,7 @@ static int update_journal_header_block(s
     set_bit(BH_Dirty, &(SB_JOURNAL(p_s_sb)->j_header_bh->b_state)) ;
     ll_rw_block(WRITE, 1, &(SB_JOURNAL(p_s_sb)->j_header_bh)) ;
     wait_on_buffer((SB_JOURNAL(p_s_sb)->j_header_bh)) ; 
+    conditional_schedule();
     if (!buffer_uptodate(SB_JOURNAL(p_s_sb)->j_header_bh)) {
       reiserfs_panic(p_s_sb, "journal-712: buffer write failed\n") ;
     }
@@ -2076,6 +2079,7 @@ int journal_join(struct reiserfs_transac
 }
 
 int journal_begin(struct reiserfs_transaction_handle *th, struct super_block  * p_s_sb, unsigned long nblocks) {
+  conditional_schedule();
   return do_journal_begin_r(th, p_s_sb, nblocks, 0) ;
 }
 
@@ -2213,6 +2217,7 @@ int journal_mark_dirty_nolog(struct reis
 }
 
 int journal_end(struct reiserfs_transaction_handle *th, struct super_block *p_s_sb, unsigned long nblocks) {
+  conditional_schedule();
   return do_journal_end(th, p_s_sb, nblocks, 0) ;
 }
 
@@ -2648,6 +2653,7 @@ void reiserfs_prepare_for_journal(struct
       }
 #endif
       wait_on_buffer(bh) ;
+      conditional_schedule();
     }
     retry_count++ ;
   }
@@ -2820,6 +2826,7 @@ printk("journal-2020: do_journal_end: BA
     /* copy all the real blocks into log area.  dirty log blocks */
     if (test_bit(BH_JDirty, &cn->bh->b_state)) {
       struct buffer_head *tmp_bh ;
+      conditional_schedule();		/* getblk can sleep, so... */
       tmp_bh = getblk(p_s_sb->s_dev, reiserfs_get_journal_block(p_s_sb) + 
 		     ((cur_write_start + jindex) % JOURNAL_BLOCK_COUNT), 
 				       p_s_sb->s_blocksize) ;
--- linux-2.4.7/fs/reiserfs/stree.c	Sat Jul 21 12:37:14 2001
+++ linux-akpm/fs/reiserfs/stree.c	Sun Jul 22 22:06:17 2001
@@ -681,9 +681,9 @@ int search_by_key (struct super_block * 
                                        DISK_LEAF_NODE_LEVEL */
     ) {
     kdev_t n_dev = p_s_sb->s_dev;
-    int  n_block_number = SB_ROOT_BLOCK (p_s_sb),
-      expected_level = SB_TREE_HEIGHT (p_s_sb),
-      n_block_size    = p_s_sb->s_blocksize;
+    int n_block_number; 
+    int expected_level;
+    int n_block_size = p_s_sb->s_blocksize;
     struct buffer_head  *       p_s_bh;
     struct path_element *       p_s_last_element;
     int				n_node_level, n_retval;
@@ -694,6 +694,8 @@ int search_by_key (struct super_block * 
     int n_repeat_counter = 0;
 #endif
 
+    conditional_schedule();
+
     /* As we add each node to a path we increase its count.  This means that
        we must be careful to release all nodes in a path before we either
        discard the path struct or re-use the path struct, as we do here. */
@@ -705,6 +707,9 @@ int search_by_key (struct super_block * 
     /* With each iteration of this loop we search through the items in the
        current node, and calculate the next current node(next path element)
        for the next iteration of this loop.. */
+    fs_gen = get_generation (p_s_sb);
+    n_block_number = SB_ROOT_BLOCK (p_s_sb);
+    expected_level = SB_TREE_HEIGHT (p_s_sb);
     while ( 1 ) {
 
 #ifdef CONFIG_REISERFS_CHECK
@@ -1174,6 +1179,8 @@ static char  prepare_for_delete_or_cut(
 	    for ( n_retry = 0, n_counter = *p_n_removed;
 		  n_counter < n_unfm_number; n_counter++, p_n_unfm_pointer-- )  {
 
+		conditional_schedule();
+
 		if (item_moved (&s_ih, p_s_path)) {
 		    need_research = 1 ;
 		    break;
@@ -1191,6 +1198,16 @@ static char  prepare_for_delete_or_cut(
 		}
 		/* Search for the buffer in cache. */
 		p_s_un_bh = get_hash_table(p_s_sb->s_dev, *p_n_unfm_pointer, n_blk_size);
+
+		/* AKPM: this is not _really_ needed.  It takes us from 2,000 usecs to 500 */
+		if (p_s_un_bh && conditional_schedule_needed()) {
+		  unconditional_schedule();
+		  if ( item_moved (&s_ih, p_s_path) )  {
+		      need_research = 1;
+		      brelse(p_s_un_bh) ;
+		      break ;
+		  }
+		}
 
 		if (p_s_un_bh) {
 		    mark_buffer_clean(p_s_un_bh) ;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 17:19   ` [PATCH] Significant performace improvements on reiserfs systems Chris Mason
  2001-09-20 17:39     ` Andrew Morton
@ 2001-09-20 18:47     ` Andrea Arcangeli
  2001-09-20 18:58       ` Andrew Morton
  2001-09-20 20:52     ` Robert Love
  2 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2001-09-20 18:47 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dieter Nützel, Beau Kuiper, Andrew Morton, Robert Love,
	Linux Kernel List, ReiserFS List

On Thu, Sep 20, 2001 at 01:19:53PM -0400, Chris Mason wrote:
> 
> 
> On Thursday, September 20, 2001 07:08:25 PM +0200 Dieter Nützel
> <Dieter.Nuetzel@hamburg.de> wrote:
> 
> 
> > Please have a look at Robert Love's Linux kernel preemption patches and
> > the  conversation about my reported latency results.
> > 
> 
> Andrew Morton has patches that significantly improve the reiserfs latency,
> looks like the last one he sent me was 2.4.7-pre9.  He and I did a bunch of
> work to make sure they introduce schedules only when it was safe.
> 
> Andrew, are these still maintained or should I pull out the reiserfs bits?

May not help latency but I suspect this could help reiserfs, it should
basically be a noop for ext2.

--- 2.4.10pre12aa2/fs/buffer.c.~1~	Thu Sep 20 20:14:19 2001
+++ 2.4.10pre12aa2/fs/buffer.c	Thu Sep 20 20:45:58 2001
@@ -2506,7 +2506,7 @@
 	spin_unlock(&free_list[isize].lock);
 
 	page->buffers = bh;
-	page->flags &= ~(1 << PG_referenced);
+	page->flags |= 1 << PG_referenced;
 	lru_cache_add(page);
 	UnlockPage(page);
 	atomic_inc(&buffermem_pages);

You may want to give it a spin.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 18:47     ` Andrea Arcangeli
@ 2001-09-20 18:58       ` Andrew Morton
  2001-09-20 19:13         ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2001-09-20 18:58 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Mason, Dieter Nützel, Beau Kuiper, Andrew Morton,
	Robert Love, Linux Kernel List, ReiserFS List

Andrea Arcangeli wrote:
> 
> ...
> --- 2.4.10pre12aa2/fs/buffer.c.~1~      Thu Sep 20 20:14:19 2001
> +++ 2.4.10pre12aa2/fs/buffer.c  Thu Sep 20 20:45:58 2001
> @@ -2506,7 +2506,7 @@
>         spin_unlock(&free_list[isize].lock);
> 
>         page->buffers = bh;
> -       page->flags &= ~(1 << PG_referenced);
> +       page->flags |= 1 << PG_referenced;

I don't see how this can change anything - getblk() calls
touch_buffer() shortly afterwards, which does the same
thing?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 18:58       ` Andrew Morton
@ 2001-09-20 19:13         ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2001-09-20 19:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Mason, Dieter Nützel, Beau Kuiper, Andrew Morton,
	Robert Love, Linux Kernel List, ReiserFS List

On Thu, Sep 20, 2001 at 11:58:53AM -0700, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > ...
> > --- 2.4.10pre12aa2/fs/buffer.c.~1~      Thu Sep 20 20:14:19 2001
> > +++ 2.4.10pre12aa2/fs/buffer.c  Thu Sep 20 20:45:58 2001
> > @@ -2506,7 +2506,7 @@
> >         spin_unlock(&free_list[isize].lock);
> > 
> >         page->buffers = bh;
> > -       page->flags &= ~(1 << PG_referenced);
> > +       page->flags |= 1 << PG_referenced;
> 
> I don't see how this can change anything - getblk() calls
> touch_buffer() shortly afterwards, which does the same
> thing?

I'm worried the page is freed before it has a chance to be found in the
freelist in smp. The refill_freelist logic is broken too and right fix
would be much more invasive than the above one liner, we should return
one bh with b_count just set to 1 to be sure it's not collected away
under us and we have to try again. But this one probably will do the
trick too for now. and maybe it really doesn't matter as you say.

And also the get_hash_table need to touch the buffer since reiserfs
makes heavy use if it.

just a few ideas.

--- 2.4.10pre12aa2/fs/buffer.c.~1~	Thu Sep 20 20:14:19 2001
+++ 2.4.10pre12aa2/fs/buffer.c	Thu Sep 20 21:06:16 2001
@@ -598,8 +598,10 @@
 		    bh->b_size    == size	&&
 		    bh->b_dev     == dev)
 			break;
-	if (bh)
+	if (bh) {
 		get_bh(bh);
+		touch_buffer(bh);
+	}
 
 	return bh;
 }
@@ -1181,10 +1183,10 @@
 
 		/* Insert the buffer into the regular lists */
 		__insert_into_queues(bh);
+		touch_buffer(bh);
 	out:
 		write_unlock(&hash_table_lock);
 		spin_unlock(&lru_list_lock);
-		touch_buffer(bh);
 		return bh;
 	}
 

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 17:19   ` [PATCH] Significant performace improvements on reiserfs systems Chris Mason
  2001-09-20 17:39     ` Andrew Morton
  2001-09-20 18:47     ` Andrea Arcangeli
@ 2001-09-20 20:52     ` Robert Love
  2001-09-20 21:11       ` Dieter Nützel
  2001-09-20 22:24       ` Robert Love
  2 siblings, 2 replies; 21+ messages in thread
From: Robert Love @ 2001-09-20 20:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Mason, Dieter Nützel, Beau Kuiper, Andrea Arcangeli,
	Linux Kernel List, ReiserFS List

On Thu, 2001-09-20 at 13:39, Andrew Morton wrote:
> > Andrew, are these still maintained or should I pull out the reiserfs bits?
> 
> This is the reiserfs part - it applies to 2.4.10-pre12 OK.
> 
> For the purposes of Robert's patch, conditional_schedule()
> should be defined as 
> 
> 	if (current->need_resched && current->lock_depth == 0) {
> 		unlock_kernel();
> 		lock_kernel();
> 	}
> 
> which is somewhat crufty, because the implementation of lock_kernel()
> is arch-specific.  But all architectures seem to implement it the same way.

Note that we only support preemption on i386 thus far, so this is not an
issue (yet).

One other note, in not all cases is it OK to just drop the one lock
(above it is).  Say we hold a spinlock, but another lock (most notably
the BKL) is also held.  Since the preemption counter is recursive, we
still won't allow preemption until all locks are dropped.  In this case,
we need to see what else needs to be dropped, or just call schedule()
explicitly.

> <patch snipped>

Looks nice, Andrew.

Anyone try this? (I don't use ReiserFS).

I am putting together a conditional scheduling patch to fix some of the
worst cases, for use in conjunction with the preemption patch, and this
might be useful.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 20:52     ` Robert Love
@ 2001-09-20 21:11       ` Dieter Nützel
  2001-09-20 22:24       ` Robert Love
  1 sibling, 0 replies; 21+ messages in thread
From: Dieter Nützel @ 2001-09-20 21:11 UTC (permalink / raw)
  To: Robert Love, Andrew Morton
  Cc: Chris Mason, Beau Kuiper, Andrea Arcangeli, Linux Kernel List,
	ReiserFS List

Am Donnerstag, 20. September 2001 22:52 schrieb Robert Love:
> On Thu, 2001-09-20 at 13:39, Andrew Morton wrote:
> > > Andrew, are these still maintained or should I pull out the reiserfs
> > > bits?
> >
> > This is the reiserfs part - it applies to 2.4.10-pre12 OK.
> >
> > For the purposes of Robert's patch, conditional_schedule()
> > should be defined as
> >
> > 	if (current->need_resched && current->lock_depth == 0) {
> > 		unlock_kernel();
> > 		lock_kernel();
> > 	}
> >
> > which is somewhat crufty, because the implementation of lock_kernel()
> > is arch-specific.  But all architectures seem to implement it the same
> > way.

> > <patch snipped>
>
> Looks nice, Andrew.
>
> Anyone try this? (I don't use ReiserFS).

Yes, I will...:-)
Send it along.

>
> I am putting together a conditional scheduling patch to fix some of the
> worst cases, for use in conjunction with the preemption patch, and this
> might be useful.

The conditional_schedule() function hampered me from running it already.

-Dieter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 20:52     ` Robert Love
  2001-09-20 21:11       ` Dieter Nützel
@ 2001-09-20 22:24       ` Robert Love
  2001-09-20 22:37         ` Andrea Arcangeli
  2001-09-20 22:56         ` Robert Love
  1 sibling, 2 replies; 21+ messages in thread
From: Robert Love @ 2001-09-20 22:24 UTC (permalink / raw)
  To: Dieter Nützel
  Cc: Andrew Morton, Chris Mason, Beau Kuiper, Andrea Arcangeli,
	Linux Kernel List, ReiserFS List

On Thu, 2001-09-20 at 17:11, Dieter Nützel wrote:
> > I am putting together a conditional scheduling patch to fix some of the
> > worst cases, for use in conjunction with the preemption patch, and this
> > might be useful.
> 
> The conditional_schedule() function hampered me from running it already.

hrm, i didnt notice that conditional_schedule wasnt defined in that
patch.  you will need to do it, but do something more like

if (current->need_resched && current->lock_depth == 0) {
	unlock_kernel();
	lock_kernel();
}

like Andrew wrote.

If you don't jump on the idea of trying this :) I will send work out a
patch that does some other low-latency thigns and send it out.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 22:24       ` Robert Love
@ 2001-09-20 22:37         ` Andrea Arcangeli
  2001-09-20 22:56         ` Robert Love
  1 sibling, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2001-09-20 22:37 UTC (permalink / raw)
  To: Robert Love
  Cc: Dieter Nützel, Andrew Morton, Chris Mason, Beau Kuiper,
	Linux Kernel List, ReiserFS List

On Thu, Sep 20, 2001 at 06:24:48PM -0400, Robert Love wrote:
> On Thu, 2001-09-20 at 17:11, Dieter Nützel wrote:
> > > I am putting together a conditional scheduling patch to fix some of the
> > > worst cases, for use in conjunction with the preemption patch, and this
> > > might be useful.
> > 
> > The conditional_schedule() function hampered me from running it already.
> 
> hrm, i didnt notice that conditional_schedule wasnt defined in that
> patch.  you will need to do it, but do something more like
> 
> if (current->need_resched && current->lock_depth == 0) {
> 	unlock_kernel();
> 	lock_kernel();
> }
> 
> like Andrew wrote.

nitpicking: the above is fine but it isn't complete, it may work for
most cases but for a generic function it would be better implemented
similarly to release_kernel_lock_save/restore so you take care of
lock_depth > 0 too:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.20pre10aa1/72_copy-user-unlock-1

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 22:24       ` Robert Love
  2001-09-20 22:37         ` Andrea Arcangeli
@ 2001-09-20 22:56         ` Robert Love
  2001-09-20 23:15           ` Andrea Arcangeli
  2001-09-20 23:41           ` Robert Love
  1 sibling, 2 replies; 21+ messages in thread
From: Robert Love @ 2001-09-20 22:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dieter Nützel, Andrew Morton, Chris Mason, Beau Kuiper,
	Linux Kernel List, ReiserFS List

On Thu, 2001-09-20 at 18:37, Andrea Arcangeli wrote:
> On Thu, Sep 20, 2001 at 06:24:48PM -0400, Robert Love wrote:
> >
> > if (current->need_resched && current->lock_depth == 0) {
> > 	unlock_kernel();
> > 	lock_kernel();
> > }

> nitpicking: the above is fine but it isn't complete, it may work for
> most cases but for a generic function it would be better implemented
> similarly to release_kernel_lock_save/restore so you take care of
> lock_depth > 0 too:

Let me explain a little about the patch, and then I am interested in if
your opinion changes.

When unlock_kernel() is called, the preemption code will be enabled and
check if the preemption count is non-zero -- its handled just like a
recursive lock.  If there are other locks held, preemption won't
happen.  Thus, we return to the caller code patch and lock_kernel() is
called and everyone is happy.

If unlock_kernel() is called, and the preemption code (which hangs off
the bottom of the locking code) sees that the lock depth is now 0, it
will call preempt_schedule (if needed) and allow a new process to run.
Then we return to the original code patch, call lock_kernel, and
continue.

Or am I missing something?

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 22:56         ` Robert Love
@ 2001-09-20 23:15           ` Andrea Arcangeli
  2001-09-21  0:37             ` george anzinger
  2001-09-20 23:41           ` Robert Love
  1 sibling, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2001-09-20 23:15 UTC (permalink / raw)
  To: Robert Love
  Cc: Dieter Nützel, Andrew Morton, Chris Mason, Beau Kuiper,
	Linux Kernel List, ReiserFS List

On Thu, Sep 20, 2001 at 06:56:04PM -0400, Robert Love wrote:
> On Thu, 2001-09-20 at 18:37, Andrea Arcangeli wrote:
> > On Thu, Sep 20, 2001 at 06:24:48PM -0400, Robert Love wrote:
> > >
> > > if (current->need_resched && current->lock_depth == 0) {
> > > 	unlock_kernel();
> > > 	lock_kernel();
> > > }
> 
> > nitpicking: the above is fine but it isn't complete, it may work for
> > most cases but for a generic function it would be better implemented
> > similarly to release_kernel_lock_save/restore so you take care of
> > lock_depth > 0 too:
> 
> Let me explain a little about the patch, and then I am interested in if
> your opinion changes.
> 
> When unlock_kernel() is called, the preemption code will be enabled and
> check if the preemption count is non-zero -- its handled just like a

All I'm saying is that you should check for >= 0, not == 0.

But anwyays it's pretty depressing to see such a costly check needed to
get latency right with the preemptive kernel approch, with
non-preemptive kernel we'd need to just check need_resched and a call
schedule in the unlikely case so it would be even lighter :) and no
fixed costs in UP spinlocks, per-cpu datastrctures etc... The point of
preemptive kernel would be just to prevent us to put such kind of
explicit (costly) checks in the code. My theory was that the cpu-costly
loops are mostly protected by some lock anyways and the fact you're
writing such horrid (but helpful) code is a kind of proof.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 22:56         ` Robert Love
  2001-09-20 23:15           ` Andrea Arcangeli
@ 2001-09-20 23:41           ` Robert Love
  1 sibling, 0 replies; 21+ messages in thread
From: Robert Love @ 2001-09-20 23:41 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dieter Nützel, Andrew Morton, Chris Mason, Beau Kuiper,
	Linux Kernel List, ReiserFS List

On Thu, 2001-09-20 at 19:15, Andrea Arcangeli wrote:
> All I'm saying is that you should check for >= 0, not == 0.

And I am saying we already keep track of that, we have a preemption
counter.

> But anwyays it's pretty depressing to see such a costly check needed to
> get latency right with the preemptive kernel approch, with
> non-preemptive kernel we'd need to just check need_resched and a call
> schedule in the unlikely case so it would be even lighter :) and no
> fixed costs in UP spinlocks, per-cpu datastrctures etc... The point of
> preemptive kernel would be just to prevent us to put such kind of
> explicit (costly) checks in the code. My theory was that the cpu-costly
> loops are mostly protected by some lock anyways and the fact you're
> writing such horrid (but helpful) code is a kind of proof.

Well, with the preemptive kernel we already account for 90% of the high
latency areas.

Doing the "horrid" solution may solve some other issues, but agreeably
you are right its not the prettiest solution.

I don't agree, however, that its that much more costly, and maybe I am
missing something.  Assuming we are SMP (and thus have locks), where is
there a lot more overhead?  We check current->need_resched (which we
dont _have_ to), call unlock_kernel() and then call lock_kernel(), with
preemption happening automatically in between.  The preemption code
merely checks a counter and calls preempt_schedule() if needed.

Now, yes, this is not ideal.  Ideally we don't need any of this muck. 
Ideally preemption provides everything we need.  So, towards the future,
we can work towards that.  For very short locks, we could just disable
interrupts (a lot less overhead).  For long-held locks, we can replace
them with a more efficient lock -- spin-then-sleep or a
priority-inheriting mutex lock, for example.

I don't want to confuse the above, which is perhaps an ideal system for
inclusion in 2.5, with trying to lower latency further for those who
care via conditional scheduling and the such.

We already have average latency of <1ms with peaks 10-50ms.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-20 23:15           ` Andrea Arcangeli
@ 2001-09-21  0:37             ` george anzinger
  2001-09-21  1:20               ` Andrew Morton
                                 ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: george anzinger @ 2001-09-21  0:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Robert Love, Dieter Nützel, Andrew Morton, Chris Mason,
	Beau Kuiper, Linux Kernel List, ReiserFS List

Andrea Arcangeli wrote:
> 
> On Thu, Sep 20, 2001 at 06:56:04PM -0400, Robert Love wrote:
> > On Thu, 2001-09-20 at 18:37, Andrea Arcangeli wrote:
> > > On Thu, Sep 20, 2001 at 06:24:48PM -0400, Robert Love wrote:
> > > >
> > > > if (current->need_resched && current->lock_depth == 0) {
> > > >   unlock_kernel();
> > > >   lock_kernel();
> > > > }
> >
> > > nitpicking: the above is fine but it isn't complete, it may work for
> > > most cases but for a generic function it would be better implemented
> > > similarly to release_kernel_lock_save/restore so you take care of
> > > lock_depth > 0 too:
> >
> > Let me explain a little about the patch, and then I am interested in if
> > your opinion changes.
> >
> > When unlock_kernel() is called, the preemption code will be enabled and
> > check if the preemption count is non-zero -- its handled just like a
> 
> All I'm saying is that you should check for >= 0, not == 0.
> 
> But anwyays it's pretty depressing to see such a costly check needed to
> get latency right with the preemptive kernel approch, with
> non-preemptive kernel we'd need to just check need_resched and a call
> schedule in the unlikely case so it would be even lighter :) and no
> fixed costs in UP spinlocks, per-cpu datastrctures etc... The point of
> preemptive kernel would be just to prevent us to put such kind of
> explicit (costly) checks in the code. My theory was that the cpu-costly
> loops are mostly protected by some lock anyways and the fact you're
> writing such horrid (but helpful) code is a kind of proof.
> 
Actually, I rather think that the problem is lock granularity.  These
issues are present in the SMP systems as well.  A good solution would be
one that shortened the spinlock time.  No horrid preempt code, just
tight fast code.

And the ones to address these issues are the owners of the code.  Those
who want the lower latency are left, thur lack of knowledge and time,
with crude tools like the conditional_schedule().

George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  0:37             ` george anzinger
@ 2001-09-21  1:20               ` Andrew Morton
  2001-09-21  3:14               ` Robert Love
  2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
  2 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2001-09-21  1:20 UTC (permalink / raw)
  To: george anzinger
  Cc: Andrea Arcangeli, Robert Love, Dieter Nützel, Chris Mason,
	Beau Kuiper, Linux Kernel List, ReiserFS List

george anzinger wrote:
> 
> ...
> Actually, I rather think that the problem is lock granularity.  These
> issues are present in the SMP systems as well.  A good solution would be
> one that shortened the spinlock time.  No horrid preempt code, just
> tight fast code.
> 

This may not be practical.

Take, for example, zap_page_range().   It simply has a lot
of work to do, and it does it inside a spinlock.  By doing
it in a tight loop, it's optimal.

There is no way to speed this function up by two or three orders
of magnitude.  (Well, there is: don't take the lock at all if
the mm isn't shared, but this is merely an example.  There are
other instances).

It seems that for a preemptive kernel to be successful, we need
to globally alter the kernel so that it never holds locks for
more than 500 microseconds.  Which is what the conditional_schedule()
(aka cooperative multitasking :)) patches do.

It seems that there are no magic bullets, and low latency will
forever have a global impact on kernel design, unless a way is
found to reschedule with locks held.  I recall that a large
part of the MontaVista patch involved turning spinlocks into
semaphores, yes?  That would seem to be the way to go.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  0:37             ` george anzinger
  2001-09-21  1:20               ` Andrew Morton
@ 2001-09-21  3:14               ` Robert Love
  2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
  2 siblings, 0 replies; 21+ messages in thread
From: Robert Love @ 2001-09-21  3:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: george anzinger, Andrea Arcangeli, Dieter Nützel,
	Chris Mason, Beau Kuiper, Linux Kernel List, ReiserFS List

On Thu, 2001-09-20 at 21:20, Andrew Morton wrote:
> This may not be practical.
> 
> Take, for example, zap_page_range().   It simply has a lot
> of work to do, and it does it inside a spinlock.  By doing
> it in a tight loop, it's optimal.
> 
> There is no way to speed this function up by two or three orders
> of magnitude.  (Well, there is: don't take the lock at all if
> the mm isn't shared, but this is merely an example.  There are
> other instances).

Agreed, but...

> It seems that for a preemptive kernel to be successful, we need
> to globally alter the kernel so that it never holds locks for
> more than 500 microseconds.  Which is what the conditional_schedule()
> (aka cooperative multitasking :)) patches do.
>
> It seems that there are no magic bullets, and low latency will
> forever have a global impact on kernel design, unless a way is
> found to reschedule with locks held.  I recall that a large
> part of the MontaVista patch involved turning spinlocks into
> semaphores, yes?  That would seem to be the way to go.

This would be the situation that solved the problem with little
complaint, huh?

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  0:37             ` george anzinger
  2001-09-21  1:20               ` Andrew Morton
  2001-09-21  3:14               ` Robert Love
@ 2001-09-21  9:32               ` Nikita Danilov
  2001-09-21 12:18                 ` Alan Cox
                                   ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: Nikita Danilov @ 2001-09-21  9:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: george anzinger, Andrea Arcangeli, Robert Love,
	Dieter Nützel, Chris Mason, Beau Kuiper, Linux Kernel List,
	ReiserFS List

Andrew Morton writes:
 > george anzinger wrote:
 > > 
 > > ...
 > > Actually, I rather think that the problem is lock granularity.  These
 > > issues are present in the SMP systems as well.  A good solution would be
 > > one that shortened the spinlock time.  No horrid preempt code, just
 > > tight fast code.
 > > 
 > 
 > This may not be practical.
 > 
 > Take, for example, zap_page_range().   It simply has a lot
 > of work to do, and it does it inside a spinlock.  By doing
 > it in a tight loop, it's optimal.
 > 
 > There is no way to speed this function up by two or three orders
 > of magnitude.  (Well, there is: don't take the lock at all if
 > the mm isn't shared, but this is merely an example.  There are
 > other instances).
 > 
 > It seems that for a preemptive kernel to be successful, we need
 > to globally alter the kernel so that it never holds locks for
 > more than 500 microseconds.  Which is what the conditional_schedule()
 > (aka cooperative multitasking :)) patches do.
 > 
 > It seems that there are no magic bullets, and low latency will
 > forever have a global impact on kernel design, unless a way is
 > found to reschedule with locks held.  I recall that a large

In Solaris, before spinning on a busy spin-lock, thread checks whether
spin-lock holder runs on the same processor. If so, thread goes to sleep
and holder wakes it up on spin-lock release. The same, I guess is going
for interrupts that are served as separate threads. This way, one can
re-schedule with spin-locks held.

 > part of the MontaVista patch involved turning spinlocks into
 > semaphores, yes?  That would seem to be the way to go.

Nikita.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
@ 2001-09-21 12:18                 ` Alan Cox
  2001-09-21 12:31                 ` Nikita Danilov
  2001-09-23 23:49                 ` Rusty Russell
  2 siblings, 0 replies; 21+ messages in thread
From: Alan Cox @ 2001-09-21 12:18 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Andrew Morton, george anzinger, Andrea Arcangeli, Robert Love,
	Dieter Nützel, Chris Mason, Beau Kuiper, Linux Kernel List,
	ReiserFS List

> In Solaris, before spinning on a busy spin-lock, thread checks whether
> spin-lock holder runs on the same processor. If so, thread goes to sleep
> and holder wakes it up on spin-lock release. The same, I guess is going


> for interrupts that are served as separate threads. This way, one can
> re-schedule with spin-locks held.

This is one of the things interrupt handling by threads gives you, but the
performance cost is not nice. When you consider that ksoftirqd when it
kicks in (currently far too often) takes up to 10% off gigabit ethernet
performance, you can appreciate why we don't want to go that path.

Our spinlock paths are supposed to be very small and predictable. Where 
there is sleeping involved we have semaphores.

As lockmeter shows we still have a few io_request_lock cases at least where
we lock for far too long

Alan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
  2001-09-21 12:18                 ` Alan Cox
@ 2001-09-21 12:31                 ` Nikita Danilov
  2001-09-23 23:49                 ` Rusty Russell
  2 siblings, 0 replies; 21+ messages in thread
From: Nikita Danilov @ 2001-09-21 12:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, george anzinger, Andrea Arcangeli, Robert Love,
	Dieter Nützel, Chris Mason, Beau Kuiper, Linux Kernel List,
	ReiserFS List

Alan Cox writes:
 > > In Solaris, before spinning on a busy spin-lock, thread checks whether
 > > spin-lock holder runs on the same processor. If so, thread goes to sleep
 > > and holder wakes it up on spin-lock release. The same, I guess is going
 > 
 > 
 > > for interrupts that are served as separate threads. This way, one can
 > > re-schedule with spin-locks held.
 > 
 > This is one of the things interrupt handling by threads gives you, but the
 > performance cost is not nice. When you consider that ksoftirqd when it
 > kicks in (currently far too often) takes up to 10% off gigabit ethernet
 > performance, you can appreciate why we don't want to go that path.

I guess, reasoning behind Solaris design was that you have to disable
interrupts during critical sections more frequently than they would
actually block in them. So, you lose on interrupt thread creation and
when interrupts really block on a lock, but you gain because there is no
more need to disable interrupts when accessing data shared between
interrupt handler and the rest of the kernel. This can ultimately amount
to some net advantage especially when coupled with some sort of lazy
interrupt thread creation.

 > 
 > Our spinlock paths are supposed to be very small and predictable. Where 
 > there is sleeping involved we have semaphores.
 > 
 > As lockmeter shows we still have a few io_request_lock cases at least where
 > we lock for far too long

Reiserfs also likes to keep BKL while doing binary searches within nodes
of a tree.

 > 
 > Alan

Nikita.

 > -
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems
  2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
  2001-09-21 12:18                 ` Alan Cox
  2001-09-21 12:31                 ` Nikita Danilov
@ 2001-09-23 23:49                 ` Rusty Russell
  2 siblings, 0 replies; 21+ messages in thread
From: Rusty Russell @ 2001-09-23 23:49 UTC (permalink / raw)
  To: Alan Cox
  Cc: Nikita, akpm, george, andrea, rml, Dieter.Nuetzel, mason,
	kuib-kl, linux-kernel, andrea

On Fri, 21 Sep 2001 13:18:31 +0100 (BST)
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > In Solaris, before spinning on a busy spin-lock, thread checks whether
> > spin-lock holder runs on the same processor. If so, thread goes to sleep
> > and holder wakes it up on spin-lock release. The same, I guess is going
> 
> 
> > for interrupts that are served as separate threads. This way, one can
> > re-schedule with spin-locks held.
> 
> This is one of the things interrupt handling by threads gives you, but the
> performance cost is not nice. When you consider that ksoftirqd when it
> kicks in (currently far too often) takes up to 10% off gigabit ethernet
> performance, you can appreciate why we don't want to go that path.

I've been thinking about this: I know some people have been fiddling with making
do_softirq spin X times before kicking ksoftirqd, but how about the following?

Cheers,
Rusty.

--- linux-pmac/kernel/softirq.c	Sun Sep  9 15:11:37 2001
+++ working-pmac-ksoftirq/kernel/softirq.c	Mon Sep 24 09:44:07 2001
@@ -63,11 +63,12 @@
 	int cpu = smp_processor_id();
 	__u32 pending;
 	long flags;
-	__u32 mask;
+	long start;
 
 	if (in_interrupt())
 		return;
 
+	start = jiffies;
 	local_irq_save(flags);
 
 	pending = softirq_pending(cpu);
@@ -75,32 +76,32 @@
 	if (pending) {
 		struct softirq_action *h;
 
-		mask = ~pending;
 		local_bh_disable();
-restart:
-		/* Reset the pending bitmask before enabling irqs */
-		softirq_pending(cpu) = 0;
+		do {
+			/* Reset the pending bitmask before enabling irqs */
+			softirq_pending(cpu) = 0;
 
-		local_irq_enable();
+			local_irq_enable();
 
-		h = softirq_vec;
+			h = softirq_vec;
 
-		do {
-			if (pending & 1)
-				h->action(h);
-			h++;
-			pending >>= 1;
-		} while (pending);
-
-		local_irq_disable();
-
-		pending = softirq_pending(cpu);
-		if (pending & mask) {
-			mask &= ~pending;
-			goto restart;
-		}
+			do {
+				if (pending & 1)
+					h->action(h);
+				h++;
+				pending >>= 1;
+			} while (pending);
+
+			local_irq_disable();
+
+			pending = softirq_pending(cpu);
+
+			/* Don't spin here forever... */
+		} while (pending && start == jiffies);
 		__local_bh_enable();
 
+		/* If a timer tick went off, assume we're overloaded,
+                   and kick in ksoftirqd */
 		if (pending)
 			wakeup_softirqd(cpu);
 	}



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
       [not found] <200109201708.f8KH8sG15617@zero.tech9.net>
@ 2001-09-20 20:48 ` Robert Love
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Love @ 2001-09-20 20:48 UTC (permalink / raw)
  To: Dieter Nützel
  Cc: Beau Kuiper, Chris Mason, Andrea Arcangeli, Linux Kernel List,
	ReiserFS List

On Thu, 2001-09-20 at 13:08, Dieter Nützel wrote:
> I examined that ReiserFS suffer from kupdated since 2.4.7-ac3.
> When ever I do "kill -STOP kupdated" the performance is much better.
> I know this is unsafe...

The patches that are going around in this thread (stuff from Andrew
Morton and Andrea) should address some of the ReiserFS issues.  Have you
tried them?

Note that many of the patches will _not_ improve recorded latency,
because anytime current->need_resched is unset, the spinlocks will not
be unlocked and thus preemption will not be enabled.  The patches only
measure the time premption is not enabled.

Of course, true latency will decrease, so see if your `hiccups'
decrease.

Perhaps if we start putting in some conditional schedules like this, I
can write a macro to use the instrumentation stuff to report the shorter
latencies.

> Please have a look at Robert Love's Linux kernel preemption patches and the 
> conversation about my reported latency results.
> 
> It seems that ReiserFS is involved in the poor audio behavior (hiccups during 
> MP2/MP3/Ogg-Vorbis playback).
> 
> Re: [PATCH] Preemption Latency Measurement Tool
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100097432006605&w=2

> Taken from Andrea's latest post:
> 
> I can (with Randy Dunlap's ksysmap, 
> http://www.osdlab.org/sw_resources/scripts/ksysmap).
> 
> SunWave1>./ksysmap /boot/System.map c0114db3
> ksysmap: searching '/boot/System.map' for 'c0114db3'
> 
> c0114d60 T preempt_schedule
> c0114db3 ..... <<<<<
> c0114e10 T wake_up_process

Its really just as easy if not easier to lookup the file and line number
that /proc/latencytimes reports.

> Unneeded kernel locks/stalls which hurt latency and (global) throughput.
> 
> I will do some benchmarks against Andrea's VM
> 2.4.10-pre12 + patch-rml-2.4.10-pre12-preempt-kernel-1 + 
> patch-rml-2.4.10-pre12-preempt-stats-1
> 
> Hope this post isn't to long and nonbody feels offended.

No, thank you for it.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Significant performace improvements on reiserfs systems
@ 2001-09-20 17:08 Dieter Nützel
  0 siblings, 0 replies; 21+ messages in thread
From: Dieter Nützel @ 2001-09-20 17:08 UTC (permalink / raw)
  To: Beau Kuiper
  Cc: Chris Mason, Andrea Arcangeli, Robert Love, Linux Kernel List,
	ReiserFS List

On Thu, 20 Sep 2001, Beau Kuiper wrote:
> > On Thu, 20 Sep 2001, Chris Mason wrote:
> > >
> > On Thursday, September 20, 2001 03:12:44 PM +0800 Beau Kuiper
> <kuib-kl@ljbc.wa.edu.au> wrote:
> >
> > > Hi,
> > >
> > > Resierfs on 2.4 has always been bog slow.
> > >
> > > I have identified kupdated as the culprit, and have 3 patches that fix
> > > the peformance problems I have had been suffering.
> >
> > Thanks for sending these along.
> >
> > >
> > > I would like these patches to be reviewed an put into the mainline
> > > kernel so that others can testthe changes.
> >
> > > Patch 1.
> > >
> > > This patch fixes reiserfs to use the kupdated code path when told to
> > > resync its super block, like it did in 2.2.19. This is the culpit for
> > > bad reiserfs performace in 2.4. Unfortunately, this fix relies on the
> > > second patch to work properly.
> >
> > I promised linus I would never reactivate this code, it is just too nasty
> > ;-)  The problem is that write_super doesn't know if it is called from
> > sync or from kupdated.  The right fix is to have an extra param to
> > write_super, or another super_block method that gets called instead of
> > write_super when an immediate commit is not required.

I examined that ReiserFS suffer from kupdated since 2.4.7-ac3.
When ever I do "kill -STOP kupdated" the performance is much better.
I know this is unsafe...

> I don't think that this could happen until 2.5.x though, as either
> solution touches every file system. However, if we added an extra methed,
> we could do this while only slightly touching the other filesystems (where
> kupdated sync == real sync) Simply see if the method exists (is non-null)
> and call that instead with a kupdate sync instead of the normal
> super_sync. Are you interested in me writing a patch to do this?
>
> >
> > It is possible to get almost the same behaviour as 2.2.x by changing the
> > metadata sync interval in bdflush to 30 seconds.
> >
>
> But then kupdate doesn't flush normal data as regularly as it should, plus
> it is almost as messy as Patch 1 :-)
>
> > >
> > > Patch 2
> > >
> > > This patch implements a simple mechinism to ensure that each superblock
> > > only gets told to be flushed once. With reiserfs and the first patch,
> > > the superblock is still dirty after being told to sync (probably
> > > becasue it doesn't want to write out the entire journal every 5 seconds
> > > when kupdate calls it). This caused an infinite loop because sync_supers
> > > would always find the reiserfs superblock dirty when called from
> > > kupdated. I am not convinced that this patch is the best one for this
> > > problem (suggestions?)
> >
> > It is ok to leave the superblock dirty, after all, since the commit wasn't
> > done, the super is still dirty.  If the checks from reiserfs_write_super
> > are actually slowing things down, then it is probably best to fix the
> > checks.
>
> I meant, there might be better wway to prevent the endless loop than
> adding an extra field to the superblock data structure. I beleive (I
> havn't explored reiserfs code much) the slowdown is caused by the journal
> being synced with the superblock, thus causing:
>
> 1) Too much contention for disk resources.
> 2) A huge increase in the number of times programs must be suspended to
> wait for the disk

Please have a look at Robert Love's Linux kernel preemption patches and the 
conversation about my reported latency results.

It seems that ReiserFS is involved in the poor audio behavior (hiccups during 
MP2/MP3/Ogg-Vorbis playback).

Re: [PATCH] Preemption Latency Measurement Tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=100097432006605&w=2

Taken from Andrea's latest post:

> those are kernel addresses, can you resolve them via System.map rather
> than trying to find their start/end line number?
>
> > Worst 20 latency times of 8033 measured in this period.
> >   usec      cause     mask   start line/file      address   end line/file
> >  10856  spin_lock        1  1376/sched.c         c0114db3   697/sched.c

I can (with Randy Dunlap's ksysmap, 
http://www.osdlab.org/sw_resources/scripts/ksysmap).

SunWave1>./ksysmap /boot/System.map c0114db3
ksysmap: searching '/boot/System.map' for 'c0114db3'

c0114d60 T preempt_schedule
c0114db3 ..... <<<<<
c0114e10 T wake_up_process

> with dbench 48 we gone to 10msec latency as far I can see (still far
> from 0.5~1 sec). dbench 48 is longer so more probability to get the
> higher latency, and it does more I/O, probably also seeks more, so thre
> are many variables (slower insection in I/O queues first of all, etcll).
> However 10msec isn't that bad, it means 100hz, something that the human
> eye cannot see. 0.5~1 sec would been horribly bad latency instead.. :)
>
> >   10705        BKL        1  1302/inode.c         c016f359   697/sched.c

c016f300 T reiserfs_dirty_inode
c016f359 ..... <<<<<
c016f3f0 T reiserfs_sync_inode

> >   10577  spin_lock        1  1376/sched.c         c0114db3   303/namei.c

c0114d60 T preempt_schedule
c0114db3 ..... <<<<<
c0114e10 T wake_up_process

> >   9427  spin_lock        1   547/sched.c         c0112fe4   697/sched.c

c0112fb0 T schedule
c0112fe4 ..... <<<<<
c0113500 T __wake_up

> >   8526   reacqBKL        1  1375/sched.c         c0114d94   697/sched.c

c0114d60 T preempt_schedule
c0114d94 ..... <<<<<
c0114e10 T wake_up_process

> >   4492   reacqBKL        1  1375/sched.c         c0114d94  1381/sched.c

c0114d60 T preempt_schedule
c0114d94 ..... <<<<<
c0114e10 T wake_up_process

> >   4171        BKL        1  1302/inode.c         c016f359  1381/sched.c

c016f300 T reiserfs_dirty_inode
c016f359 ..... <<<<<
c016f3f0 T reiserfs_sync_inode

> >   3902   reacqBKL        0  1375/sched.c         c0114d94  1306/inode.c

c0114d60 T preempt_schedule
c0114d94 ..... <<<<<
c0114e10 T wake_up_process

> >   3376  spin_lock        0  1376/sched.c         c0114db3  1380/sched.c

c0114d60 T preempt_schedule
c0114db3 ..... <<<<<
c0114e10 T wake_up_process

> >   3132        BKL        0  1302/inode.c         c016f359  1380/sched.c

c016f300 T reiserfs_dirty_inode
c016f359 ..... <<<<<
c016f3f0 T reiserfs_sync_inode

> >   3096  spin_lock        1   547/sched.c         c0112fe4  1380/sched.c

c0112fb0 T schedule
c0112fe4 ..... <<<<<
c0113500 T __wake_up

> >   2808        BKL        0    30/inode.c         c016ce51  1381/sched.c

c016ce20 T reiserfs_delete_inode
c016ce51 ..... <<<<<
c016cf30 t _make_cpu_key

> >   2807  spin_lock        0   547/sched.c         c0112fe4  1381/sched.c

c0112fb0 T schedule
c0112fe4 ..... <<<<<
c0113500 T __wake_up

> >   2782        BKL        0   452/exit.c          c011af61  1380/sched.c

c011ae30 T do_exit
c011af61 ..... <<<<<
c011b190 T complete_and_exit

> >   2631  spin_lock        0   483/dcache.c        c0153efa   520/dcache.c

c0153ec0 t select_parent
c0153efa ..... <<<<<
c0153fc0 T shrink_dcache_parent

> >   2533        BKL        0   533/inode.c         c016d9cd  1380/sched.c

c016d930 T reiserfs_get_block
c016d9cd ..... <<<<<
c016e860 t init_inode

> >   2489        BKL        0   927/namei.c         c014b2bf  1380/sched.c

c014b210 T vfs_create
c014b2bf ..... <<<<<
c014b360 T open_namei

> >   2389        BKL        1   452/exit.c          c011af61    52/inode.c

c011ae30 T do_exit
c011af61 ..... <<<<<
c011b190 T complete_and_exit

> >   2369        BKL        1  1302/inode.c         c016f359   842/inode.c

c016f300 T reiserfs_dirty_inode
c016f359 ..... <<<<<
c016f3f0 T reiserfs_sync_inode

> >   2327        BKL        1    30/inode.c         c016ce51  1380/sched.c

c016ce20 T reiserfs_delete_inode
c016ce51 ..... <<<<<
c016cf30 t _make_cpu_key

> 3) Poor CPU utilization in code that uses the filesystem regularly (like
> compiling)

Unneeded kernel locks/stalls which hurt latency and (global) throughput.

>
> >
> > >
> > > Patch 3
> > >
> > > This patch was generated as I was exploring the buffer cache, wondering
> > > why reiserfs was so slow on 2.4. I found that kupdated may write buffers
> > > that are not actually old back to disk. Eg
> > >
> > > Imagine that there are 20 dirty buffers. 16 of them are more that 30
> > > seconds old (and should be written back), but the other 4 are younger
> > > than 30 seconds. The current code would force all 20 out to disk,
> > > interrupting
> > > programs still using the young 4 until the disk write was complete.
> > >
> > > I know that it isn't a major problem, but I found it and I have written
> > > the patch for it :-)
> > >
> > > Please try out these patches and give comments about style, performace
> > > ect. They fixed my problems, sliced almost a minute off 2.2.19 kernel
> > > compile time on my duron 700 (from 4min 30sec to 3min 45sec)
> >
> > Doe you have the results of the individual fixes?
>
> Patch 3 doesn't improve performace much (even in theory the number of
> dirty buffers being wrongly flushed is pretty low)
>
> Patch 2 doesn't improve performace at all (unless you apply patch 1,
> without it, the computer will bog itself into the ground on the
> first kupdated)
>
> Patch 1 makes a huge difference because it stops reiserfs from reacting
> badly on a kupdated.
>
> Are there any good benchmarks you want me to run, on the plain and modded
> kernels.

I will do some benchmarks against Andrea's VM
2.4.10-pre12 + patch-rml-2.4.10-pre12-preempt-kernel-1 + 
patch-rml-2.4.10-pre12-preempt-stats-1

Hope this post isn't to long and nonbody feels offended.

Regards,
	Dieter

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-09-23 23:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200109202112.f8KLCXG16849@zero.tech9.net>
     [not found] ` <20010920170812.CCCACE641B@ns1.suse.com>
2001-09-20 17:19   ` [PATCH] Significant performace improvements on reiserfs systems Chris Mason
2001-09-20 17:39     ` Andrew Morton
2001-09-20 18:47     ` Andrea Arcangeli
2001-09-20 18:58       ` Andrew Morton
2001-09-20 19:13         ` Andrea Arcangeli
2001-09-20 20:52     ` Robert Love
2001-09-20 21:11       ` Dieter Nützel
2001-09-20 22:24       ` Robert Love
2001-09-20 22:37         ` Andrea Arcangeli
2001-09-20 22:56         ` Robert Love
2001-09-20 23:15           ` Andrea Arcangeli
2001-09-21  0:37             ` george anzinger
2001-09-21  1:20               ` Andrew Morton
2001-09-21  3:14               ` Robert Love
2001-09-21  9:32               ` [reiserfs-list] " Nikita Danilov
2001-09-21 12:18                 ` Alan Cox
2001-09-21 12:31                 ` Nikita Danilov
2001-09-23 23:49                 ` Rusty Russell
2001-09-20 23:41           ` Robert Love
     [not found] <200109201708.f8KH8sG15617@zero.tech9.net>
2001-09-20 20:48 ` Robert Love
2001-09-20 17:08 Dieter Nützel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).