All of lore.kernel.org
 help / color / mirror / Atom feed
* Latency issues with MD-RAID
@ 2011-03-01 21:13 Jansen, Frank
  2011-03-01 22:04 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Jansen, Frank @ 2011-03-01 21:13 UTC (permalink / raw)
  To: linux-raid

We're doing some testing to determine performance of MD-RAID and suitability for our environment.

One particular test is giving some cause for concern:

- Run heavy I/O to a raw partition:
 # time dd if=/dev/zero of=/dev/md0p1 bs=131072 count=1000000
- Run single sync I/Os to the partition:
 # time dd if=/dev/zero of=/dev/md0p1 bs=4096 count=1 oflag=sync

When we run this, latency for the single I/O completion can go as high as 5-10 seconds

In investigating this, it looks like the following code in md_write_start causes most of the slow down:

        if (mddev->in_sync) {
                spin_lock_irq(&mddev->write_lock);
                if (mddev->in_sync) {
                        mddev->in_sync = 0;
                        set_bit(MD_CHANGE_CLEAN, &mddev->flags);
                        set_bit(MD_CHANGE_PENDING, &mddev->flags);
                        md_wakeup_thread(mddev->thread);
                        did_change = 1;
                }
                spin_unlock_irq(&mddev->write_lock);
        }

When we change this to run about once every 10 seconds, our latency goes way down to a reasonable number of milliseconds.

Questions:
- is the high latency for single sync I/Os something that we should expect?
- the first time the thread runs, it was seen to take a lot longer.  Is this due to more outstanding metadata or similar?
- is the approach to run the thread less frequently reasonable, or does that open up huge problems?

Thanks,

Frank

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Latency issues with MD-RAID
  2011-03-01 21:13 Latency issues with MD-RAID Jansen, Frank
@ 2011-03-01 22:04 ` NeilBrown
  2011-03-02 19:17   ` Jansen, Frank
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2011-03-01 22:04 UTC (permalink / raw)
  To: Jansen, Frank; +Cc: linux-raid

On Tue, 1 Mar 2011 21:13:46 +0000 "Jansen, Frank" <fjansen@CROSSBEAMSYS.COM>
wrote:

> We're doing some testing to determine performance of MD-RAID and suitability for our environment.

RAID0 ? RAID1?  RAID5 ? 
It helps to be specific.

> 
> One particular test is giving some cause for concern:
> 
> - Run heavy I/O to a raw partition:
>  # time dd if=/dev/zero of=/dev/md0p1 bs=131072 count=1000000
> - Run single sync I/Os to the partition:
>  # time dd if=/dev/zero of=/dev/md0p1 bs=4096 count=1 oflag=sync
> 
> When we run this, latency for the single I/O completion can go as high as 5-10 seconds
> 
> In investigating this, it looks like the following code in md_write_start causes most of the slow down:
> 
>         if (mddev->in_sync) {
>                 spin_lock_irq(&mddev->write_lock);
>                 if (mddev->in_sync) {
>                         mddev->in_sync = 0;
>                         set_bit(MD_CHANGE_CLEAN, &mddev->flags);
>                         set_bit(MD_CHANGE_PENDING, &mddev->flags);
>                         md_wakeup_thread(mddev->thread);
>                         did_change = 1;
>                 }
>                 spin_unlock_irq(&mddev->write_lock);
>         }
> 
> When we change this to run about once every 10 seconds, our latency goes way down to a reasonable number of milliseconds.

What did you change exactly.

This code can be tuned by changing
   /sys/block/mdXXX/md/safe_mode_timeout
which is measured in seconds and is the delay before marking a clean array
dirty.

> 
> Questions:
> - is the high latency for single sync I/Os something that we should expect?

Not necessarily.

> - the first time the thread runs, it was seen to take a lot longer.  Is this due to more outstanding metadata or similar?

No idea without a lot more details.  What is "the thread"?  How much is "a
lot longer"?


> - is the approach to run the thread less frequently reasonable, or does that open up huge problems?

Seeing you have said exactly what you mean by "run the thread less
frequently", that is a very hard question to answer.

NeilBrown



> 
> Thanks,
> 
> Frank
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Latency issues with MD-RAID
  2011-03-01 22:04 ` NeilBrown
@ 2011-03-02 19:17   ` Jansen, Frank
  0 siblings, 0 replies; 3+ messages in thread
From: Jansen, Frank @ 2011-03-02 19:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Neil,

Thank you for your response and my apologies for the incomplete nature of the e-mail; I didn't do all the work myself, so have collected the rest of the data to help complete the picture.

> > We're doing some testing to determine performance of MD-RAID and
> suitability for our environment.
> 
> RAID0 ? RAID1?  RAID5 ?
> It helps to be specific.
Sorry.  Should have mentioned that we're seeing this both with RAID1 and RAID5, but not with RAID0.
> 
> >
> > One particular test is giving some cause for concern:
> >
> > - Run heavy I/O to a raw partition:
> >  # time dd if=/dev/zero of=/dev/md0p1 bs=131072 count=1000000
> > - Run single sync I/Os to the partition:
> >  # time dd if=/dev/zero of=/dev/md0p1 bs=4096 count=1 oflag=sync
> >
> > When we run this, latency for the single I/O completion can go as
> high as 5-10 seconds
> >
> > In investigating this, it looks like the following code in
> md_write_start causes most of the slow down:
> >
> >         if (mddev->in_sync) {
> >                 spin_lock_irq(&mddev->write_lock);
> >                 if (mddev->in_sync) {
> >                         mddev->in_sync = 0;
> >                         set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> >                         set_bit(MD_CHANGE_PENDING, &mddev->flags);
> >                         md_wakeup_thread(mddev->thread);
> >                         did_change = 1;
> >                 }
> >                 spin_unlock_irq(&mddev->write_lock);
> >         }
> >
> > When we change this to run about once every 10 seconds, our latency
> goes way down to a reasonable number of milliseconds.
> 
> What did you change exactly.
> 
> This code can be tuned by changing
>    /sys/block/mdXXX/md/safe_mode_timeout
> which is measured in seconds and is the delay before marking a clean
> array
> dirty.
> 
I have put the code changes at the end of this message, and I'll test the safe_mode_timeout setting.
> >
> > Questions:
> > - is the high latency for single sync I/Os something that we should
> expect?
> 
> Not necessarily.
> 
> > - the first time the thread runs, it was seen to take a lot longer.
> Is this due to more outstanding metadata or similar?
> 
> No idea without a lot more details.  What is "the thread"?  How much is
> "a
> lot longer"?
> 
Should have been clearer; the thread is the appropriate raid thread; i.e. raid1d or raid5d.  When we put some timers in the code, without other changes, and then start the sync I/O once per second, the first sync write often takes as much as 5-10 seconds, whereas most of the others will average around 1 second with spikes from 2-5 seconds.  Occasional spikes were seen up to 15 seconds to complete a write, but those are infrequent.
> 
> > - is the approach to run the thread less frequently reasonable, or
> does that open up huge problems?
> 
> Seeing you have said exactly what you mean by "run the thread less
> frequently", that is a very hard question to answer.
> 
The change is to delay the superblock update for up to 10 seconds in the raid thread.

> NeilBrown
> 
> 
> 
> >
> > Thanks,
> >
> > Frank
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

drivers/md$ diff -c 
/kernels/linux_src-2.6.18-53.el5_64/drivers/md/raid1.c raid1.c
*** /kernels/linux_src-2.6.18-53.el5_64/drivers/md/raid1.c    2008-11-19 
15:02:05.000000000 -0500
--- raid1.c    2011-03-01 14:10:21.347880000 -0500
***************
*** 750,755 ****
--- 750,756 ----
       struct page **behind_pages = NULL;
       const int rw = bio_data_dir(bio);
       int do_barriers;
+     unsigned long start, sbsync, diska, diskb, end;

       /*
        * Register the new request and wait if the reconstruction
***************
*** 760,766 ****
        * if barriers work.
        */

!     md_write_start(mddev, bio); /* wait on superblock update early */

       if (unlikely(!mddev->barriers_work && bio_barrier(bio))) {
           if (rw == WRITE)
--- 761,785 ----
        * if barriers work.
        */

!     diska = diskb = end = start = 0;
!     if(IOPRIO_PRIO_CLASS(current->ioprio) == IOPRIO_CLASS_RT)
!     {
!         static int count;
!         static unsigned long lastmw;
!
!         if(lastmw == 0)
!             lastmw = jiffies;
!         start = jiffies;
!         if((count++ > 40) || ((jiffies - lastmw) > (HZ*10)))
!         {
!             md_write_start(mddev, bio); /* wait on superblock update 
early */
!             count = 0;
!             lastmw = jiffies;
!         }
!     }
!     else
!         md_write_start(mddev, bio); /* wait on superblock update early */
!     sbsync = jiffies;

       if (unlikely(!mddev->barriers_work && bio_barrier(bio))) {
           if (rw == WRITE)
***************
*** 920,925 ****
--- 939,948 ----
           generic_make_request(bio);
   #endif

+     end = jiffies;
+     //if(start != 0)
+         //printk("Raid1 make_request sbsync %ld, total 
%ld\n",sbsync-start,end-start);
+
       return 0;
   }




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-02 19:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-01 21:13 Latency issues with MD-RAID Jansen, Frank
2011-03-01 22:04 ` NeilBrown
2011-03-02 19:17   ` Jansen, Frank

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.