linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Martin Knoblauch <spamtrap@knobisoft.de>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>,
	Mike Snitzer <snitzer@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	jplatte@naasa.net, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	James.Bottomley@steeleye.com
Subject: Re: regression: 100% io-wait with 2.6.24-rcX
Date: Thu, 17 Jan 2008 20:23:57 +0000	[thread overview]
Message-ID: <20080117202356.GA25975@csn.ul.ie> (raw)
In-Reply-To: <411558.92960.qm@web32601.mail.mud.yahoo.com>

On (17/01/08 09:44), Martin Knoblauch didst pronounce:
> > > > > > > On Wed, Jan 16, 2008 at 01:26:41AM -0800, Martin Knoblauch wrote:
> > > > > > For those interested in using your writeback improvements in
> > > > > > production sooner rather than later (primarily with ext3); what
> > > > > > recommendations do you have?  Just heavily test our own 2.6.24
> > > > > > evolving "close, but not ready for merge" -mm writeback patchset?
> > > > > > 
> > > > > 
> > > > >  I can add myself to Mikes question. It would be good to know a
> > > > 
> > > > "roadmap" for the writeback changes. Testing 2.6.24-rcX so far has
> > > > been showing quite nice improvement of the overall writeback situation and
> > > > it would be sad to see this [partially] gone in 2.6.24-final.
> > > > Linus apparently already has reverted  "...2250b". I will definitely
> > > > repeat my tests  with -rc8. and report.
> > > > 
> > > Thank you, Martin. Can you help test this patch on 2.6.24-rc7?
> > > Maybe we can push it to 2.6.24 after your testing.
> > > 
> > Hi Fengguang,
> > 
> > something really bad has happened between -rc3 and -rc6.
> > Embarrassingly I did not catch that earlier :-(
> > Compared to the numbers I posted in
> > http://lkml.org/lkml/2007/10/26/208 , dd1 is now at 60 MB/sec
> > (slight plus), while dd2/dd3 suck the same way as in pre 2.6.24.
> > The only test that is still good is mix3, which I attribute to
> > the per-BDI stuff.

I suspect that the IO hardware you have is very sensitive to the color of the
physical page. I wonder, do you boot the system cleanly and then run these
tests? If so, it would be interesting to know what happens if you stress
the system first (many kernel compiles for example, basically anything that
would use a lot of memory in different ways for some time) to randomise the
free lists a bit and then run your test. You'd need to run the test
three times for 2.6.23, 2.6.24-rc8 and 2.6.24-rc8 with the patch you
identified reverted.

> > At the moment I am frantically trying to find when things went down. I
> > did run -rc8 and rc8+yourpatch. No difference to what I see with -rc6.
> > 
> > Sorry that I cannot provide any input to your patch.
> 
>  OK, the change happened between rc5 and rc6. Just following a gut feeling, I reverted
> 
> #commit 81eabcbe0b991ddef5216f30ae91c4b226d54b6d
> #Author: Mel Gorman <mel@csn.ul.ie>
> #Date:   Mon Dec 17 16:20:05 2007 -0800
> #
> #    mm: fix page allocation for larger I/O segments
> #    
> #    In some cases the IO subsystem is able to merge requests if the pages are
> #    adjacent in physical memory.  This was achieved in the allocator by having
> #    expand() return pages in physically contiguous order in situations were a
> #    large buddy was split.  However, list-based anti-fragmentation changed the
> #    order pages were returned in to avoid searching in buffered_rmqueue() for a
> #    page of the appropriate migrate type.
> #    
> #    This patch restores behaviour of rmqueue_bulk() preserving the physical
> #    order of pages returned by the allocator without incurring increased search
> #    costs for anti-fragmentation.
> #    
> #    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> #    Cc: James Bottomley <James.Bottomley@steeleye.com>
> #    Cc: Jens Axboe <jens.axboe@oracle.com>
> #    Cc: Mark Lord <mlord@pobox.com
> #    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> #    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> diff -urN linux-2.6.24-rc5/mm/page_alloc.c linux-2.6.24-rc6/mm/page_alloc.c
> --- linux-2.6.24-rc5/mm/page_alloc.c    2007-12-21 04:14:11.305633890 +0000
> +++ linux-2.6.24-rc6/mm/page_alloc.c    2007-12-21 04:14:17.746985697 +0000
> @@ -847,8 +847,19 @@
>                 struct page *page = __rmqueue(zone, order, migratetype);
>                 if (unlikely(page == NULL))
>                         break;
> +
> +               /*
> +                * Split buddy pages returned by expand() are received here
> +                * in physical page order. The page is added to the callers and
> +                * list and the list head then moves forward. From the callers
> +                * perspective, the linked list is ordered by page number in
> +                * some conditions. This is useful for IO devices that can
> +                * merge IO requests if the physical pages are ordered
> +                * properly.
> +                */
>                 list_add(&page->lru, list);
>                 set_page_private(page, migratetype);
> +               list = &page->lru;
>         }
>         spin_unlock(&zone->lock);
>         return i;
> 
> This has brought back the good results I observed and reported.
> I do not know what to make out of this. At least on the systems I care
> about (HP/DL380g4, dual CPUs, HT-enabled, 8 GB Memory, SmartaArray6i
> controller with 4x72GB SCSI disks as RAID5 (battery protected writeback
> cache enabled) and gigabit networking (tg3)) this optimisation is a dissaster.
> 

That patch was not an optimisation, it was a regression fix against 2.6.23
and I don't believe reverting it is an option. Other IO hardware benefits
from having the allocator supply pages in PFN order. Your controller would
seem to suffer when presented with the same situation but I don't know why
that is. I've added James to the cc in case he has seen this sort of
situation before.

> On the other hand, it is not a regression against 2.6.22/23. Those had
> bad IO scaling to. It would just be a shame to loose an apparently great
> performance win.
> is

Could you try running your tests again when the system has been stressed
with some other workload first?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2008-01-17 20:24 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-17 17:44 regression: 100% io-wait with 2.6.24-rcX Martin Knoblauch
2008-01-17 20:23 ` Mel Gorman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-01-23 11:12 Martin Knoblauch
2008-01-22 18:51 Martin Knoblauch
2008-01-22 15:25 Martin Knoblauch
2008-01-22 23:40 ` Alasdair G Kergon
2008-01-19 10:24 Martin Knoblauch
2008-01-18  8:19 Martin Knoblauch
2008-01-18 16:01 ` Mel Gorman
2008-01-18 17:46   ` Linus Torvalds
2008-01-18 19:01     ` Martin Knoblauch
2008-01-18 19:23       ` Linus Torvalds
2008-01-22 14:39       ` Alasdair G Kergon
2008-01-18 20:00     ` Mike Snitzer
2008-01-18 22:47       ` Mike Snitzer
2008-01-17 21:50 Martin Knoblauch
2008-01-17 22:12 ` Mel Gorman
2008-01-17 17:51 Martin Knoblauch
2008-01-17 13:52 Martin Knoblauch
2008-01-17 16:11 ` Mike Snitzer
2008-01-16 14:15 Martin Knoblauch
2008-01-16 16:27 ` Mike Snitzer
2008-01-16  9:26 Martin Knoblauch
     [not found] ` <E1JF6w8-0000vs-HM@localhost.localdomain>
2008-01-16 12:00   ` Fengguang Wu
2008-01-07 10:51 Joerg Platte
2008-01-07 11:19 ` Ingo Molnar
2008-01-07 13:24   ` Joerg Platte
2008-01-07 13:32     ` Peter Zijlstra
2008-01-07 13:40       ` Joerg Platte
     [not found]         ` <E1JCRbA-0002bh-3c@localhost.localdomain>
2008-01-09  3:27           ` Fengguang Wu
2008-01-09  6:13             ` Joerg Platte
     [not found]               ` <E1JCZg2-0001DE-RP@localhost.localdomain>
2008-01-09 12:04                 ` Fengguang Wu
2008-01-09 12:22                   ` Joerg Platte
     [not found]                     ` <E1JCaUd-0001Ko-Tt@localhost.localdomain>
2008-01-09 12:57                       ` Fengguang Wu
2008-01-09 13:04                         ` Joerg Platte
     [not found]                           ` <E1JCrMj-0001HR-SZ@localhost.localdomain>
2008-01-10  6:58                             ` Fengguang Wu
     [not found]                           ` <E1JCrsE-0000v4-Dz@localhost.localdomain>
2008-01-10  7:30                             ` Fengguang Wu
     [not found]                           ` <20080110073046.GA3432@mail.ustc.edu.cn>
     [not found]                             ` <E1JCsDr-0002cl-0e@localhost.localdomain>
2008-01-10  7:53                               ` Fengguang Wu
2008-01-10  8:37                                 ` Joerg Platte
     [not found]                                   ` <E1JCt0n-00048n-AD@localhost.localdomain>
2008-01-10  8:43                                     ` Fengguang Wu
2008-01-10 10:03                                       ` Joerg Platte
     [not found]                                         ` <E1JDBk4-0000UF-03@localhost.localdomain>
2008-01-11  4:43                                           ` Fengguang Wu
2008-01-11  5:29                                             ` Joerg Platte
2008-01-11  6:41                                               ` Joerg Platte
2008-01-12 23:32                                             ` Joerg Platte
     [not found]                                               ` <E1JDwaA-00017Q-W6@localhost.localdomain>
2008-01-13  6:44                                                 ` Fengguang Wu
2008-01-13  8:05                                                   ` Joerg Platte
     [not found]                                                     ` <E1JDy5a-0001al-Tk@localhost.localdomain>
2008-01-13  8:21                                                       ` Fengguang Wu
2008-01-13  9:49                                                         ` Joerg Platte
     [not found]                                                           ` <E1JE1Uz-0002w5-6z@localhost.localdomain>
2008-01-13 11:59                                                             ` Fengguang Wu
     [not found]                                                           ` <20080113115933.GA11045@mail.ustc.edu.cn>
     [not found]                                                             ` <E1JEGPH-0001uw-Df@localhost.localdomain>
2008-01-14  3:54                                                               ` Fengguang Wu
     [not found]                                                             ` <20080114035439.GA7330@mail.ustc.edu.cn>
     [not found]                                                               ` <E1JEM2I-00010S-5U@localhost.localdomain>
2008-01-14  9:55                                                                 ` Fengguang Wu
2008-01-14 11:30                                                                   ` Joerg Platte
2008-01-14 11:41                                                                     ` Peter Zijlstra
     [not found]                                                                       ` <E1JEOmD-0001Ap-U7@localhost.localdomain>
2008-01-14 12:50                                                                         ` Fengguang Wu
2008-01-15 21:13                                                                           ` Mike Snitzer
     [not found]                                                                             ` <E1JF0m1-000101-OK@localhost.localdomain>
2008-01-16  5:25                                                                               ` Fengguang Wu
2008-01-15 21:42                                                                         ` Ingo Molnar
     [not found]                                                                           ` <E1JF0bJ-0000zU-FG@localhost.localdomain>
2008-01-16  5:14                                                                             ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080117202356.GA25975@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=James.Bottomley@steeleye.com \
    --cc=jplatte@naasa.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=snitzer@gmail.com \
    --cc=spamtrap@knobisoft.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wfg@mail.ustc.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).