All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim
Date: Mon, 13 Sep 2010 22:33:01 +0800	[thread overview]
Message-ID: <20100913143301.GB14158@localhost> (raw)
In-Reply-To: <20100913135510.GH23508@csn.ul.ie>

> > >  	/* Check if we should syncronously wait for writeback */
> > >  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> > 
> > It is possible to OOM if the LRU list is small and/or the storage is slow, so
> > that the flusher cannot clean enough pages before the LRU is fully scanned.
> > 
> 
> To go OOM, nr_reclaimed would have to be 0 and for that, the entire list
> would have to be dirty or unreclaimable. If that situation happens, is
> the dirty throttling not also broken?

My worry is, even if the dirty throttling limit is instantly set to 0,
it may still take time to knock down the number of dirty pages. Think
about 500MB dirty pages waiting to be flushed to a slow USB stick.

> > So we may need do waits on dirty/writeback pages on *order-0*
> > direct reclaims, when priority goes rather low (such as < 3).
> > 
> 
> In case this is really necessary, the necessary stalling could be done by
> removing the check for lumpy reclaim in should_reclaim_stall().  What do
> you think of the following replacement?

I merely want to provide a guarantee, so it may be enough to add this:

        if (nr_freed == nr_taken)
                return false;

+       if (!priority)
+               return true;

This ensures the last full LRU scan will do necessary waits to prevent
the OOM.

> /*
>  * Returns true if the caller should wait to clean dirty/writeback pages.
>  *
>  * If we are direct reclaiming for contiguous pages and we do not reclaim
>  * everything in the list, try again and wait for writeback IO to complete.
>  * This will stall high-order allocations noticeably. Only do that when really
>  * need to free the pages under high memory pressure.
>  *
>  * Alternatively, if priority is getting high, it may be because there are
>  * too many dirty pages on the LRU. Rather than returning nr_reclaimed == 0
>  * and potentially causing an OOM, we stall on writeback.
>  */
> static inline bool should_reclaim_stall(unsigned long nr_taken,
>                                         unsigned long nr_freed,
>                                         int priority,
>                                         struct scan_control *sc)
> {
>         int stall_priority;
> 
>         /* kswapd should not stall on sync IO */
>         if (current_is_kswapd())
>                 return false;
> 
>         /* If we cannot writeback, there is no point stalling */
>         if (!sc->may_writepage)
>                 return false;
> 
>         /* If we have relaimed everything on the isolated list, no stall */
>         if (nr_freed == nr_taken)
>                 return false;
> 
>         /*
>          * For high-order allocations, there are two stall thresholds.
>          * High-cost allocations stall immediately where as lower
>          * order allocations such as stacks require the scanning
>          * priority to be much higher before stalling.
>          */
>         if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
>                 stall_priority = DEF_PRIORITY;
>         else
>                 stall_priority = DEF_PRIORITY / 3;
> 
>         return priority <= stall_priority;
> }
> 
> 
> > > +		int dirty_retry = MAX_SWAP_CLEAN_WAIT;
> > >  		set_lumpy_reclaim_mode(priority, sc, true);
> > > -		nr_reclaimed += shrink_page_list(&page_list, sc);
> > > +
> > > +		while (nr_reclaimed < nr_taken && nr_dirty && dirty_retry--) {
> > > +			struct page *page, *tmp;
> > > +
> > 
> > > +			/* Take off the clean pages marked for activation */
> > > +			list_for_each_entry_safe(page, tmp, &page_list, lru) {
> > > +				if (PageDirty(page) || PageWriteback(page))
> > > +					continue;
> > > +
> > > +				list_del(&page->lru);
> > > +				list_add(&page->lru, &putback_list);
> > > +			}
> > 
> > nitpick: I guess the above loop is optional code to avoid overheads
> > of shrink_page_list() repeatedly going through some unfreeable pages?
> 
> Pretty much, if they are to be activated, there is no point trying to reclaim
> them again. It's unnecessary overhead. A strong motivation for this
> series is to reduce overheads in the reclaim paths and unnecessary
> retrying of unfreeable pages.

We do so much waits in this loop, so that users will get upset by the
iowait stalls much much more than the CPU overheads.. best option is
always to avoid entering this loop in the first place, and if we
succeeded on that, these lines of optimizations will be nothing but
mind destroyers for newbie developers.

> > Considering this is the slow code path, I'd prefer to keep the code
> > simple than to do such optimizations.
> > 
> > > +			wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty);
> > 
> > how about 
> >                         if (!laptop_mode)
> >                                 wakeup_flusher_threads(nr_dirty);
> > 
> 
> It's not the same thing. wakeup_flusher_threads(0) in laptop_mode is to
> clean all pages if some need dirtying. laptop_mode cleans all pages to
> minimise disk spinups.

Ah.. that's sure fine. I wonder if the flusher could be more smart to
automatically extend the number of pages to write in laptop mode. This
could simplify some callers.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim
Date: Mon, 13 Sep 2010 22:33:01 +0800	[thread overview]
Message-ID: <20100913143301.GB14158@localhost> (raw)
In-Reply-To: <20100913135510.GH23508@csn.ul.ie>

> > >  	/* Check if we should syncronously wait for writeback */
> > >  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> > 
> > It is possible to OOM if the LRU list is small and/or the storage is slow, so
> > that the flusher cannot clean enough pages before the LRU is fully scanned.
> > 
> 
> To go OOM, nr_reclaimed would have to be 0 and for that, the entire list
> would have to be dirty or unreclaimable. If that situation happens, is
> the dirty throttling not also broken?

My worry is, even if the dirty throttling limit is instantly set to 0,
it may still take time to knock down the number of dirty pages. Think
about 500MB dirty pages waiting to be flushed to a slow USB stick.

> > So we may need do waits on dirty/writeback pages on *order-0*
> > direct reclaims, when priority goes rather low (such as < 3).
> > 
> 
> In case this is really necessary, the necessary stalling could be done by
> removing the check for lumpy reclaim in should_reclaim_stall().  What do
> you think of the following replacement?

I merely want to provide a guarantee, so it may be enough to add this:

        if (nr_freed == nr_taken)
                return false;

+       if (!priority)
+               return true;

This ensures the last full LRU scan will do necessary waits to prevent
the OOM.

> /*
>  * Returns true if the caller should wait to clean dirty/writeback pages.
>  *
>  * If we are direct reclaiming for contiguous pages and we do not reclaim
>  * everything in the list, try again and wait for writeback IO to complete.
>  * This will stall high-order allocations noticeably. Only do that when really
>  * need to free the pages under high memory pressure.
>  *
>  * Alternatively, if priority is getting high, it may be because there are
>  * too many dirty pages on the LRU. Rather than returning nr_reclaimed == 0
>  * and potentially causing an OOM, we stall on writeback.
>  */
> static inline bool should_reclaim_stall(unsigned long nr_taken,
>                                         unsigned long nr_freed,
>                                         int priority,
>                                         struct scan_control *sc)
> {
>         int stall_priority;
> 
>         /* kswapd should not stall on sync IO */
>         if (current_is_kswapd())
>                 return false;
> 
>         /* If we cannot writeback, there is no point stalling */
>         if (!sc->may_writepage)
>                 return false;
> 
>         /* If we have relaimed everything on the isolated list, no stall */
>         if (nr_freed == nr_taken)
>                 return false;
> 
>         /*
>          * For high-order allocations, there are two stall thresholds.
>          * High-cost allocations stall immediately where as lower
>          * order allocations such as stacks require the scanning
>          * priority to be much higher before stalling.
>          */
>         if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
>                 stall_priority = DEF_PRIORITY;
>         else
>                 stall_priority = DEF_PRIORITY / 3;
> 
>         return priority <= stall_priority;
> }
> 
> 
> > > +		int dirty_retry = MAX_SWAP_CLEAN_WAIT;
> > >  		set_lumpy_reclaim_mode(priority, sc, true);
> > > -		nr_reclaimed += shrink_page_list(&page_list, sc);
> > > +
> > > +		while (nr_reclaimed < nr_taken && nr_dirty && dirty_retry--) {
> > > +			struct page *page, *tmp;
> > > +
> > 
> > > +			/* Take off the clean pages marked for activation */
> > > +			list_for_each_entry_safe(page, tmp, &page_list, lru) {
> > > +				if (PageDirty(page) || PageWriteback(page))
> > > +					continue;
> > > +
> > > +				list_del(&page->lru);
> > > +				list_add(&page->lru, &putback_list);
> > > +			}
> > 
> > nitpick: I guess the above loop is optional code to avoid overheads
> > of shrink_page_list() repeatedly going through some unfreeable pages?
> 
> Pretty much, if they are to be activated, there is no point trying to reclaim
> them again. It's unnecessary overhead. A strong motivation for this
> series is to reduce overheads in the reclaim paths and unnecessary
> retrying of unfreeable pages.

We do so much waits in this loop, so that users will get upset by the
iowait stalls much much more than the CPU overheads.. best option is
always to avoid entering this loop in the first place, and if we
succeeded on that, these lines of optimizations will be nothing but
mind destroyers for newbie developers.

> > Considering this is the slow code path, I'd prefer to keep the code
> > simple than to do such optimizations.
> > 
> > > +			wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty);
> > 
> > how about 
> >                         if (!laptop_mode)
> >                                 wakeup_flusher_threads(nr_dirty);
> > 
> 
> It's not the same thing. wakeup_flusher_threads(0) in laptop_mode is to
> clean all pages if some need dirtying. laptop_mode cleans all pages to
> minimise disk spinups.

Ah.. that's sure fine. I wonder if the flusher could be more smart to
automatically extend the number of pages to write in laptop mode. This
could simplify some callers.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-13 14:33 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-06 10:47 [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman
2010-09-06 10:47 ` Mel Gorman
2010-09-06 10:47 ` [PATCH 01/10] tracing, vmscan: Add trace events for LRU list shrinking Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-06 10:47 ` [PATCH 02/10] writeback: Account for time spent congestion_waited Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-06 10:47 ` [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:25   ` Minchan Kim
2010-09-07 15:25     ` Minchan Kim
2010-09-08 11:04     ` Mel Gorman
2010-09-08 11:04       ` Mel Gorman
2010-09-08 14:52       ` Minchan Kim
2010-09-08 14:52         ` Minchan Kim
2010-09-09  8:54         ` Mel Gorman
2010-09-09  8:54           ` Mel Gorman
2010-09-12 15:37           ` Minchan Kim
2010-09-12 15:37             ` Minchan Kim
2010-09-13  8:55             ` Mel Gorman
2010-09-13  8:55               ` Mel Gorman
2010-09-13  9:48               ` Minchan Kim
2010-09-13  9:48                 ` Minchan Kim
2010-09-13 10:07                 ` Mel Gorman
2010-09-13 10:07                   ` Mel Gorman
2010-09-13 10:20                   ` Minchan Kim
2010-09-13 10:20                     ` Minchan Kim
2010-09-13 10:30                     ` Mel Gorman
2010-09-13 10:30                       ` Mel Gorman
2010-09-08 21:23   ` Andrew Morton
2010-09-08 21:23     ` Andrew Morton
2010-09-09 10:43     ` Mel Gorman
2010-09-09 10:43       ` Mel Gorman
2010-09-09  3:02   ` KAMEZAWA Hiroyuki
2010-09-09  3:02     ` KAMEZAWA Hiroyuki
2010-09-09  8:58     ` Mel Gorman
2010-09-09  8:58       ` Mel Gorman
2010-09-06 10:47 ` [PATCH 04/10] vmscan: Synchronous lumpy reclaim should not call congestion_wait() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:26   ` Minchan Kim
2010-09-07 15:26     ` Minchan Kim
2010-09-08  6:15   ` Johannes Weiner
2010-09-08  6:15     ` Johannes Weiner
2010-09-08 11:25   ` Wu Fengguang
2010-09-08 11:25     ` Wu Fengguang
2010-09-09  3:03   ` KAMEZAWA Hiroyuki
2010-09-09  3:03     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 05/10] vmscan: Synchrounous lumpy reclaim use lock_page() instead trylock_page() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:28   ` Minchan Kim
2010-09-07 15:28     ` Minchan Kim
2010-09-08  6:16   ` Johannes Weiner
2010-09-08  6:16     ` Johannes Weiner
2010-09-08 11:28   ` Wu Fengguang
2010-09-08 11:28     ` Wu Fengguang
2010-09-09  3:04   ` KAMEZAWA Hiroyuki
2010-09-09  3:04     ` KAMEZAWA Hiroyuki
2010-09-09  3:15     ` KAMEZAWA Hiroyuki
2010-09-09  3:15       ` KAMEZAWA Hiroyuki
2010-09-09  3:25       ` Wu Fengguang
2010-09-09  3:25         ` Wu Fengguang
2010-09-09  4:13       ` KOSAKI Motohiro
2010-09-09  4:13         ` KOSAKI Motohiro
2010-09-09  9:22         ` Mel Gorman
2010-09-09  9:22           ` Mel Gorman
2010-09-10 10:25           ` KOSAKI Motohiro
2010-09-10 10:25             ` KOSAKI Motohiro
2010-09-10 10:33             ` KOSAKI Motohiro
2010-09-10 10:33               ` KOSAKI Motohiro
2010-09-10 10:33               ` KOSAKI Motohiro
2010-09-13  9:14             ` Mel Gorman
2010-09-13  9:14               ` Mel Gorman
2010-09-14 10:14               ` KOSAKI Motohiro
2010-09-14 10:14                 ` KOSAKI Motohiro
2010-09-06 10:47 ` [PATCH 06/10] vmscan: Narrow the scenarios lumpy reclaim uses synchrounous reclaim Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-09  3:14   ` KAMEZAWA Hiroyuki
2010-09-09  3:14     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 07/10] vmscan: Remove dead code in shrink_inactive_list() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:33   ` Minchan Kim
2010-09-07 15:33     ` Minchan Kim
2010-09-06 10:47 ` [PATCH 08/10] vmscan: isolated_lru_pages() stop neighbour search if neighbour cannot be isolated Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:37   ` Minchan Kim
2010-09-07 15:37     ` Minchan Kim
2010-09-08 11:12     ` Mel Gorman
2010-09-08 11:12       ` Mel Gorman
2010-09-08 14:58       ` Minchan Kim
2010-09-08 14:58         ` Minchan Kim
2010-09-08 11:37   ` Wu Fengguang
2010-09-08 11:37     ` Wu Fengguang
2010-09-08 12:50     ` Mel Gorman
2010-09-08 12:50       ` Mel Gorman
2010-09-08 13:14       ` Wu Fengguang
2010-09-08 13:14         ` Wu Fengguang
2010-09-08 13:27         ` Mel Gorman
2010-09-08 13:27           ` Mel Gorman
2010-09-09  3:17   ` KAMEZAWA Hiroyuki
2010-09-09  3:17     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-13 13:31   ` Wu Fengguang
2010-09-13 13:31     ` Wu Fengguang
2010-09-13 13:55     ` Mel Gorman
2010-09-13 13:55       ` Mel Gorman
2010-09-13 14:33       ` Wu Fengguang [this message]
2010-09-13 14:33         ` Wu Fengguang
2010-10-28 21:50   ` Christoph Hellwig
2010-10-28 21:50     ` Christoph Hellwig
2010-10-29 10:26     ` Mel Gorman
2010-10-29 10:26       ` Mel Gorman
2010-09-06 10:47 ` [PATCH 10/10] vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-09  3:22   ` KAMEZAWA Hiroyuki
2010-09-09  3:22     ` KAMEZAWA Hiroyuki
2010-09-09  9:32     ` Mel Gorman
2010-09-09  9:32       ` Mel Gorman
2010-09-13  0:53       ` KAMEZAWA Hiroyuki
2010-09-13  0:53         ` KAMEZAWA Hiroyuki
2010-09-13 13:48   ` Wu Fengguang
2010-09-13 13:48     ` Wu Fengguang
2010-09-13 14:10     ` Mel Gorman
2010-09-13 14:10       ` Mel Gorman
2010-09-13 14:41       ` Wu Fengguang
2010-09-13 14:41         ` Wu Fengguang
2010-09-06 10:49 ` [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman
2010-09-06 10:49   ` Mel Gorman
2010-09-08  3:14 ` KOSAKI Motohiro
2010-09-08  3:14   ` KOSAKI Motohiro
2010-09-08  8:38   ` Mel Gorman
2010-09-08  8:38     ` Mel Gorman
2010-09-13 23:10 ` Minchan Kim
2010-09-13 23:10   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100913143301.GB14158@localhost \
    --to=fengguang.wu@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.