All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Stephan Uphoff <ups@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	linaro-kernel@lists.linaro.org, android-kernel@googlegroups.com,
	linux-mm@kvack.org, "Luca Porzio (lporzio)" <lporzio@micron.com>,
	Alex Lemberg <alex.lemberg@sandisk.com>,
	linux-kernel@vger.kernel.org,
	Saugata Das <saugata.das@linaro.org>,
	Venkatraman S <venkat@linaro.org>,
	Yejin Moon <yejin.moon@samsung.com>,
	Hyojin Jeong <syr.jeong@samsung.com>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
Subject: Re: swap on eMMC and other flash
Date: Mon, 16 Apr 2012 18:59:32 +0000	[thread overview]
Message-ID: <201204161859.32436.arnd@arndb.de> (raw)
In-Reply-To: <CAKL-ytsXbe4=u94PjqvhZo=ZLiChQ0FmZC84GNrFHa0N1mDjFw@mail.gmail.com>

On Monday 16 April 2012, Stephan Uphoff wrote:
> opportunity to plant a few ideas.
> 
> In contrast to rotational disks read/write operation overhead and
> costs are not symmetric.
> While random reads are much faster on flash - the number of write
> operations is limited by wearout and garbage collection overhead.
> To further improve swapping on eMMC or similar flash media I believe
> that the following issues need to be addressed:
> 
> 1) Limit average write bandwidth to eMMC to a configurable level to
> guarantee a minimum device lifetime
> 2) Aim for a low write amplification factor to maximize useable write bandwidth
> 3) Strongly favor read over write operations
> 
> Lowering write amplification (2) has been discussed in this email
> thread - and the only observation I would like to add is that
> over-provisioning the internal swap space compared to the exported
> swap space significantly can guarantee a lower write amplification
> factor with the indirection and GC techniques discussed.

Yes, good point.

> I believe the swap functionality is currently optimized for storage
> media where read and write costs are nearly identical.
> As this is not the case on flash I propose splitting the anonymous
> inactive queue (at least conceptually) - keeping clean anonymous pages
> with swap slots on a separate queue as the cost of swapping them
> out/in is only an inexpensive read operation. A variable similar to
> swapiness (or a more dynamic algorithmn) could determine the
> preference for swapping out clean pages or dirty pages. ( A similar
> argument could be made for splitting up the file inactive queue )

I'm not sure I understand yet how this would be different from swappiness.

> The problem of limiting the average write bandwidth reminds me of
> enforcing cpu utilization limits on interactive workloads.
> Just as with cpu workloads - using the resources to the limit produces
> poor interactivity.
> When interactivity suffers too much I believe the only sane response
> for an interactive device is to limit usage of the swap device and
> transition into a low memory situation - and if needed - either
> allowing userspace to reduce memory usage or invoking the OOM killer.
> As a result low memory situations could not only be encountered on new
> memory allocations but also on workload changes that increase the
> number of dirty pages.

While swap is just a special case for anonymous memory in writeback
rather than file backed pages, I think what you want here is a tuning
knob that decides whether we should discard a clean page or write back
a dirty page under memory pressure. I have to say that I don't know
whether we already have such a knob or whether we already treat them
differently, but it is certainly a valid observation that on hard
drives, discarding a clean page that is likely going to be needed
again has about the same overhead as writing back a dirty page
(i.e. one seek operation), while on flash the former would be much
cheaper than the latter.

> A wild idea to avoid some writes altogether is to see if
> de-duplication techniques can be used to (partially?) match pages
> previously written so swap.

Interesting! We already have KSM (kernel samepage merging) to do
the same thing in memory, but I don't know how that works
during swapout. It might already be there, waiting to get switched
on, or might not be possible until we implemnt an extra remapping
layer in swap as has been proposed. It's certainly worth remembering
this as we work on the design for that remapping layer.

> In case of unencrypted swap  (or encrypted swap with a static key)
> swap pages on eMMC could even be re-used across multiple reboots.
> A simple version would just compare dirty pages with data in their
> swap slots as I suspect (but really don't know) that some user space
> algorithms (garbage collection?) dirty a page just temporarily -
> eventually reverting it to the previous content.

I think that would incur overhead for indexing the pages in swap space
in a persistent way, something that by itself would contribute to
write amplification because for every swapout, we would have to write
both the page and the index (eventually), and that index would likely
be a random write.

Thanks for your thoughts!

	Arnd

WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@arndb.de>
To: Stephan Uphoff <ups@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	linaro-kernel@lists.linaro.org, android-kernel@googlegroups.com,
	linux-mm@kvack.org, "Luca Porzio (lporzio)" <lporzio@micron.com>,
	Alex Lemberg <alex.lemberg@sandisk.com>,
	linux-kernel@vger.kernel.org,
	Saugata Das <saugata.das@linaro.org>,
	Venkatraman S <venkat@linaro.org>,
	Yejin Moon <yejin.moon@samsung.com>,
	Hyojin Jeong <syr.jeong@samsung.com>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
Subject: Re: swap on eMMC and other flash
Date: Mon, 16 Apr 2012 18:59:32 +0000	[thread overview]
Message-ID: <201204161859.32436.arnd@arndb.de> (raw)
In-Reply-To: <CAKL-ytsXbe4=u94PjqvhZo=ZLiChQ0FmZC84GNrFHa0N1mDjFw@mail.gmail.com>

On Monday 16 April 2012, Stephan Uphoff wrote:
> opportunity to plant a few ideas.
> 
> In contrast to rotational disks read/write operation overhead and
> costs are not symmetric.
> While random reads are much faster on flash - the number of write
> operations is limited by wearout and garbage collection overhead.
> To further improve swapping on eMMC or similar flash media I believe
> that the following issues need to be addressed:
> 
> 1) Limit average write bandwidth to eMMC to a configurable level to
> guarantee a minimum device lifetime
> 2) Aim for a low write amplification factor to maximize useable write bandwidth
> 3) Strongly favor read over write operations
> 
> Lowering write amplification (2) has been discussed in this email
> thread - and the only observation I would like to add is that
> over-provisioning the internal swap space compared to the exported
> swap space significantly can guarantee a lower write amplification
> factor with the indirection and GC techniques discussed.

Yes, good point.

> I believe the swap functionality is currently optimized for storage
> media where read and write costs are nearly identical.
> As this is not the case on flash I propose splitting the anonymous
> inactive queue (at least conceptually) - keeping clean anonymous pages
> with swap slots on a separate queue as the cost of swapping them
> out/in is only an inexpensive read operation. A variable similar to
> swapiness (or a more dynamic algorithmn) could determine the
> preference for swapping out clean pages or dirty pages. ( A similar
> argument could be made for splitting up the file inactive queue )

I'm not sure I understand yet how this would be different from swappiness.

> The problem of limiting the average write bandwidth reminds me of
> enforcing cpu utilization limits on interactive workloads.
> Just as with cpu workloads - using the resources to the limit produces
> poor interactivity.
> When interactivity suffers too much I believe the only sane response
> for an interactive device is to limit usage of the swap device and
> transition into a low memory situation - and if needed - either
> allowing userspace to reduce memory usage or invoking the OOM killer.
> As a result low memory situations could not only be encountered on new
> memory allocations but also on workload changes that increase the
> number of dirty pages.

While swap is just a special case for anonymous memory in writeback
rather than file backed pages, I think what you want here is a tuning
knob that decides whether we should discard a clean page or write back
a dirty page under memory pressure. I have to say that I don't know
whether we already have such a knob or whether we already treat them
differently, but it is certainly a valid observation that on hard
drives, discarding a clean page that is likely going to be needed
again has about the same overhead as writing back a dirty page
(i.e. one seek operation), while on flash the former would be much
cheaper than the latter.

> A wild idea to avoid some writes altogether is to see if
> de-duplication techniques can be used to (partially?) match pages
> previously written so swap.

Interesting! We already have KSM (kernel samepage merging) to do
the same thing in memory, but I don't know how that works
during swapout. It might already be there, waiting to get switched
on, or might not be possible until we implemnt an extra remapping
layer in swap as has been proposed. It's certainly worth remembering
this as we work on the design for that remapping layer.

> In case of unencrypted swap  (or encrypted swap with a static key)
> swap pages on eMMC could even be re-used across multiple reboots.
> A simple version would just compare dirty pages with data in their
> swap slots as I suspect (but really don't know) that some user space
> algorithms (garbage collection?) dirty a page just temporarily -
> eventually reverting it to the previous content.

I think that would incur overhead for indexing the pages in swap space
in a persistent way, something that by itself would contribute to
write amplification because for every swapout, we would have to write
both the page and the index (eventually), and that index would likely
be a random write.

Thanks for your thoughts!

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-04-16 18:59 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-30 17:44 swap on eMMC and other flash Arnd Bergmann
2012-03-30 17:44 ` Arnd Bergmann
2012-03-30 18:50 ` Arnd Bergmann
2012-03-30 18:50   ` Arnd Bergmann
2012-03-30 22:08   ` Zach Pfeffer
2012-03-30 22:08     ` Zach Pfeffer
2012-03-31  9:24     ` Arnd Bergmann
2012-03-31  9:24       ` Arnd Bergmann
2012-04-03 18:17       ` Zach Pfeffer
2012-04-03 18:17         ` Zach Pfeffer
2012-03-31 20:29   ` Hugh Dickins
2012-03-31 20:29     ` Hugh Dickins
2012-03-31 20:29     ` Hugh Dickins
2012-04-02 11:45     ` Arnd Bergmann
2012-04-02 11:45       ` Arnd Bergmann
2012-04-02 14:41       ` Hugh Dickins
2012-04-02 14:41         ` Hugh Dickins
2012-04-02 14:55         ` Arnd Bergmann
2012-04-02 14:55           ` Arnd Bergmann
2012-04-05  0:17           ` 정효진
2012-04-05  0:17             ` 정효진
2012-04-09 12:50             ` Arnd Bergmann
2012-04-09 12:50               ` Arnd Bergmann
2012-04-08 13:50           ` Alex Lemberg
2012-04-08 13:50             ` Alex Lemberg
2012-04-09  2:14             ` Minchan Kim
2012-04-09  2:14               ` Minchan Kim
2012-04-09  7:37               ` 정효진
2012-04-09  7:37                 ` 정효진
2012-04-09  8:11                 ` Minchan Kim
2012-04-09  8:11                   ` Minchan Kim
2012-04-09  8:11                   ` Minchan Kim
2012-04-09 13:00                   ` Arnd Bergmann
2012-04-09 13:00                     ` Arnd Bergmann
2012-04-10  1:10                     ` Minchan Kim
2012-04-10  1:10                       ` Minchan Kim
2012-04-10  8:40                       ` Arnd Bergmann
2012-04-10  8:40                         ` Arnd Bergmann
2012-04-12  8:32                         ` Luca Porzio (lporzio)
2012-04-12  8:32                           ` Luca Porzio (lporzio)
2012-04-09 12:54                 ` Arnd Bergmann
2012-04-09 12:54                   ` Arnd Bergmann
2012-04-02 12:52     ` Luca Porzio (lporzio)
2012-04-02 12:52       ` Luca Porzio (lporzio)
2012-04-02 14:58       ` Hugh Dickins
2012-04-02 14:58         ` Hugh Dickins
2012-04-02 16:51         ` Rik van Riel
2012-04-02 16:51           ` Rik van Riel
2012-04-04 12:21   ` Adrian Hunter
2012-04-04 12:21     ` Adrian Hunter
2012-04-04 12:47     ` Arnd Bergmann
2012-04-04 12:47       ` Arnd Bergmann
2012-04-11 10:28       ` Adrian Hunter
2012-04-11 10:28         ` Adrian Hunter
2012-07-16 13:29         ` Pavel Machek
2012-07-16 13:29           ` Pavel Machek
2012-04-06  7:15 ` Minchan Kim
2012-04-06 16:16   ` Arnd Bergmann
2012-04-06 16:16     ` Arnd Bergmann
2012-04-09  2:06     ` Minchan Kim
2012-04-09  2:06       ` Minchan Kim
2012-04-09  2:06       ` Minchan Kim
2012-04-09 12:35       ` Arnd Bergmann
2012-04-09 12:35         ` Arnd Bergmann
2012-04-09 12:35         ` Arnd Bergmann
2012-04-10  0:57         ` Minchan Kim
2012-04-10  0:57           ` Minchan Kim
2012-04-10  0:57           ` Minchan Kim
2012-04-10  8:32           ` Arnd Bergmann
2012-04-10  8:32             ` Arnd Bergmann
2012-04-10  8:32             ` Arnd Bergmann
2012-04-11  9:54             ` Minchan Kim
2012-04-11  9:54               ` Minchan Kim
2012-04-11 15:57               ` Arnd Bergmann
2012-04-11 15:57                 ` Arnd Bergmann
2012-04-12  2:36                 ` Minchan Kim
2012-04-12  2:36                   ` Minchan Kim
2012-04-16 18:22                 ` Stephan Uphoff
2012-04-16 18:22                   ` Stephan Uphoff
2012-04-16 18:59                   ` Arnd Bergmann [this message]
2012-04-16 18:59                     ` Arnd Bergmann
2012-04-16 21:12                     ` Stephan Uphoff
2012-04-16 21:12                       ` Stephan Uphoff
2012-04-17  2:18                       ` Minchan Kim
2012-04-17  2:18                         ` Minchan Kim
2012-04-17  2:05                     ` Minchan Kim
2012-04-17  2:05                       ` Minchan Kim
2012-04-27  7:34                   ` Luca Porzio (lporzio)
2012-04-27  7:34                     ` Luca Porzio (lporzio)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201204161859.32436.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=alex.lemberg@sandisk.com \
    --cc=android-kernel@googlegroups.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=lporzio@micron.com \
    --cc=minchan@kernel.org \
    --cc=saugata.das@linaro.org \
    --cc=syr.jeong@samsung.com \
    --cc=ups@google.com \
    --cc=venkat@linaro.org \
    --cc=yejin.moon@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.