* [PATCH 0/2] swap: improve swap I/O rate - V2 @ 2012-05-21 8:09 ehrhardt 2012-05-21 8:09 ` [PATCH 1/2] swap: allow swap readahead to be merged ehrhardt 2012-05-21 8:09 ` [PATCH 2/2] documentation: update how page-cluster affects swap I/O ehrhardt 0 siblings, 2 replies; 7+ messages in thread From: ehrhardt @ 2012-05-21 8:09 UTC (permalink / raw) To: linux-mm; +Cc: axboe, Ehrhardt Christian From: Ehrhardt Christian <ehrhardt@linux.vnet.ibm.com> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> * Update in V2 * - Adapted the documentation patch according to feedback of Minchan Kim - Added the Acks I got to V1 so far In an memory overcommitment scneario with KVM I ran into a lot of waits for swap. While checking the I/O done on the swap disks I found almost all I/Os to be done as single page 4k request. Despite the fact that swap in is a batch of 1<<page-cluster pages as swap readahead and swap out is a list of pages written in shrink_page_list. [1/2 swap in improvment] The read patch shows improvements of up to 50% swap throughput, much happier guest systems and even when running with comparable throughput a lot I/O per seconds saved leaving resources in the SAN for other consumers. [2/2 documentation] While doing so I also realized that the documentation for proc/sys/vm/page-cluster is no more matching the code Kind regards, Christian Ehrhardt Christian Ehrhardt (2): swap: allow swap readahead to be merged documentation: update how page-cluster affects swap I/O Documentation/sysctl/vm.txt | 12 ++++++++++-- mm/swap_state.c | 5 +++++ 2 files changed, 15 insertions(+), 2 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] swap: allow swap readahead to be merged 2012-05-21 8:09 [PATCH 0/2] swap: improve swap I/O rate - V2 ehrhardt @ 2012-05-21 8:09 ` ehrhardt 2012-05-21 8:51 ` Minchan Kim 2012-05-21 8:09 ` [PATCH 2/2] documentation: update how page-cluster affects swap I/O ehrhardt 1 sibling, 1 reply; 7+ messages in thread From: ehrhardt @ 2012-05-21 8:09 UTC (permalink / raw) To: linux-mm; +Cc: axboe, Christian Ehrhardt From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Swap readahead works fine, but the I/O to disk is almost always done in page size requests, despite the fact that readahead submits 1<<page-cluster pages at a time. On older kernels the old per device plugging behavior might have captured this and merged the requests, but currently all comes down to much more I/Os than required. On a single device this might not be an issue, but as soon as a server runs on shared san resources savin I/Os not only improves swapin throughput but also provides a lower resource utilization. With a load running KVM in a lot of memory overcommitment (the hot memory is 1.5 times the host memory) swapping throughput improves significantly and the lead feels more responsive as well as achieves more throughput. In a test setup with 16 swap disks running blocktrace on one of those disks shows the improved merging: Prior: Reads Queued: 560,888, 2,243MiB Writes Queued: 226,242, 904,968KiB Read Dispatches: 544,701, 2,243MiB Write Dispatches: 159,318, 904,968KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 544,716, 2,243MiB Writes Completed: 159,321, 904,980KiB Read Merges: 16,187, 64,748KiB Write Merges: 61,744, 246,976KiB IO unplugs: 149,614 Timer unplugs: 2,940 With the patch: Reads Queued: 734,315, 2,937MiB Writes Queued: 300,188, 1,200MiB Read Dispatches: 214,972, 2,937MiB Write Dispatches: 215,176, 1,200MiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 214,971, 2,937MiB Writes Completed: 215,177, 1,200MiB Read Merges: 519,343, 2,077MiB Write Merges: 73,325, 293,300KiB IO unplugs: 337,130 Timer unplugs: 11,184 Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> --- mm/swap_state.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 4c5ff7f..c85b559 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -14,6 +14,7 @@ #include <linux/init.h> #include <linux/pagemap.h> #include <linux/backing-dev.h> +#include <linux/blkdev.h> #include <linux/pagevec.h> #include <linux/migrate.h> #include <linux/page_cgroup.h> @@ -376,6 +377,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long offset = swp_offset(entry); unsigned long start_offset, end_offset; unsigned long mask = (1UL << page_cluster) - 1; + struct blk_plug plug; /* Read a page_cluster sized and aligned cluster around offset. */ start_offset = offset & ~mask; @@ -383,6 +385,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!start_offset) /* First page is swap header. */ start_offset++; + blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ page = read_swap_cache_async(swp_entry(swp_type(entry), offset), @@ -391,6 +394,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, continue; page_cache_release(page); } + blk_finish_plug(&plug); + lru_add_drain(); /* Push any new pages onto the LRU now */ return read_swap_cache_async(entry, gfp_mask, vma, addr); } -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] swap: allow swap readahead to be merged 2012-05-21 8:09 ` [PATCH 1/2] swap: allow swap readahead to be merged ehrhardt @ 2012-05-21 8:51 ` Minchan Kim 2012-05-21 9:07 ` Christian Ehrhardt 0 siblings, 1 reply; 7+ messages in thread From: Minchan Kim @ 2012-05-21 8:51 UTC (permalink / raw) To: ehrhardt; +Cc: linux-mm, axboe On 05/21/2012 05:09 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> > > Swap readahead works fine, but the I/O to disk is almost always done in page > size requests, despite the fact that readahead submits 1<<page-cluster pages > at a time. > On older kernels the old per device plugging behavior might have captured > this and merged the requests, but currently all comes down to much more I/Os > than required. > > On a single device this might not be an issue, but as soon as a server runs > on shared san resources savin I/Os not only improves swapin throughput but > also provides a lower resource utilization. > > With a load running KVM in a lot of memory overcommitment (the hot memory > is 1.5 times the host memory) swapping throughput improves significantly > and the lead feels more responsive as well as achieves more throughput. > > In a test setup with 16 swap disks running blocktrace on one of those disks > shows the improved merging: > Prior: > Reads Queued: 560,888, 2,243MiB Writes Queued: 226,242, 904,968KiB > Read Dispatches: 544,701, 2,243MiB Write Dispatches: 159,318, 904,968KiB > Reads Requeued: 0 Writes Requeued: 0 > Reads Completed: 544,716, 2,243MiB Writes Completed: 159,321, 904,980KiB > Read Merges: 16,187, 64,748KiB Write Merges: 61,744, 246,976KiB > IO unplugs: 149,614 Timer unplugs: 2,940 > > With the patch: > Reads Queued: 734,315, 2,937MiB Writes Queued: 300,188, 1,200MiB > Read Dispatches: 214,972, 2,937MiB Write Dispatches: 215,176, 1,200MiB > Reads Requeued: 0 Writes Requeued: 0 > Reads Completed: 214,971, 2,937MiB Writes Completed: 215,177, 1,200MiB > Read Merges: 519,343, 2,077MiB Write Merges: 73,325, 293,300KiB > IO unplugs: 337,130 Timer unplugs: 11,184 > > Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> > Acked-by: Rik van Riel <riel@redhat.com> > Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Minchan Kim <minchan@kernel.org> Didn't I add my Reviewed-by on your previous version? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] swap: allow swap readahead to be merged 2012-05-21 8:51 ` Minchan Kim @ 2012-05-21 9:07 ` Christian Ehrhardt 0 siblings, 0 replies; 7+ messages in thread From: Christian Ehrhardt @ 2012-05-21 9:07 UTC (permalink / raw) To: Minchan Kim; +Cc: linux-mm, axboe On 05/21/2012 10:51 AM, Minchan Kim wrote: > On 05/21/2012 05:09 PM, ehrhardt@linux.vnet.ibm.com wrote: > >> From: Christian Ehrhardt<ehrhardt@linux.vnet.ibm.com> >> [...] >> >> Signed-off-by: Christian Ehrhardt<ehrhardt@linux.vnet.ibm.com> >> Acked-by: Rik van Riel<riel@redhat.com> >> Acked-by: Jens Axboe<axboe@kernel.dk> > > > Reviewed-by: Minchan Kim<minchan@kernel.org> > > Didn't I add my Reviewed-by on your previous version? > Sorry I missed it since you provided the good feedback on all three mails. I had your "otherwise looks good to me to mail #2" still in mind and didn't want to be so offensive to convert that to a review or ack statement. -- GrA 1/4 sse / regards, Christian Ehrhardt IBM Linux Technology Center, System z Linux Performance -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] documentation: update how page-cluster affects swap I/O 2012-05-21 8:09 [PATCH 0/2] swap: improve swap I/O rate - V2 ehrhardt 2012-05-21 8:09 ` [PATCH 1/2] swap: allow swap readahead to be merged ehrhardt @ 2012-05-21 8:09 ` ehrhardt 2012-05-21 8:48 ` Minchan Kim 1 sibling, 1 reply; 7+ messages in thread From: ehrhardt @ 2012-05-21 8:09 UTC (permalink / raw) To: linux-mm; +Cc: axboe, Christian Ehrhardt From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of the code and add some comments about what the tunable will change in that behavior. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Acked-by: Jens Axboe <axboe@kernel.dk> --- Documentation/sysctl/vm.txt | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee8..4d87dc0 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -574,16 +574,24 @@ of physical RAM. See above. page-cluster -page-cluster controls the number of pages which are written to swap in -a single attempt. The swap I/O size. +page-cluster controls the number of pages up to which consecutive pages +are read in from swap in a single attempt. This is the swap counterpart +to page cache readahead. +The mentioned consecutivity is not in terms of virtual/physical addresses, +but consecutive on swap space - that means they were swapped out together. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. +Zero disables swap readahead completely. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. +Lower values mean lower latencies for initial faults, but at the same time +extra faults and I/O delays for following faults if they would have been part of +that consecutive pages readahead would have brought in. + ============================================================= panic_on_oom -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] documentation: update how page-cluster affects swap I/O 2012-05-21 8:09 ` [PATCH 2/2] documentation: update how page-cluster affects swap I/O ehrhardt @ 2012-05-21 8:48 ` Minchan Kim 0 siblings, 0 replies; 7+ messages in thread From: Minchan Kim @ 2012-05-21 8:48 UTC (permalink / raw) To: ehrhardt; +Cc: linux-mm, axboe On 05/21/2012 05:09 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> > > Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of > the code and add some comments about what the tunable will change in that > behavior. > > Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> > Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Minchan Kim <minchan@kernel.org> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 0/2] swap: improve swap I/O rate - V2 @ 2012-06-04 8:33 ehrhardt 0 siblings, 0 replies; 7+ messages in thread From: ehrhardt @ 2012-06-04 8:33 UTC (permalink / raw) To: linux-mm, akpm; +Cc: axboe, hughd, minchan, Ehrhardt Christian From: Ehrhardt Christian <ehrhardt@linux.vnet.ibm.com> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> * Update in V3 * - Added another reviewed by - should be ready for upstream inclusion now * Update in V2 * - Adapted the documentation patch according to feedback of Minchan Kim - Added the Acks I got to V1 so far In an memory overcommitment scneario with KVM I ran into a lot of waits for swap. While checking the I/O done on the swap disks I found almost all I/Os to be done as single page 4k request. Despite the fact that swap in is a batch of 1<<page-cluster pages as swap readahead and swap out is a list of pages written in shrink_page_list. [1/2 swap in improvment] The read patch shows improvements of up to 50% swap throughput, much happier guest systems and even when running with comparable throughput a lot I/O per seconds saved leaving resources in the SAN for other consumers. [2/2 documentation] While doing so I also realized that the documentation for proc/sys/vm/page-cluster is no more matching the code Kind regards, Christian Ehrhardt Christian Ehrhardt (2): swap: allow swap readahead to be merged documentation: update how page-cluster affects swap I/O Documentation/sysctl/vm.txt | 12 ++++++++++-- mm/swap_state.c | 5 +++++ 2 files changed, 15 insertions(+), 2 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-06-04 8:33 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-05-21 8:09 [PATCH 0/2] swap: improve swap I/O rate - V2 ehrhardt 2012-05-21 8:09 ` [PATCH 1/2] swap: allow swap readahead to be merged ehrhardt 2012-05-21 8:51 ` Minchan Kim 2012-05-21 9:07 ` Christian Ehrhardt 2012-05-21 8:09 ` [PATCH 2/2] documentation: update how page-cluster affects swap I/O ehrhardt 2012-05-21 8:48 ` Minchan Kim 2012-06-04 8:33 [PATCH 0/2] swap: improve swap I/O rate - V2 ehrhardt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.