linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not
@ 2018-12-21 21:40 Yang Shi
  2018-12-21 21:40 ` [v3 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Yang Shi @ 2018-12-21 21:40 UTC (permalink / raw)
  To: ying.huang, tim.c.chen, minchan, akpm; +Cc: yang.shi, linux-mm, linux-kernel

Swap readahead would read in a few pages regardless if the underlying
device is busy or not.  It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does.  Get inode from swap_info_struct.
Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e.
it may break mapping_cap_account_dirty().  Using inode from
swap_info_struct seems simple and good enough.

Just does the check in vma_cluster_readahead() since
swap_vma_readahead() is just used for non-rotational device which
much less likely has congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may be
fragmented on swap file. This check would help to reduce excessive stall
for such case.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v3: Move inode deference under swap device type check per Tim Chen
v2: Check the swap device type per Tim Chen

 mm/swap_state.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index fd2f21e..78d500e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	bool do_poll = true, page_allocated;
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
+	struct inode *inode = NULL;
 
 	mask = swapin_nr_pages(offset) - 1;
 	if (!mask)
 		goto skip;
 
+	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
+		inode = si->swap_file->f_mapping->host;
+		if (inode_read_congested(inode))
+			goto skip;
+	}
+
 	do_poll = false;
 	/* Read a page_cluster sized and aligned cluster around offset. */
 	start_offset = offset & ~mask;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [v3 PATCH 2/2] mm: swap: add comment for swap_vma_readahead
  2018-12-21 21:40 [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
@ 2018-12-21 21:40 ` Yang Shi
  2018-12-21 22:42 ` [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Tim Chen
  2018-12-29  0:42 ` Andrew Morton
  2 siblings, 0 replies; 6+ messages in thread
From: Yang Shi @ 2018-12-21 21:40 UTC (permalink / raw)
  To: ying.huang, tim.c.chen, minchan, akpm; +Cc: yang.shi, linux-mm, linux-kernel

swap_vma_readahead()'s comment is missed, just add it.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/swap_state.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 78d500e..dd8f698 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -698,6 +698,23 @@ static void swap_ra_info(struct vm_fault *vmf,
 	pte_unmap(orig_pte);
 }
 
+/**
+ * swap_vm_readahead - swap in pages in hope we need them soon
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vmf: fault information
+ *
+ * Returns the struct page for entry and addr, after queueing swapin.
+ *
+ * Primitive swap readahead code. We simply read in a few pages whoes
+ * virtual addresses are around the fault address in the same vma.
+ *
+ * This has been extended to use the NUMA policies from the mm triggering
+ * the readahead.
+ *
+ * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
+ *
+ */
 static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 				       struct vm_fault *vmf)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2018-12-21 21:40 [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
  2018-12-21 21:40 ` [v3 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
@ 2018-12-21 22:42 ` Tim Chen
  2018-12-22  1:09   ` Yang Shi
  2018-12-29  0:42 ` Andrew Morton
  2 siblings, 1 reply; 6+ messages in thread
From: Tim Chen @ 2018-12-21 22:42 UTC (permalink / raw)
  To: Yang Shi, ying.huang, tim.c.chen, minchan, akpm; +Cc: linux-mm, linux-kernel

On 12/21/18 1:40 PM, Yang Shi wrote:
> Swap readahead would read in a few pages regardless if the underlying
> device is busy or not.  It may incur long waiting time if the device is
> congested, and it may also exacerbate the congestion.
> 
> Use inode_read_congested() to check if the underlying device is busy or
> not like what file page readahead does.  Get inode from swap_info_struct.
> Although we can add inode information in swap_address_space
> (address_space->host), it may lead some unexpected side effect, i.e.
> it may break mapping_cap_account_dirty().  Using inode from
> swap_info_struct seems simple and good enough.
> 
> Just does the check in vma_cluster_readahead() since
> swap_vma_readahead() is just used for non-rotational device which
> much less likely has congestion than traditional HDD.
> 
> Although swap slots may be consecutive on swap partition, it still may be
> fragmented on swap file. This check would help to reduce excessive stall
> for such case.
> 
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Tim Chen <tim.c.chen@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v3: Move inode deference under swap device type check per Tim Chen
> v2: Check the swap device type per Tim Chen
> 
>  mm/swap_state.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index fd2f21e..78d500e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  	bool do_poll = true, page_allocated;
>  	struct vm_area_struct *vma = vmf->vma;
>  	unsigned long addr = vmf->address;
> +	struct inode *inode = NULL;
>  
>  	mask = swapin_nr_pages(offset) - 1;
>  	if (!mask)
>  		goto skip;
>  
> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> +		inode = si->swap_file->f_mapping->host;
> +		if (inode_read_congested(inode))
> +			goto skip;
> +	}
> +
>  	do_poll = false;
>  	/* Read a page_cluster sized and aligned cluster around offset. */
>  	start_offset = offset & ~mask;
> 

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2018-12-21 22:42 ` [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Tim Chen
@ 2018-12-22  1:09   ` Yang Shi
  0 siblings, 0 replies; 6+ messages in thread
From: Yang Shi @ 2018-12-22  1:09 UTC (permalink / raw)
  To: Tim Chen, ying.huang, tim.c.chen, minchan, akpm; +Cc: linux-mm, linux-kernel



On 12/21/18 2:42 PM, Tim Chen wrote:
> On 12/21/18 1:40 PM, Yang Shi wrote:
>> Swap readahead would read in a few pages regardless if the underlying
>> device is busy or not.  It may incur long waiting time if the device is
>> congested, and it may also exacerbate the congestion.
>>
>> Use inode_read_congested() to check if the underlying device is busy or
>> not like what file page readahead does.  Get inode from swap_info_struct.
>> Although we can add inode information in swap_address_space
>> (address_space->host), it may lead some unexpected side effect, i.e.
>> it may break mapping_cap_account_dirty().  Using inode from
>> swap_info_struct seems simple and good enough.
>>
>> Just does the check in vma_cluster_readahead() since
>> swap_vma_readahead() is just used for non-rotational device which
>> much less likely has congestion than traditional HDD.
>>
>> Although swap slots may be consecutive on swap partition, it still may be
>> fragmented on swap file. This check would help to reduce excessive stall
>> for such case.
>>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: Tim Chen <tim.c.chen@intel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>> v3: Move inode deference under swap device type check per Tim Chen
>> v2: Check the swap device type per Tim Chen
>>
>>   mm/swap_state.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index fd2f21e..78d500e 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>>   	bool do_poll = true, page_allocated;
>>   	struct vm_area_struct *vma = vmf->vma;
>>   	unsigned long addr = vmf->address;
>> +	struct inode *inode = NULL;
>>   
>>   	mask = swapin_nr_pages(offset) - 1;
>>   	if (!mask)
>>   		goto skip;
>>   
>> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
>> +		inode = si->swap_file->f_mapping->host;
>> +		if (inode_read_congested(inode))
>> +			goto skip;
>> +	}
>> +
>>   	do_poll = false;
>>   	/* Read a page_cluster sized and aligned cluster around offset. */
>>   	start_offset = offset & ~mask;
>>
> Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

Thanks. Happy holiday.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2018-12-21 21:40 [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
  2018-12-21 21:40 ` [v3 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
  2018-12-21 22:42 ` [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Tim Chen
@ 2018-12-29  0:42 ` Andrew Morton
  2018-12-29  1:41   ` Yang Shi
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-12-29  0:42 UTC (permalink / raw)
  To: Yang Shi; +Cc: ying.huang, tim.c.chen, minchan, linux-mm, linux-kernel

On Sat, 22 Dec 2018 05:40:19 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:

> Swap readahead would read in a few pages regardless if the underlying
> device is busy or not.  It may incur long waiting time if the device is
> congested, and it may also exacerbate the congestion.
> 
> Use inode_read_congested() to check if the underlying device is busy or
> not like what file page readahead does.  Get inode from swap_info_struct.
> Although we can add inode information in swap_address_space
> (address_space->host), it may lead some unexpected side effect, i.e.
> it may break mapping_cap_account_dirty().  Using inode from
> swap_info_struct seems simple and good enough.
> 
> Just does the check in vma_cluster_readahead() since
> swap_vma_readahead() is just used for non-rotational device which
> much less likely has congestion than traditional HDD.
> 
> Although swap slots may be consecutive on swap partition, it still may be
> fragmented on swap file. This check would help to reduce excessive stall
> for such case.

Some words about the observed effects of the patch would be more than
appropriate!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2018-12-29  0:42 ` Andrew Morton
@ 2018-12-29  1:41   ` Yang Shi
  0 siblings, 0 replies; 6+ messages in thread
From: Yang Shi @ 2018-12-29  1:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ying.huang, tim.c.chen, minchan, linux-mm, linux-kernel



On 12/28/18 4:42 PM, Andrew Morton wrote:
> On Sat, 22 Dec 2018 05:40:19 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>> Swap readahead would read in a few pages regardless if the underlying
>> device is busy or not.  It may incur long waiting time if the device is
>> congested, and it may also exacerbate the congestion.
>>
>> Use inode_read_congested() to check if the underlying device is busy or
>> not like what file page readahead does.  Get inode from swap_info_struct.
>> Although we can add inode information in swap_address_space
>> (address_space->host), it may lead some unexpected side effect, i.e.
>> it may break mapping_cap_account_dirty().  Using inode from
>> swap_info_struct seems simple and good enough.
>>
>> Just does the check in vma_cluster_readahead() since
>> swap_vma_readahead() is just used for non-rotational device which
>> much less likely has congestion than traditional HDD.
>>
>> Although swap slots may be consecutive on swap partition, it still may be
>> fragmented on swap file. This check would help to reduce excessive stall
>> for such case.
> Some words about the observed effects of the patch would be more than
> appropriate!

Yes, sure. Actually, this could reduce the latency long tail of 
do_swap_page() on a congested system.

The test on my virtual machine with emulated HDD shows:

Without swap congestion check:
page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      # 
57377.796 us |  do_swap_page();
  page_fault1_thr-1490  [023]   129.369103: funcgraph_entry: 5.642 us   
|  do_swap_page();
  page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      # 
1289.592 us |  do_swap_page();
  page_fault1_thr-1490  [023]   129.370411: funcgraph_entry: 4.957 us   
|  do_swap_page();
  page_fault1_thr-1490  [023]   129.370419: funcgraph_entry: 1.940 us   
|  do_swap_page();
  page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      # 
1411.385 us |  do_swap_page();
  page_fault1_thr-1490  [023]   129.380262: funcgraph_entry: 3.916 us   
|  do_swap_page();
  page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      # 
4287.751 us |  do_swap_page();


With swap congestion check:
       runtest.py-1417  [020]   301.925911: funcgraph_entry:      # 
9870.146 us |  do_swap_page();
       runtest.py-1417  [020]   301.935785: funcgraph_entry: 9.802 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935799: funcgraph_entry: 3.551 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935806: funcgraph_entry: 2.142 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935853: funcgraph_entry: 6.938 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935864: funcgraph_entry: 3.765 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935871: funcgraph_entry: 3.600 us   
|  do_swap_page();
       runtest.py-1417  [020]   301.935878: funcgraph_entry: 7.202 us   
|  do_swap_page();


The long tail latency (>1000us) is reduced significantly.

BTW, do you need I resend the patch with the above information appended 
into the commit log?

Thanks,
Yang



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-29  1:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-21 21:40 [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
2018-12-21 21:40 ` [v3 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
2018-12-21 22:42 ` [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not Tim Chen
2018-12-22  1:09   ` Yang Shi
2018-12-29  0:42 ` Andrew Morton
2018-12-29  1:41   ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).