All of lore.kernel.org
 help / color / mirror / Atom feed
* [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
@ 2019-01-03 19:27 Yang Shi
  2019-01-03 19:27 ` [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-03 19:27 UTC (permalink / raw)
  To: ying.huang, tim.c.chen, minchan, daniel.m.jordan, akpm
  Cc: yang.shi, linux-mm, linux-kernel

Swap readahead would read in a few pages regardless if the underlying
device is busy or not.  It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does.  Get inode from swap_info_struct.
Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e.
it may break mapping_cap_account_dirty().  Using inode from
swap_info_struct seems simple and good enough.

Just does the check in vma_cluster_readahead() since
swap_vma_readahead() is just used for non-rotational device which
much less likely has congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may be
fragmented on swap file. This check would help to reduce excessive stall
for such case.

The test with page_fault1 of will-it-scale (sometimes tracing may just
show runtest.py that is the wrapper script of page_fault1), which basically
launches NR_CPU threads to generate 128MB anonymous pages for each thread,
on my virtual machine with congested HDD shows long tail latency is reduced
significantly.

Without the patch
 page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();

With the patch
      runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
      runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
      runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
      runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
      runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
      runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
      runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
      runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();

Acked-by: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v5: Elaborate more about the test case per Daniel
v4: Added observed effects in the commit log per Andrew
v3: Move inode deference under swap device type check per Tim Chen
v2: Check the swap device type per Tim Chen

 mm/swap_state.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index fd2f21e..78d500e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	bool do_poll = true, page_allocated;
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
+	struct inode *inode = NULL;
 
 	mask = swapin_nr_pages(offset) - 1;
 	if (!mask)
 		goto skip;
 
+	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
+		inode = si->swap_file->f_mapping->host;
+		if (inode_read_congested(inode))
+			goto skip;
+	}
+
 	do_poll = false;
 	/* Read a page_cluster sized and aligned cluster around offset. */
 	start_offset = offset & ~mask;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead
  2019-01-03 19:27 [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
@ 2019-01-03 19:27 ` Yang Shi
  2019-01-04  2:25     ` Huang, Ying
  2019-01-10 20:49 ` [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2019-01-03 19:27 UTC (permalink / raw)
  To: ying.huang, tim.c.chen, minchan, daniel.m.jordan, akpm
  Cc: yang.shi, linux-mm, linux-kernel

swap_vma_readahead()'s comment is missed, just add it.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v5: Fixed the comments per Ying Huang

 mm/swap_state.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 78d500e..c8730d7 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -523,7 +523,7 @@ static unsigned long swapin_nr_pages(unsigned long offset)
  * This has been extended to use the NUMA policies from the mm triggering
  * the readahead.
  *
- * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
+ * Caller must hold read mmap_sem if vmf->vma is not NULL.
  */
 struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 				struct vm_fault *vmf)
@@ -698,6 +698,20 @@ static void swap_ra_info(struct vm_fault *vmf,
 	pte_unmap(orig_pte);
 }
 
+/**
+ * swap_vma_readahead - swap in pages in hope we need them soon
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vmf: fault information
+ *
+ * Returns the struct page for entry and addr, after queueing swapin.
+ *
+ * Primitive swap readahead code. We simply read in a few pages whoes
+ * virtual addresses are around the fault address in the same vma.
+ *
+ * Caller must hold read mmap_sem if vmf->vma is not NULL.
+ *
+ */
 static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 				       struct vm_fault *vmf)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead
  2019-01-03 19:27 ` [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
  2019-01-04  2:25     ` Huang, Ying
@ 2019-01-04  2:25     ` Huang, Ying
  0 siblings, 0 replies; 10+ messages in thread
From: Huang, Ying @ 2019-01-04  2:25 UTC (permalink / raw)
  To: Yang Shi
  Cc: tim.c.chen, minchan, daniel.m.jordan, akpm, linux-mm, linux-kernel

Yang Shi <yang.shi@linux.alibaba.com> writes:

> swap_vma_readahead()'s comment is missed, just add it.
>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Tim Chen <tim.c.chen@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Thank!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

> ---
> v5: Fixed the comments per Ying Huang
>
>  mm/swap_state.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 78d500e..c8730d7 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -523,7 +523,7 @@ static unsigned long swapin_nr_pages(unsigned long offset)
>   * This has been extended to use the NUMA policies from the mm triggering
>   * the readahead.
>   *
> - * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
>   */
>  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  				struct vm_fault *vmf)
> @@ -698,6 +698,20 @@ static void swap_ra_info(struct vm_fault *vmf,
>  	pte_unmap(orig_pte);
>  }
>  
> +/**
> + * swap_vma_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vmf: fault information
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * Primitive swap readahead code. We simply read in a few pages whoes
> + * virtual addresses are around the fault address in the same vma.
> + *
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
> + *
> + */
>  static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
>  				       struct vm_fault *vmf)
>  {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead
@ 2019-01-04  2:25     ` Huang, Ying
  0 siblings, 0 replies; 10+ messages in thread
From: Huang, Ying @ 2019-01-04  2:25 UTC (permalink / raw)
  To: Yang Shi
  Cc: tim.c.chen, minchan, daniel.m.jordan, akpm, linux-mm, linux-kernel

Yang Shi <yang.shi@linux.alibaba.com> writes:

> swap_vma_readahead()'s comment is missed, just add it.
>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Tim Chen <tim.c.chen@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Thank!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

> ---
> v5: Fixed the comments per Ying Huang
>
>  mm/swap_state.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 78d500e..c8730d7 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -523,7 +523,7 @@ static unsigned long swapin_nr_pages(unsigned long offset)
>   * This has been extended to use the NUMA policies from the mm triggering
>   * the readahead.
>   *
> - * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
>   */
>  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  				struct vm_fault *vmf)
> @@ -698,6 +698,20 @@ static void swap_ra_info(struct vm_fault *vmf,
>  	pte_unmap(orig_pte);
>  }
>  
> +/**
> + * swap_vma_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vmf: fault information
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * Primitive swap readahead code. We simply read in a few pages whoes
> + * virtual addresses are around the fault address in the same vma.
> + *
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
> + *
> + */
>  static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
>  				       struct vm_fault *vmf)
>  {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead
@ 2019-01-04  2:25     ` Huang, Ying
  0 siblings, 0 replies; 10+ messages in thread
From: Huang, Ying @ 2019-01-04  2:25 UTC (permalink / raw)
  To: Yang Shi
  Cc: tim.c.chen, minchan, daniel.m.jordan, akpm, linux-mm, linux-kernel

Yang Shi <yang.shi@linux.alibaba.com> writes:

> swap_vma_readahead()'s comment is missed, just add it.
>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Tim Chen <tim.c.chen@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Thank!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

> ---
> v5: Fixed the comments per Ying Huang
>
>  mm/swap_state.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 78d500e..c8730d7 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -523,7 +523,7 @@ static unsigned long swapin_nr_pages(unsigned long offset)
>   * This has been extended to use the NUMA policies from the mm triggering
>   * the readahead.
>   *
> - * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
>   */
>  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  				struct vm_fault *vmf)
> @@ -698,6 +698,20 @@ static void swap_ra_info(struct vm_fault *vmf,
>  	pte_unmap(orig_pte);
>  }
>  
> +/**
> + * swap_vma_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vmf: fault information
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * Primitive swap readahead code. We simply read in a few pages whoes
> + * virtual addresses are around the fault address in the same vma.
> + *
> + * Caller must hold read mmap_sem if vmf->vma is not NULL.
> + *
> + */
>  static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
>  				       struct vm_fault *vmf)
>  {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2019-01-03 19:27 [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
  2019-01-03 19:27 ` [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
@ 2019-01-10 20:49 ` Yang Shi
  2019-01-10 23:20 ` Andrew Morton
  2019-01-10 23:31 ` Andrew Morton
  3 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-10 20:49 UTC (permalink / raw)
  To: ying.huang, tim.c.chen, minchan, daniel.m.jordan, akpm
  Cc: linux-mm, linux-kernel

Hi Andrew,


How do you look these patches? They had been reviewed and the commit log 
has been updated per your and Daniel's comments.


Thanks,

Yang



On 1/3/19 11:27 AM, Yang Shi wrote:
> Swap readahead would read in a few pages regardless if the underlying
> device is busy or not.  It may incur long waiting time if the device is
> congested, and it may also exacerbate the congestion.
>
> Use inode_read_congested() to check if the underlying device is busy or
> not like what file page readahead does.  Get inode from swap_info_struct.
> Although we can add inode information in swap_address_space
> (address_space->host), it may lead some unexpected side effect, i.e.
> it may break mapping_cap_account_dirty().  Using inode from
> swap_info_struct seems simple and good enough.
>
> Just does the check in vma_cluster_readahead() since
> swap_vma_readahead() is just used for non-rotational device which
> much less likely has congestion than traditional HDD.
>
> Although swap slots may be consecutive on swap partition, it still may be
> fragmented on swap file. This check would help to reduce excessive stall
> for such case.
>
> The test with page_fault1 of will-it-scale (sometimes tracing may just
> show runtest.py that is the wrapper script of page_fault1), which basically
> launches NR_CPU threads to generate 128MB anonymous pages for each thread,
> on my virtual machine with congested HDD shows long tail latency is reduced
> significantly.
>
> Without the patch
>   page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
>   page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();
>
> With the patch
>        runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
>        runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
>        runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();
>
> Acked-by: Tim Chen <tim.c.chen@intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v5: Elaborate more about the test case per Daniel
> v4: Added observed effects in the commit log per Andrew
> v3: Move inode deference under swap device type check per Tim Chen
> v2: Check the swap device type per Tim Chen
>
>   mm/swap_state.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index fd2f21e..78d500e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>   	bool do_poll = true, page_allocated;
>   	struct vm_area_struct *vma = vmf->vma;
>   	unsigned long addr = vmf->address;
> +	struct inode *inode = NULL;
>   
>   	mask = swapin_nr_pages(offset) - 1;
>   	if (!mask)
>   		goto skip;
>   
> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> +		inode = si->swap_file->f_mapping->host;
> +		if (inode_read_congested(inode))
> +			goto skip;
> +	}
> +
>   	do_poll = false;
>   	/* Read a page_cluster sized and aligned cluster around offset. */
>   	start_offset = offset & ~mask;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2019-01-03 19:27 [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
  2019-01-03 19:27 ` [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
  2019-01-10 20:49 ` [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
@ 2019-01-10 23:20 ` Andrew Morton
  2019-01-10 23:31 ` Andrew Morton
  3 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2019-01-10 23:20 UTC (permalink / raw)
  To: Yang Shi
  Cc: ying.huang, tim.c.chen, minchan, daniel.m.jordan, linux-mm, linux-kernel

On Fri,  4 Jan 2019 03:27:52 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:

> Swap readahead would read in a few pages regardless if the underlying
> device is busy or not.  It may incur long waiting time if the device is
> congested, and it may also exacerbate the congestion.
> 
> Use inode_read_congested() to check if the underlying device is busy or
> not like what file page readahead does.  Get inode from swap_info_struct.
> Although we can add inode information in swap_address_space
> (address_space->host), it may lead some unexpected side effect, i.e.
> it may break mapping_cap_account_dirty().  Using inode from
> swap_info_struct seems simple and good enough.
> 
> ...
>
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  	bool do_poll = true, page_allocated;
>  	struct vm_area_struct *vma = vmf->vma;
>  	unsigned long addr = vmf->address;
> +	struct inode *inode = NULL;
>  
>  	mask = swapin_nr_pages(offset) - 1;
>  	if (!mask)
>  		goto skip;
>  
> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> +		inode = si->swap_file->f_mapping->host;
> +		if (inode_read_congested(inode))
> +			goto skip;
> +	}
> +
>  	do_poll = false;
>  	/* Read a page_cluster sized and aligned cluster around offset. */
>  	start_offset = offset & ~mask;

Neater:

--- a/mm/swap_state.c~mm-swap-check-if-swap-backing-device-is-congested-or-not-fix
+++ a/mm/swap_state.c
@@ -538,14 +538,13 @@ struct page *swap_cluster_readahead(swp_
 	bool do_poll = true, page_allocated;
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
-	struct inode *inode = NULL;
 
 	mask = swapin_nr_pages(offset) - 1;
 	if (!mask)
 		goto skip;
 
 	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
-		inode = si->swap_file->f_mapping->host;
+		struct inode *inode = si->swap_file->f_mapping->host;
 		if (inode_read_congested(inode))
 			goto skip;
 	}
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2019-01-03 19:27 [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
                   ` (2 preceding siblings ...)
  2019-01-10 23:20 ` Andrew Morton
@ 2019-01-10 23:31 ` Andrew Morton
  2019-01-10 23:40   ` Chen, Tim C
  2019-01-11  0:56   ` Yang Shi
  3 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2019-01-10 23:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: ying.huang, tim.c.chen, minchan, daniel.m.jordan, linux-mm, linux-kernel

On Fri,  4 Jan 2019 03:27:52 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:

> Swap readahead would read in a few pages regardless if the underlying
> device is busy or not.  It may incur long waiting time if the device is
> congested, and it may also exacerbate the congestion.
> 
> Use inode_read_congested() to check if the underlying device is busy or
> not like what file page readahead does.  Get inode from swap_info_struct.
> Although we can add inode information in swap_address_space
> (address_space->host), it may lead some unexpected side effect, i.e.
> it may break mapping_cap_account_dirty().  Using inode from
> swap_info_struct seems simple and good enough.
> 
> ...
>
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  	bool do_poll = true, page_allocated;
>  	struct vm_area_struct *vma = vmf->vma;
>  	unsigned long addr = vmf->address;
> +	struct inode *inode = NULL;
>  
>  	mask = swapin_nr_pages(offset) - 1;
>  	if (!mask)
>  		goto skip;
>  
> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {

I re-read your discussion with Tim and I must say the reasoning behind
this test remain foggy.

What goes wrong if we just remove it?

What is the status of shmem swap readahead?

Can we at least get a comment in here which explains the reasoning?

Thanks.

> +		inode = si->swap_file->f_mapping->host;
> +		if (inode_read_congested(inode))
> +			goto skip;
> +	}
> +
>  	do_poll = false;
>  	/* Read a page_cluster sized and aligned cluster around offset. */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2019-01-10 23:31 ` Andrew Morton
@ 2019-01-10 23:40   ` Chen, Tim C
  2019-01-11  0:56   ` Yang Shi
  1 sibling, 0 replies; 10+ messages in thread
From: Chen, Tim C @ 2019-01-10 23:40 UTC (permalink / raw)
  To: Andrew Morton, Yang Shi
  Cc: Huang, Ying, minchan, daniel.m.jordan, linux-mm, linux-kernel

> >
> > +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> 
> I re-read your discussion with Tim and I must say the reasoning behind this
> test remain foggy.

I was worried that the dereference

inode = si->swap_file->f_mapping->host;

is not always safe for corner cases.

So the test makes sure that the dereference is valid.

> 
> What goes wrong if we just remove it?

If the dereference to get inode is always safe, we can remove it.


Thanks.

Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
  2019-01-10 23:31 ` Andrew Morton
  2019-01-10 23:40   ` Chen, Tim C
@ 2019-01-11  0:56   ` Yang Shi
  1 sibling, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-11  0:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ying.huang, tim.c.chen, minchan, daniel.m.jordan, linux-mm, linux-kernel



On 1/10/19 3:31 PM, Andrew Morton wrote:
> On Fri,  4 Jan 2019 03:27:52 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>> Swap readahead would read in a few pages regardless if the underlying
>> device is busy or not.  It may incur long waiting time if the device is
>> congested, and it may also exacerbate the congestion.
>>
>> Use inode_read_congested() to check if the underlying device is busy or
>> not like what file page readahead does.  Get inode from swap_info_struct.
>> Although we can add inode information in swap_address_space
>> (address_space->host), it may lead some unexpected side effect, i.e.
>> it may break mapping_cap_account_dirty().  Using inode from
>> swap_info_struct seems simple and good enough.
>>
>> ...
>>
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -538,11 +538,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>>   	bool do_poll = true, page_allocated;
>>   	struct vm_area_struct *vma = vmf->vma;
>>   	unsigned long addr = vmf->address;
>> +	struct inode *inode = NULL;
>>   
>>   	mask = swapin_nr_pages(offset) - 1;
>>   	if (!mask)
>>   		goto skip;
>>   
>> +	if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> I re-read your discussion with Tim and I must say the reasoning behind
> this test remain foggy.
>
> What goes wrong if we just remove it?

I saw Tim already answered this.

>
> What is the status of shmem swap readahead?

shmem swap readahead will be skipped too if the underlying device is 
congested.

>
> Can we at least get a comment in here which explains the reasoning?

How about like this:

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3f63bb7..85245fd 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -543,7 +543,8 @@ struct page *swap_cluster_readahead(swp_entry_t 
entry, gfp_t gfp_mask,
         if (!mask)
                 goto skip;

-       if (si->flags & (SWP_BLKDEV | SWP_FS)) {
+       /* Test swap type to make sure the dereference is safe */
+       if (likely(si->flags & (SWP_BLKDEV | SWP_FS))) {
                 struct inode *inode = si->swap_file->f_mapping->host;
                 if (inode_read_congested(inode))
                         goto skip;

Tim is worried about the deference might be not safe for some corner 
case, the corner cases sound unlikely by code inspection. So, added 
"likely" in the if statement.

Thanks,
Yang

>
> Thanks.
>
>> +		inode = si->swap_file->f_mapping->host;
>> +		if (inode_read_congested(inode))
>> +			goto skip;
>> +	}
>> +
>>   	do_poll = false;
>>   	/* Read a page_cluster sized and aligned cluster around offset. */


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-11  0:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 19:27 [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
2019-01-03 19:27 ` [v5 PATCH 2/2] mm: swap: add comment for swap_vma_readahead Yang Shi
2019-01-04  2:25   ` Huang, Ying
2019-01-04  2:25     ` Huang, Ying
2019-01-04  2:25     ` Huang, Ying
2019-01-10 20:49 ` [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not Yang Shi
2019-01-10 23:20 ` Andrew Morton
2019-01-10 23:31 ` Andrew Morton
2019-01-10 23:40   ` Chen, Tim C
2019-01-11  0:56   ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.